Site Reliability and Infrastructure Engineer

Treeswift
New York Office$160k – $215kPosted 3 April 2026

Job Description

Site Reliability and Infrastructure Engineer In the face of rising threats like severe storms and wildfires, increasing pressure on affordability, and unprecedented demands for system expansion, Treeswift empowers energy companies to modernize their field work to meet the growth and challenges ahead. To accomplish our mission, we deploy our sensors into our customers' field operations, typically on backpacks or vehicles. The resulting trove of LiDAR and imagery data is processed through our AI models to deliver actionable analytics through our web platform. To date, our technology has enabled utilities to reduce wildfire, regulatory and outage risk from vegetation, avoid delays and cost overruns in new construction, and accelerate recovery from severe storms. Since our first utility pilot in June 2024, we have rapidly expanded and now work with three of the five largest utilities in the United States and are expanding across new customers and use cases. To tackle this challenge, we are bringing together a team of mission-driven experts with deep industry experience in robotics (Penn, Caltech, CMU) and enterprise software development (Palantir, Stripe, Oracle, MongoDB). We have raised funding from leading investors including Penny Pritzker’s Inspired Capital. Treeswift is headquartered in lower Manhattan, and maintains an office in Philadelphia. We also have some customer-facing team members based closer to our customer sites (i.e. Bay Area). We strongly support our employees (including software engineers) to visit customer sites — ask us about this! We hope you’ll join us on this journey. About the role - Help us scale and harden the platform that schedules our pipelines, runs machine learning training, and hosts our web app. We run Apache Airflow on Astronomer with DAGs that orchestrate high-volume processing across AWS and Kubernetes, including machine learning inference inside pipeline tasks. You will build the observability and reliability foundations that let us run this system confidently as customer data volume grows: monitoring, alerting, performance/cost visibility, and clear operational practices. - Stay curious, collaborative, and cross-functional while also taking ownership of problems. We translate complex, real-world requirements from a critical industry into high-quality data products, so understanding the business holistically is key. We take pride in managing complexity and providing high-fidelity data that our customers can use to make better-informed decisions. - You’ll be our first full-time SRE/infrastructure engineer, so we’ll look to you for leadership on how to improve and scale our infrastructure to support each part of the platform. Our data pipeline, machine learning training platform, and web app could all benefit from further productionization. RESPONSIBILITIES - Partner with the data platform and engineering teams to understand how changes propagate across pipeline execution (Astronomer-hosted Airflow DAGs), containerized workers (Kubernetes), and AWS services (S3, SQS, Lambda, Step Functions, ECS). - Design and implement reliability and observability for high-volume pipeline operations, including: - actionable monitoring/alerting for DAG/task failures and reruns - visibility into operational workflows like flight orchestration (including DLQ/failed-message alerting and notification pathways) - dashboards and SLO/SLI definitions focused on correctness, throughput, and pipeline health - Own CI/CD guardrails for production changes: build/deploy validation and safe rollout mechanics for Astronomer deployments (image builds pushed to ECR, and Airflow configuration updates via Astronomer CLI variable updates) - Make machine learning inference operations more reliable and observable: - instrument inference runs executed inside pipeline runners (model checkpoint resolution, S3 sync behavior, thresholds and fallback behavior, and output correctness) - add operational visibility for inf ... (truncated, view full listing at source)
Apply Now

Direct link to company career page

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card

Share