Lead Site Reliability Engineer

Turion Space
IT & Platform$155k – $231kPosted 21 February 2026

Job Description

At Turion Space, our Platform Engineering team is building the infrastructure backbone that powers the next generation of space exploration. As Lead Site Reliability Engineer, you'll build the operational foundations for our mission-critical systems by establish SRE practices, defining reliability standards, and creating the monitoring, incident response, and automation capabilities that keep production systems running when it matters most. You'll ensure our spacecraft control systems, autonomous satellite operations, and mission-critical applications maintain world-class reliability.Our team’s mission is to enable Turion engineers to efficiently and reliably deliver products at scale, and support missions that can’t afford downtime when hardware is operating hundreds of miles above Earth. You’ll help create and scale an infrastructure platform that is as reliable and cutting-edge as the missions it supports.Key Responsibilities:Design and implement monitoring, alerting, and observability solutions across cloud and on-premises infrastructureDefine and maintain SLAs, SLIs, and SLOs for critical systemsLead incident response, conduct postmortems, and drive systemic improvements to prevent recurrenceOwn on-call rotation for production systemsIdentify and eliminate repetitive manual operational tasks through automation and self-healing systemsPartner with development teams to embed reliability practices into the software development cycle and establish reliability standardsContribute to architecture reviews with focus on scalability, fault tolerance, disaster recovery, and security requirementsMinimum Qualifications:5+ years of working experience in DevOps or SRE type roles and 1+ years in a technical leadership roleSelf-directed work style with ability to own projects from conception to production in fast-moving environmentsProficient in utilizing AWS cloud servicesDeep understanding of network conceptsDevelopment experience in at least one programming language (e.g. Python, Go, TypeScript)Experience with Linux system administrationExperience with observability tools (Grafana, Prometheus, Loki, Alloy, ELK) in production environmentsStrong experience with Kubernetes, Docker, and container orchestration in production environmentsHands-on experience with CI/CD tools and infrastructure as code (Terraform or Crossplane preferred)Hands-on experience with DR planning, failure mode analysis, and building resilient systems with automated failover and recoveryFamiliarity with HashiCorp Vault or similar identity/secrets management systemsPrevious experience scaling infrastructure at high-growth companies (startup to 100+ employees)Preferred Qualifications:Relevant certifications such as AWS Certified Solutions ArchitectActive SECRET or TOP SECRET clearance that can be maintainedLead Site Reliability: $155,000-$231,000ITAR Requirements:This position may include access to technology and/or software source code that is subject to U.S. export controls. To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State.Benefits:We offer a comprehensive compensation and benefits package designed to support the well-being and professional growth of our employees. In addition to a competitive base salary and company stock, determined by factors such as job-related knowledge, education, skills, experience, and market demand, full-time employees are eligible for:​Equity: Receive equity in Turion Space, letting you benefit from the company's successHealth Insurance: Comprehensive medical, dental, and vision coverage for employees and their dependents. ​Retirement Plans: Access to a 401(k) plan to help you plan for your future. ​Paid Time Off: Generous vacation days, personal days, sick days, and hol ... (truncated, view full listing at source)