Staff Site Reliability Engineer

Figure AI
San Jose, CA$175k – $250kPosted 5 March 2026

Job Description

<p>Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.</p> <p>We are looking for a Site Reliability Engineer to own our internal systems infrastructure. This role is responsible for setting up and managing cloud and on-prem infrastructure to deliver highly available, reliable, and automated systems.</p> <p><strong>Responsibilities:</strong></p> <ul> <li>Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more.</li> <li>Migrate SaaS to self-hosted solutions to enhance security and reliability.</li> <li>Implement monitoring and alerting systems, and define incident response plans and runbooks.</li> <li>Reduce human workload through automation to automate deployment and scaling.</li> <li>Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives.</li> <li>Use a data driven approach to demonstrate service robustness and track optimization work.</li> <li>Partner with the security team to ensure that security remediations and updates are applied in a timely manner.</li> </ul> <p><strong>Requirements:</strong></p> <ul> <li>Strong experience with Linux/Unix systems administration</li> <li>Proficiency in programming/scripting</li> <li>Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures</li> <li>Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems.</li> <li>Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)</li> <li>Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)</li> <li>Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)</li> <li>Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets.</li> <li>Ability to work in cross-functional teams with developers, infra, and product teams</li> <li>Excellent verbal and written communication skills</li> </ul> <p>The US base salary range for this full-time position is between $175,000 - $250,000 annually.</p> <p>The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended. </p> <p><br><br></p>