Staff Site Reliability Engineer
Figure AISan Jose, CA$175k – $250kPosted 5 March 2026
Job Description
<p>Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.</p>
<p>We are looking for a Site Reliability Engineer to own our internal systems infrastructure. This role is responsible for setting up and managing cloud and on-prem infrastructure to deliver highly available, reliable, and automated systems.</p>
<p><strong>Responsibilities:</strong></p>
<ul>
<li>Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more.</li>
<li>Migrate SaaS to self-hosted solutions to enhance security and reliability.</li>
<li>Implement monitoring and alerting systems, and define incident response plans and runbooks.</li>
<li>Reduce human workload through automation to automate deployment and scaling.</li>
<li>Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives.</li>
<li>Use a data driven approach to demonstrate service robustness and track optimization work.</li>
<li>Partner with the security team to ensure that security remediations and updates are applied in a timely manner.</li>
</ul>
<p><strong>Requirements:</strong></p>
<ul>
<li>Strong experience with Linux/Unix systems administration</li>
<li>Proficiency in programming/scripting</li>
<li>Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures</li>
<li>Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems.</li>
<li>Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)</li>
<li>Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)</li>
<li>Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)</li>
<li>Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets.</li>
<li>Ability to work in cross-functional teams with developers, infra, and product teams</li>
<li>Excellent verbal and written communication skills</li>
</ul>
<p>The US base salary range for this full-time position is between $175,000 - $250,000 annually.</p>
<p>The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended. </p>
<p><br><br></p>
Apply Now
Direct link to company career page