Senior Site Reliability Engineer, Compute

Roblox
San Mateo, CA, United StatesPosted 24 February 2026

Job Description

<div class="content-intro"><p><span style="font-weight: 400;">Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators. </span></p> <p><span style="font-weight: 400;">At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.</span><strong> </strong><span style="font-weight: 400;">We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there. </span></p> <p><span style="font-weight: 400;">A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.</span></p></div><p>The Infrastructure Compute Site Reliability Engineering (SRE) team's mission is to own and manage the successful operation of our underlying cell infrastructure system, along with elements of service discovery, secrets management and related software layers. We’re looking for skilled Site Reliability Engineers with strong programming skills to help us build Roblox's private cloud, productionize our growing Kubernetes-based infrastructure, and institute reliability best practices across the Roblox Compute team.</p> <p><strong>You will:</strong></p> <ul> <li><strong>Design and Develop</strong> systems libraries that promote fault-tolerance and resilience, automate much of the management and lifecycle of our clusters, and ensure systems are observable.</li> <li><strong>Promote and Institute</strong> reliability best practices across the Infra Compute group, drive common reliability initiatives. Provides collaborative technical reviews and operational guidance to strengthen system reliability.</li> <li><strong>Build, Automate and Standardize</strong> process automation to create a "golden path" of tooling and platform support that powers the fundamental Roblox ecosystem.</li> <li><strong>Create Tooling</strong> that provides production guardrails, by evaluating release candidate capacity with load testing tooling before deploying to production.</li> <li><strong>Create Performance Monitoring Services</strong> and observability towards understanding capacity issues and platform degradations, monitoring production services and their changes, like generalized canarying services with alerting.</li> <li><strong>Analyze</strong> systems and system designs for production readiness</li> </ul> <p><strong>You have:</strong></p> <ul> <li>A Bachelors degree (or equivalent professional experience) in Computer Science or related engineering field with a proven track record including at least 4 years as an SRE or Software Engineer.</li> <li>Fluency with high-level programming languages like <strong>Go</strong>, Java, C#.</li> <li>Experience with Kubernetes, or similar orchestration systems. Experience in Nomad, Vault, and Consul is strongly desired.</li> <li>Experience and good habits around building software and tools and getting them adopted. Your system's focus advises a view of code needing to be deeply reliable.</li> </ul> <p><strong>You are:</strong></p> <ul> <li><strong>A Partner</strong>: You know that the best tools integrate broadly with the tooling ecosystem. You approach partners and processes with curiosity and seek to understand a problem deeply before you start coding.</li> <li><strong>A Developer</strong>: You love building durable and reliable complex systems.</li> <li><strong>Passionate</strong> about problem-solving, finding creative work solutions, and addressing unexpected challenges as part of a team.</li> <li><strong>Problem Solver</strong>: You ask the right questions to tackle issues within your expertise and you use data to test y ... (truncated, view full listing at source)
Apply Now

Direct link to company career page

Share this job