Staff Site Reliability Engineer - Observability
OktaSan Francisco, CaliforniaPosted 12 March 2026
Job Description
Get to know Okta
Okta is The World’s Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth. At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we’re looking for lifelong learners and people who can make us better with their unique experiences.
Join our team! We’re building a world where Identity belongs to you.
We are seeking a highly technical Observability
Site Reliability Engineer with a specialty in Google Cloud, to own and expand our Observability ecosystem into GCP. In this role, you will move beyond simple monitoring to delivering a world class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners. You will treat infrastructure as code —utilizing Terraform and strong coding proficiency in Go, Python, or Ruby —to automate the deployment of agents and collectors across complex distributed systems.
Key Responsibilities
Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
GCP Observabilty Engineering: Optimize the collection, processing, and storage of Observabilty data to ensure high reliability and low latency of our Splunk and Grafana services
Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
Automation: Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.
Required Skills Experience (The Essentials)
GKE: Minimum 5+ Experience scaling and managing observability in a Google Cloud platform.
Visualization: Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources. SRE Mindset: Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
Programming Proficiency: Strong coding skills in Python , Go for building internal tools and automating workflows.
Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE).
Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.
Bonus Skills (The "Nice-to-Haves")
Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
Grafana Loki: Experience in migrating Splunk to Grafana Loki
Other Cloud Platforms: Experience managing observability native tools within AWS.
Additional requirements:
This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
#LI-MM #LI-Hybrid
P24517_3387022
Below is the annual base salary range for candidates located in San Francisco Bay Area. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: https://rewards.okta.com/us .
The annual base salary range for this position for candidates located in the San Francisco Bay area is between:
$194,000
$267,000 USD
What you can look forward to as a ... (truncated, view full listing at source)
More jobs at Okta
See all →Staff Software Engineer, Security Engineering
Bellevue, Washington; Chicago, Illinois · 13 March 2026
Manager, Technical Account Management
Dublin, Ireland · 13 March 2026
Senior Software Engineer, AI Agentic Experience (Auth0)
San Francisco, California · 13 March 2026
AI Operations Lead
Dublin, Ireland · 13 March 2026