Site Reliability Engineer
PendoHerzliya, ILPosted 11 February 2026
Tech Stack
Job Description
<p><strong>Team Description</strong></p>
<p>The Site Reliability Engineering (SRE) team at Pendo is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives, and working with developers and product managers to ensure that our products are not only reliable and performant, but also cost-efficient. Our platform is built on Google Kubernetes Engine (GKE) and utilizes several other Google technologies such as Memorystore, Cloud Datastore, PubSub, Cloud Functions, BigQuery, and Vertex AI, as well as services from other vendors such as Amazon SES.</p>
<p>In the development process, SREs provide developers with stable and performant CI and release pipelines and development environments to facilitate frequent delivery of new product features. In production, SREs perform Tier 1 on-call and incident management functions, supporting a high-throughput platform which processes more than 35 billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand service level objectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that cloud infrastructure is properly secured, and that sufficient controls are in place to meet our compliance goals with respect to industry standards such as SOC 2.</p>
<p><strong>Role Responsibilities</strong></p>
<ul>
<li>Write high-quality infrastructure-as-code that automates the provisioning, deployment, scaling, and monitoring of Pendo’s infrastructure to ensure that it is reliable and performant</li>
<li>Write maintainable code for product functionality with a primary emphasis on operations, scale, resiliency, and monitoring</li>
<li>Work with other engineers to ensure that new services are well-designed, properly monitored and have well-defined SLIs and achievable SLOs</li>
<li>Debug production issues, learn to mitigate them quickly, and find ways to prevent them</li>
<li>Maintain runbooks for manual tasks and replace those runbooks with automation whenever possible</li>
<li>Proactively track our capacity, quotas, and other performance limits to plan for growth</li>
<li>Participate in a 24x7 on-call rotation to handle product availability issues as well as urgent customer support escalations</li>
</ul>
<p><strong>Minimum Qualifications</strong>&nbsp;</p>
<ul>
<li>Experience working with cloud infrastructure using tools such as Ansible or Terraform</li>
<li>Programming skills in a language such as Go or Python, and a willingness to learn new</li>
<li>languages as needed</li>
<li>Ability to think and talk about systems in terms of possible failure modes, bottlenecks, etc.</li>
<li>Ability to write clear and concise English-language documentation of processes for incident</li>
<li>runbooks and release processes</li>
<li>Good number sense for discussing performance analysis, cost analysis, and operational metrics</li>
</ul>
<p><strong>Preferred Qualifications</strong>&nbsp;</p>
<ul>
<li>Experience designing, analyzing, and troubleshooting distributed systems</li>
<li>Experience maintaining Kubernetes clusters in a production environment</li>
<li>Previous experience as a Site Reliability Engineer, DevOps Engineer, or similar role</li>
</ul>
<p><strong>Pendo Description:</strong></p>
<p>Pendo was founded in 2013 by former product managers, who combined their heads and hearts to build something they wanted but never had as product man ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
More jobs at Pendo
See all →Senior Account Executive Enterprise Sales
Munich, Germany · 27 February 2026
Sr. AI-First Backend & Data Engineer
Herzliya, IL · 25 February 2026
Account Director Enterprise Sales
Chicago, IL / Milwaukee, WI / Minneapolis, MN / Cleveland, OH · 24 February 2026
Field Marketing Manager
San Francisco, CA · 24 February 2026
More Python jobs
See all →[Summer 2026] People Science - PhD Intern
Roblox · San Mateo, CA, United States
Team Lead - Security Platform
Cloudflare · Distributed; Hybrid
Sr. Security Software Engineer, Applied Computing (Starshield)
SpaceX · Hawthorne, CA
Security Software Engineer, Applied Computing (Starshield)
SpaceX · Washington, DC