Site Reliability Engineer II

PagerDuty
Toronto$115k – $165kPosted 5 March 2026

Job Description

<div class="content-intro"><p>PagerDuty (NYSE:PD) is a leader in Digital Operations Management. In an always-on world, organizations of all sizes trust PagerDuty to help them deliver a perfect digital experience to their customers, every time. Teams use PagerDuty to identify issues and opportunities in real time and bring together the right people to fix problems faster and prevent them in the future. Over 13,000 organizations (including 60 of Fortune 100) rely on PagerDuty to succeed with Digital Transformation, Cloud Migration, and DevOps Modernization. Notable customers include GE, Cisco, Genentech, Electronic Arts, Cox Automotive, Netflix, Shopify, Zoom, DoorDash, Lululemon and more. We are expanding rapidly as a platform for Digital Operations Management using AI/ML and Automation and growing our adoption by Development, IT, Customer Service, Security, and other teams across the organization,</p></div><p>As an intermediate Site Reliability Engineer on the Core Infrastructure team in our Toronto office, you'll help build and operate the foundational infrastructure that powers PagerDuty's real-time digital operations platform. Our systems support millions of events and alerts daily, enabling customers to detect, respond to, and resolve incidents quickly and reliably.</p> <p>You'll work at the intersection of platform evolution and operational excellence, building and evolving foundational network, compute, and ingress infrastructure while scaling and hardening existing systems. Your work will directly impact the reliability, scalability, and security of the services our customers rely on to keep their businesses running as PagerDuty continues to grow across products, regions, and customer use cases.</p> <p><strong>Key Responsibilities</strong></p> <ul> <li>Support and improve foundational infrastructure, including networking, compute platforms, Kubernetes clusters, and ingress/traffic management systems.</li> <li>Contribute to the reliability and scalability of PagerDuty's core platform by hardening existing systems and supporting the rollout of new infrastructure capabilities.</li> <li>Participate in agile rituals (standups, planning, retros) and communicate progress/risks early</li> <li>You stay current on technical trends to suggest innovative tools and approaches to interesting problems</li> <li>Monitor system health using metrics, logs, and alerts, and participate in 24/7 on-call rotations to help detect, respond to, and resolve incidents.</li> </ul> <p><strong>Basic Qualifications</strong></p> <ul> <li>3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles</li> <li>Hands-on experience operating Linux-based systems in production environments</li> <li>Working knowledge of networking fundamentals, such as load balancing, DNS, TLS, and ingress traffic flow</li> <li>Experience with container orchestration (e.g., EKS, Kubernetes)</li> <li>Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure), including networking and compute concepts</li> <li>Proficiency in at least one programming language (e.g., Python, Ruby, Go, etc.)</li> <li>Experience with Infrastructure as Code (e.g., Terraform, CloudFormation)</li> </ul> <p><strong>Preferred Qualifications</strong></p> <ul> <li>Experience with AWS cloud networking concepts such as VPCs, subnets, routing, security groups, and load balancers</li> <li>Experience operating or contributing to production Kubernetes platforms (e.g., EKS), including cluster upgrades, networking, or ingress configuration</li> <li>Experience with monitoring, observability, and logging platforms (e.g., DataDog, New Relic, SumoLogic, Splunk, Prometheus, Grafana)</li> <li>Familiarity with service meshes, ingress controllers, or API gateways (e.g., Envoy, Istio, NGINX)</li> </ul> <p> </p> <p><strong>The base salary range for this position is 115,000 - 165,000 CAD.</strong> This role may also be eligible for bonus, commission, equity, and/or benefits.</p> ... (truncated, view full listing at source)