Senior Manager, Site Reliability Engineering (SRE)
SolarWindsBangalore, IndiaPosted 24 February 2026
Job Description
<div class="content-intro"><p>At SolarWinds, we’re a people-first company. Our purpose is to enrich the lives of the people we serve—including our employees, customers, shareholders, partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions.</p>
<p>The ideal candidate thrives in an innovative, fast-paced environment and is collaborative, accountable, ready, and empathetic. We’re looking for individuals who believe they can accomplish more as a team and create lasting growth for themselves and others. We hire based on attitude, competency, and commitment. Solarians are ready to advance our world-class solutions in a fast-paced environment and accept the challenge to lead with purpose. If you’re looking to build your career with an exceptional team, you’ve come to the right place. Join SolarWinds and grow with us!</p></div><p></p>
<h3 data-start="291" data-end="307">Role Overview:</h3>
<p data-start="309" data-end="514">SolarWinds is looking for a <strong data-start="337" data-end="391">Senior Manager, Site Reliability Engineering (SRE)</strong> to lead reliability, scalability, and operational excellence for large-scale, cloud-native, data-intensive SaaS platforms.</p>
<p data-start="516" data-end="927">This role combines <strong data-start="535" data-end="556">people leadership</strong>, <strong data-start="558" data-end="577">technical depth</strong>, and <strong data-start="583" data-end="608">operational ownership</strong>. You will manage and grow SRE teams responsible for production systems, while remaining close to architecture, platform reliability, incident response, and automation strategy. The ideal candidate has operated complex distributed systems at scale and knows how to balance availability, performance, velocity, and cost.</p>
<h3>Responsibilities:</h3>
<ul>
<li>Lead and mentor SRE teams responsible for the reliability, availability, and performance of mission-critical SaaS platforms</li>
<li>Own and drive production reliability outcomes, including uptime, latency, capacity, scalability, and operational readiness</li>
<li>Oversee data-intensive distributed systems, including technologies such as ClickHouse, Kafka, ZooKeeper, MySQL, Redis, and Flink</li>
<li>Guide and review Kubernetes platform operations at scale, including cluster lifecycle management, upgrades, and capacity planning</li>
<li>Establish and evolve SRE best practices, including SLIs/SLOs, alerting strategy, incident management, and post-incident reviews</li>
<li>Promote and enforce an automation-first approach, reducing manual toil through scripting, tooling, and platform improvements</li>
<li>Partner closely with Engineering, Platform, Product, and Security teams to embed reliability into system design and delivery</li>
<li>Drive adoption of GitOps, service mesh, and observability standards across teams</li>
<li>Lead cloud infrastructure operations across AWS and Azure, ensuring secure, resilient, and cost-effective usage</li>
<li>Participate in and oversee on-call and incident response practices, ensuring clear ownership, fast recovery, and continuous improvement</li>
</ul>
<h3>Must Have Qualifications:</h3>
<ul>
<li>Proven experience leading SRE, Platform, or Infrastructure teams supporting production, customer-facing SaaS systems</li>
<li>Strong hands-on Kubernetes experience in large-scale production environments, including:</li>
<li>Cluster operations and lifecycle management</li>
<li>Autoscaling and resilience mechanisms (HPA, VPA, KEDA, Cluster Autoscaler, Pod Disruption Budgets, Goldilocks)</li>
<li>Observability and monitoring (Prometheus, Grafana)</li>
<li>Experience operating distributed, data-intensive systems such as ClickHouse, Kafka, ZooKeeper, MySQL, Redis, or Flink</li>
<li>Practical experience with GitOps and service mesh technologies, including Flux, Kustomize, and Istio</li>
<li>Strong automation mindset, with hands-on experience using P ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
More jobs at SolarWinds
See all →More Python jobs
See all →[Summer 2026] People Science - PhD Intern
Roblox · San Mateo, CA, United States
Team Lead - Security Platform
Cloudflare · Distributed; Hybrid
Sr. Security Software Engineer, Applied Computing (Starshield)
SpaceX · Hawthorne, CA
Security Software Engineer, Applied Computing (Starshield)
SpaceX · Washington, DC