Site Reliability Engineer
Arista NetworksRemote, ORPosted 4 March 2026
Job Description
<p><strong>Who You’ll Work With</strong></p><p>SRE's at Arista combine strong software and systems engineering with a passion for operating production systems at scale. As an SRE you’ll be part of the team responsible for our global service fleet.</p><p><strong>What You’ll Do:</strong></p><p>CloudVision is deployed on Kubernetes across global regions using Spinnaker for our CI/CD pipeline. Our tech stack runs on GKE, using HBase/Hadoop as main distributed database and storage layer, ElasticSearch for powering search data, ClickHouse for fast real time queries of flow data, our own Kafka-based distributed real time stream processing layer for analytics, and TensorFlow for ML analysis. Our monitoring system is built on top of Prometheus, Grafana, Loki, and other OSS tools.<br>
As a Senior SRE, you’ll be responsible for our global CloudVision service fleet. This includes:</p><ul><li>Build, deploy safely and incrementally and operate critical production systems with focus on scalability, reliability, observability, performance and security.</li><li>Monitor, support and enhance product deployment experience across services.</li><li>Build automation to remove toil and efficiently operate production systems.</li><li>Proactively monitor, respond to, and enhance alerts and set up automated alert handling</li><li>Create and maintain the incident response runbooks.</li><li>Build and deploy new systems with scalability, reliability, and observability as primary requirements</li><li>Triage platform/infrastructural issues and help Arista software engineers in their triages. Engage with 3rd party vendor support.</li><li>Deploy new systems in a staged manner</li><li>Write postmortem documents and build solutions to avoid incidents from repeating.</li><li>Plan and communicate maintenance windows on production systems.</li><li>Work with Arista’s product development teams to identify infrastructural issues that are causing bottlenecks and limitations in their workflows. Design and implement solutions to resolve them.</li><li>Survey and adopt best practices around infrastructure/platform to maintain secure, scalable and fault-tolerant systems.</li><li>Implement solutions to scale the systems</li><li>Implement fault-tolerance and performance to improve availability of the systems</li><li>Study the design and sufficient implementation details of OSS systems for better triage and fix resolution.</li></ul><p>#LI-EO1</p>
<ul><li>Bachelors in Computer Science or Engineering + 5 years’ experience, MS Computer Science or Engineering + 5 years’ experience, or equivalent work experience.</li><li>Knowledge of one or more of Go, Python, bash shell scripting to be able to implement medium complexity automation workflows.</li><li>Knowledge of Linux (or UNIX) from administration and debugging perspective</li><li>Hands-on experience in operating software systems (infrastructure, complex applications etc) at scale</li><li>Experience in server provisioning (esp from storage and networking perspective).</li><li>Strong problem solving and software troubleshooting skills</li><li>Experience with infrastructure-as-code.</li><li>Desirable to have one/more of the following skills</li><li>Experience managing databases - eg: PostgreSQL or equivalent RDBMS etc</li><li>Experience with docker and virtualization technologies </li><li>Experience managing monitoring stack - Prometheus, Grafana  etc</li><li>Experience managing Artifactory, docker registry etc</li><li>Experience managing CI/CD systems like GitLab tools, Spinnaker etc</li><li>Experience with infrastructure-as-code frameworks like Terraform</li><li>Experience with container orchestration via Kubernetes</li></ul>
<p>Arista stands out as an engineering-centric company. Our leadership, including founders and engineering managers, are all engineers who understand sound software engineering principles and the importance of doing things right.</p><p><br>
We hire globally into our diverse team. At Arista, ... (truncated, view full listing at source)
Apply Now
Direct link to company career page