Principal Site Reliability Engineer
ZefrMarina del Rey, CAPosted 27 March 2026
Tech Stack
Job Description
Principal Site Reliability Engineer
WHAT WE DO:
Zefr is the leading global technology company enabling responsible marketing in walled garden social environments. Zefr’s solutions empower brands to manage their content adjacency on scaled platforms such as YouTube, Meta, TikTok, and Snap, in accordance with industry standard frameworks. Through its patented AI technology, Zefr offers brands and agencies more accurate and transparent solutions for social walled gardens. The company is headquartered in Los Angeles, California, with additional locations across the globe.
WHAT YOU’LL DO:
AS A PRINCIPAL SITE RELIABILITY ENGINEER AT ZEFR, YOU'LL SERVE AS A TECHNICAL LEADER AND SUBJECT MATTER EXPERT, HELPING DEFINE THE TECHNICAL VISION AND SHAPE THE DIRECTION OF OUR RELIABILITY PRACTICES ACROSS THE ORGANIZATION.
You'll leverage deep expertise in observability, core SRE principles, cloud infrastructure, CI/CD and DevSecOps to solve our most complex challenges and set the standard for engineering excellence.
This role requires a blend of hands-on technical expertise and strategic thinking. You'll drive cross-functional initiatives, mentor engineers across teams, and partner with leadership to ensure our AI-powered platform is robust, efficient, and scalable.
We’re looking for someone to combine their technical expertise with strong leadership and a passion for continuous improvement and innovation. Zefr wants a candidate that champions reliability as a product feature, and can translate complex technical concepts into strategy. This is a role where you'll shape how we build and operate systems at scale.
- Support and build systems and tools that enable other engineers to generate, deploy, and manage product features and models both quickly and safely.
- Deploy and support a multi-cloud, micro-service architecture, including infrastructure tailored for ML workloads, deployed via Github Actions, ArgoCD & Kubernetes.
- Collaborate with other engineers to architect secure, resilient, scalable, and cost-efficient applications and ML systems/pipelines in AWS and GCP.
- Foster and push our DevOps culture and philosophy by encouraging continuous improvement across all engineering teams.
- Proactively maintain the health of production environments, including monitoring application performance and resource utilization.
- Participate in 24/7 on-call rotation, respond to system performance issues and outages.
- Debug code at the application and infrastructure level.
- Mature our CI/CD workflows and release process.
- Maintains a forward-thinking approach, actively researching and proposing new solutions.
- Propose and review Engineering Request for Comments (RFC) to drive Engineering architecture and practices.
TECHNOLOGY STACK AT ZEFR:
Core Infrastructure & Cloud Platforms:
- Cloud Providers: Google Cloud Platform (primary), Amazon Web Services Infrastructure as Code (IaC): Terraform, Terragrunt
- Containerization & Orchestration: Docker, Kubernetes (experience with GKE and/or EKS expected), Helm, Kustomize
- Service Mesh: Istio
CI/CD & Automation:
- CI/CD Pipelines: GitHub Actions
- GitOps / Continuous Delivery: Argo CD
- Primary Scripting/Automation Language: Python
Observability & Monitoring:
- Monitoring & Alerting: Prometheus, Chronosphere, Pagerduty
- Telemetry Standards: OpenTelemetry
Application & Data Ecosystem (Supporting):
- Application Languages/Frameworks: Python, FastAPI, Flask, Node.js, React
- Data Streaming: Apache Kafka
- Data Processing/Transformation: Pandas, DBT
- Workflow Orchestration: Apache Airflow, Ray
Data Stores & Databases:
- Relational Databases: PostgreSQL (including managed versions like AWS Aurora, GCP Cloud SQL)
- NoSQL Databases: DynamoDB
- Search Databases: OpenSearch
- Vector Databases: Qdrant
- Caching: Redis
- Data Warehousing: Snowflake
WHAT WE’RE LOOKING FOR:
- 10+ year job history designing, managing, deploying, and supporting Cloud Infrastructure in a production ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
AI Resume Fit Check
See exactly which skills you match and which are missing before you apply. Free, instant, no spam.
Check my resume fitFree · No credit card