Lead DevOps Engineer
Observe.AIBengaluruPosted 23 March 2026
Tech Stack
Job Description
About Us
Observe.AI is the enterprise-grade Customer Experience AI platform that unifies conversations, intelligence, and action to turn contact centers into performance engines. Built to optimize the full lifecycle of human and AI agents, Observe.AI enables enterprises to automate customer interactions, augment agent performance, and deliver governed AI at scale.
On a single platform, Observe.AI combines Voice and Chat AI Agents, real-time AI Copilots, and Conversation Intelligence with 100% interaction coverage for quality, compliance, and performance management. Trusted by brands like DoorDash, Affordable Care, Signify Health, and Verida, Observe.AI delivers fast time-to-value, measurable ROI, and consistent, high-quality customer experiences across every channel.
Why Join Us
Joining Observe.AI as a Lead DevOps Engineer puts you at the forefront of AI and cloud infrastructure, where you’ll own and scale systems powering real-world customer interactions. You’ll drive high-impact initiatives like GPU orchestration, self-hosting, and low-latency AI deployments while working closely with ML teams to productionize cutting-edge models. With end-to-end ownership, a modern tech stack, and the opportunity to shape MLOps best practices, this role offers strong technical leadership, tangible business impact, and accelerated growth in a fast-scaling AI company.
What you’ll be doing
Manager Self-Hosting tools: Lead the transition from managed services to self-hosted Elastic search, Prometheus, and other critical infrastructure components to optimize performance and cost.
Optimize AI Infrastructure: Work closely with ML engineers and data scientists to efficiently deploy and scale AI/ML models, ensuring high availability and low-latency inference.
Infrastructure Scalability Reliability: Design and implement scalable, fault-tolerant systems capable of handling large-scale AI workloads, distributed training, and high-throughput data pipelines.
Technology Evaluation Implementation: Continuously assess and introduce new technologies to enhance automation, reliability, and security in AI model deployment and training pipelines.
CI/CD for AI Workflows: Enhance and automate ML model deployment pipelines using MLOps best practices and tools like Kubeflow, MLflow, and Argo Workflows.
Observability Monitoring: Implement and enhance monitoring, logging, and alerting strategies using Prometheus, Grafana, ELK, OpenTelemetry, etc., tailored for AI workloads.
Security Best Practices: Implement security measures for AI data pipelines, model storage, and cloud infrastructure.
Mentorship Best Practices: Set high standards by implementing best practices in DevOps and MLOps, mentoring team members to raise the technical bar.
What you bring to the role
6+ years of experience in DevOps, SRE, or Cloud Infrastructure roles, preferably in AI or data-intensive environments.
Strong expertise in Kubernetes (EKS, AKS preferred ) for deploying AI workloads and managing GPU non-CPU clusters.
Experience with self-hosting services like Elasticsearch, Prometheus, Grafana, Kafka, etc.
Hands-on expertise in Infrastructure as Code (Terraform, CloudFormation).
Deep understanding of cloud platforms (AWS, Azure, GCP) and AI-focused services like AWS Sagemaker, Vertex AI, or Azure ML.
Strong automation and scripting skills in Python, Bash, or Go.
Experience in CI/CD tools (Jenkins, GitHub Actions, ArgoCD, etc.) with a focus on AI model deployment.
Strong leadership and mentorship skills to guide DevOps and ML teams.
FinOps expertise for optimizing GPU and AI cloud compute costs.
Familiarity with service meshes (Istio, Linkerd) and API gateways.
Knowledge of compliance frameworks (SOC2, ISO 27001, etc.) for AI data pipelines.
Perks Benefits
Excellent medical insurance options and free online doctor consultations
Yearly privilege and sick leaves as per Karnataka SE Act
Generous holidays (National and Festive) recognition and parental leave po ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
AI Resume Fit Check
See exactly which skills you match and which are missing before you apply. Free, instant, no spam.
Check my resume fitFree · No credit card
More jobs at Observe.AI
See all →More Python jobs
See all →Quality Assurance Engineer
Graphcore · Gdańsk, Pomeranian Voivodeship, Poland
Senior Quality Assurance Engineer
Graphcore · Gdańsk, Pomeranian Voivodeship, Poland
Intern - Research
Graphcore · Bristol, UK; Cambridge, UK; London, UK
Data & AI Strategy Senior Manager
Accenture Federal Services · Washington, DC