Senior Machine Learning Engineer - I (MLOps/LLMOps)
Sumo LogicUnited States (HQ)Posted 19 March 2026
Tech Stack
Job Description
Senior Machine Learning Engineer - I (MLOps/LLMOps)
As a Senior Machine Learning Engineer - MLOps/LLMOps, you will design, build, and scale production-grade infrastructure and platforms that enable the full lifecycle of ML and LLM systems. You'll architect robust pipelines for model training, evaluation, deployment, and monitoring while ensuring reliability, observability, and efficiency at scale. This role collaborates closely with ML Engineers, Data Scientists, and Product teams to operationalize AI/ML solutions from prototype to production. Remote candidates will be considered. Ability to participate with fellow ML staff in-office at the company HQ in Redwood City, CA when needed is preferred.
Responsibilities
Platform Infrastructure
Design and implement scalable MLOps/LLMOps platforms supporting the full ML lifecycle: data versioning, model training, evaluation, deployment, and monitoring
Build and maintain CI/CD pipelines for ML models and LLM applications with automated testing, validation, and rollback capabilities
Develop infrastructure-as-code (IaC) for reproducible, version-controlled ML environments
Architect model serving infrastructure with auto-scaling, A/B testing, and canary deployment capabilities
LLM Operations
Build platforms for LLM fine-tuning, prompt management, and experimentation at scale
Implement evaluation frameworks for LLM performance, quality, safety, and cost optimization
Design and deploy enterprise-grade AI agents and copilots with robust monitoring and guardrails
Establish LLM observability: token usage tracking, latency monitoring, prompt/response logging, and cost attribution
Operational Excellence
Own uptime, reliability, and performance of ML/LLM services (SLIs/SLOs)
Implement comprehensive monitoring, alerting, and incident response for ML systems
Participate in on-call rotations and drive post-incident reviews to improve system resilience
Build automation and tooling to reduce toil and accelerate ML development velocity
Collaboration Leadership
Partner with ML Engineers and Data Scientists to translate research into production-ready systems
Collaborate with platform and infrastructure teams on cloud architecture and resource optimization
Mentor team members on MLOps best practices, production ML patterns, and operational excellence
Drive technical decisions on tooling, frameworks, and architectural patterns
Required Qualifications and Skills
Education: B.S./M.S./Ph.D. in Computer Science, Engineering, or related technical field
Experience: 4+ years of software engineering experience with 2+ years focused on MLOps/LLMOps
MLOps Expertise:
Production experience with ML model serving frameworks (e.g., TensorFlow Serving, TorchServe, Triton)
Hands-on with ML experiment tracking and model registry tools (MLflow, Weights Biases, Kubeflow)
Proficiency in workflow orchestration (Airflow, Prefect, Kubeflow Pipelines, Metaflow)
LLMOps Expertise:
Experience with LLM deployment, fine-tuning, and evaluation frameworks (e.g., vLLM, LangChain, LlamaIndex)
Knowledge of prompt engineering, RAG architectures, and LLM application patterns
Familiarity with LLM observability tools (e.g., LangSmith, Arize, WhyLabs)
Cloud Infrastructure:
Strong experience with major cloud providers (AWS, GCP, or Azure) and ML-specific services (SageMaker, Vertex AI, Azure ML, Bedrock)
Proficiency in containerization (Docker, Kubernetes) and infrastructure-as-code (Terraform, CloudFormation, Pulumi)
Experience with microservices architecture and API development (REST, gRPC)
Software Engineering:
Strong programming skills in Python, terraform and Helm; familiarity with Go, Java, or Rust is a plus
Deep understanding of CI/CD practices and tools (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
Experience with monitoring and observability stacks (Prometheus, Grafana, DataDog, ELK)
Operational Excellence:
Track record of managing production systems with defined SLIs/SLOs
Experience with on-call ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
AI Resume Fit Check
See exactly which skills you match and which are missing before you apply. Free, instant, no spam.
Check my resume fitFree · No credit card