Staff Machine Learning Engineer

Otter AI
Mountain View, CA$210k – $275kPosted 24 February 2026

Job Description

<div> <p><strong>The Opportunity</strong><strong><br></strong>Do you want to lead projects to build and deploy cutting-edge AI technology to help people get unparalleled value from meetings and conversations? Join our core AI team responsible for ML and work alongside industry-veteran scientists and engineers. As a Staff Machine Learning Engineer, you’ll bring your strong software engineering mindset to machine learning in order to scale and optimize our ML systems—creating and transforming innovative research into production-ready features that power Otter’s summarization and conversational intelligence products.</p> <p><strong>Your Impact</strong></p> <ul> <li><strong>Architect, build, and evolve</strong> large-scale SID / ASR / NLP / LLM systems that power mission-critical product experiences including summarization, chat, and speech understanding across millions of conversations.</li> <li><strong>Lead the design and implementation</strong> of training, fine-tuning, post-training, and inference strategies for large language and speech models using PyTorch and/or JAX, making principled trade-offs across quality, latency, cost, and reliability.</li> <li><strong>Design and improve model architectures,</strong> loss functions, decoding strategies, and training techniques for speech and language models, informed by both research and production constraints.</li> <li><strong>Own end-to-end ML system lifecycles</strong>, from research prototyping through production deployment, monitoring, iteration, and long-term maintenance.</li> <li><strong>Partner deeply with product, and infrastructure teams</strong> to develop and translate cutting-edge research into scalable, production-grade systems that deliver measurable user and business impact.</li> <li><strong>Drive system-level improvements</strong> in model performance, robustness, observability, and operational excellence using real-world conversational data at scale.</li> <li><strong>Set technical direction and best practices</strong> for ML infrastructure, data pipelines, evaluation frameworks, and deployment workflows in a cloud environment.</li> <li><strong>Identify and resolve complex, ambiguous problems</strong> in model behavior, data quality, scaling, and system interactions, often before they surface as user-visible issues.</li> <li><strong>Mentor and elevate other engineers</strong>, influencing team standards, reviewing designs, and contributing to a culture of strong technical decision-making and execution.</li> <li><strong>Influence applied research and technical roadmaps</strong> by identifying promising speech and multimodal modeling approaches, and driving their validation and adoption into production systems</li> </ul> <p><strong>We're Looking for Someone Who</strong></p> <ul> <li>Holds a <strong>Bachelor’s or Master’s degree in Computer Science or a related field with 10+ years of relevant industry experience</strong>; PhD is preferred.</li> <li>Has deep, hands-on experience <strong>building and fine-tuning large language or foundation models</strong>, with production experience in <strong>ASR, TTS, multimodal, or modern LLM/NLP systems</strong>, and a strong understanding of model failure modes and trade-offs.</li> <li>Demonstrates <strong>strong command of modern ML research</strong>, with the ability to critically evaluate new papers and drive innovation by identifying what is production-worthy versus experimental.</li> <li>Has<strong> extensive experience deploying, scaling, monitoring, and operating ML systems in production</strong> across training, inference, and serving infrastructure, including model versioning, rollback strategies, and performance regression detection while balancing cost, latency, and reliability constraints.</li> <li>Is comfortable working with <strong>large-scale speech and conversational datasets</strong>, including data preprocessing, augmentation, quality analysis, and labeling strategies to support model training and evaluation.</li> ... (truncated, view full listing at source)