Engineering Manager, Model Serving

Together AI
San Francisco, CA$250k – $300kPosted 5 March 2026

Job Description

<p>Together AI is building the AI Inference Model Shaping Platform that brings the most advanced generative AI models to the world. Our platform powers multi-tenant server-less workloads and dedicated endpoints, enabling developers, enterprises, and researchers to harness the latest LLMs, multimodal models, image, audio, video, and reasoning models at scale.</p> <p>We are looking for an exceptional Engineering Lead to partner closely with our cross-functional engineering, infrastructure, research, and sales teams to ensure excellence of our ML API offerings. Your primary focus will be on delivering world-class inference and fine-tuning in our public APIs and customer deployments by building automation and operations processes.</p> <p>This role is ideal for a highly motivated and technically adept individual who excels in fast-paced, dynamic environments. You will be in charge of designing and scaling our ML processes tooling at production scale – optimizing operations to ensure availability and reliability for our services, across differing tenants and user loads, and in a multi-cluster deployment. You will serve as a passionate advocate for internal and external customers, providing feedback to the wider engineering and infrastructure teams to improve our systems and core business metrics. If you thrive in a collaborative, problem-solving environment and are driven to deliver operational excellence, we encourage you to apply for this exciting opportunity.</p> <p><strong>Key Responsibilities</strong></p> <ul> <li>Own availability and performance SLAs for production inference and fine-tuning services across serverless and dedicated deployments</li> <li>Own improve testing, deployment, configuration management, and monitoring practices for multi-cluster ML infrastructure – partnering closely with Infra SREs</li> <li>Build self-serve tooling and automation to reduce operational toil and enable internal users (MLOps, customer experience) and self-serve offerings</li> <li>Define and enforce configuration best practices for inference engines (SGLang, vLLM, etc) to prevent runtime issues</li> <li>Lead incident response, conduct postmortems, and drive reliability improvements</li> <li>Mentor team members and potentially grow into hiring/team building as the organization scales</li> <li>Partner with infrastructure and ML engineering teams to improve system reliability and cost efficiency</li> </ul> <p><strong>Required Qualifications</strong></p> <ul> <li>5+ years operating production ML inference or training systems at scale</li> <li>2+ years in senior IC or tech lead roles, with demonstrated mentorship and technical leadership experience. Having built or scaled teams is a plus.</li> <li>Deep expertise with Kubernetes, multi-cluster orchestration, and ML serving frameworks</li> <li>Experience with multi-tenant SaaS platforms</li> <li>Proven track record of SLA ownership with specific metrics (99.9% uptime, p99 latency targets)</li> <li>Customer escalation and incident communication experience</li> <li>Experience with LLM inference serving systems (SGLang, vLLM, TRT-LLM, or similar)</li> <li>Ability to influence cross-functional teams and make deployment/architecture decisions</li> </ul> <p><strong>Nice to Have</strong></p> <ul> <li>Experience building internal developer platforms or self-serve tooling</li> <li>Background in cost optimization for GPU infrastructure</li> <li>Contributions to open-source ML infrastructure projects</li> </ul> <h3><strong>About Together AI</strong></h3> <p>Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind techno ... (truncated, view full listing at source)