Research Engineer, Evaluations

Remote - New York$210k – $260kPosted 25 March 2026

Tech Stack

Python Express Go Machine Learning AI NLP LLM Agents GDPR

Job Description

About AssemblyAI AssemblyAI builds the best-in-class Speech AI models powering the next generation of voice applications. Our models serve 600M+ inference calls monthly, process 1M+ hours of audio daily, and power 2 billion+ end-user experiences—from voice agents and meeting assistants to contact centers and medical scribes. Companies like Zoom, Granola, Fireflies, Cluely, and Calabrio rely on AssemblyAI to ship production-ready voice AI. We're at an inflection point in Speech AI. We released Universal-Streaming in mid-2025, and it has quickly earned its place as the model offering the best accuracy-latency-cost tradeoff on the market. Our research team drives these advances and ships with relentless velocity. Since releasing Universal-Streaming, we've already launched keyterms prompting feature and multilingual support —with more significant improvements on the roadmap. We've raised $115M+ from Accel, Insight Partners, Y Combinator's AI Fund, Patrick and John Collison, Nat Friedman, and Daniel Gross. We're a remote team building one of the next great AI companies—and we're looking for people who will shape its future. About the Role We are looking for a Senior Research Engineer to join our streaming speech-to-text research team—a new role that sits at the intersection of research, product, and engineering. You'll be the person who makes sure we're measuring the right things, benchmarking against the right competitors, building and extending evaluation tooling and translating customer pain points into quantifiable research targets. You'll own the evaluation infrastructure that tells us whether our models are actually better—and by how much. This role is ideal for someone with a Machine Learning / Research Engineering background who is obsessed with understanding what customers actually need, and who gets satisfaction from turning vague feedback ("the model feels slow") into concrete metrics that the whole team can align around. You're comfortable talking to customer-facing teams one hour, designing a new evaluation framework the next, and then convincing researchers why it matters. You'll also operate at the frontier of the voice agent ecosystem. Our streaming product integrates with orchestration frameworks like LiveKit, Pipecat, and Vapi, and you'll need to understand how ASR fits into the broader voice agent stack—alongside VAD, turn detection, TTS, and LLM components. As this stack evolves rapidly, you'll help ensure our evaluations reflect real-world integration scenarios. You'll work directly with our research and engineering teams and become the connective tissue between what customers need and what researchers build. If you're entrepreneurial, rigorous about measurement, and want to have an outsized impact on the success of a rapidly growing product, this is your role. What You'll Do Evaluation Benchmarking Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics (e.g., turn detection latency, endpointing accuracy) Build and maintain competitive benchmarking pipelines against other providers in the market Design and run systematic experiments to measure the impact of model changes Dataset Test Set Management Onboard, curate, and maintain evaluation datasets—both public benchmarks and internal test sets Create evaluation subsets that stress-test specific capabilities and edge cases Metric Development Research Translation Define evaluation metrics that capture real-world performance Translate qualitative customer feedback into quantifiable evaluation criteria Work with customer-facing teams to understand pain points and convert them into research priorities Research Velocity Reduce friction for researchers by maintaining clean evaluation pipelines and clear documentation Identify evaluation gaps proactively and propose solutions Move fast—iterate on benchmarking approaches weekly, not monthly What You'll Need ML fundamentals : You underst ... (truncated, view full listing at source)

Apply Now

Direct link to company career page

More jobs atAssemblyAI

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card