Research Engineer, Evaluations
AssemblyAIRemote - New York$210k – $260kPosted 25 March 2026
Job Description
About AssemblyAI
AssemblyAI builds the best-in-class Speech AI models powering the next generation of voice applications. Our models serve 600M+ inference calls monthly, process 1M+ hours of audio daily, and power 2 billion+ end-user experiences—from voice agents and meeting assistants to contact centers and medical scribes. Companies like Zoom, Granola, Fireflies, Cluely, and Calabrio rely on AssemblyAI to ship production-ready voice AI.
We're at an inflection point in Speech AI. We released Universal-Streaming in mid-2025, and it has quickly earned its place as the model offering the best accuracy-latency-cost tradeoff on the market. Our research team drives these advances and ships with relentless velocity. Since releasing Universal-Streaming, we've already launched keyterms prompting feature and multilingual support —with more significant improvements on the roadmap.
We've raised $115M+ from Accel, Insight Partners, Y Combinator's AI Fund, Patrick and John Collison, Nat Friedman, and Daniel Gross. We're a remote team building one of the next great AI companies—and we're looking for people who will shape its future.
About the Role
We are looking for a Senior Research Engineer to join our streaming speech-to-text research team—a new role that sits at the intersection of research, product, and engineering.
You'll be the person who makes sure we're measuring the right things, benchmarking against the right competitors, building and extending evaluation tooling and translating customer pain points into quantifiable research targets. You'll own the evaluation infrastructure that tells us whether our models are actually better—and by how much.
This role is ideal for someone with a Machine Learning / Research Engineering background who is obsessed with understanding what customers actually need, and who gets satisfaction from turning vague feedback ("the model feels slow") into concrete metrics that the whole team can align around. You're comfortable talking to customer-facing teams one hour, designing a new evaluation framework the next, and then convincing researchers why it matters.
You'll also operate at the frontier of the voice agent ecosystem. Our streaming product integrates with orchestration frameworks like LiveKit, Pipecat, and Vapi, and you'll need to understand how ASR fits into the broader voice agent stack—alongside VAD, turn detection, TTS, and LLM components. As this stack evolves rapidly, you'll help ensure our evaluations reflect real-world integration scenarios.
You'll work directly with our research and engineering teams and become the connective tissue between what customers need and what researchers build. If you're entrepreneurial, rigorous about measurement, and want to have an outsized impact on the success of a rapidly growing product, this is your role.
What You'll Do
Evaluation Benchmarking
Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics (e.g., turn detection latency, endpointing accuracy)
Build and maintain competitive benchmarking pipelines against other providers in the market
Design and run systematic experiments to measure the impact of model changes
Dataset Test Set Management
Onboard, curate, and maintain evaluation datasets—both public benchmarks and internal test sets
Create evaluation subsets that stress-test specific capabilities and edge cases
Metric Development Research Translation
Define evaluation metrics that capture real-world performance
Translate qualitative customer feedback into quantifiable evaluation criteria
Work with customer-facing teams to understand pain points and convert them into research priorities
Research Velocity
Reduce friction for researchers by maintaining clean evaluation pipelines and clear documentation
Identify evaluation gaps proactively and propose solutions
Move fast—iterate on benchmarking approaches weekly, not monthly
What You'll Need
ML fundamentals : You underst ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
AI Resume Fit Check
See exactly which skills you match and which are missing before you apply. Free, instant, no spam.
Check my resume fitFree · No credit card