Job Description
The AI Platform team at Datadog builds the infrastructure that powers the next generation of generative AI features across our products.
As a Senior Software Engineer on the Evaluation and Annotation team, you will design and evolve the systems that define and measure AI quality at scale. This includes building evaluation pipelines, model performance monitoring, and annotation workflows that assess correctness, safety, bias, and reliability across production use cases.
Your work will directly shape how Datadog ships and maintains trustworthy AI capabilities. You will partner closely with product, ML, and infrastructure teams to define quality standards, integrate evaluation systems with our observability platform, and build human-in-the-loop feedback mechanisms that continuously improve model behavior.
At Datadog, we place value in our office culture - the relationships that it builds, the creativity it brings to the table, and the collaboration of being together. We operate as a hybrid workplace to ensure our employees can create a work-life harmony that best fits them.
What You’ll Do:
Design and scale robust evaluation systems to measure the performance and reliability of LLMs and AI agents across Datadog’s product ecosystem
Lead efforts to build human-in-the-loop and automated annotation pipelines for model assessment, ensuring high-quality training and feedback data
Define and implement continuous evaluation workflows in CI/CD and production environments to monitor model behavior in real time
Analyze model outputs for correctness, bias, safety, and reliability and translate insights into actionable improvements
Collaborate cross-functionally with Applied Scientists, Researchers, product managers, and platform engineers to establish best practices for responsible AI
Mentor team members and contribute to long-term technical strategy focused on AI quality, trust, and safety
Who You Are:
You have 6+ years of experience building large-scale distributed systems or machine learning systems in production environments
You’ve designed infrastructure to support AI/ML model evaluation, annotation, or benchmarking workflows
You approach engineering with a focus on system design, long-term maintainability, and reliability
You bring a strong understanding of AI/ML concepts, including evaluation metrics, prompt analysis, and trust safety challenges
You thrive in cross-functional teams and communicate effectively with technical and non-technical stakeholders
Bonus: You’ve used open-source tools like evaluation frameworks (e.g., lm-eval-harness), vector databases, or human feedback loops
Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we encourage you to apply.
Benefits and Growth:
New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
Continuous professional development, product training, and career pathing
Intradepartmental mentor and buddy program for in-house networking
An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
Access to Inclusion Talks, our internal panel discussions
Free, global mental health benefits for employees and dependents age 6+
Competitive global benefits
Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.
#LI-Hybrid
About Datadog:
Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve complexity in the cloud age by enabling digital transformation, cloud migration, and infrastructure monitoring of our customers’ entire technology stacks. Built by engineers, for engineers, Datadog is used by organizations of all sizes across a wide range of industries. Together, we champion professional devel ... (truncated, view full listing at source)