Senior Software Engineer, AI Eval

San Francisco, CaliforniaPosted 23 January 2026

Tech Stack

TypeScript Python Machine Learning AI LLM Agents

Job Description

About SentryBad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster so we can get back to enjoying technology.With more than $217 million in funding and 100,000+ organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney, Microsoft, and Atlassian spend less time fixing bugs and more time building products.Sentry embraces a hybrid work model across our global hubs, with Mondays, Tuesdays, and Thursdays set as in-office anchor days to encourage meaningful collaboration. If you like to selfishly build things that make your digital life better, come help us build the next generation of software monitoring tools.About the roleAs a Senior Software Engineer on Sentry’s AI/ML team, you’ll be responsible for building the evaluation infrastructure that measures the accuracy, reliability, and real-world performance of our AI systems. This role is critical to ensuring that our debugging agents and AI-powered features behave correctly, safely, and predictably as they scale. You’ll design datasets, benchmarks, and test harnesses that turn ambiguous AI behavior into measurable signals, helping the team ship AI with confidence.In this role you willDesign and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systemsCreate and curate high-quality datasets, golden test cases, and benchmarks grounded in real production dataBuild automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflowsPartner closely with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteriaOwn the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoringYou’ll love this job if youCare deeply about correctness, rigor, and measurement in AI systemsEnjoy turning fuzzy product goals and model behavior into concrete tests and metricsLike building foundational infrastructure that unlocks faster iteration and higher confidence for the entire AI teamThrive in cross-functional environments and enjoy influencing model design through better evaluationQualificationsMinimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related fieldExperience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)Comfort writing production-quality code (we use Python and TypeScript)Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelinesFamiliarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)Bonus: experience evaluating LLMs, agentic systems, or AI-assisted developer toolsThe base salary range (or hourly wage range, if applicable) that Sentry reasonably expects to pay for this position is $240,000 to $280,000 USD. A successful candidate’s actual base salary (or hourly wage) amount will be determined by a variety of relevant factors including, without limitation, the candidate’s work location, education, work and other relevant experience, skills, and job-related knowledge. A successful candidate will be eligible to participate in Sentry’s employee benefit plans/programs applicable to the candidate’s position (including incentive compensation, equity grants, paid time off, and group health insurance coverage). See Sentry Benefits for more details about the Company’s benefit plans/programs.Equal Opportunity at SentrySentry is committed to providing equal employment opportunities to its employees and candidates for employment regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, or other legally-prot ... (truncated, view full listing at source)

Apply Now

Direct link to company career page

More jobs atSentry

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card