Senior Applied Scientist – AI Red Teaming & Model Risk

Uber
New York, United StatesPosted 7 March 2026

Tech Stack

Job Description

Senior Applied Scientist – AI Red Teaming & Model Risk Department: Data Science Team: Data Scientist Location: New York, United States Type: Full-Time **About the Role** As AI systems—particularly LLMs and agentic AI—become core to our products and internal platforms, understanding how these systems fail is just as important as improving their performance. We are looking for a Senior Applied Scientist to join our AI Red Teaming efforts and focus on adversarial evaluation, failure analysis, and risk discovery in AI models and AI agents. In this role, you will systematically probe AI systems to uncover unsafe, unintended, or harmful behaviors, including prompt injection, jailbreaks, behavioral drift, tool misuse, and context or memory poisoning. You will design experiments, build evaluation frameworks, and analyze outcomes to surface risks that traditional ML metrics do not capture. This role is ideal for a data scientist who enjoys working at the edge of model behavior, cares deeply about safety and robustness, and wants to apply scientific rigor to securing real-world AI systems. **What the Candidate Will Need / Bonus Points** \-\-\-\- What the Candidate Will Do ---- 1. Design and execute AI red-teaming experiments against LLMs and AI agents to identify: prompt injection (direct & indirect), jailbreaking and policy bypass, model and tool poisoning, context and memory poisoning, behavioral drift and unsafe autonomy 2. Develop adversarial datasets, probes, and test harnesses to systematically evaluate model and agent behavior under attack. 3. Define and track AI risk metrics beyond accuracy (e.g., failure rates, drift indicators, unsafe action likelihood, confidence miscalibration). 4. Analyze agent workflows and decision traces to understand how failures emerge across multi-step reasoning and tool use.\ 5. Collaborate with security engineers and AI platform teams to translate findings into guardrails, mitigations, and design improvements. 6. Build reusable evaluation pipelines to support continuous red teaming and regression testing as models and agents evolve. \-\-\-\- Basic Qualifications ---- 1. 5+ years
Apply Now

Direct link to company career page

Share this job