Manager, AI Operations & Evaluation
ChimeSan Francisco, CA; United StatesUp to $208kPosted 24 February 2026
Job Description
<h3><strong>About the Role</strong></h3>
<p>AI Operations (AIOPS) defines how AI is governed, evaluated, and continuously improved across OMX. We ensure every model in Operations is accurate, fair, and aligned with Chime’s standards for operational excellence and member trust.</p>
<p>As Manager, AI Evaluation Insights, you’ll lead the team responsible for operationalizing and executing AI evaluation standards across OMX. You’ll run human and automated evaluation systems, manage model health monitoring, and apply testing and simulation frameworks that detect hallucinations, bias, or drift before they impact members or agents.</p>
<p>You’ll manage a team of TPM’s and evaluation specialists who measure AI performance across risk, compliance, agent experience, and bot experience domains. You’ll ensure AI deployments meet the standards set by the AI Governance pillar and deliver measurable value to Operations.</p>
<p>The base salary offered for this role and level of experience will begin at $150,000.00 and up to $208,000.00. Full-time employees are also eligible for a bonus, competitive equity package, and benefits. The actual base salary offered may be higher, depending on your location, skills, qualifications, and experience.</p>
<h3><strong>In This Role, You Will</strong></h3>
<ul>
<li>Lead the AI Evaluation team, owning staffing, coaching, performance management, and delivery of evaluation and testing frameworks.</li>
<li>Manage the AI evaluation lifecycle — including pre-launch testing, simulation, and post-deployment health monitoring — ensuring alignment with governance standards and expectations.</li>
<li>Create domain-specific evaluation tracks (e.g., Compliance Risk, Bot Experience, Agent Experience) to assess AI quality from multiple perspectives.</li>
<li>Operationalize human-in-the-loop testing, integrating reviewer feedback into continuous improvement loops.</li>
<li>Oversee simulation environments (3rd-party tools) for stress-testing LLMs and identifying hallucinations or performance regressions.</li>
<li>Partner closely with AI Platform Governance to implement evaluation metrics, reporting, and health signals in alignment with Responsible AI principles.</li>
<li>Develop dashboards and reporting frameworks to track evaluation coverage, accuracy, and confidence scores across models.</li>
<li>Collaborate with Enablement, Speech Analytics, and Data Operations to ensure AI evaluation results inform retraining, policy, and member impact analysis.</li>
<li>Coach and develop TPM’s to become domain experts in responsible AI measurement. Foster a high-performing, collaborative team culture, ensuring career development and continuous skill enhancement for all team members.</li>
</ul>
<h3><strong>To Thrive in This Role, You Have</strong></h3>
<ul>
<li>7+ years in AI/ML operations, quality, or evaluation with at least 2+ years of people leadership experience.</li>
<li>Deep understanding of LLM behavior, prompt testing, and evaluation methodologies.</li>
<li>Familiarity with human-in-the-loop frameworks and prompt testing tools.</li>
<li>Strong program management and stakeholder communication skills.</li>
<li>Technical proficiency in SQL, Python (preferred), or data visualization platforms (Looker, Snowflake).</li>
<li>Experience collaborating with Engineering, Data Science, and Risk/Compliance partners on AI-related initiatives.</li>
<li>A passion for operational excellence and responsible innovation.</li>
</ul>
<h3><strong>Why This Role Matters</strong></h3>
<p>This role creates the execution layer between AI experimentation and operational reality — ensuring governance standards are consistently applied and AI systems are safe, fair, and high-performing in production. You’ll lead the teams that deliver the evaluation signals Operations relies on to trust every AI model deployed.</p>
<p>#LI-EI1 #LI-Remote</p><div class="content-conclusion"><h2><strong>A little about us</strong></h2>
<p>At Chime, we believe that everyone can ac ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
More jobs at Chime
See all →More Python jobs
See all →[Summer 2026] People Science - PhD Intern
Roblox · San Mateo, CA, United States
Team Lead - Security Platform
Cloudflare · Distributed; Hybrid
Sr. Security Software Engineer, Applied Computing (Starshield)
SpaceX · Hawthorne, CA
Security Software Engineer, Applied Computing (Starshield)
SpaceX · Washington, DC