Job Description
Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential. Title and Summary Principal AI Engineer Overview As a Principal AI Engineer on the AI Foundations team, you are an established subject matter expert in AI Engineering who applies expert knowledge and experience to drive achievement of key area goals and initiatives by making significant improvements to new or existing products, services, and/or processes. You lead the design and operationalization of complex, production-grade agentic systems—particularly multi-agent, multi-tool solutions that plan, call tools safely, maintain memory, and continuously improve through evaluation and feedback. You influence technical direction across programs, set engineering standards for reliability and responsible AI, and partner with platform, security, governance, and product stakeholders to ship measurable business outcomes. Responsibilities • Serve as an established subject matter expert in AI Engineering, influencing stakeholders and shaping technical direction across multiple initiatives. • Architect, design, develop, and maintain advanced AI/ML systems, with emphasis on complex agentic solutions (multi-agent orchestration, tool/function-calling, memory, reflection/self-correction, and autonomy policies). • Lead production implementation of agentic AI systems, including scalable training and evaluation pipelines, deployment frameworks, and runtime orchestration patterns. • Define and implement safe tool-use patterns: structured outputs, robust error handling, permissioning and auditability, human-in-the-loop (HITL) approval steps for sensitive actions, and guardrail enforcement. • Establish end-to-end AgentOps/LLMOps practices for agentic systems: release pipelines for prompts/tools/policies, canary strategies, safe rollback mechanisms, and continuous regression/safety evaluations as release gates. • Build and optimize data ingestion, preprocessing, feature/embedding engineering, and retrieval/memory workflows to improve grounding quality and reduce failure modes. • Own production observability for agentic systems: trace capture, cost/token telemetry, latency and reliability SLOs, and incident response practices for agent failures. • Implement drift detection and performance decay monitoring (data drift, concept drift), and automate model/agent retraining, policy updates, and redeployment to maintain output quality over time. • Drive measurable improvements in system effectiveness, safety, and efficiency by defining success metrics (task success, intervention rate, policy violations, cost and latency per task) and continuously improving evaluation coverage. • Mentor and grow senior and junior engineers through design reviews, code reviews, hands-on coaching, and the creation of reusable patterns, playbooks, and standards for agentic delivery. Key Skills • Agentic System Architecture: multi-agent orchestration, planning and goal decomposition, tool/function-calling, memory and retrieval patterns, and behavior optimization. • Production Engineering for Agents: scalable deployment frameworks, high-availability runtime design, robust failure handling, and operational readiness. • AgentOps / LLMOps: CI/CD for prompts/tools/policies, release governance, experiment management, canaries, feature flags, and safe rollbacks. • Evaluation & Safety: automated eval pipelines (behavioral, regression, adversarial), HITL workflows, guardrails, policy enforcement, and red-team testing as shipping gates. • Observability & Reliability: distributed tracing for ... (truncated, view full listing at source)