Member of Technical Staff, AI Reliability & Monitoring Engineering Lead
PostmanSan Francisco, California, United States$256k – $276kPosted 23 February 2026
Tech Stack
Job Description
<div class="content-intro"><h2><strong>Who Are We?</strong></h2>
<p>Postman is the world’s leading API platform, used by more than 45 million+ developers and 500,000 organizations, including 98% of the Fortune 500. Postman is helping developers and professionals across the globe build the API-first world by simplifying each step of the API lifecycle and streamlining collaboration—enabling users to create better APIs, faster.</p>
<p>The company is headquartered in San Francisco and has offices in Boston, New York, Austin, Tokyo, London, and Bangalore - where Postman was founded. Postman is privately held, with funding from Battery Ventures, BOND, Coatue, CRV, Insight Partners, and Nexus Venture Partners. Learn more at postman.com or connect with Postman on X via @getpostman.</p>
<p>P.S: We highly recommend reading <a href="https://api-first-world.com/">The "API-First World" graphic novel</a> to understand the bigger picture and our vision at Postman.</p></div><div id="content" class="highlighter-context page view" data-inline-comments-target="true" data-testid="page-content-only">
<div class="_19itglyw _vchhusvi _r06hglyw _19pkidpf _2hwx1wug _otyr1epz _18u01wug _1bsb1osq">
<div id="main-content" class="wiki-content cc-1m6kbux e5xcnr80" data-testid="pageContentRendererTestId" data-vc="pageContentRendererTestId" data-test-appearance="full-page">
<div class="renderer-overrides">
<div class="cc-3qfej8">
<div class="ak-renderer-wrapper is-full-page cc-pw7jst">
<div class="cc-1drlcw4">
<div class="ak-renderer-document">
<h2><strong>The Opportunity</strong></h2>
<p data-renderer-start-pos="10479">Postman is seeking an experienced AI Systems Reliability Engineer to help define, build, and maintain the infrastructure and processes that ensure the reliability, scalability, and performance of Postman’s AI-powered API and agentic systems in production. This role focuses on monitoring, availability, incident response, and automation to support AI services and tools trusted by millions of developers globally.</p>
<h2 data-renderer-start-pos="10909"><strong>What You’ll Do</strong></h2>
<ul class="ak-ul" data-indent-level="1">
<li>
<p data-renderer-start-pos="10929">Develop and manage reliability metrics (SLOs) for AI-driven API services and agentic AI platform features</p>
</li>
<li>
<p data-renderer-start-pos="11038">Implement comprehensive observability and monitoring systems for real-time performance and fault detection</p>
</li>
<li>
<p data-renderer-start-pos="11148">Design and drive automated failover, recovery, and incident response strategies for high-availability AI infrastructure</p>
</li>
<li>
<p data-renderer-start-pos="11271">Optimize resource utilization, particularly GPU/accelerator efficiency, ensuring cost-effective AI system operation</p>
</li>
<li>
<p data-renderer-start-pos="11390">Collaborate closely with engineering, platform, and product teams to align reliability efforts with broader organizational goals</p>
</li>
<li>
<p data-renderer-start-pos="11522">Lead efforts to build internal tooling and automation focused on AI system stability and operational excellence</p>
</li>
<li>
<p data-renderer-start-pos="11637">Drive continuous improvement in deployment practices, monitoring approaches, and incident management processes</p>
</li>
</ul>
<h2><strong>About You</strong></h2>
<ul class="ak-ul" data-indent-level="1">
<li>
<p data-renderer-start-pos="11783">Have a strong background in AI reliability engineering, SRE, or DevOps for distributed systems</p>
</li>
<li>
<p data-renderer-start-pos="11881">Understand the unique challenges of maintaining large-scale AI systems and integrating AI-specific metrics into reliability frameworks</p>
</li>
<li>
<p data-renderer-start-pos="12019">Are experienced with cloud platforms, monitoring tools, and incident response automation</p>
</li>
<li>
<p data-renderer-start-pos="12111">Are comfortable collaborating across teams to influence best practices for AI system reliability and operatio ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
More jobs at Postman
See all →Manager, Account Development
San Francisco, California, United States · 27 February 2026
Head of B2B Integrated Campaigns
San Francisco, California, United States · 27 February 2026
Technical Support Engineer, Bangalore
Bengaluru, Karnataka, India · 25 February 2026
Software Engineer, Ecosystem
San Francisco, California, United States · 24 February 2026
More Scala jobs
See all →Payroll Analyst
DoorDash · United States - Remote
Creative Project Manager
DoorDash · Los Angeles,CA; San Francisco, CA; New York, NY
Manager, New Verticals - Gift Card Strategy & Operations
DoorDash · New York, NY; San Francisco, CA; Los Angeles, CA; Seattle, WA; Washington, DC
Shift Lead - 11 Mile & Gratiot
DoorDash · Roseville, MI