Technical Program Manager, AI Infrastructure

Cerebras Systems
Sunnyvale, CAPosted 1 March 2026

Job Description

<div class="content-intro"><p><span data-contrast="none">Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. </span><span data-ccp-props="{"134233117":false,"134233118":false,"201341983":0,"335559685":0,"335559737":240,"335559738":240,"335559739":240,"335559740":279}"> </span></p> <p>Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. <a href="https://openai.com/index/cerebras-partnership/">OpenAI recently announced a multi-year partnership with Cerebras</a>, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. </p> <p>Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.</p></div><h4>About The Role</h4> <p class="x_p1"><span data-olk-copy-source="MessageBody">Be part of the team that builds and operates the world's fastest AI infrastructure for training and inference. Your role as a TPM will help accelerate data center buildouts to meet the explosive demand for our inference service platform.</span></p> <h4 class="x_p1">Responsibilities </h4> <ul> <li data-olk-copy-source="MessageBody">Own end-to-end technical programs for multiple data center buildouts, coordinating with partners, contractors, and internal teams.</li> <li>Drive facility site readiness for power and cooling for Cerebras Wafer-Scale Engine systems.</li> <li>Coordinate equipment delivery and manage vendor accountability for schedules and quality related to rack integration and inter-rack cabling.</li> <li>Act as the single-threaded owner across internal partners: Hardware Systems Engineering, Network Storage Engineering, AI Cloud Infrastructure Operations.</li> <li>Enforce handover criteria between site completion, equipment deployment, and operations.</li> <li>Own overall schedule tracking, risk identification, and mitigation, creating clear visibility for leadership.</li> <li>Establish program governance, risk tracking, and RACI clarity.</li> <li>Present program status, metrics, and operational risks to senior leadership.</li> <li>Drive partner accountability on contractual milestones and commercial commitments.</li> <li>Document repeatable processes and implement them to scale across future data centers.</li> <li>Partner on installation, commissioning, change management, and break/fix workflows.</li> <li>Lead incident reviews and postmortems, ensuring corrective actions are completed.</li> </ul> <h4 class="x_p1"><strong>Qualifications </strong></h4> <ul> <li data-olk-copy-source="MessageBody">Experience leading large, cross-functional infrastructure programs.</li> <li>Experience with AI/ML, HPC, or accelerator-based infrastructure.</li> <li>Strong understanding of data center power and cooling fundamentals.</li> <li>Experience installing and managing network, storage, and compute devices.</li> <li>Proven ability to define and operationalize metrics.</li> <li>Strong written and executive-level communication skills.</li> <li>Experience working with colocation providers and facilities teams.</li> <li>Background in incident management, reliability, or service operations.</li> <li>Experience running network operations teams is a plus.</li> </ul><div class="content-conclusion"><h4><strong>Why Join Cerebras</strong></h4> ... (truncated, view full listing at source)