ML Software Tool Development Engineer
Cerebras SystemsSunnyvale CA or Toronto CanadaPosted 1 March 2026
Job Description
<div class="content-intro"><p><span data-contrast="none">Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. </span><span data-ccp-props="{"134233117":false,"134233118":false,"201341983":0,"335559685":0,"335559737":240,"335559738":240,"335559739":240,"335559740":279}"> </span></p>
<p>Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. <a href="https://openai.com/index/cerebras-partnership/">OpenAI recently announced a multi-year partnership with Cerebras</a>, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. </p>
<p>Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.</p></div><p><strong>Responsibilities:</strong></p>
<ul>
<li>Lead the design and implementation of system-level debugging, validation, and observability platforms.</li>
<li>Develop automated systems for collecting and analyzing numerical, and execution anomalies.</li>
<li>Create visualization and analysis tools to enable efficient root-cause investigation.</li>
<li>Build frameworks for failure classification, regression detection, and anomaly monitoring.</li>
<li>Extend compilers, runtimes, and programming interfaces to support advanced profiling and instrumentation.</li>
<li>Improve system bring-up, low-level debug, and validation workflows.</li>
<li>Partner cross-functionally with compiler, hardware, firmware, runtime, and infrastructure teams.</li>
<li>Establish best practices for debuggability, reliability, and operational excellence.</li>
<li>Lead high-impact initiatives.</li>
<li>Support incident response and drive long-term corrective actions.</li>
</ul>
<p><strong> </strong></p>
<p><strong>Qualifications: </strong></p>
<ul>
<li>Strong proficiency in C++ and Python, with a track record of building reliable, high-performance systems and tooling.</li>
<li>Demonstrated experience debugging complex hardware/software systems and driving issues to root cause.</li>
<li>Experience analyzing system-level data structures, execution graphs, or dependency networks for diagnostics and validation.</li>
<li>Proven ability to design and build intuitive visualization and analysis tools for complex technical data.</li>
</ul>
<ul>
<li>Experience with compiler internals, custom hardware interfaces, or low-level protocol design.</li>
</ul>
<ul>
<li>Strong written and verbal communication skills, with the ability to explain technical concepts to diverse stakeholders.</li>
<li>Ability to work independently and lead complex technical projects end-to-end.</li>
</ul>
<p> </p>
<p><strong>Preferred Skills Qualifications</strong></p>
<ul>
<li>Familiarity with machine learning training and inference pipelines, especially distributed training and large-model scaling.</li>
<li>Prior work on high-performance clusters, HPC systems, or custom hardware/software co-design.</li>
</ul>
<p> </p><div class="content-conclusion"><h4><strong>Why Join Cerebras</strong></h4>
<p>People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members o ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
More jobs at Cerebras Systems
See all →More Python jobs
See all →AI Engineer- Gen AI/SWE- Weights & Biases
Weights and Biases · Livingston, NJ / New York, NY / San Francisco, CA / Sunnyvale, CA / Bellevue, WA / Remove - US
AI Customer Support Engineer, Tier I - Weights & Biases
Weights and Biases · Sunnyvale, CA
AI Customer Support Engineer, Tier I - W&B EMEA
Weights and Biases · London, England
Analytics Engineer - Weights & Biases
Weights and Biases · San Francisco, CA / Remote - US