Full Stack LLM Engineer

Cerebras Systems
Toronto, Ontario, CanadaPosted 1 March 2026

Job Description

<div class="content-intro"><p><span data-contrast="none">Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. </span><span data-ccp-props="{"134233117":false,"134233118":false,"201341983":0,"335559685":0,"335559737":240,"335559738":240,"335559739":240,"335559740":279}"> </span></p> <p>Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. <a href="https://openai.com/index/cerebras-partnership/">OpenAI recently announced a multi-year partnership with Cerebras</a>, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. </p> <p>Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.</p></div><p><strong data-stringify-type="bold">About the Role</strong><br>We are seeking a versatile and experienced engineer to join our Inference Core Model Bringup team. This team is responsible to rapidly bring up state-of-the-art open-source models (like LLaMA, Qwen, etc) or customer-provided proprietary models on our Cerebras CSX systems. Success in this role requires a system-minded generalist who thrives in fast-paced bringup environments and is comfortable working across the entire Cerebras software stack.<br>Your work will play a critical role in achieving unprecedented levels of performance, efficiency, and scalability for AI applications.</p> <div class="p-rich_text_section"><strong data-stringify-type="bold">Responsibilities</strong></div> <ul class="p-rich_text_list p-rich_text_list__bullet p-rich_text_list--nested" data-stringify-type="unordered-list" data-list-tree="true" data-indent="0" data-border="0"> <li data-stringify-indent="0" data-stringify-border="0">Contribute to the end-to-end bring up of ML models on Cerebras CSX systems.</li> <li data-stringify-indent="0" data-stringify-border="0">Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning.</li> <li data-stringify-indent="0" data-stringify-border="0">Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization.</li> <li data-stringify-indent="0" data-stringify-border="0">Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups.</li> </ul> <div class="p-rich_text_section"><strong data-stringify-type="bold">Skills Qualifications</strong></div> <ul class="p-rich_text_list p-rich_text_list__bullet p-rich_text_list--nested" data-stringify-type="unordered-list" data-list-tree="true" data-indent="0" data-border="0"> <li data-stringify-indent="0" data-stringify-border="0">Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field.</li> <li data-stringify-indent="0" data-stringify-border="0">Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.</li> <li data-stringify-indent="0" data-stringify-border="0">Strong debugging skills across performance, numerical accuracy, and runtime integration.</li> <li data-stringify-indent="0" data-stringify-border="0">Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and familiarity with model internals (e.g., attention, MoE, diffusion).< ... (truncated, view full listing at source)