AI Platform Engineer II
BrazeTorontoPosted 24 February 2026
Tech Stack
Job Description
<div class="content-intro"><p>At Braze, we have found our people. We’re a genuinely approachable, exceptionally kind, and intensely passionate crew.</p>
<p>We seek to ignite that passion by setting high standards, championing teamwork, and creating work-life harmony as we collectively navigate rapid growth on a global scale while striving for greater equity and opportunity – inside and outside our organization.</p>
<p>To flourish here, you must be prepared to set a high bar for yourself and those around you. There is always a way to contribute: Acting with autonomy, having accountability and being open to new perspectives are essential to our continued success.</p>
<p>Our deep curiosity to learn and our eagerness to share diverse passions with others gives us balance and injects a one-of-a-kind vibrancy into our culture.</p>
<p>If you are driven to solve exhilarating challenges and have a bias toward action in the face of change, you will be empowered to make a real impact here, with a sharp and passionate team at your back. If Braze sounds like a place where you can thrive, we can’t wait to meet you.</p></div><h4>WHAT YOU'LL DO</h4>
<p>Join the AI Platform team at Braze to build and scale BrazeAI Decisioning Studio - a reinforcement learning platform at the forefront of AI Decisioning. The platform runs continuous experimentation and personalizes customer engagement at the individual level, helping brands move from rule-based campaigns to autonomous, self-optimizing interactions. You'll work at the intersection of cloud-native infrastructure, data-intensive systems, and machine learning in production.</p>
<p>Main responsibilities:</p>
<ul>
<li>Build and maintain critical services and subsystems on our AI platform, balancing performance with cost-effective operations</li>
<li>Implement cloud-native solutions that ensure reliability, scalability, and fault tolerance</li>
<li>Troubleshoot production incidents end-to-end, going deep to identify root causes and implement durable fixes</li>
<li>Contribute to observability practices using Sentry and Datadog to proactively detect issues and minimize downtime</li>
<li>Collaborate with data scientists, ML engineers, and product teams to translate real-world use cases into platform capabilities</li>
<li>Improve developer experience by streamlining workflows, enhancing tooling, and supporting MLOps best practices</li>
</ul>
<p>Tech stack:</p>
<ul>
<li>Core Data ML: Python, Ibis, FastAPI, Dataproc (Spark), SQL, BigQuery, MLflow, Streamlit</li>
<li>Platform Infrastructure: Google Cloud Platform, AWS, Kubernetes, Helm, Terraform</li>
<li>Workflows Orchestration: Airflow, RabbitMQ, Celery</li>
<li>CI/CD: GitHub Actions, Jenkins</li>
<li>Observability: Sentry, Datadog</li>
</ul>
<p>Why this role:</p>
<ul>
<li>Production ML at scale: no toy datasets or notebook demos; you’re building infrastructure that powers real AI workloads</li>
<li>Engineering rigor: unit and integration tests, modular design, CI/CD, pair programming, and code reviews are how we work, not aspirations</li>
<li>Learn continuously: deep exposure to ML system architecture, end-to-end ML workflows, and reinforcement learning systems</li>
</ul>
<h4>WHO YOU ARE</h4>
<ul>
<li>2-4 years of experience in platform engineering, infrastructure, or a related backend role</li>
<li>Solid understanding of platform architecture, particularly in ML or data-intensive environments</li>
<li>Hands-on experience with Kubernetes and cloud infrastructure (GCP preferred)</li>
<li>Ability to troubleshoot complex distributed systems under pressure</li>
<li>Writes clean, modular code with a focus on testable APIs and maintainable design</li>
<li>Experience working with AI coding assistants. Understands effective prompting strategies and can articulate when these tools add value versus when they're not appropriate</li>
<li>Clear communicator who can work across technical and non-technical stakeholders</li>
<li>Proactive problem solver who identifies is ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
More jobs at Braze
See all →More Python jobs
See all →[Summer 2026] People Science - PhD Intern
Roblox · San Mateo, CA, United States
Team Lead - Security Platform
Cloudflare · Distributed; Hybrid
Sr. Security Software Engineer, Applied Computing (Starshield)
SpaceX · Hawthorne, CA
Security Software Engineer, Applied Computing (Starshield)
SpaceX · Washington, DC