Senior Research Scientist - Reinforcement Learning, MoEs
CanvaLondon,Posted 25 February 2026
Job Description
<p>At Canva, our mission is to empower the world to design. We’re building AI that feels magical and lands real impact for millions of people - helping anyone create with confidence. We’re looking for a senior research scientist who lives and breathes reinforcement learning, agentic systems and mixture of expert models to push the frontier of reasoning, tool use, latency and reliability - and ship it to users.</p><p><strong>About the team</strong></p><p>We explore multimodal agentic architectures, build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. We are a cutting-edge post-training team, developing new multimodal agentic systems. We work on all topics of multimodal modelling, post-training and design agents, we build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. We are looking for a person with experience in post-training, reinforcement learning (RL) and mixture of expert models to join our team.</p><p><strong>About the role</strong></p><p>You’ll drive research directions and play a leading role in hands‑on work across the agent stack—from reward design and policy optimization to planning, memory, and tool orchestration, dataset construction, to post-training, and the development of novel post-training approaches. You’ll design tight experiments, iterate quickly, and land trustworthy conclusions. Most importantly, you’ll help convert research into reliable, safe, and high‑quality product experiences.</p><p><strong>What you’ll do</strong></p><ul><li><p>Develop agent systems (planning, multimodal tool use, retrieval, novel training approaches, modeling ablations) for real tasks in design, vision, and language.</p></li><li><p>Scale post-training and RL across distributed systems (PyTorch) with efficient data loaders, tracing/telemetry, stable training of mixture-of-experts (MoE) architectures, and reproducible pipelines; profile, debug, and optimize.</p></li><li><p>Contribute to the research agenda for RL/agentic systems aligned with Canva’s product goals; identify high‑leverage bets and retire dead ends quickly.</p></li><li><p>Build reward models and learning loops: RLHF/RLAIF, preference modeling, DPO/IPO‑style objectives, offline/online RL, curriculum learning, and credit assignment for multi‑step reasoning.</p></li><li><p>Develop simulation and sandbox tasks that surface failure modes (planning errors, tool‑use brittleness, hallucination, unsafe actions) and turn them into measurable targets.</p></li><li><p>Help align on rigorous evaluation for agents (task success, reliability, latency, safety, regressions). Stand up offline suites and online A/B tests; favor simple, controlled experiments that generalize.</p></li><li><p>Collaborate and ship: work shoulder‑to‑shoulder with product, design, safety, and platform to land research as reliable features—then iterate.</p></li><li><p>Share and elevate: mentor teammates, present findings internally, and contribute back to the community when it helps the field and our users.</p></li></ul><p><strong>You’re likely a match if you have</strong></p><ul><li><p>Depth in implementing and post-training MoEs/LLMs/VLMs/Diffusion models, with a track record of shipped research or publications in MoEs, RL or agents.</p></li><li><p>Experience modifying, and adapting open-source models.</p></li><li><p>Strong experience with experimental design: tight baselines, clean ablations, reproducibility, and clear, data‑backed conclusions.</p></li><li><p>Fluency in Python and PyTorch; you’re comfortable in large ML codebases and can profile, debug, and optimize training and inference.</p></li><li><p>Practical experience building agent loops (planning, tool invocation, retrieval, memory) and evaluating multi‑step reasoning quality.</p></li><li><p>Hands‑on experience with policy optimization, reward modeling, ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
More jobs at Canva
See all →More Python jobs
See all →[Summer 2026] People Science - PhD Intern
Roblox · San Mateo, CA, United States
Team Lead - Security Platform
Cloudflare · Distributed; Hybrid
Sr. Security Software Engineer, Applied Computing (Starshield)
SpaceX · Hawthorne, CA
Security Software Engineer, Applied Computing (Starshield)
SpaceX · Washington, DC