Technical Product Manager - Mission Control
NebiusAmsterdam, Netherlands; Remote - EuropePosted 20 March 2026
Job Description
Why work at Nebius Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside some of the most experienced and innovative leaders and engineers in the field.
Where we work Headquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with RD hubs across Europe, North America, and Israel. The team of over 1400 employees includes more than 400 highly skilled engineers with deep expertise across hardware and software engineering, as well as an in-house AI RD team.
The role
At Nebius, we’re building a next-generation AI compute platform for large-scale ML training and inference — from a few nodes to thousands of GPUs. We’re looking for a Technical Product Manager to lead Mission Control — the product area responsible for reliability and performance across the full infrastructure stack. As PM for Mission Control, you will own foundational capabilities that determine how well AI infrastructure performs in real-world training and inference workloads — from bare metal and networking to scheduler/runtime behavior and user-facing outcomes. This is a deeply technical PM role.
Prior PM title is not mandatory : strong candidates from HPC, ML infrastructure, distributed systems, SRE, cloud engineering, or ML solution architecture who want to grow into product are welcome.
Your responsibilities will include:
• Own reliability and performance opportunities across the Nebius stack: from bare metal to applications . • Define product direction end-to-end: problem discovery → design → delivery → adoption . • Drive cross-functional execution across compute, networking, storage, observability, platform, and hardware teams . • Lead deep problem research using customer interviews, analytics, workload studies, and logs investigations. • Identify and prioritize bottlenecks affecting large-scale training/inference performance and stability. • Translate advanced ML/infrastructure research into practical, scalable product capabilities. • Define and operationalize product metrics for cluster experience (e.g. reliability, efficiency, latency-to-start, utilization, throughput).
We expect you to have:
• 3–5+ years of experience in one or more of: product management, HPC, ML infrastructure/MLOps, distributed systems, SRE, cloud architecture, or GPU platforms. • Strong technical foundation in distributed systems, cloud infrastructure, or ML platforms. • Hands-on familiarity with ML orchestration environments (e.g. Slurm, Kubernetes, Ray , or similar). • Experience delivering technically complex initiatives with multiple engineering teams. • Strong communication skills and ability to influence engineering, research, and customer stakeholders. • Experience using analytics and data to prioritize roadmap decisions. • High ownership, learning speed, and comfort in fast-evolving AI infrastructure environments.
It will be an added bonus if you have:
• Experience with GPU platforms and HPC technologies ( InfiniBand/RDMA , topology-aware systems). • Familiarity with modern ML training stacks ( PyTorch, DeepSpeed, FSDP/ZeRO, NCCL ). • Understanding of training efficiency metrics and operational signals ( Goodput, MFU, scheduling quality, health checks ). • Exposure to large-scale LLM training or inference systems. • Background in observability, performance tuning, or reliability engineering. • Customer-facing technical experience supporting ML or infrastructure workloads.
About Nebius
Nebius AI is an AI cloud platform with one of the largest GPU capacities in Europe. Launched in November 2023, the Nebius AI platform provides high-end, training-optimized infrastructure for AI practitioners. As an NVIDIA preferred cloud service provider, Nebius A ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
AI Resume Fit Check
See exactly which skills you match and which are missing before you apply. Free, instant, no spam.
Check my resume fitFree · No credit card