Senior Virtualization Validation Engineer
CrusoeSan Francisco, CA - US$173k – $210kPosted 27 March 2026
Job Description
Senior Virtualization Validation Engineer
Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.
We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.
If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.
SENIOR VIRTUALIZATION VALIDATION ENGINEER
SAN FRANCISCO, SUNNYVALE (ONSITE)
ROLE MISSION
At Crusoe, we are pioneering the future of sustainable computing. As a Virtualization Validation Engineer, you will be responsible for the end-to-end validation of large-scale, multi-node GPU clusters. You will focus on high-performance GPU Virtualization using QEMU and Cloud Hypervisor, ensuring that distributed workloads scale efficiently across multiple virtualized nodes. Your role is critical in validating the interconnect fabric and collective communication libraries (NCCL/RCCL) that power the world’s most demanding AI and HPC applications.
WHAT YOU’LL BE WORKING ON:
- Multi-Node Scaling Validation: Design and execute large-scale validation tests across multi-node virtualized clusters to ensure linear scaling and stability of GPU workloads.
- Interconnect & Fabric Testing: Validate high-speed interconnects—including NVLink, Infinity Fabric, InfiniBand, and RoCE—within virtualized environments to ensure low-latency, high-bandwidth communication.
- Hypervisor & GPU Virtualization: Lead the validation of QEMU and Cloud Hypervisor with a focus on PCIe passthrough (VFIO), IOMMU, and direct device assignment for GPUs and high-speed NICs.
- Collective Communication Benchmarking: Architect and run comprehensive test suites using nccl-tests and rccl-tests (e.g., AllReduce, AllGather) to verify performance across node boundaries.
- Network Stack Validation: Validate SR-IOV and RDMA configurations to ensure that virtualized guests achieve near-bare-metal networking performance for distributed GPU tasks.
- Automated Cluster Orchestration: Develop and maintain automation frameworks in Python or Go to dynamically provision, configure, and stress-test multi-node virtualized environments.
- Performance Bottleneck Analysis: Perform deep-dive analysis of performance regressions in multi-node communication, identifying root causes across the guest OS, hypervisor, and physical fabric.
WHAT YOU’LL BRING TO THE TEAM:
- Education & Experience: 2-5+ YOE demonstrated ability to competently and independently perform responsibilities plus Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related technical field.
- Virtualization Expertise: Proven experience with QEMU/KVM and Cloud Hypervisor in a production or research environment.
- Distributed GPU Ecosystems: Deep familiarity with NVIDIA (CUDA/NCCL) and/or AMD (ROCm/RCCL) stacks in a multi-node context.
- Networking Knowledge: Strong understanding of RDMA, RoCE, and InfiniBand protocols and their implementation in virtualized systems.
- System Internals: Expert-level knowledge of Linux kernel internals, specifically PCIe topology, VFIO, and memory management (HugeP ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
AI Resume Fit Check
See exactly which skills you match and which are missing before you apply. Free, instant, no spam.
Check my resume fitFree · No credit card
More jobs at Crusoe
See all →More Node jobs
See all →Lead Machine Learning Engineer
Serve Robotics · USA (remote)
Deal Strategy Analyst, EMEA
Lucid Software · Amsterdam, NL
Sr./Staff Software Engineer
Toma · San Francisco, CA
US Defense Engineering Lead
Mattermost · Denver, Colorado, United States; Honolulu, Hawaii, United States; San Antonio, Texas, United States; San Diego, California, United States; United States; Washington, District of Columbia, United States