Member of Technical Staff - Platform (Deployment Infrastructure)
xAIPalo Alto, CA; Washington, D.C.$180k – $440kPosted 27 March 2026
Job Description
About xAI
xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.
Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity.
We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important.
All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.
ABOUT THE ROLE:
You will build the tooling that turns a hardware listing and a deployment profile into a complete, self-contained software bundle capable of standing up xAI's full AI inference platform — from bare metal provisioning through GPU workloads — at any site, in any environment, with no internet access required.
xAI operates GPU infrastructure across public cloud, on-premise, and classified environments. Today, these targets are served by separate codebases that drift with every release. You will build the unified deployment platform that eliminates this divergence: a single generator that reads a thin profile (site topology, compliance requirements, connectivity model) and produces everything needed to deploy — Kubernetes manifests, switch configurations, OS provisioning configs, monitoring stacks, signed container image bundles, and acceptance tests. One source, every target.
You work on the unclassified (low) side. You build the tooling; cleared engineers at classified sites execute it. The quality of what you build directly determines how effectively those engineers can operate in environments where they cannot call you for help. Your tooling must be deterministic, complete, well-tested, and foolproof.
RESPONSIBILITIES:
Design and build the deployment generator: a Go CLI that reads a YAML profile (6 deployment axes + site topology) and produces a fully-resolved deployment manifest with pinned image digests, rendered Helm values, switch configs, OSP inventory, network telemetry configuration, and AlertManager grouping/inhibition rules computed from the site topology.
Build the bundle pipeline: collect all referenced container images, Helm charts, OS boot images, NVIDIA drivers, and model weights into a signed, self-contained tarball with CycloneDX SBOM and cosign signatures. Build the update bundle pipeline for delta-only updates: diff against the previously shipped baseline manifest, package only changed artifacts, sign, and include apply-update scripts and machine-readable changelogs.
Implement profile-driven rendering: the same model deployment YAML, the same operator charts, the same monitoring stack produce correct output for public cloud (ArgoCD), enterprise on-prem (Pulumi), and classified air-gap (static manifests) targets based on profile selection.
Build and maintain the cross-profile CI matrix: every PR touching shared platform code is validated against all active deployment profiles before merge, catching cross-profile breakage at PR time.
Build the testing and validation framework: manifest validation against CRD schemas (kubeconform), profile-specific constraint checks (no external dependencies in air-gap profiles, FIPS requirements for gov profiles), acceptance test generation, and shadow cluster pre-transfer testing.
Develop the actuator: the high-side executor that receives a signed bundle, verifies signatures, loads images into a local registry, and converges the cluster to the manifest state with zero-downtime updates and automatic rollback on failure.
Build the CDS send-side pipeline: stage signed bundles for transfer through a one-way data diode or physical media, with machine-readable changelogs and verification tooling.
Own the bare met ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
AI Resume Fit Check
See exactly which skills you match and which are missing before you apply. Free, instant, no spam.
Check my resume fitFree · No credit card