Job Description
CoreWeave, the AI Hyperscaler™, acquired Weights Biases to create the most powerful end-to-end platform to develop, deploy, and iterate AI faster. Since 2017, CoreWeave has operated a growing footprint of data centers covering every region of the US and across Europe, and was ranked as one of the TIME100 most influential companies of 2024. By bringing together CoreWeave’s industry-leading cloud infrastructure with the best-in-class tools AI practitioners know and love from Weights Biases, we’re setting a new standard for how AI is built, trained, and scaled.
The integration of our teams and technologies is accelerating our shared mission: to empower developers with the tools and infrastructure they need to push the boundaries of what AI can do. From experiment tracking and model optimization to high-performance training clusters, agent building, and inference at scale, we’re combining forces to serve the full AI lifecycle — all in one seamless platform.
Weights Biases has long been trusted by over 1,500 organizations — including AstraZeneca, Canva, Cohere, OpenAI, Meta, Snowflake, Square,Toyota, and Wayve — to build better models, AI agents and applications. Now, as part of CoreWeave, that impact is amplified across a broader ecosystem of AI innovators, researchers, and enterprises.
As we unite under one vision, we’re looking for bold thinkers and agile builders who are excited to shape the future of AI alongside us. If you're passionate about solving complex problems at the intersection of software, hardware, and AI, there's never been a more exciting time to join our team.
What You’ll Do The Production Engineering team is responsible for building and operating the platform that enables our engineers to ship software quickly, reliably, and safely. This team owns the observability patterns, tools, automation, and release processes that underpin our engineering organization's ability to deliver software to customers in GCP, AWS, Azure, and even self-hosted environments.
About the Role We are seeking a highly skilled Engineering Manager with expertise in both cloud and on-premises environments. In this role, your team will architect, implement, and optimize scalable infrastructure solutions while leading automation, CI/CD, and deployment across hybrid platforms. You’ll mentor engineers, drive adoption of DevOps best practices, and champion a seamless developer experience by streamlining workflows and improving tooling. You will collaborate closely with Solutions Architects, Field Engineers, and product teams to deliver robust, integrated solutions tailored to customer needs.
Who You Are
10+ years of DevOps or engineering experience
5+ years of management experience
Expert in at least one major cloud platform (AWS, Azure, GCP)
Strong skills in infrastructure as code, automation, and configuration management (Terraform, CloudFormation, Ansible, etc.)
Proficient in scripting languages (Python, Bash, etc.)
Experience with containerization (Docker, Kubernetes)
Deep understanding of CI/CD tools (Jenkins, GitHub Actions, etc.)
Proven leadership and mentoring experience
Passion for improving developer experience and enabling engineering productivity
Preferred
Cloud certifications (AWS Solutions Architect, Azure Architect, etc.)
Experience with monitoring tools (Prometheus, Datadog, etc.)
Familiarity with security best practices in hybrid environments
Wondering if you’re a good fit?
We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match. Here are a few qualities we’ve found compatible with our team. If some of this describes you, we’d love to talk.
You love building scalable infrastructure that supports both cloud and on-prem environments
You’re curious about modern DevOps practices and how to improve developer workflows
You’re an expert in automation, CI/CD, and containerized systems
The base ... (truncated, view full listing at source)