Job Description
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .
What You’ll Do:
The Platform Infrastructure Engineering team in the Data Infrastructure organization is responsible for the performance, reliability, scalability, and security of the company’s data platform. The team builds and operates the foundational systems that power ingestion, transformation, analytics, and AI workloads at scale. This includes ownership of the underlying infrastructure for orchestration, compute, and storage systems that enable data engineering teams to build and deliver data products. We operate with production-grade discipline, supporting mission-critical services with stringent uptime requirements and a focus on automation, observability, and resilience.
About the role:
As an Engineering Manager, you will lead a team of Software Engineers and Site Reliability Engineers responsible for the infrastructure that powers CoreWeave’s data platform. You will own the reliability, scalability, and performance of core systems such as compute engines, orchestration frameworks, and storage layers. You’ll partner closely with Data Engineering teams, as well as cross-functional groups including Production Engineering, Developer Experience, Security Engineering, and IT Operations to ensure the platform is robust, secure, and easy to operate. This role balances people leadership with deep technical ownership, including stepping in hands-on when needed to support critical initiatives.
Who You Are:
7+ years of experience in software engineering, infrastructure engineering, or data platform engineering roles
2+ years of experience managing engineering teams, including hiring, coaching, performance management, and career development
Experience leading teams through the full software development lifecycle (SDLC), including planning, execution, and delivery of complex technical initiatives
Experience running and evolving engineering processes (e.g., agile development, backlog management) to drive predictable execution and continuous improvement
Experience setting team goals and metrics (e.g., OKRs) and holding teams accountable to outcomes
Strong hands-on experience operating and scaling data platform infrastructure (e.g., Spark, Airflow, Iceberg, StarRocks) in production environments
Deep expertise in Kubernetes and containerized software development, including cluster design, operations, and scaling in production environments
Experience building and operating distributed systems with high availability and performance requirements, including SLOs and incident management
Strong understanding of data platform architecture (compute, orchestration, storage) and experience driving reliability, performance, and cost optimization at the platform level
Ability to contribute code and technical solutions when needed, with proficiency in at least one programming language (Python, Java, Go, Rust)
Experience partnering with cross-functional engineering teams (e.g., Production Engineering, Developer Experience, Security, IT) and data engineering teams to deliver cohesive platform solutions
Preferred:
Experience supporting high-scale data workloads (e.g., large-scale Spark clusters, real-time ingestion platforms)
Experience working in environments with strict uptime and reliability requirements (e.g., ≥99.99% uptime)
Experience working in regulated environments with compliance frameworks such as GDPR, SOC 2, HIPAA, or SOX
Experience building internal platforms that enable self-service analytics ... (truncated, view full listing at source)