Storage Reliability Engineer

CoreWeave
Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA$139k – $204kPosted 10 March 2026

Job Description

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com . What You’ll Do: CoreWeave’s Storage Reliability team sits at the intersection of infrastructure engineering, operations, and customer enablement. The team is responsible for ensuring the stability, performance, and operational excellence of the storage systems powering some of the world’s largest AI workloads. We work directly with production systems at scale, partnering closely with engineering, solutions, and customer-facing teams to maintain reliability while continuously improving the tooling, automation, and observability that support our storage platform. About the role: As a Storage Reliability Engineer, you will operate and support mission-critical storage systems that power large-scale AI and data-intensive workloads. You will work hands-on with production infrastructure, triaging complex incidents, debugging issues across the application, system, and kernel layers, and contributing fixes and improvements to the storage stack. This role sits at the boundary between engineering and operations, turning real-world production learnings into long-term reliability improvements through tooling, automation, and operational best practices. You’ll also partner closely with internal teams and customers to diagnose and resolve complex deployment and performance issues. Who You Are: Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience 5+ years of experience working with storage systems, distributed infrastructure, or low-level systems in production environments Strong debugging and troubleshooting skills across user space and kernel space, including experience analyzing core dumps Hands-on experience working with Kubernetes and Kubernetes CSI drivers Experience working with storage protocols and APIs such as NFS and/or S3 Proficiency in systems programming and debugging in Go or a comparable language Strong understanding of Linux internals, system performance, and system behavior under load Experience operating production systems within an on-call rotation and responding to high-impact incidents Demonstrated experience building tooling, automation, or diagnostics to improve reliability and operational efficiency Experience supporting complex infrastructure deployments in collaboration with customer-facing or solutions engineering teams Preferred: Experience working with distributed storage systems in large-scale production environments Experience contributing fixes or improvements to storage infrastructure or storage-related services Experience building observability tooling or reliability frameworks for infrastructure systems Experience supporting AI, HPC, or other high-performance computing workloads Wondering if you’re a good fit? We believe in investing in our people and value candidates who bring diverse experiences and perspectives to our teams — even if you aren't a 100% skill or experience match. Here are a few qualities we’ve found compatible with our team. If some of this describes you, we’d love to talk. You love solving deep infrastructure problems and debugging complex distributed systems You’re curious about how large-scale storage systems behave under real-world workloads You’re an expert in diagnosing and improving the reliability of production infrastructure Why CoreWeave? At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a litt ... (truncated, view full listing at source)
Apply Now

Direct link to company career page

Share