Storage Reliability Engineer
CoreWeaveLivingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA$139k – $204kPosted 10 March 2026
Job Description
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .
What You’ll Do: CoreWeave’s Storage Reliability team sits at the intersection of infrastructure engineering, operations, and customer enablement. The team is responsible for ensuring the stability, performance, and operational excellence of the storage systems powering some of the world’s largest AI workloads. We work directly with production systems at scale, partnering closely with engineering, solutions, and customer-facing teams to maintain reliability while continuously improving the tooling, automation, and observability that support our storage platform.
About the role: As a Storage Reliability Engineer, you will operate and support mission-critical storage systems that power large-scale AI and data-intensive workloads. You will work hands-on with production infrastructure, triaging complex incidents, debugging issues across the application, system, and kernel layers, and contributing fixes and improvements to the storage stack. This role sits at the boundary between engineering and operations, turning real-world production learnings into long-term reliability improvements through tooling, automation, and operational best practices. You’ll also partner closely with internal teams and customers to diagnose and resolve complex deployment and performance issues.
Who You Are:
Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
5+ years of experience working with storage systems, distributed infrastructure, or low-level systems in production environments
Strong debugging and troubleshooting skills across user space and kernel space, including experience analyzing core dumps
Hands-on experience working with Kubernetes and Kubernetes CSI drivers
Experience working with storage protocols and APIs such as NFS and/or S3
Proficiency in systems programming and debugging in Go or a comparable language
Strong understanding of Linux internals, system performance, and system behavior under load
Experience operating production systems within an on-call rotation and responding to high-impact incidents
Demonstrated experience building tooling, automation, or diagnostics to improve reliability and operational efficiency
Experience supporting complex infrastructure deployments in collaboration with customer-facing or solutions engineering teams
Preferred:
Experience working with distributed storage systems in large-scale production environments
Experience contributing fixes or improvements to storage infrastructure or storage-related services
Experience building observability tooling or reliability frameworks for infrastructure systems
Experience supporting AI, HPC, or other high-performance computing workloads
Wondering if you’re a good fit?
We believe in investing in our people and value candidates who bring diverse experiences and perspectives to our teams — even if you aren't a 100% skill or experience match. Here are a few qualities we’ve found compatible with our team. If some of this describes you, we’d love to talk.
You love solving deep infrastructure problems and debugging complex distributed systems
You’re curious about how large-scale storage systems behave under real-world workloads
You’re an expert in diagnosing and improving the reliability of production infrastructure
Why CoreWeave?
At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a litt ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
More jobs at CoreWeave
See all →Staff Security Engineer, SOAR
Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA · 13 March 2026
MRB Coordinator
Livingston, NJ / New York, NY / Sunnyvale, CA / San Francisco, CA / Bellevue, WA · 13 March 2026
Data Scientist I - Analytics & Dashboards
Livingston, NJ / New York, NY · 13 March 2026
Software Enablement Specialist
Livingston, NJ / New York, NY / Philadelphia, PA · 13 March 2026