Site Reliability Engineer (Kubernetes)

Crusoe
Dublin - IEPosted 27 March 2026

Job Description

Site Reliability Engineer (Kubernetes) Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role: At Crusoe, our Site Reliability Engineering (SRE) team plays a pivotal role in ensuring the reliability and performance of our infrastructure. SRE at Crusoe is dedicated to detecting, analyzing, and preventing issues to maintain high Service Level Agreements (SLAs) through Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Through automation and proactive remediation, our SREs resolve common errors automatically, then advise various engineering teams how to build resilient code. We anticipate and resolve issues before they impact our customers, conduct thorough post-mortems, and drive continuous improvement. Our customer-centric approach ensures that clients always have access to the virtual machines they depend on. This role is crucial for maintaining the "gold standard" reliability and performance of Crusoe's AI platform. The ideal candidate has experience in SRE practices, understanding of distributed systems, networking, Linux, and a passion for automation and problem-solving. This is a full-time position. What You’ll Be Working On: - Building Kubernetes Platform: Focus on scaling tooling and features dedicated to Crusoe's Managed Kubernetes and Managed VM platforms for external customers. - Collaboration and Planning: Collaborate with the team in morning stand-up meetings to discuss ongoing projects, recent incidents, and priorities for the day. Collaborate on action plans for deploying new data centers or retrofitting existing ones. Work closely with software engineers, advising on best practices for resilient code and reviewing changes before deployment. - System Monitoring and Alerting: Review overnight alerts and system performance metrics to ensure everything is running smoothly. Analyze system logs and develop tools to enhance our monitoring capabilities. - Incident Response and Problem Solving: Engage in incident response drills, post-mortems, and root cause analysis sessions to learn from past issues and prevent future ones. Resolve common errors automatically through automation and proactive remediation. - Performance Monitoring and Optimization: Stay focused on maintaining high SLIs and SLOs, ensuring that our infrastructure remains robust and reliable for our customers. - Documentation and Knowledge Sharing: Document work, share insights with the team, and plan for the next day's challenges, always with a customer-centric mindset. What You’ll Bring to the Team: - SRE Experience: 3-6 years of professional SRE experience. - Kubernetes: Expereince building Kubernetes platforms or Kubernetes controllers - Server Hardware and Provisioning: Exposure to server-class hardware & provisioning. - Distributed Systems Architecture: Understanding of distributed system architecture; ... (truncated, view full listing at source)
Apply Now

Direct link to company career page

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card

Share