Infrastructure Site Reliability Engineer

Radiant
GloucestershirePosted 7 April 2026

Job Description

Infrastructure Site Reliability Engineer ABOUT RADIANT Radiant is redefining how AI infrastructure is built. We design and operate AI-native cloud platforms engineered for sovereignty, performance, and scale. Our infrastructure powers GPU-native workloads, multi-tenant control planes, and high-performance AI systems designed for the most demanding environments. We are not building a generic cloud. We are building purpose-built AI infrastructure - from powered land, to compute, to software . As we scale our platform and expand our engineering organisation, we are looking for leaders who can build strong teams, uphold high standards, and deliver reliably at pace. Job Summary: We’re looking for an experienced Infrastructure Site Reliability Engineer to run and evolve our infrastructure stack. You’ll contribute across bare-metal, virtualization, and orchestration layers, keeping things stable and secure 24/7 x 365 — all while mentoring teammates, improving process and automation as well as helping translate deep technical concepts for a wide range of collaborators and customers. What You’ll Do : - Deploy and operate resilient, scalable infrastructure supporting AI/HPC workloads - Optimize Linux system configuration, BIOS/firmware, kernel, and disk subsystem for performance - Configure, monitor and manage bare-metal infrastructure using IPMI, Redfish, etc - Build and maintain automation scripts and infrastructure as code to support platform lifecycle, as well as simplifying troubleshooting for Incident resolution and provision of tooling for our support organisation - Apply ITSM frameworks: Incident, Major Incident, Change Management, and service improvement. - Maintain and enhance ORI’s observability stack: Prometheus, Grafana, and custom monitoring integrations - Operate and support services in 24x7 production environments, including on-call rotation - Contribute to Incident postmortem analyses, root cause analysis, document learnings, and automate remediations - Mentor junior engineers and act as an Operational requirements consultant to other departments - Communicate technical decisions clearly to non-technical stakeholders and customers - Uphold a culture of: do, document, automate - Willingness to cross train with Platform Engineering/Platform SRE to fully support both our infrastructure and platform stacks. - Willingness to cross train with HPC Engineering, supported by NVIDIA to enhance our - HPC supportability offering What you bring: - 5+ Years Proven experience in globally scaled, performance-intensive environments operating to a 24/7 support model - Expert-level Linux administration, especially Ubuntu distributions - Proficiency in system tuning, disk I/O optimization, and hardware-level performance tweaks - Familiarity with Out of Band management tools (IPMI, Redfish, PXE, etc.) - Strong networking fundamentals: TCP/IP, DNS, DHCP, VLANs, routing, switching - Strong experience with infrastructure scripting and automation (Bash, Python, Ansible) - Deep understanding of observability principles and tools (Prometheus, Grafana) - Hands-on experience operating orchestration platforms (Kubernetes, MAAS, Tinkerbell) - Strong grasp of ITSM and service operation best practices - Excellent communication and mentorship skills - Comfortable interfacing with internal stakeholders and external customers - Bonus: Knowledge of HPC workloads and GPU-based infrastructure - Bonus: Experience with InfiniBand networks and HPC performance tuning Nice to have: - Bachelor or Masters Level degree in Computer Science, Engineering or related field, or equivalent experience. - LPIC Certifications - ITIL Foundation level qualification or equivalent experience How you work: - You approach problems with a systems mindset - balancing practical execution with long-term scalability - You elevate the team, setting high standards for technical quality and engineering excellence. - You hold yourself and others ... (truncated, view full listing at source)
Apply Now

Direct link to company career page

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card

Share