Staff Linux & DevOps SSE
FlexAIBangalore, IndiaPosted 6 April 2026
Job Description
Staff Linux & DevOps SSE
About FlexAI
Build and Deploy AI the right way, anywhere.
The FlexAI Compute Infrastructure Platform provides an "end-to-end AI compute layer" for running and managing workloads across any cloud, any GPU, and any deployment model (public, hybrid, or on-prem). It brings together "1-click simplicity" for users with "enterprise-grade orchestration, security, and automation" under the hood.
Founded by Brijesh Tripathi , who bring experience from Nvidia, Apple, Tesla, Intel and Zoox, FlexAI is not just building a product – we’re shaping the future of AI. Our teams are strategically distributed across Paris, Silicon Valley, and Bangalore, united by a shared mission: to deliver more compute with less complexity.
If you're passionate about shaping the future of artificial intelligence, driving innovation, and contributing to a sustainable and inclusive AI ecosystem, FlexAI is the place for you !
Role Overview
FlexAI is seeking a Staff Linux & Systems Engineer to architect, build, and operate large-scale bare-metal AI/HPC GPU clusters. This role extends beyond hands-on systems engineering into technical leadership, platform architecture, and fleet-scale infrastructure ownership.
You will lead platform bring-up across the full stack (UEFI/BIOS → bootloaders → OS → kernel/device enablement), drive low-level networking performance (RoCEv2/InfiniBand), ensure GPU/accelerator stack readiness, and establish repeatable automation frameworks for provisioning, compliance, and reliability at scale.
This role is suited for engineers who are deeply comfortable operating across firmware, kernel, PCIe, and distributed AI infrastructure — and who can translate low-level expertise into scalable platform systems and engineering standards.
JD-Senior Linux & Systems Engin…
What You'll Do
Platform Architecture & Fleet Ownership:
Architect and lead end-to-end bring-up of AI/HPC server platforms from firmware to production cluster deployment
Define standards for UEFI/BIOS configuration, SecureBoot, TPM/MeasuredBoot, GRUB, PXE/iPXE provisioning workflows
Establish scalable patterns for fleet provisioning, configuration management, and lifecycle operations across GPU clusters
Own technical roadmap for bare-metal AI infrastructure and systems reliability at scale
Platform & Boot Enablement:
Lead server bring-up including UEFI/BIOS configuration, bootloader flows, and secure boot pipelines
Architect automated BMC/IPMI/Redfish workflows for out-of-band provisioning and fleet management
Standardize platform initialization processes across heterogeneous hardware environments
Diagnose and resolve complex boot, firmware, and hardware initialization issues
OS & Kernel Engineering:
Architect, build, and harden custom Linux (Ubuntu) images optimized for AI and HPC workloads
Lead kernel tuning for performance-sensitive workloads (NUMA, IRQ affinity, cgroups, namespaces)
Diagnose and resolve kernel and user-space performance issues using perf, ftrace, eBPF, and bpftrace
Drive system-level optimizations for latency, throughput, and resource utilization across clusters
PCIe, Driver & Device Enablement:
Lead validation of PCIe topologies and advanced features (ACS, ARI, ATS, SR-IOV, IOMMU/VFIO)
Own GPU/NIC driver bring-up, firmware validation, and device performance optimization
Root-cause complex regressions across kernel, drivers, firmware, and userspace layers
Partner with hardware vendors to resolve low-level device and platform issues
Provisioning & Automation at Scale:
Architect idempotent Ansible-based provisioning frameworks and automation pipelines
Build scalable golden images and repeatable provisioning workflows for large GPU fleets
Develop Python/Pytest validation harnesses for pre- and post-provisioning checks
Implement drift detection, remediation, and compliance automation across infrastructure
GPU / Accelerator & HPC Stack Readiness:
Lead enablement of NVIDIA CUDA, NCCL, GPUDirect RDMA and AMD RO ... (truncated, view full listing at source)
Apply Now
Direct link to company career page
AI Resume Fit Check
See exactly which skills you match and which are missing before you apply. Free, instant, no spam.
Check my resume fitFree · No credit card
More jobs at FlexAI
See all →More Node jobs
See all →Senior Software Engineer - ML Platform
Latitude · Pittsburgh, PA, Palo Alto, CA
Senior Software Engineering Manager, Connected Devices
Axon · Ho Chi Minh City, Ho Chi Minh City, Vietnam
Privacy Manager
Axon · Amsterdam, North Holland, Netherlands
Sr. Software Engineer (LATAM)
Finalis · Latin America (All countries)