Infrastructure Operations Engineer

TensorWave
Las Vegas, NevadaPosted 24 February 2026

Job Description

Our mission at Tensorwave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.About the roleWe are seeking a Infrastructure Operations Engineer to join our growing infrastructure team. This role is ideal for someone who thrives in hardware-centric environments, enjoys hands-on datacenter and system administration work, and can build reliable automation around large-scale infrastructure. You will be responsible for managing enterprise hardware, monitoring systems, network operations, infrastructure automation, and supporting our compute clusters across multiple data centers.This role touches every layer of modern infrastructure - from bare metal provisioning, to OS and Kubernetes management, to monitoring and troubleshooting hardware. If you are detail-oriented, resourceful, and comfortable working with both low-level hardware systems and higher-level DevOps tooling, we’d love to talk.ResponsibilitiesManage and maintain enterprise-grade server hardware including diagnostics and break/fix for CPUs, memory, disks, PSUs, and NICsOperate out-of-band management systems for remote access and recovery - iLO, iDRAC, IPMI, RedfishDesign, build, and maintain infrastructure monitoring and alerting - Prometheus, Grafana, SNMP, or similarAdminister and troubleshoot Linux systems - OS install, boot issues, services, networking, filesystems, and access controlsOwn bare-metal provisioning workflows - PXE/UEFI boot and automated node bring-up using MAAS, Foreman, or equivalentsBuild and maintain infrastructure automation - shell scripting and CLI tooling to improve reliability and scale operationsManage core networking - subnets, IP address management, VLANs, routing, NAT, and firewall configurationConfigure and support secure connectivity such as VPNs - IPsec, WireGuard, OpenVPNSupport Kubernetes clusters at the infrastructure layer - node lifecycle, access, basic troubleshooting, and scalingPartner with internal teams to ensure compute clusters remain reliable, secure, and scalable across multiple data centersRequired ExperienceBachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experienceProven experience managing enterprise-grade hardware at scaleExpertise with automation languages such as Python, Go, PHP, or PerlStrong understanding of out-of-band management systems - IPMI, BMC, RedfishHands-on expertise with monitoring systems - Prometheus, Grafana, SNMP, Nagios, CheckMK, or similarSolid knowledge of network administration - firewalls, routing, VPNs, NAT, and managed switchesLinux system administration experience - installation, configuration, troubleshootingExperience with filesystems - RAID, partitioning, and general storage management.Familiarity with certificate management - key-based auth, and cryptographic functions.Experience with bare metal provisioning - MAAS, Foreman, or similarUnderstanding of PXE/UEFI/HTTP boot systemsAbility to write functional, maintainable bash scripts for automationNice to HaveExperience with Kubernetes - operators, cluster scaling, CRDsExperience with Helm chart customizationExposure to high-availability or distributed compute environmentsKnowledge of infrastructure security and hardening practicesWhat We BringMission driven companyCompetitive SalaryStock Options100% paid Medical, Dental, and Vision insuranceFlexible PTOPaid Holidays401(k)Parental LeaveFlexible Spending AccountShort Term Disability InsuranceLife and Voluntary Supplemental InsuranceMental Health Benefits through Spring HealthWe’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.Tensorwave is an equal opportunity employer ... (truncated, view full listing at source)