Job Description
Staff Software Engineer (Backend Engineer + Streaming)
Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.
We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.
If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.
About This Role:
We're looking for a Staff Streaming Software Engineer to join the Observability team within our Cloud Infrastructure organization. This team builds and operates the real-time data platforms that power metrics, logs, traces, and event streams used by engineers across the company to understand and operate Crusoe's AI cloud reliably at scale.
In this role, you'll set the technical direction for our high-throughput streaming systems, driving architectural decisions and long-term investments across the observability stack. You'll operate at the intersection of deep technical execution and organizational influence—identifying gaps before they become problems, shaping how teams build and operate streaming infrastructure, and partnering with engineering leaders to align platform strategy with company goals.
This is an opportunity to define how observability data moves at scale across a rapidly growing AI cloud, and to leave a lasting architectural footprint on systems that the entire engineering organization depends on.
What You'll Be Working On:
- Defining the technical strategy and multi-quarter roadmap for streaming infrastructure that ingests and processes observability data including logs, metrics, traces, and operational events
- Leading architectural design for large-scale, high-throughput streaming systems using technologies such as Kafka, Kinesis, Pub/Sub, Flink, or similar platforms—setting standards that scale across teams
- Driving solutions to the hardest reliability and scalability challenges: high-cardinality workloads, bursty traffic patterns, cross-region data movement, and fault-tolerant delivery semantics
- Partnering with SREs, platform teams, and product engineering to define how streaming data integrates into internal observability tooling and operational workflows company-wide
- Establishing engineering best practices around instrumentation, CI/CD, infrastructure-as-code, and incident management for streaming systems
- Leading post-incident reviews and contributing systemic improvements that reduce toil and improve platform resilience across teams
- Representing the observability platform in cross-functional technical forums and influencing decisions that span the broader Cloud Infrastructure organization
- Mentoring and leveling up senior engineers through design reviews, technical feedback, and hands-on collaboration
What You'll Bring to the Team:
- Deep experience architecting and operating distributed streaming or real-time data platforms at significant scale
- A track record of driving technical decisions that have lasting, cross-team impact—not just building features but shaping how systems are designed
- Hands-on expertise with Kafka or similar distributed strea ... (truncated, view full listing at source)