Principal Engineer, Core Infrastructure

Klaviyo
Boston, MAPosted 24 February 2026

Job Description

<div class="content-intro"><p><em>At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit <a class="_ymio1r31 _ypr0glyw _zcxs1o36 _mizu194a _1ah3dkaa _ra3xnqa1 _128mdkaa _1cvmnqa1 _4davt94y _4bfu18uv _1hms8stv _ajmmnqa1 _vchhusvi _kqswh2mm _ect4ttxp _syaz13af _1a3b18uv _4fpr8stv _5goinqa1 _f8pj13af _9oik18uv _1bnxglyw _jf4cnqa1 _30l313af _1nrm18uv _c2waglyw _1iohnqa1 _9h8h12zz _10531ra0 _1ien1ra0 _n0fx1ra0 _1vhv17z1" href="http://klaviyo.com/careers" data-renderer-mark="true">klaviyo.com/careers</a> to see how we empower creators to own their own destiny.</em></p></div><p>As a hands‑on principal for compute, networking, storage, runtimes (e.g., Kubernetes), CI/CD, and observability, you’ll architect the service platform that lets teams ship fast and safely. IC role—no direct reports—you lead via design, code, and incident excellence, setting technical standards and SLOs for platform services.</p> <p><strong>What You’ll Do</strong></p> <ul> <li>Architect and evolve the Kubernetes platform, service mesh, networking, storage, and CI/CD pipelines; ship golden paths and IaC modules.</li> <li>Define platform SLOs; use error budgets to guide reliability vs. velocity trade‑offs; drive incident learning and readiness reviews.</li> <li>Improve developer velocity (build/deploy times, flaky tests, local dev ergonomics) with measurable results.</li> <li>Lead capacity planning and commitments; build guardrails for cost, security, and compliance with Security/FinOps partners.</li> <li>Write high‑impact code, automation, and tooling; mentor across teams and raise the bar on operational excellence</li> <li>Embed AI in the developer experience—from code generation to observability and incident response—so teams ship faster and safer by default.</li> </ul> <p><strong>Who You Are</strong></p> <ul> <li>Experience: 10+ years building and operating cloud platforms (compute, networking, storage, runtimes like Kubernetes), with a track record of multi‑region HA and SLO rigor.</li> <li>Technical expertise: Deep in Kubernetes, service mesh, Terraform/IaC, CI/CD, and production observability; you ship golden paths and guardrails that lift the whole org.</li> <li>Experience with databases and storage systems, including SQL and NoSQL databases, and object, block, or file storage platforms.</li> <li>AI tools automation: You’ve brought AI into platform engineering—from copilot‑assisted workflows and intelligent test generation to AIOps for incident triage, anomaly detection, and runbook automation—with clear security and cost boundaries.</li> <li>Ops leadership: You lead via design reviews, incident excellence, and SLO/error‑budget tradeoffs communicated in business terms.</li> <li>AI fluency: You’re hands‑on with AI tools and help teams adopt them responsibly.</li> </ul> <p><strong>Nice to Haves</strong></p> <ul> <li>Core SLOs velocity: ≥99.95% SLOs for core services; 25–50% faster build/deploy times; developer‑reported friction trending down.</li> <li>AI‑enabled platform: Approved AI tooling is integrated into IDE/CI/CD with repo policies and auditability; ≥70% MAU among eligible engineers; MTTR down 20–30% via AI‑assisted triage; flaky‑test rate decreases through targeted, AI‑suggested fixes.</li> <li>Guardrails in place: Cost, security, and compliance controls are codified as IaC modules and enforced in paved roads.</li> <li>Experience with enterprise governance, including compliance and audit requirements.</li> <li>Familiarity with GDPR and data privacy considerations in large-scale, production environments.</li> </ul> <p><strong>Success in 6–12 Months</strong></p> ... (truncated, view full listing at source)