Job Description
<div class="content-intro"><p>Figma is growing our team of passionate creatives and builders on a mission to make design accessible to all. Figma’s platform helps teams bring ideas to life—whether you're brainstorming, creating a prototype, translating designs into code, or iterating with AI. From idea to product, Figma empowers teams to streamline workflows, move faster, and work together in real time from anywhere in the world. If you're excited to shape the future of design and collaboration, join us!</p></div><p>Figma’s Observability engineering team builds and operates the systems that give us deep visibility into the health, performance, and efficiency of our platform. From metrics, logs, and traces to cost attribution and budgeting, this team ensures that engineers across Figma can detect issues quickly, understand system behavior at scale, and make informed decisions about reliability and spend. The team owns and evolves our core observability stack—including platforms like Datadog, shared instrumentation libraries, and the agents and operators that power telemetry collection—while continuously raising the bar on signal quality and operational clarity.</p>
<p>As the Engineering Manager for Observability, you will lead a team of five engineers responsible for shaping the future of visibility and efficiency at Figma. You’ll define the strategy for instrumentation standards and cost transparency, drive initiatives to optimize observability footprint and spend, and explore innovative AI-driven approaches to anomaly detection and operational automation. This role is well-suited for a leader with strong distributed systems experience who is motivated by platform leverage, cross-functional impact, and building systems that enable every engineering team to operate with confidence and precision.</p>
<p>This is a full time role that can be held from one of our US hubs or remotely in the United States.</p>
<h4>What you’ll do at Figma:</h4>
<ul>
<li>Lead and grow a team of engineers responsible for the reliability, scalability, and evolution of Figma’s observability and cost engineering platforms</li>
<li>Own and operate Figma’s core observability stack, including vendor platforms such as Datadog, ensuring high availability, strong data quality, and effective signal-to-noise across metrics, logs, and traces</li>
<li>Define and drive the technical strategy for instrumentation standards, observability libraries, agents, and operators used to monitor internal and external facing services</li>
<li>Explore and implement innovative, AI-driven approaches to anomaly detection, root cause analysis, signal correlation, and operational automation</li>
<li>Establish clear frameworks for cost attribution, budgeting, forecasting, and alerting across infrastructure and observability spend, enabling teams to make informed tradeoffs</li>
<li>Partner with infrastructure, product engineering, finance, and security teams to improve visibility into system health and cost efficiency at scale</li>
<li>Lead initiatives to optimize observability footprint and spend, balancing depth of insight with performance and cost considerations</li>
<li>Coach and mentor engineers through career development, performance feedback, and technical leadership, fostering a culture of ownership, collaboration, and high quality execution</li>
</ul>
<div class="section page-centered">
<h4>We'd love to hear from you if you have:</h4>
<ul>
<li>4+ years of experience leading infrastructure, observability, or platform engineering teams, with a track record of delivering highly reliable production systems</li>
<li>Deep hands-on experience with modern observability platforms (e.g., Datadog, OpenTelemetry) across metrics, logs, and distributed tracing</li>
<li>Strong understanding of distributed systems, instrumentation best practices, SLO design, and incident response workflows</li>
<li>Experience driving cost transparency and accountability initiatives, including cost attribution, ... (truncated, view full listing at source)