Figma

Manager, Software Engineering - Observability

Figma
remote senior full-time $258,000-$376,000 USD San Francisco, CA • New York, NY • United States
Apply →

First indexed 18 Apr 2026

Description

We are seeking a Manager, Software Engineering - Observability to lead our team of engineers responsible for the reliability, scalability, and evolution of Figma's observability and cost engineering platforms.

As a key member of our engineering team, you will own and operate Figma's core observability stack, including vendor platforms such as Datadog, ensuring high availability, strong data quality, and effective signal-to-noise across metrics, logs, and traces.

You will define and drive the technical strategy for instrumentation standards, observability libraries, agents, and operators used to monitor internal and external facing services. You will also explore and implement innovative, AI-driven approaches to anomaly detection, root cause analysis, signal correlation, and operational automation.

In addition, you will establish clear frameworks for cost attribution, budgeting, forecasting, and alerting across infrastructure and observability spend, enabling teams to make informed tradeoffs.

You will partner with infrastructure, product engineering, finance, and security teams to improve visibility into system health and cost efficiency at scale.

You will lead initiatives to optimize observability footprint and spend, balancing depth of insight with performance and cost considerations.

You will coach and mentor engineers through career development, performance feedback, and technical leadership, fostering a culture of ownership, collaboration, and high-quality execution.

We are looking for someone with 4+ years of experience leading infrastructure, observability, or platform engineering teams, with a track record of delivering highly reliable production systems.

You should have deep hands-on experience with modern observability platforms (e.g., Datadog, OpenTelemetry) across metrics, logs, and distributed tracing.

You should have a strong understanding of distributed systems, instrumentation best practices, SLO design, and incident response workflows.

Experience driving cost transparency and accountability initiatives, including cost attribution, budgeting, forecasting, and alerting in cloud environments is also required.

Preferred skills include experience designing or evolving company-wide observability standards, shared libraries, and agent/operator-based integrations, background in cost optimization for infrastructure or observability tooling, including vendor negotiations and usage modeling, and experience applying AI or machine learning techniques to anomaly detection, root cause analysis, or operational automation.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/figma/jobs/5807963004