# Senior Manager, Observability

**Company**: CoreWeave
**Location**: Sunnyvale, CA
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Salary**: $188,000 to $275,000
**Category**: Engineering
**Industry**: Technology

**Apply**: https://job-boards.greenhouse.io/coreweave/jobs/4675051006?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_23a8d884-351

## Description

CoreWeave is seeking a Senior Manager, Observability Engineering to lead a team responsible for building, scaling, and operating observability systems across metrics, logs, traces, and telemetry pipelines.

The Observability Engineering organization at CoreWeave is responsible for the platforms and practices that help engineers understand, operate, and improve production systems at scale. This team owns and evolves the foundations for metrics, logs, traces, telemetry pipelines, and observability reliability, enabling teams to detect issues quickly, troubleshoot complex distributed systems, and operate AI infrastructure with confidence.

As CoreWeave continues to scale, observability plays a critical role in delivering reliable platform experiences, improving engineering velocity, and maintaining operational excellence across a rapidly growing cloud environment.

Responsibilities:

- Define strategy and roadmap for observability systems

- Drive platform reliability and performance improvements

- Guide architectural decisions across observability infrastructure

- Partner with infrastructure, platform, security, and application engineering teams to improve instrumentation and production visibility

- Lead a team of engineers and technical leads

Requirements:

- 8+ years of software engineering experience with production systems at scale

- 4+ years of engineering management experience leading senior engineers and technical leads

- Experience building and operating observability platforms across logs, metrics, traces, and alerting in distributed systems

- Knowledge of reliability engineering concepts including SLOs, SLIs, incident management, error budgets, and fault-tolerant design

- Experience scaling telemetry systems including collection pipelines, storage backends, and query layers

- Experience with distributed systems, performance engineering, and trade-offs involving scale, resilience, and cost

- Experience partnering with infrastructure, security, and application engineering teams to drive platform adoption

- Experience hiring and managing engineering teams

Preferred Skills:

- Experience with OpenTelemetry, Grafana, Prometheus-compatible systems, log aggregation, and distributed tracing tools

- Experience operating cloud-native infrastructure, including Kubernetes environments

- Experience supporting large-scale cloud, developer platforms, or AI/ML infrastructure

- Familiarity with capacity planning for high-ingest telemetry systems

- Experience scaling platforms in high-growth environments

Benefits:

- Medical, dental, and vision insurance - 100% paid for by CoreWeave

- Company-paid Life Insurance

- Voluntary supplemental life insurance

- Short and long-term disability insurance

- Flexible Spending Account

- Health Savings Account

- Tuition Reimbursement

- Ability to Participate in Employee Stock Purchase Program (ESPP)

- Mental Wellness Benefits through Spring Health

- Family-Forming support provided by Carrot

- Paid Parental Leave

- Flexible, full-service childcare support with Kinside

- 401(k) with a generous employer match

- Flexible PTO

- Catered lunch each day in our office and data center locations

- A casual work environment

- A work culture focused on innovative disruption

## Skills

### Required
- software engineering
- production systems
- observability platforms
- reliability engineering
- distributed systems
- performance engineering
- telemetry systems
- engineering management

### Nice to have
- OpenTelemetry
- Grafana
- Prometheus-compatible systems
- log aggregation
- distributed tracing tools
- cloud-native infrastructure
- Kubernetes environments
- capacity planning
- high-growth environments

---

Source: [Apply at job-boards.greenhouse.io](https://job-boards.greenhouse.io/coreweave/jobs/4675051006?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
