Description
The Weights & Biases (W&B) team builds the developer platform trusted by machine learning practitioners to track, manage, and scale their ML workflows. As a Technical Program Manager focused on platform reliability and release management, you'll be at the centre of our platform's growth and stability.
You will partner with engineering teams within W&B and CoreWeave AI/ML Platform Services (AMPS) to ensure W&B integrates seamlessly into the broader ML ecosystem, while maintaining high reliability and predictable releases.
This role is ideal for someone who thrives in cross-functional environments, has a strong grasp of developer workflows, and excels at creating repeatable, reliable program structures that scale.
Responsibilities
- Drive end-to-end program management for critical platform initiatives.
- Build and run release management processes, ensuring predictable and high-quality delivery cycles.
- Partner with engineering and product to define success metrics, manage risks, and ensure on-time delivery.
- Build and scale incident management and RCA processes for W&B services.
- Improve the predictability and visibility of releases across teams, introducing dashboards, retrospectives, and program forums.
- Collaborate with TPMs and engineering leaders across W&B and CoreWeave to ensure end-to-end reliability across the ML developer stack.
Qualifications
- Bachelor's degree in a technical field or equivalent experience.
- 5+ years of program management experience in SaaS, developer tools, or ML/AI platforms.
- Proven experience running release management programs and incident management processes.
- Strong technical fluency in cloud computing, developer workflows, and CI/CD practices.
- Excellent communication and facilitation skills with diverse technical and non-technical audiences.
- Track record of improving reliability, efficiency, and predictability in software delivery.
Additional Qualifications
- Familiarity with ML workflows, model training/inference, and developer productivity tools.
- Experience building integrations between SaaS platforms, APIs, and cloud services.
- Strong background in reliability engineering practices and DevOps program leadership.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://job-boards.greenhouse.io/coreweave/jobs/4610109006