Description
The Customer Experience (CX) Organisation at CoreWeave is dedicated to ensuring every client running AI workloads at scale has a seamless, reliable, and high-performance experience.
As a Manager of Bare Metal Support Engineering, you'll be at the centre of ensuring our dedicated infrastructure remains stable, reliable, and performant. You'll lead daily support operations, triage incidents, drive escalations, and ensure that hardware is monitored, maintained, and delivered effectively for our clients.
Key responsibilities include:
- Leading a skilled team responsible for maintaining and optimising physical infrastructure across multiple client environments.
- Building, developing, and leading a dedicated Infrastructure Support team focused on supporting key infrastructure, handling escalations, and ensuring smooth hardware operations.
- Overseeing the resolution of infrastructure-related incidents, escalation management, and collaborating with internal teams to deliver effective solutions.
- Improving support processes to enhance efficiency and reduce downtime, ensuring the infrastructure meets client expectations.
The ideal candidate will have 5+ years of experience leading teams responsible for infrastructure support, data centre operations, or physical compute environments. They should be hands-on with Linux system administration and command-line tools, familiar with hardware-level diagnostics, troubleshooting, and replacement, and have experience working with high-performance rack-scale hardware.
In addition to the required skills, preferred skills include experience managing infrastructure support teams in high-growth or rapidly evolving environments, proven ability to develop and implement operational processes that scale with business needs, and strong familiarity with server and GPU hardware lifecycle management.