# HPC Engineer

**Company**: CoreWeave
**Location**: New York, NY/ Bellevue, WA/ Sunnyvale, CA / Livingston, NJ
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Salary**: $109,000 to $204,000
**Category**: Engineering
**Industry**: Technology

**Apply**: https://job-boards.greenhouse.io/coreweave/jobs/4645664006
**Canonical**: https://yubhub.co/jobs/job_fb9b187c-e32

## Description

We are seeking a skilled and driven NVLink Engineer to support large-scale data center deployments. In this role, you'll be at the forefront of cutting-edge infrastructure technologies, ensuring the optimal performance and stability of NVLink systems.

Key Responsibilities:

- Support the deployment of NVLink systems across large data center environments.

- Support the full lifecycle management of NVLink hardware and software components.

- Build and maintain tooling to automate and streamline the deployment, monitoring and troubleshooting workflows.

- Diagnose and resolve performance, connectivity and stability issues in complex environments.

- Collaborate with internal teams and external customers worldwide.

- Participate in a rotating on-call schedule to ensure 24/7 support coverage.

Required Qualifications:

- Solid understanding of networking fundamentals

- Proven background in troubleshooting network and server hardware at the component level.

- Strong Linux system administration skills.

- Proficiency in at least one language (e.g., Python, Go).

- Proven ability to troubleshoot and debug complex application issues.

- Excellent communication and collaboration skills.

- Experience with Ansible.

Preferred Qualifications:

- Experience with InfiniBand networking.

- Experience managing large-scale environments (1,000+ switches or nodes).

- Prior experience with NVLink technologies.

- Knowledge of Redfish API for system management.

- Experience with NVUE (NVIDIA User Experience).

- Background with SONiC.

- Experience with Grafana/PromQL

## Skills

### Required
- Networking fundamentals
- Linux system administration
- Python
- Go
- Troubleshooting and debugging

### Nice to have
- InfiniBand networking
- Ansible
- Redfish API
- NVUE
- SONiC
- Grafana/PromQL
