Description
We are seeking a skilled and driven NVLink Engineer to support large-scale data center deployments. In this role, you'll be at the forefront of cutting-edge infrastructure technologies, ensuring the optimal performance and stability of NVLink systems.
Key Responsibilities:
- Support the deployment of NVLink systems across large data center environments.
- Support the full lifecycle management of NVLink hardware and software components.
- Build and maintain tooling to automate and streamline the deployment, monitoring and troubleshooting workflows.
- Diagnose and resolve performance, connectivity and stability issues in complex environments.
- Collaborate with internal teams and external customers worldwide.
- Participate in a rotating on-call schedule to ensure 24/7 support coverage.
Required Qualifications:
- Solid understanding of networking fundamentals
- Proven background in troubleshooting network and server hardware at the component level.
- Strong Linux system administration skills.
- Proficiency in at least one language (e.g., Python, Go).
- Proven ability to troubleshoot and debug complex application issues.
- Excellent communication and collaboration skills.
- Experience with Ansible.
Preferred Qualifications:
- Experience with InfiniBand networking.
- Experience managing large-scale environments (1,000+ switches or nodes).
- Prior experience with NVLink technologies.
- Knowledge of Redfish API for system management.
- Experience with NVUE (NVIDIA User Experience).
- Background with SONiC.
- Experience with Grafana/PromQL
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://job-boards.greenhouse.io/coreweave/jobs/4645664006