# HPC Performance Engineer

**Company**: CoreWeave
**Location**: New York, NY
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Salary**: $165,000 to $242,000
**Category**: Engineering
**Industry**: Technology

**Apply**: https://job-boards.greenhouse.io/coreweave/jobs/4601657006?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_b873980b-aa0

## Description

CoreWeave is seeking an experienced HPC Performance Engineer to join its HAVOCK Team, reporting to the Manager of Systems Engineering. The successful candidate will play a crucial role in designing, developing, and optimizing bare-metal systems from POST through integration with a Kubernetes cluster.

Responsibilities include:

- Developing and maintaining tools for establishing systems performance baselines

- Developing and maintaining performance regression analysis testing automation

- Designing and maintaining performance regression test pipelines for HPC workloads

- Debugging and tuning fabric-level performance for low-latency, high-throughput configurations

- Developing telemetry for performance analysis across distributed clusters of servers

- Triaging and fixing performance issues in Linux

- Collecting data, producing metrics, and visualizations to inform business decisions and drive automation improvements

- Defining Linux and OS requirements, specifications, and system architecture in collaboration with cross-functional teams

Requirements:

- 5+ years of professional experience in Systems/HPC Performance Engineering, Benchmarking, and/or Validation

- Strong experience with MPI workloads and distributed system performance analysis

- Familiarity with RoCE, InfiniBand, and GPUDirect/Data Direct I/O, NUMA, etc. in HPC workloads

- Hands-on experience with public HPC benchmarks (HPCC, HPL, OSU, MLPerf-HPC, STREAM, IO500)

- Extensive, deep experience in Linux internals

- Fluency with a programming language geared toward automation (Python preferred)

- Experience writing robust, testable code

- Experience diagnosing and fixing systems performance issues

- Experience with implementing automation testing

- Ability to effectively prioritize and communicate proposed features and fixes in a remote-employee environment

Preferred qualifications:

- Familiarity with QA/QE best practices

- Familiarity with Golang

- Experience working in Cloud environments

- Experience as a software engineer writing large-scale applications

- Experience in open-source community software development

- Experience with machine learning

CoreWeave offers a competitive salary ranging from $165,000 to $242,000, a discretionary bonus, equity awards, and a comprehensive benefits program, including medical, dental, and vision insurance, company-paid Life Insurance, short and long-term disability insurance, flexible spending accounts, health savings accounts, tuition reimbursement, and employee stock purchase plans.

## Skills

### Required
- Systems/HPC Performance Engineering
- Benchmarking
- Validation
- MPI workloads
- Distributed system performance analysis
- RoCE
- InfiniBand
- GPUDirect/Data Direct I/O
- NUMA
- Linux internals
- Python
- Automation testing

### Nice to have
- QA/QE best practices
- Golang
- Cloud environments
- Large-scale applications
- Open-source community software development
- Machine learning

---

Source: [Apply at job-boards.greenhouse.io](https://job-boards.greenhouse.io/coreweave/jobs/4601657006?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
