Description
CoreWeave is seeking an experienced HPC Performance Engineer to join its HAVOCK Team, reporting to the Manager of Systems Engineering. The successful candidate will play a crucial role in designing, developing, and optimizing bare-metal systems from POST through integration with a Kubernetes cluster.
Responsibilities include:
- Developing and maintaining tools for establishing systems performance baselines
- Developing and maintaining performance regression analysis testing automation
- Designing and maintaining performance regression test pipelines for HPC workloads
- Debugging and tuning fabric-level performance for low-latency, high-throughput configurations
- Developing telemetry for performance analysis across distributed clusters of servers
- Triaging and fixing performance issues in Linux
- Collecting data, producing metrics, and visualizations to inform business decisions and drive automation improvements
- Defining Linux and OS requirements, specifications, and system architecture in collaboration with cross-functional teams
Requirements:
- 5+ years of professional experience in Systems/HPC Performance Engineering, Benchmarking, and/or Validation
- Strong experience with MPI workloads and distributed system performance analysis
- Familiarity with RoCE, InfiniBand, and GPUDirect/Data Direct I/O, NUMA, etc. in HPC workloads
- Hands-on experience with public HPC benchmarks (HPCC, HPL, OSU, MLPerf-HPC, STREAM, IO500)
- Extensive, deep experience in Linux internals
- Fluency with a programming language geared toward automation (Python preferred)
- Experience writing robust, testable code
- Experience diagnosing and fixing systems performance issues
- Experience with implementing automation testing
- Ability to effectively prioritize and communicate proposed features and fixes in a remote-employee environment
Preferred qualifications:
- Familiarity with QA/QE best practices
- Familiarity with Golang
- Experience working in Cloud environments
- Experience as a software engineer writing large-scale applications
- Experience in open-source community software development
- Experience with machine learning
CoreWeave offers a competitive salary ranging from $165,000 to $242,000, a discretionary bonus, equity awards, and a comprehensive benefits program, including medical, dental, and vision insurance, company-paid Life Insurance, short and long-term disability insurance, flexible spending accounts, health savings accounts, tuition reimbursement, and employee stock purchase plans.