# Senior HPC and AI Networking Performance Research and Analysis Engineer

**Company**: NVIDIA
**Location**: Germany
**Work arrangement**: remote
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Germany-Remote/Senior-HPC-and-AI-Networking-Performance-Research-and-Analysis-Engineer_JR2011934?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_54d873bd-b34

## Description

We are looking for a talented Performance Research and Analysis Engineer to join our Performance group. You will profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training and inference focusing at the communication patterns, collectives communication, RDMA, networking and system performance.

Your responsibilities will include:

- Profiling and analyzing AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training and inference.

- Identifying bottlenecks and areas of improvement and optimizations with a strong emphasis on networking aspects.

- Implementing performance analysis tools.

- Collaborating with many teams from HW to SW to provide performance analysis insights.

- Defining performance test planning, setting performance expectations for new technologies and solutions, and working to reach the performance targets limits.

To succeed in this role, you will need:

- A Bachelor's degree in Computer Science or Software Engineering.

- At least 6 years of experience with high-performance Networking (RDMA, MPI, NCCL).

- Demonstrated Performance Analysis skills and methodologies.

- Experience with NVIDIA GPUs, CUDA library, deep learning frameworks like TensorFlow or PyTorch,

- Combined with expertise in networking collective communication libraries (such as NCCL) and protocols (such as RoCE and RDMA).

- Fast and self-learning capabilities with strong analytical and problem-solving skills.

- Programming Languages: Python, Bash, and C languages.

- Experience with Linux OS distros.

- Team player with good communication and interpersonal skills.

If you have in-depth knowledge and experience with AI workloads benchmarking for distributed LLM training, CUDA, and NCCL libraries, this could be the perfect opportunity for you to take your skills to the next level.

## Skills

### Required
- High-performance Networking
- RDMA
- MPI
- NCCL
- NVIDIA GPUs
- CUDA library
- Deep learning frameworks
- TensorFlow
- PyTorch
- Networking collective communication libraries
- RoCE
- Linux OS distros
- Python
- Bash
- C languages

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Germany-Remote/Senior-HPC-and-AI-Networking-Performance-Research-and-Analysis-Engineer_JR2011934?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
