# Senior Deep Learning Performance Architect

**Company**: NVIDIA
**Location**: Santa Clara
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Deep-Learning-Performance-Architect_JR2017476?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_32943cd8-683

## Description

We are now looking for a Senior Deep Learning Performance Architect!

You will design and evaluate hardware architectures to improve performance, efficiency, and scalability of production AI workloads.

Key responsibilities include analysing and optimising large-scale deep learning workloads, especially LLM inference/training in real-world deployments.

You will build and use performance and power models (Python/C++) to drive architecture and product decisions.

Identify and resolve system bottlenecks across compute, memory, and interconnect.

Evaluate PPA trade-offs and guide feature prioritisation for next-generation GPU/ASIC designs.

Partner closely with software, systems, and product teams to align hardware capabilities with workload requirements.

Responsibilities:

- Design and evaluate hardware architectures to improve performance, efficiency, and scalability of production AI workloads.

- Analyse and optimise large-scale deep learning workloads, especially LLM inference/training in real-world deployments.

- Build and use performance and power models (Python/C++) to drive architecture and product decisions.

- Identify and resolve system bottlenecks across compute, memory, and interconnect.

- Evaluate PPA trade-offs and guide feature prioritisation for next-generation GPU/ASIC designs.

- Partner closely with software, systems, and product teams to align hardware capabilities with workload requirements.

Requirements:

- MS or PhD in a relevant field (Computer Science, Electrical Engineering, Computer Engineering, etc) or equivalent experience.

- 5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.

- Experience with deep learning workloads in production environments (training and/or inference).

- Proficiency in Python and C++ for building performance models, simulators, or analysis tools.

- Solid understanding of system architecture: memory hierarchy, data movement, and scalability.

- Prior experience debugging, profiling, and performance tuning on real systems.

- Ability to work across team and drive decisions in fast-paced product environments.

Benefits:

- Eligible for equity and benefits.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

## Skills

### Required
- GPU/ASIC architecture
- parallel computing
- system performance engineering
- deep learning workloads
- production environments
- Python
- C++
- system architecture
- memory hierarchy
- data movement
- scalability

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Deep-Learning-Performance-Architect_JR2017476?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
