Description

We are now looking for a Senior Deep Learning Performance Architect!

You will design and evaluate hardware architectures to improve performance, efficiency, and scalability of production AI workloads.

Key responsibilities include analysing and optimising large-scale deep learning workloads, especially LLM inference/training in real-world deployments.

You will build and use performance and power models (Python/C++) to drive architecture and product decisions.

Identify and resolve system bottlenecks across compute, memory, and interconnect.

Evaluate PPA trade-offs and guide feature prioritisation for next-generation GPU/ASIC designs.

Partner closely with software, systems, and product teams to align hardware capabilities with workload requirements.

Responsibilities:

Design and evaluate hardware architectures to improve performance, efficiency, and scalability of production AI workloads.

Analyse and optimise large-scale deep learning workloads, especially LLM inference/training in real-world deployments.

Build and use performance and power models (Python/C++) to drive architecture and product decisions.

Identify and resolve system bottlenecks across compute, memory, and interconnect.

Evaluate PPA trade-offs and guide feature prioritisation for next-generation GPU/ASIC designs.

Partner closely with software, systems, and product teams to align hardware capabilities with workload requirements.

Requirements:

MS or PhD in a relevant field (Computer Science, Electrical Engineering, Computer Engineering, etc) or equivalent experience.

5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.

Experience with deep learning workloads in production environments (training and/or inference).

Proficiency in Python and C++ for building performance models, simulators, or analysis tools.

Solid understanding of system architecture: memory hierarchy, data movement, and scalability.

Prior experience debugging, profiling, and performance tuning on real systems.

Ability to work across team and drive decisions in fast-paced product environments.

Benefits:

Eligible for equity and benefits.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Deep-Learning-Performance-Architect_JR2017476