New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
NVIDIA

Senior Deep Learning Performance Architect

NVIDIA
Apply →
senior full-time Santa Clara

First indexed 5 May 2026

Description

We are now looking for a Senior Deep Learning Performance Architect!

You will design and evaluate hardware architectures to improve performance, efficiency, and scalability of production AI workloads.

Key responsibilities include analysing and optimising large-scale deep learning workloads, especially LLM inference/training in real-world deployments.

You will build and use performance and power models (Python/C++) to drive architecture and product decisions.

Identify and resolve system bottlenecks across compute, memory, and interconnect.

Evaluate PPA trade-offs and guide feature prioritisation for next-generation GPU/ASIC designs.

Partner closely with software, systems, and product teams to align hardware capabilities with workload requirements.

Responsibilities:

  • Design and evaluate hardware architectures to improve performance, efficiency, and scalability of production AI workloads.
  • Analyse and optimise large-scale deep learning workloads, especially LLM inference/training in real-world deployments.
  • Build and use performance and power models (Python/C++) to drive architecture and product decisions.
  • Identify and resolve system bottlenecks across compute, memory, and interconnect.
  • Evaluate PPA trade-offs and guide feature prioritisation for next-generation GPU/ASIC designs.
  • Partner closely with software, systems, and product teams to align hardware capabilities with workload requirements.

Requirements:

  • MS or PhD in a relevant field (Computer Science, Electrical Engineering, Computer Engineering, etc) or equivalent experience.
  • 5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.
  • Experience with deep learning workloads in production environments (training and/or inference).
  • Proficiency in Python and C++ for building performance models, simulators, or analysis tools.
  • Solid understanding of system architecture: memory hierarchy, data movement, and scalability.
  • Prior experience debugging, profiling, and performance tuning on real systems.
  • Ability to work across team and drive decisions in fast-paced product environments.

Benefits:

  • Eligible for equity and benefits.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.