Description

We are seeking a Senior GPU System Architect who will architect and design multi-GPU scale-up and scale-out systems for next-generation datacenter platforms for AI and HPC. The architect in this role will explore and define system architectures that tightly couple GPU compute, high-bandwidth memory, in-package interconnects and GPU-to-GPU communication fabric subsystems to deliver industry-leading AI performance, scalability and resilience.

The ideal candidate combines deep hands-on system-level fabric/networking architecture experience, and practical hardware-software co-design expertise.

Responsibilities:

Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.

Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.

Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.

Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.

Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.

Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.

Requirements:

BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent experience.

8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.

Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.

Experience with RDMA/RoCE or InfiniBand transport offload architectures.

Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.

Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.

Strong analytical and system modeling skills (Python, SystemC, or similar).

Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.

Preferred Qualifications:

Background in system design for AI and HPC.

Experience with NICs or DPU architecture and other transport offload engines.

Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.

Hands-on experience with interposer or 2.5D/3D package co-design.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-GPU-System-Architect_JR2017254