Description
We are seeking a Senior GPU System Architect who will architect and design multi-GPU scale-up and scale-out systems for next-generation datacenter platforms for AI and HPC. The architect in this role will explore and define system architectures that tightly couple GPU compute, high-bandwidth memory, in-package interconnects and GPU-to-GPU communication fabric subsystems to deliver industry-leading AI performance, scalability and resilience.
The ideal candidate combines deep hands-on system-level fabric/networking architecture experience, and practical hardware-software co-design expertise.
Responsibilities:
- Architect multi-GPU system topologies for scale-up and scale-out configurations, balancing AI throughput, scalability, and resilience.
- Define, modify and evaluate future architectures for high-speed interconnects such as NVLink and Ethernet co-designed with the GPU memory system.
- Collaborate with other teams to architect RDMA-capable hardware and define transport layer optimizations for GPU-based large scale AI workload deployments.
- Use and modify system models, perform simulations and bottleneck analyses to guide design trade-offs.
- Work with GPU ASIC, compiler, library and software stack teams to enable efficient hardware-software co-design across compute, memory, and communication layers.
- Contribute to interposer, package, PCB and switch co-design for novel high-density multi-die, multi-package, multi-node rack-scale systems consisting of hundreds of GPUs.
Requirements:
- BS/MS/PhD in Electrical Engineering, Computer Engineering, or equivalent experience.
- 8 years or more of relevant experience in system design and/or ASIC/SoC architecture for GPU, CPU or networking products.
- Deep understanding of communication interconnect protocols such as NVLink, Ethernet, InfiniBand, CXL and PCIe.
- Experience with RDMA/RoCE or InfiniBand transport offload architectures.
- Proven ability to architect multi-GPU/multi-CPU topologies, with awareness of bandwidth scaling, NUMA, memory models, coherency and resilience.
- Experience with hardware-software interaction, drivers and runtimes, and performance tuning for modern distributed computing systems.
- Strong analytical and system modeling skills (Python, SystemC, or similar).
- Excellent cross-functional collaboration skills with silicon, packaging, board, and software teams.
Preferred Qualifications:
- Background in system design for AI and HPC.
- Experience with NICs or DPU architecture and other transport offload engines.
- Expertise in chiplet interconnect architectures or multi-node fabrics and protocols for distributed computing.
- Hands-on experience with interposer or 2.5D/3D package co-design.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-GPU-System-Architect_JR2017254