Description

NVIDIA is seeking a Solutions Architect, Networking to help design and deploy large-scale AI Factories across Canada. In this role, you will collaborate with customers to build end-to-end infrastructure. You will become a trusted technical advisor working on exciting projects, focused on how high-performance networking enables generative AI, large language models, and production AI inference pipelines. You will also collaborate with a diverse set of internal engineering, product, and business teams on performance analysis and modeling of these large GPU clusters. You should be comfortable working in a dynamic environment and have hands-on experience with NVIDIA networking and GPU technologies. This is an excellent opportunity to be at the center of Canada's rapidly growing AI infrastructure landscape.

Key Responsibilities:

Become the trusted technical advisor for NVIDIA Cloud Partners in Canada to rapidly bring NVIDIA Data Center GPU and networking platforms to market at scale.
Collaborate directly with customers to build, deploy, and optimize large-scale AI training and inference infrastructure using NVIDIA technology.
Analyze deployment and performance data, identify product health trends, system bottlenecks, and operational risks.
Solve challenging technical problems involving GPUs, networking, drivers, containers, firmware, and distributed system interactions.
Deliver streamlined executive-level communication on status, risks, progress, and required decisions.

Requirements:

Bachelor's degree in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related field (or equivalent experience)
5+ years of Solution Architecture (or similar Sales Engineering, Systems Engineering, Cloud Engineering, Solution Engineering)
Understanding of high-performance networking technologies (e.g., RDMA, congestion control, high-bandwidth interconnects), and their role in distributed AI workloads.
Hands-on experience with bring-up and validation of large-scale NVIDIA GPU platforms, including multi-GPU and multi-node architectures.
Familiarity with NVIDIA system software stacks: CUDA, NCCL, NVSwitch/NVLink, driver behavior, and performance tuning.
Ability to identify performance bottlenecks at the cluster, node, accelerator, network, or application layer.
Strong Linux fundamentals across drivers, kernel subsystems, cgroups, containers, and node-level performance analysis.
Excellent presentation, communication, and collaboration skills.

Nice to Have:

Prior experience deploying or optimizing deep learning training and inference at scale in production environments on large GPU clusters.
Familiarity with NVIDIA hardware (such as GPUs, networking, storage) and systems technology such as NCCL, DCGM, UFM, Mission Control, Base Command Manager.
Demonstrated leadership resolving multi-team infrastructure challenges across engineering, product, and customer groups.
A consistent record of taking GPU or infrastructure products from pilot to high-volume deployment in large data center environments.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Canada-Remote/Solutions-Architect--Networking_JR2017665