New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
NVIDIA

Senior Solutions Architect, Infiniband and Networking Ethernet - NVIS

NVIDIA
Apply →
remote senior full-time India, IN

First indexed 12 Jun 2026

Description

NVIDIA is looking for a Senior Solutions Architect to join its NVIDIA Infrastructure Specialist Team. The successful candidate will be responsible for building AI/HPC infrastructure for new and existing customers, supporting operational and reliability aspects of large-scale AI clusters, and engaging in the whole lifecycle of services from inception and design through deployment, operation, and refinement.

Primary responsibilities will include:

  • Building AI/HPC infrastructure for new and existing customers
  • Supporting operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting
  • Engaging in and improving the whole lifecycle of services,from inception and design through deployment, operation, and refinement
  • Maintaining services once they are live by measuring and monitoring availability, latency, and overall system health

The ideal candidate will have:

  • A BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields
  • At least 5+ years of professional experience in networking fundamentals, Ethernet or InfiniBand World
  • Hands-on experience with network switch/router platforms like Cumulus Linux, SONiC, IOS, JunosOS, and EOS, etc.
  • Solid working knowledge of Ethernet/InfiniBand/RDMA core principles
  • Proficiency in end-to-end IB/Eth cluster deployment, adapter configuration and firmware maintenance, and able to conduct professional performance benchmarking with mainstream RDMA testing tools
  • Ability to independently diagnose and troubleshoot typical IB/Eth network anomalies, including link flapping, connection failure, as well as bandwidth and latency jitter issues
  • Master practical RDMA network optimization strategies such as QP tuning, MTU configuration and congestion control optimization
  • Hands-on working experience in RDMA-accelerated business scenarios, including distributed storage and high-performance computing clusters
  • Extensive experience delivering automated network provisioning solutions using tools like Ansible, Salt, and Python
  • Ability to develop CI/CD pipelines for network operations

Preferred qualifications include:

  • Familiarity with cloud networks (AWS, GCP, Azure)
  • Advanced Linux or Networking Certifications
  • Experience with High-performance computing architectures. Understanding of how job schedulers (Slurm, PBS) work
  • Lustre management technologies knowledge (bonus credit for BCM (Base Command Manager))
  • Experience with GPU (Graphics Processing Unit) focused hardware/software

NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry.