# Senior Solutions Architect, Infiniband and Networking Ethernet - NVIS

**Company**: NVIDIA
**Location**: India, IN
**Work arrangement**: remote
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/India-Pune/Senior-Solutions-Architect--Infiniband-and-Networking-Ethernet---NVIS_JR2019584?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_81cd67d2-ec3

## Description

NVIDIA is looking for a Senior Solutions Architect to join its NVIDIA Infrastructure Specialist Team. The successful candidate will be responsible for building AI/HPC infrastructure for new and existing customers, supporting operational and reliability aspects of large-scale AI clusters, and engaging in the whole lifecycle of services from inception and design through deployment, operation, and refinement.

Primary responsibilities will include:

- Building AI/HPC infrastructure for new and existing customers

- Supporting operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting

- Engaging in and improving the whole lifecycle of services,from inception and design through deployment, operation, and refinement

- Maintaining services once they are live by measuring and monitoring availability, latency, and overall system health

The ideal candidate will have:

- A BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields

- At least 5+ years of professional experience in networking fundamentals, Ethernet or InfiniBand World

- Hands-on experience with network switch/router platforms like Cumulus Linux, SONiC, IOS, JunosOS, and EOS, etc.

- Solid working knowledge of Ethernet/InfiniBand/RDMA core principles

- Proficiency in end-to-end IB/Eth cluster deployment, adapter configuration and firmware maintenance, and able to conduct professional performance benchmarking with mainstream RDMA testing tools

- Ability to independently diagnose and troubleshoot typical IB/Eth network anomalies, including link flapping, connection failure, as well as bandwidth and latency jitter issues

- Master practical RDMA network optimization strategies such as QP tuning, MTU configuration and congestion control optimization

- Hands-on working experience in RDMA-accelerated business scenarios, including distributed storage and high-performance computing clusters

- Extensive experience delivering automated network provisioning solutions using tools like Ansible, Salt, and Python

- Ability to develop CI/CD pipelines for network operations

Preferred qualifications include:

- Familiarity with cloud networks (AWS, GCP, Azure)

- Advanced Linux or Networking Certifications

- Experience with High-performance computing architectures. Understanding of how job schedulers (Slurm, PBS) work

- Lustre management technologies knowledge (bonus credit for BCM (Base Command Manager))

- Experience with GPU (Graphics Processing Unit) focused hardware/software

NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry.

## Skills

### Required
- Networking fundamentals
- Ethernet or InfiniBand World
- Cumulus Linux
- SONiC
- IOS
- JunosOS
- EOS
- RDMA core principles
- End-to-end IB/Eth cluster deployment
- Adapter configuration and firmware maintenance
- Professional performance benchmarking with mainstream RDMA testing tools
- QP tuning
- MTU configuration and congestion control optimization
- RDMA-accelerated business scenarios
- Distributed storage and high-performance computing clusters
- Automated network provisioning solutions using tools like Ansible, Salt, and Python
- CI/CD pipelines for network operations

### Nice to have
- Cloud networks (AWS, GCP, Azure)
- Advanced Linux or Networking Certifications
- High-performance computing architectures
- Lustre management technologies
- GPU (Graphics Processing Unit) focused hardware/software

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/India-Pune/Senior-Solutions-Architect--Infiniband-and-Networking-Ethernet---NVIS_JR2019584?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
