Description
NVIDIA is looking for a Senior Cloud Infrastructure/DevOps Solutions Architect to join its NVIDIA Infrastructure Specialist Team. The successful candidate will be responsible for developing and maintaining continuous integration and delivery pipelines, deploying monitoring solutions for servers, networks, and storage, and performing troubleshooting from bare metal to application level.
The ideal candidate will have a strong background in computer science, data science, electrical/computer engineering, physics, mathematics, or other engineering fields, with at least 8 years of work or research experience in networking fundamentals, TCP/IP stack, and data center architecture. They will also have experience with large-scale HPC/AI clusters, cloud computing platforms, job scheduling workloads, and orchestration technologies.
Key responsibilities include:
- Developing and maintaining continuous integration and delivery pipelines
- Deploying monitoring solutions for servers, networks, and storage
- Performing troubleshooting from bare metal to application level
- Developing, re-defining, and documenting standard methodologies to share with internal teams
The successful candidate will be a technical resource who can work on a dynamic customer-focused team, requiring excellent interpersonal skills. They will interact with customers, partners, and internal teams to analyze, define, and implement large-scale networking projects.
To stand out from the crowd, the ideal candidate will have knowledge of CPU and/or GPU architecture, Kubernetes, container-related microservice technologies, experience with GPU-focused hardware/software, and background with RDMA (InfiniBand or RoCE) fabrics.