Description
We are looking for a Senior Cloud Infrastructure/DevOps Solutions Architect to join our NVIDIA Infrastructure Specialist Team. The successful candidate will have the ability to work on a dynamic customer-focused team that requires excellent interpersonal skills. This role will involve interacting with customers, partners, and internal teams to analyze, define, and implement large-scale networking projects.
Key responsibilities include:
- Developing and maintaining continuous integration and delivery pipelines
- Developing tooling to automate deployment and management of large-scale infrastructure environments
- Deploying monitoring solutions for servers, networks, and storage
- Performing troubleshooting from bare metal to application level
- Developing, re-defining, and documenting standard methodologies to share with internal teams
The ideal candidate will have a strong background in computer science, data science, or electrical/computer engineering, with at least 8 years of experience in networking fundamentals, TCP/IP stack, and data center architecture. They should also have experience with cloud computing platforms, job scheduling workloads, and orchestration technologies such as Slurm, Kubernetes, and Singularity.
Additional requirements include:
- Excellent knowledge of Windows and Linux networking and internals
- Experience with multiple storage solutions such as Lustre, GPFS, zfs, and xfs
- Python programming and bash scripting experience
- Comfortable with automation and configuration management tools including Jenkins, Ansible, Puppet/Chef, etc.
Preferred qualifications include knowledge of CPU and/or GPU architecture, Kubernetes, container-related microservice technologies, and experience with GPU-focused hardware/software (DGX, CUDA).
If you're a creative and autonomous individual with a passion for developing innovative solutions, we want to hear from you.