# Senior Solutions Architect, Cloud Infrastructure and DevOps

**Company**: NVIDIA
**Location**: Japan
**Work arrangement**: remote
**Experience**: senior
**Job type**: full-time
**Salary**: Competitive salary and benefits package
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Japan-Remote/Senior-Solutions-Architect--Cloud-Infrastructure-and-DevOps---NVIS_JR1997336?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_92c47b33-8a4

## Description

We are looking for a Senior Cloud Infrastructure/DevOps Solutions Architect to join our NVIDIA Infrastructure Specialist Team. As a key member of our team, you will be responsible for designing, implementing, and maintaining large-scale cloud infrastructure and DevOps solutions. Your expertise will be utilized to analyze, define, and implement large-scale Networking projects, including a combination of Networking, System Design, and Automation. You will interact with customers, partners, and internal teams to ensure seamless delivery of our solutions.

Key Responsibilities:

- Maintain large-scale HPC/AI clusters with monitoring, logging, and alerting

- Manage Linux job/workload schedulers and orchestration tools

- Develop and maintain continuous integration and delivery pipelines

- Develop tooling to automate deployment and management of large-scale infrastructure environments

- Deploy monitoring solutions for servers, network, and storage

- Perform troubleshooting from bare metal to application level

- Develop, redefine, and document standard methodologies to share with internal teams

Requirements:

- Bachelor's degree in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields

- At least 8 years of professional experience in networking fundamentals, TCP/IP stack, and data center architecture

- Knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and supporting software

- Extensive knowledge and hands-on experience with Kubernetes, including container orchestration for AI/ML workloads, resource scheduling, scaling, and integration with HPC environments

- Experience in managing and installing HPC clusters, including deployment, optimization, and troubleshooting

- Experience with job scheduling workloads and orchestration technologies such as Slurm, Kubernetes, and Singularity

- Excellent knowledge of Windows and Linux systems, including internals, ACLs, OS-level security protections, and common protocols like TCP, DHCP, DNS, etc.

- Experience with multiple storage solutions, including Lustre, GPFS, ZFS, and XFS

- Proficiency in Python programming and bash scripting

- Knowledge of CI/CD pipelines for software deployment and automation

- Comfortable with automation and configuration management tools, including Jenkins, Ansible, Puppet/Chef, etc.

Preferred Qualifications:

- Knowledge of CPU and/or GPU architecture

- Knowledge of Kubernetes, container-related microservice technologies

- Experience with GPU-focused hardware/software (DGX, CUDA)

- Background with RDMA (InfiniBand or RoCE) fabrics

## Skills

### Required
- Cloud Infrastructure
- DevOps
- Kubernetes
- Container Orchestration
- HPC
- AI
- Networking
- System Design
- Automation
- Linux
- Windows
- Python
- Bash Scripting
- CI/CD Pipelines
- Automation Tools
- Jenkins
- Ansible
- Puppet/Chef

### Nice to have
- CPU Architecture
- GPU Architecture
- Kubernetes Microservices
- GPU-Focused Hardware/Software
- RDMA Fabrics

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Japan-Remote/Senior-Solutions-Architect--Cloud-Infrastructure-and-DevOps---NVIS_JR1997336?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
