# Senior Software Engineer, Infrastructure Automation and Distributed Systems

**Company**: NVIDIA
**Work arrangement**: remote
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-SC-Remote/Senior-Software-Engineer--Infrastructure-Automation-and-Distributed-Systems_JR2014877?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_5832ca39-321

## Description

We are seeking a Senior Software Engineer to join our team, focusing on building and running reliable large-scale infrastructure platform services. You will ensure that our internal and external-facing EDA services atop NVIDIA hardware are running as reliably as needed.

**What you'll be doing:**

- Design, build, deploy, and run infrastructure services & manage the software life cycle to meet our business goals.

- Participate in defining internal-facing service level objectives and error budgets as part of our overall observability strategy.

- Eliminate toil or automate it where the ROI of building and maintaining automation is worth it.

- Practice sustainable blameless incident prevention and incident response while being a member of an on-call rotation.

- Consult with and provide consultation for peer teams on systems design best practices.

**What we need to see:**

- BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics) or equivalent experience.

- 12+ years of relevant experience.

- A track record showing a good balance between initiating your own projects, convincing others to collaborate with you, and collaborating well on projects initiated by others.

- Experience with infrastructure automation and distributed systems design, developing tools for running large-scale private or public cloud systems in production.

- Experience in one or more of the following: Python, Go, Perl, or Ruby.

- In-depth knowledge in one or more of Linux, Networking, Storage, and Containers.

**Ways to stand out from the crowd:**

- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. Experience accelerating positive impact to the business using coding assistants, MCP servers, or AI agents.

- Experience working with or developing bare metal as a service (BMaaS) associated systems.

- Experience working with or developing multi-cloud infrastructure services and running private or public cloud systems based on one or more of Kubernetes, OpenStack, Docker, or Slurm.

- Experience teaching reliability (e.g., SRE) or more general cloud systems good practices to peers or to other companies (e.g., CRE).

- Background with NVIDIA Collective Communication Library (NCCL).

You will also be eligible for equity and benefits.

## Skills

### Required
- Python
- Go
- Perl
- Ruby
- Linux
- Networking
- Storage
- Containers
- Infrastructure Automation
- Distributed Systems

### Nice to have
- Bare Metal as a Service (BMaaS)
- Multi-cloud infrastructure services
- Kubernetes
- OpenStack
- Docker
- Slurm
- NVIDIA Collective Communication Library (NCCL)

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-SC-Remote/Senior-Software-Engineer--Infrastructure-Automation-and-Distributed-Systems_JR2014877?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
