New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
NVIDIA

Senior Full Stack Software Engineer - DGX Cloud

NVIDIA
Apply →
remote senior full-time US

First indexed 18 May 2026

Description

We are hiring a Senior Full Stack Software Engineer to help scale up our AI Infrastructure. As a member of the DGX Cloud team, you will be responsible for designing and developing a massively distributed scalable platform to identify, diagnose, and remediate non-performant GPU assets. You will work with teams across NVIDIA to ensure production AI clusters run reliably and consistently with maximum performance.

You will be working with a wide range of technologies including React, Web Components, TypeScript, Golang, PostgreSQL, Temporal, Bazel, and Kubernetes. We expect you to have significant software engineering experience with cluster operations, operator development, node health monitoring, and working with GPU resource scheduling.

If you're creative, passionate about GPUs, and love having fun, please apply today!

Key responsibilities include:

  • Designing and developing a massively distributed scalable platform to identify, diagnose, and remediate non-performant GPU assets
  • Working with teams across NVIDIA to ensure production AI clusters run reliably and consistently with maximum performance
  • Evaluating system failures and improving services based on a well-defined incident management process

Requirements include:

  • 12+ years in a software engineering role within a highly technical organization with demonstrable impact from your work
  • Highly motivated with strong communication skills, able to work successfully with multi-functional teams, principles, and architects
  • Proficiency in React, TypeScript/JavaScript, and Golang
  • Proficiency with a SQL database

Preferred skills include:

  • Technical competency in managing and automating large-scale distributed systems independent of cloud providers
  • Empathy for users, attention to detail, and a passion for creating world-class user experiences
  • Prior experience in asynchronous workflows and/or event-driven architecture
  • Proven operational excellence in maintaining reliable and performant infrastructure

Benefits include eligibility for equity and benefits.