New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
NVIDIA

Principal Software Engineer - DGX Cloud

NVIDIA
Apply →
onsite senior full-time $120,000–$180,000 Santa Clara

First indexed 18 May 2026

Description

We are looking for a Principal Software Engineer to join our DGX Cloud team and build the foundational systems that drive NVIDIA's high-performance GPU infrastructure. You will play a meaningful role in crafting scalable automation solutions, integrating diverse systems, and enabling seamless workflows across global cloud operations.

As a Principal Engineer in DGX Cloud, you will be at the pinnacle of technical leadership. You will directly craft the platform that fuels the future of AI and cloud computing.

Responsibilities:

  • Lead the build and development of next-generation APIs, state management, and workflow orchestration systems that automate fleet lifecycle operations at a massive scale.
  • Drive technical alignment across dependent systems and partner teams to ensure cohesive integration, clear interfaces, and reliable end-to-end workflows, with a strong focus on delivery.
  • Act as a force-multiplier by coaching, mentoring, and encouraging senior engineers, elevating the technical standards and guidelines across the organization.
  • Maintain an incredible focus on the customer experience and product requirements, translating deep technical insight into high-impact business solutions.
  • Partner with executive and engineering leadership to codify critical business processes into self-measuring, scalable, and operationally consistent platforms, drastically reducing manual toil.
  • Direct the integration strategy for key technologies, including common AI schedulers (e.g., Kubernetes, Slurm) and innovative observability systems (e.g., Prometheus, OpenTelemetry, Grafana).

Requirements:

  • 16+ years of progressive industry experience
  • Master's or Bachelor's degree, or equivalent experience defining and shipping complex distributed systems.
  • Deep, hands-on expertise in establishing, operating, and scaling services in a fast-paced, high-reliability environment.
  • Thrive in ambiguous, fast-paced environments by rapidly testing ideas, iterating toward working solutions, and then hardening the winners into reliable, scalable systems.
  • Outstanding proficiency in modern systems programming languages such as Go, Java, or Python.
  • Proven track record of defining, owning, and evolving the architecture of high-scale distributed systems, including advanced patterns for APIs, control planes, and data pipelines.
  • Deep understanding of global cloud infrastructure (AWS, GCP, Azure) and container ecosystems (Docker, Kubernetes).
  • Demonstrated ability to drive technical strategy and influence outcomes across organizational boundaries.
  • Outstanding ability to communicate complex technical concepts, drive organizational consensus, and mentor high-performing engineers.

Benefits:

  • Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.
  • Eligible for equity and benefits.