New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
NVIDIA

Colossus Cloud Platform Datacenter Engineer

NVIDIA
Apply →
onsite senior full-time Competitive salaries and a comprehensive benefits package Santa Clara

First indexed 2 Jun 2026

Description

As an IPP Cloud Data Center Platform Development Operations Program Manager, you will play a pivotal role in driving the global scale expansion of NVIDIA's infrastructure, supporting next-generation AI and GPU platforms. You will take charge of centralized operational leadership, advancing infrastructure coordination, and strategic capacity planning throughout NVIDIA's growing infrastructure footprint.

Your responsibilities will include:

  • Driving centralized operational coordination across global Data Center and Super Lab infrastructure initiatives.
  • Supporting infrastructure planning and execution across a rapidly growing portfolio of AI/GPU infrastructure sites.
  • Improving infrastructure forecasting, portfolio visibility, dependency management, and operational readiness processes.
  • Partnering with cross-functional organizations to align infrastructure capacity, deployment priorities, and execution timelines.
  • Developing executive-level operational reporting, infrastructure dashboards, hotspot analysis, and portfolio readiness metrics.
  • Supporting strategic data center capacity planning initiatives across power, liquid cooling, and space allocation requirements.
  • Coordinating operational readiness activities supporting infrastructure build-outs, site readiness, and launch execution.
  • Identifying process gaps, operational bottlenecks, and scaling challenges associated with global infrastructure growth.
  • Improving prioritization rigor and ensuring alignment in carrying out tasks across multiple collaborator organizations.
  • Facilitating operational alignment between infrastructure demand, deployment schedules, and site readiness constraints.

To succeed in this role, you will need:

  • 15+ years of experience in infrastructure operations, developer operations, technical operations, program management, or large-scale infrastructure planning environments.
  • A strong understanding of liquid cooling systems, power and space planning, infrastructure operations, DC capacity planning and forecasting, GPU infrastructure deployment lifecycle, portfolio management, operational readiness, and infrastructure dependency management.
  • Experience managing collaboration across engineering, facilities, operations, deployment, and executive leadership groups.
  • Strong executive communication, analytical, and operational problem-solving skills.
  • Ability to operate effectively in rapidly scaling and highly matrixed environments.