Description
We are hiring a Senior Full Stack Software Engineer to help scale up our AI Infrastructure. As a member of the DGX Cloud team, you will be responsible for designing and developing a massively distributed scalable platform to identify, diagnose, and remediate non-performant GPU assets. You will work with teams across NVIDIA to ensure production AI clusters run reliably and consistently with maximum performance.
You will be working with a wide range of technologies including React, Web Components, TypeScript, Golang, PostgreSQL, Temporal, Bazel, and Kubernetes. We expect you to have significant software engineering experience with cluster operations, operator development, node health monitoring, and working with GPU resource scheduling.
If you're creative, passionate about GPUs, and love having fun, please apply today!
Key responsibilities include:
- Designing and developing a massively distributed scalable platform to identify, diagnose, and remediate non-performant GPU assets
- Working with teams across NVIDIA to ensure production AI clusters run reliably and consistently with maximum performance
- Evaluating system failures and improving services based on a well-defined incident management process
Requirements include:
- 12+ years in a software engineering role within a highly technical organization with demonstrable impact from your work
- Highly motivated with strong communication skills, able to work successfully with multi-functional teams, principles, and architects
- Proficiency in React, TypeScript/JavaScript, and Golang
- Proficiency with a SQL database
Preferred skills include:
- Technical competency in managing and automating large-scale distributed systems independent of cloud providers
- Empathy for users, attention to detail, and a passion for creating world-class user experiences
- Prior experience in asynchronous workflows and/or event-driven architecture
- Proven operational excellence in maintaining reliable and performant infrastructure
Benefits include eligibility for equity and benefits.