Description
We are looking for a senior software engineer to join our team and help us expand our enterprise GPU management and monitoring tools. As a senior software engineer, you will work closely with the broader NVIDIA team to design and build cloud-native management agents, Kubernetes integrations, and end-to-end integration solutions that combine GPUs with the rest of the datacenter software management ecosystem.
Your contributions will span many aspects of GPU system integration, including telemetry and metrics, health checks, diagnostics, configuration, and system management. These tools fill roles of both passive background monitoring and active online management with a core emphasis on operational transparency and seamless integration in customer environments.
To succeed, you must have a strong Linux background, familiarity with modern cloud-native systems, and a proven work ethic. You will be expected to jump in quickly and provide valuable contributions from day one.
In this role, you will:
- Develop and maintain distributed, robust and scalable Go programs deployed to Kubernetes environments that manage large datacenters
- Develop and maintain user-space applications, containers, Go-bindings, and CLI tools
- Enable GPU management integration with the state-of-the-art open-source ecosystem, including Kubernetes and Docker
- Support internal and external users through bug fixes, documentation, and feature improvements
- Maintain high-quality products through robust test coverage
We are looking for someone with a strong background in Go and Kubernetes development, as well as experience with APIs and interface design. You should also have outstanding written and verbal interpersonal skills, business-level English, and a strong motivation and commitment to learn new skills.
You will also be eligible for equity and benefits.