Description
We are seeking a strategic and technically proficient Principal Software Engineer to join the Data Center Systems and Software CSP engagements team. As a leader and technologist, you will play a pivotal role contributing significantly to the architecture and development of next-generation Data Center products, acting as the technical focal point for select CSP (Cloud Service Provider) and Hyperscalers.
Your responsibilities will include:
- Driving system software architecture alignment and technical deep dives, acting as the primary software engineering contact for NPI projects with key customers.
- Collaborating with major customers to understand their roadmap, use cases, and requirements, aligning them with NVIDIA's roadmap.
- Spearheading cross-functional efforts to resolve complex and high-profile customer issues during NPI phase.
- Making key technical decisions even when faced with ambiguity and mitigating execution risks by following left shift strategy.
- Building and maintaining customer trust by understanding and addressing their needs.
- Working closely with cross-functional architects in defining system software architecture for complex server platforms.
Requirements include:
- Extensive experience in designing scalable, high-performance server systems at the SW/HW interface. Expertise in server system architecture and its impact on application performance.
- Proven leadership skills with strong project ownership in complex software and hardware environments.
- Deep understanding of computer architecture, microprocessor concepts, and expert knowledge of ARM (aarch64) and x86 architectures.
- Proficient in system software design, OS fundamentals, Linux kernel device drivers, and low-level hardware/software interfaces.
- Skilled in complex system-level debugging, performance analysis, and test design.
- BS or MS in Computer Engineering, Computer Science, or related field, or equivalent experience with over 15 years in system software architecture and development.
To stand out from the crowd, you should have knowledge of cloud and cluster level deployment and management systems, expertise in Out of Band and In-band management architectures, and experience with GPU computing (CUDA), deep learning workloads, memory fabric, and CXL architectures.