Description
Join a team that's revolutionizing the field of AI with data center scale solutions. We are seeking a highly technical Solution Architect to serve as a forward-deployed technical liaison between NVIDIA's Partner Network (NPN), NVIDIA's internal teams, and enterprise customers.
In this role, your focus will be to support customers and partners in the areas of planning, development, construction, and deployment of large-scale AI factories. You will join the team building capabilities to develop, construct, and deliver large-scale AI factories based on NVIDIA's reference designs. This includes architectural systems, power distribution, cooling systems, integration of telemetry and control systems, and all other physical infrastructure.
Acting as a trusted advisor embedded with customer and partner teams, you will bridge solution development, implementation enablement, and post-deployment optimization to accelerate AI factory adoption!
Key Responsibilities:
- Collaborate with NVIDIA product, engineering and partners teams to understand NVIDIA's reference architecture for data center infrastructure including power distribution, cooling systems, controls and monitoring, and network/cabling architecture.
- Support customers and partners in quickly implementing this architecture into sophisticated and reliable data center builds.
- Collaborate across the org to build processes, partner relationships and workflows to deliver and deploy large AI factories at speed of light (SOL).
- Review and appraise customers' and partners' infrastructure build plans, verifying their compliance with NVIDIA reference architecture, industry standards, and regulatory requirements.
- Delivering mentorship, expertise and suggestions to optimize performance, scalability, and cost-effectiveness.
- Review the operational efficiency, reliability, and readiness of data center infrastructure elements before initiating AI/HPC cluster deployments.
- Design and apply detailed audit plans and conduct pre-deployment audits to detect possible problems, risks, and improvement areas.
- Implement and make quality assurance processes to ensure that deployments meet established specifications and performance benchmarks.
- Conduct detailed bring-up, testing, and commissioning to validate the functionality and reliability of infrastructure components.
- Develop partner-facing playbooks, deployment guides, and guideline documentation specific to AI factory implementations using NVIDIA reference builds.
- Provide hands-on mentorship to partners on deploying NVIDIA AI factory; conduct workshops and develop runbooks to build partners expertise.
- Serve as the representative of the customer/partner within NVIDIA, conveying field insights, deployment challenges, and optimization opportunities to NVIDIA internal and product engineering teams to support the seamless integration of data center infrastructure solutions.
Requirements:
- Bachelor's degree or equivalent experience in Engineering, or a related field.
- At least 8+ years of experience in high-density AI/HPC data centers.
- Proven technical expertise in data center systems and operations,power distribution, liquid cooling, rack/server chassis, and cabling.
- Effective communication at both on-the-ground implementation as well as executive levels, internally and with customers and partners.
- Coordination & Time Management – proficient at planning and scheduling tasks related to the job to accomplish objectives within or ahead of designated time frames.
- Able to travel (25%).