Description
We are seeking a passionate and seasoned Senior K8s Expert to join our team, focusing on the infrastructure construction, optimization, and operation of Agentic AI and Agentic Reinforcement Learning (Agentic RL) workloads. You will play a core role in bridging NVIDIA's cutting-edge accelerated computing technologies with cloud service providers (CSPs) in China, driving the landing and scaling of Agentic AI/RL solutions based on Kubernetes, and empowering our CSP partners to build high-performance, scalable, and secure Agent Infra systems.
As a Senior Solutions Architect, you will work with Sales, BD and CPM team to introduce NVIDIA technologies into assigned accounts and grow business accordingly. You will lead the design, development, and optimization of Kubernetes-based infrastructure solutions for Agentic AI and Agentic RL workloads, addressing core challenges including massive concurrent sandbox scheduling, millisecond-level elasticity, secure isolation, and full-scenario interactive environment support.
You will collaborate closely with NVIDIA's CSP partners (major cloud service providers in China) to understand their Agentic AI/RL business needs, provide professional K8s technical guidance, and tailor infrastructure solutions that align with NVIDIA's accelerated computing technologies (such as NVIDIA AI Enterprise, GB200 platform, and NVCF).
You will optimize Kubernetes clusters to support high-throughput, low-latency Agentic RL training and inference workloads, including resource scheduling strategy optimization, GPU resource management, network and storage performance tuning, and solving bottlenecks in large-scale Pod creation and scheduling.
You will design and implement Agent Infra core components based on K8s, such as secure sandbox environments, interactive trajectory recording, checkpoint breakpoint replay, and full-link observability tools, to support the end-to-end lifecycle of Agentic AI/RL development and deployment.
You will work with cross-functional teams (NVIDIA's R&D, solution architecture, and technical support teams) to promote the integration of K8s with NVIDIA's software and hardware ecosystem, including NVIDIA Operators, Dynamo, Grove, and KAI Scheduler, to achieve optimal performance of Agentic workloads.
You will provide technical leadership in K8s and Agentic AI/RL Infra fields, guide junior engineers, and drive the continuous iteration and improvement of infrastructure solutions based on industry best practices and customer feedback.
You will stay abreast of the latest trends in Kubernetes, Agentic AI, Agentic RL, and cloud-native infrastructure, introduce advanced technologies and solutions into NVIDIA's CSP ecosystem, and promote technological innovation and standardization.
You will participate in technical pre-sales support, solution demonstration, and technical training for CSP partners, helping partners master K8s-based Agentic AI/RL Infra construction and operation capabilities.