Description
Job Description
We're looking for a software-focused Solutions Architect to drive adoption of next-generation AI infrastructure across NVIDIA CPU platforms and LPU-based inference systems.
As a Solutions Architect, you will be the first line of technical expertise between NVIDIA and our customers for CPU- and LPU-centric AI system design. You will help customers understand how NVIDIA CPUs and LPU-based systems can improve the efficiency, latency, throughput, and total cost of their AI workloads, especially when deployed alongside NVIDIA GPUs in heterogeneous production environments.
Key Responsibilities:
- Evangelize NVIDIA CPU platforms, including Grace, Vera, and future generations, as well as LPU-based systems and LPX-class platforms, with a strong focus on AI software stacks and workload efficiency.
- Help customers design and optimize AI workloads across CPU, GPU, and LPU, improving latency, throughput, utilization, and overall cost efficiency.
- Analyze and tune LLM and generative AI pipelines across serving, runtime, memory, I/O, batching, scheduling, and orchestration layers.
- Build proof-of-concepts, reference architectures, and technical guidance in partnership with Engineering, Product, and Sales teams.
- Establish trusted technical relationships with customer architects, infrastructure teams, and senior leaders, becoming a strategic advisor for heterogeneous AI system design.
Requirements:
- MS or PhD in Computer Science, Engineering, Mathematics, Physics, or a related field, or equivalent experience, plus 5+ years in AI systems, infrastructure, performance engineering, or solution architecture.
- Strong understanding of modern CPU architecture, Linux systems, and software performance tuning, along with hands-on experience in AI inference for LLM, generative AI, or agentic AI workloads.
- Experience optimizing heterogeneous systems involving CPU and accelerators, with familiarity in frameworks such as PyTorch, Triton, TensorRT-LLM, vLLM, or ONNX Runtime.
- Strong programming, problem-solving, and communication skills, with the ability to work effectively with both technical teams and senior customer stakeholders.
Nice to Have:
- Experience with NVIDIA CPU platforms such as Grace, Grace Hopper, or Arm64 server environments, and familiarity with LPU-based systems or other low-latency inference accelerators.
- Deep expertise in LLM inference optimization, serving architecture, and workload placement across CPU, GPU, and LPU.
- Experience building customer-facing proof-of-concepts and measuring AI efficiency through latency, throughput, cost per token, power, or utilization.
- Familiarity with NVIDIA AI software and platform technologies.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/China-Beijing/Solutions-Architect---CPU-and-LPU_JR2015614