New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
NVIDIA

Solutions Architect - CPU and LPU

NVIDIA
Apply →
onsite senior full-time Beijing, Shanghai

First indexed 18 May 2026

Description

Job Description

We're looking for a software-focused Solutions Architect to drive adoption of next-generation AI infrastructure across NVIDIA CPU platforms and LPU-based inference systems.

As a Solutions Architect, you will be the first line of technical expertise between NVIDIA and our customers for CPU- and LPU-centric AI system design. You will help customers understand how NVIDIA CPUs and LPU-based systems can improve the efficiency, latency, throughput, and total cost of their AI workloads, especially when deployed alongside NVIDIA GPUs in heterogeneous production environments.

Key Responsibilities:

  • Evangelize NVIDIA CPU platforms, including Grace, Vera, and future generations, as well as LPU-based systems and LPX-class platforms, with a strong focus on AI software stacks and workload efficiency.
  • Help customers design and optimize AI workloads across CPU, GPU, and LPU, improving latency, throughput, utilization, and overall cost efficiency.
  • Analyze and tune LLM and generative AI pipelines across serving, runtime, memory, I/O, batching, scheduling, and orchestration layers.
  • Build proof-of-concepts, reference architectures, and technical guidance in partnership with Engineering, Product, and Sales teams.
  • Establish trusted technical relationships with customer architects, infrastructure teams, and senior leaders, becoming a strategic advisor for heterogeneous AI system design.

Requirements:

  • MS or PhD in Computer Science, Engineering, Mathematics, Physics, or a related field, or equivalent experience, plus 5+ years in AI systems, infrastructure, performance engineering, or solution architecture.
  • Strong understanding of modern CPU architecture, Linux systems, and software performance tuning, along with hands-on experience in AI inference for LLM, generative AI, or agentic AI workloads.
  • Experience optimizing heterogeneous systems involving CPU and accelerators, with familiarity in frameworks such as PyTorch, Triton, TensorRT-LLM, vLLM, or ONNX Runtime.
  • Strong programming, problem-solving, and communication skills, with the ability to work effectively with both technical teams and senior customer stakeholders.

Nice to Have:

  • Experience with NVIDIA CPU platforms such as Grace, Grace Hopper, or Arm64 server environments, and familiarity with LPU-based systems or other low-latency inference accelerators.
  • Deep expertise in LLM inference optimization, serving architecture, and workload placement across CPU, GPU, and LPU.
  • Experience building customer-facing proof-of-concepts and measuring AI efficiency through latency, throughput, cost per token, power, or utilization.
  • Familiarity with NVIDIA AI software and platform technologies.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/China-Beijing/Solutions-Architect---CPU-and-LPU_JR2015614