Description

We're looking for outstanding AI systems engineers to develop groundbreaking technologies in the inference systems software stack. Here's what you'll be doing:

Developing new AI systems technologies for efficient inference Designing, implementing, and optimising kernels for high-impact AI workloads Designing and implementing extensible abstractions for LLM serving engines Building efficient just-in-time domain-specific compilers and runtimes Collaborating closely with other engineers at NVIDIA across deep learning frameworks, libraries, kernels, and GPU arch teams Contributing to open-source communities like FlashInfer, vLLM, and SGLang

To succeed in this role, you'll need:

A Master's degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience) 6+ years of experience with ML/DL systems development Strong experience in developing or using deep learning frameworks (e.g., PyTorch, JAX, TensorFlow, ONNX, etc.) and ideally inference engines and runtimes such as vLLM, SGLang, and MLC Strong Python and C/C++ programming skills Strong experience in GPU kernel development and performance optimisations (especially using CUDA C/C++, cuTile, Triton, or similar)

If you have expertise in domain-specific compiler and library solutions for LLM inference and training (e.g., FlashInfer, Flash Attention), inference engines like vLLM and SGLang, or machine learning compilers (e.g., Apache TVM, MLIR), you'll stand out from the crowd.

You'll also be eligible for equity and benefits.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Software-Engineer--AI-and-DL-Kernel-Libraries_JR2014704