New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
NVIDIA

Senior DL Algorithms Engineer - Inference Performance

NVIDIA
Apply →
remote senior full-time Santa Clara

First indexed 5 May 2026

Description

We are seeking a Senior DL Algorithms Engineer to join our team. You will enable and optimize state-of-the-art open models, contribute new features, and deliver production code to open-source frameworks. Your expertise in deep learning, neural networks, and performance profiling will help us push the boundaries of inference performance.

Key responsibilities include:

  • Enabling and optimizing state-of-the-art open models on NVIDIA's accelerated inference SW stack.
  • Contributing new features, fixing bugs, and delivering production code to open-source frameworks like TRT-LLM, vLLM, SGLang, FlashInfer, etc.
  • Profiling and analyzing bottlenecks across the full inference stack to push the boundaries of inference performance.
  • Benchmarking state-of-the-art offerings and performing competitive analysis for NVIDIA's SW/HW stack.
  • Co-designing with partner teams to develop the next generation of AI models and services.

Requirements include:

  • PhD in CS, EE, or CSEE or equivalent experience.
  • 3+ years of experience.
  • Strong background in deep learning and neural networks, in particular inference.
  • Experience with performance profiling, analysis, and optimization, especially for GPU-based applications.
  • Proficient in PyTorch or equivalent frameworks for AI, or HPC-heavy application development.
  • Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.

Preferred qualifications include:

  • Proven experience with processor and system-level performance optimization.
  • Deep understanding of modern LLM/Diffusion architectures.
  • Strong fundamentals in algorithms.
  • GPU programming experience (CUDA or OpenCL) is a strong plus.