Description

We are seeking a Senior DL Algorithms Engineer to join our team. You will enable and optimize state-of-the-art open models, contribute new features, and deliver production code to open-source frameworks. Your expertise in deep learning, neural networks, and performance profiling will help us push the boundaries of inference performance.

Key responsibilities include:

Enabling and optimizing state-of-the-art open models on NVIDIA's accelerated inference SW stack.
Contributing new features, fixing bugs, and delivering production code to open-source frameworks like TRT-LLM, vLLM, SGLang, FlashInfer, etc.
Profiling and analyzing bottlenecks across the full inference stack to push the boundaries of inference performance.
Benchmarking state-of-the-art offerings and performing competitive analysis for NVIDIA's SW/HW stack.
Co-designing with partner teams to develop the next generation of AI models and services.

Requirements include:

PhD in CS, EE, or CSEE or equivalent experience.
3+ years of experience.
Strong background in deep learning and neural networks, in particular inference.
Experience with performance profiling, analysis, and optimization, especially for GPU-based applications.
Proficient in PyTorch or equivalent frameworks for AI, or HPC-heavy application development.
Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.

Preferred qualifications include:

Proven experience with processor and system-level performance optimization.
Deep understanding of modern LLM/Diffusion architectures.
Strong fundamentals in algorithms.
GPU programming experience (CUDA or OpenCL) is a strong plus.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-DL-Algorithms-Engineer---Inference-Performance_JR2017176