Description
We are seeking a Senior DL Algorithms Engineer to join our team. You will enable and optimize state-of-the-art open models, contribute new features, and deliver production code to open-source frameworks. Your expertise in deep learning, neural networks, and performance profiling will help us push the boundaries of inference performance.
Key responsibilities include:
- Enabling and optimizing state-of-the-art open models on NVIDIA's accelerated inference SW stack.
- Contributing new features, fixing bugs, and delivering production code to open-source frameworks like TRT-LLM, vLLM, SGLang, FlashInfer, etc.
- Profiling and analyzing bottlenecks across the full inference stack to push the boundaries of inference performance.
- Benchmarking state-of-the-art offerings and performing competitive analysis for NVIDIA's SW/HW stack.
- Co-designing with partner teams to develop the next generation of AI models and services.
Requirements include:
- PhD in CS, EE, or CSEE or equivalent experience.
- 3+ years of experience.
- Strong background in deep learning and neural networks, in particular inference.
- Experience with performance profiling, analysis, and optimization, especially for GPU-based applications.
- Proficient in PyTorch or equivalent frameworks for AI, or HPC-heavy application development.
- Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.
Preferred qualifications include:
- Proven experience with processor and system-level performance optimization.
- Deep understanding of modern LLM/Diffusion architectures.
- Strong fundamentals in algorithms.
- GPU programming experience (CUDA or OpenCL) is a strong plus.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-DL-Algorithms-Engineer---Inference-Performance_JR2017176