# Senior Software Engineer – TensorRT Edge-LLM

**Company**: NVIDIA
**Location**: Santa Clara
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Software-Engineer---TensorRT-Edge-LLM_JR2012868?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_4a322639-1a9

## Description

Are you passionate about pushing the limits of real-time large language model inference? Join NVIDIA's TensorRT Edge-LLM team and help shape the next generation of edge AI for automotive and robotics.

We build the software stack that enables Large Language, Vision-Language, and Multimodal (LLM/VLM/VLA) models to run efficiently on embedded and edge platforms , delivering cutting-edge generative AI experiences directly on-device.

**Responsibilities:**

- Develop and evolve a state-of-the-art inference framework in modern C++ that extends TensorRT with autoregressive model serving capabilities, including speculative decoding, LoRA, MoE, and KV cache management.

- Design and implement compiler and runtime optimizations tailored for transformer-based models running on constrained, real-time platforms.

- Collaborate with teams across CUDA, kernel libraries, compilers, and robotics to deliver high-performance, production-ready solutions.

- Contribute to CUDA kernel and operator development for critical transformer components such as attention, GEMM, and MoE.

- Benchmark, profile, and optimize inference performance across diverse embedded and automotive environments.

- Stay ahead of the rapidly evolving LLM/VLM ecosystem and bring emerging techniques into product-grade software.

**Requirements:**

- BS, MS, PhD, or equivalent experience in Computer Science, Electrical/Computer Engineering, or a closely related field.

- 4+ years of relevant software development experience.

- Deep understanding of transformer models and inference optimization techniques (e.g., quantization, tensor parallelism, or memory-efficient scheduling).

- Proficient programming ability with modern C++ (C++11/14/17 and beyond).

- Familiarity with popular LLM frameworks and libraries such as TensorRT, TensorRT-LLM, vLLM, SGLang, MLC-LLM, or FlashInfer.

- A track record of strong software design, execution, and collaboration across fields.

**Preferred Qualifications:**

- Demonstrated development experience or open-source contributions to LLM inference frameworks and libraries, such as SGLang, vLLM, or FlashInfer.

- Proficiency with CUDA, including efficient kernel development, performance profiling, and GPU architecture fundamentals.

- Prior work on autoregressive LLM serving systems, including speculative decoding or KV cache management.

- Familiarity with compiler infrastructure for large language model inference.

- Exposure to robotics or embedded AI pipelines, including optimizing for low-latency, resource-constrained systems.

NVIDIA is widely considered to be one of the technology world's most desirable employers. We hire some of the most brilliant and forward-thinking people in the world. If you thrive on innovation, autonomy, and technical excellence, come join us to shape the future of edge AI.

## Skills

### Required
- Modern C++
- TensorRT
- Transformer models
- Inference optimization techniques
- CUDA
- LLM frameworks and libraries

### Nice to have
- SGLang
- vLLM
- FlashInfer
- CUDA kernel development
- GPU architecture fundamentals
- Autoregressive LLM serving systems
- Compiler infrastructure for large language model inference
- Robotics or embedded AI pipelines

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Software-Engineer---TensorRT-Edge-LLM_JR2012868?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)