# Senior Deep Learning Software Engineer

**Company**: NVIDIA
**Location**: Santa Clara
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Deep-Learning-Software-Engineer_JR2012411?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_2ad1bc81-f00

## Description

We are looking for a Senior Deep Learning Software Engineer to design and build our automated inference and deployment solution. As part of the team, you will be instrumental in defining a scalable architecture for DL inference with emphasis on ease-of-use and compute efficiency.

Your work will span multiple layers of the DL deployment stack, encompassing developing features in high-level frameworks like PyTorch and JAX, designing and implementing a high-performance execution environment, low-level GPU optimizations and developing custom GPU kernels in CUDA and/or Triton.

This is an exceptional opportunity for software engineers straddling the boundaries of research and engineering, with a strong background in both machine learning fundamentals and software architecture & engineering.

**Responsibilities:**

- Play a pivotal role in defining of a modular, scalable platform to seamlessly bridge training and deployment workflows,enabling tight integration of deployment tooling with training frameworks such as Megatron and Nemo

- Leverage and build upon the torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc...) to analyze and extract standardized model graph representation from arbitrary torch models for our automated deployment solution.

- Develop support for inference optimization techniques such as speculative decoding and LoRA.

- Collaborate with teams across NVIDIA to use performant kernel implementations within the automated deployment solution.

- Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.

- Continuously innovate on the inference performance to ensure NVIDIA's inference software solutions (TRT, TRT-LLM, TRT Model Optimizer) can maintain and increase its leadership in the market.

**Requirements:**

- Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.

- 8+ years of relevant work or research experience in Deep Learning.

- Excellent software design skills, including debugging, performance analysis, and test design.

- Strong proficiency in Python, PyTorch, and related ML tools.

- Strong algorithms and programming fundamentals.

- Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

**Nice to Have:**

- Contributions to PyTorch, JAX, or other Machine Learning Frameworks.

- Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.

- Familiarity with NVIDIA's deep learning SDKs such as TensorRT.

- Prior experience in writing high-performance GPU kernels for machine learning workloads in frameworks such as CUDA, CUTLASS, or Triton.

## Skills

### Required
- Python
- PyTorch
- JAX
- CUDA
- Triton
- TensorRT
- GPU architecture
- compilation stack
- debugging
- performance analysis
- test design

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Deep-Learning-Software-Engineer_JR2012411?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)