# Deep Learning Compiler Engineer - CUDA

**Company**: NVIDIA
**Location**: Shanghai
**Work arrangement**: onsite
**Experience**: mid
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/China-Shanghai/Deep-Learning-Compiler-Engineer---CUDA_JR2010731?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_7886dec4-ec0

## Description

We are now looking for a cuTile Core Compiler Architect to join our group. The NVIDIA Architecture group is looking for world-class architects and engineers to join and lead our various architecture efforts. A key part of NVIDIA's strength is to innovate in the graphics and parallel computing fields, delivering the highest performance in the world for parallel processing algorithms.

**What you'll be doing:**

- Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures

- Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance

- Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack

- Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks

**Requirements:**

- Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)

- 2+ years of relevant work experience

- Excellent C/C++ programming and software engineering skills, ACM background is a plus

- Good fundamental knowledge on computer architecture

- Strong ability in abstracting problems and the methodology in resolving problems

- Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired

- Good knowledge of GPU architecture and fast kernel programming skills is a plus

- Knowledge of LLM algorithms or a certain HPC domain is a plus

- Knowledge of multi-GPU distributed communication is a plus

- Excellent oral communication in English is a plus

## Skills

### Required
- C/C++
- Software Engineering
- Computer Architecture
- Compiler Backgrounds
- GPU Architecture
- Fast Kernel Programming

### Nice to have
- MLIR
- TVM
- Triton
- LLVM
- LLM Algorithms
- HPC Domain
- Multi-GPU Distributed Communication

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/China-Shanghai/Deep-Learning-Compiler-Engineer---CUDA_JR2010731?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
