Description

Are you passionate about programming languages, compiler technology, and GPU performance? We are looking for outstanding engineers to build CUTLASS DSL, a Python-native language for GPU kernel development, along with the MLIR dialects and lowering passes behind it.

In this role, you will design, develop, and optimize CUTLASS DSL, a Python-native language for high-performance GPU kernel development.

Responsibilities:

Design, develop, and optimize CUTLASS DSL, a Python-native language for high-performance GPU kernel development
Build and advance the MLIR dialects, lowering passes, and code generation flows that power the CUTLASS DSL stack
Drive innovations that improve kernel compilation speed while maintaining performance on par with CUTLASS C++

Requirements:

MS, PhD, or equivalent experience in Computer Science, Software Engineering, or a related field
2+ years of relevant work experience
Excellent programming skills in Python and strong proficiency in C++
Hands-on experience with DSLs, compilers, or code generation systems
Strong command of the MLIR/LLVM stack, including IR design and pass optimization

Preferred Qualifications:

Deep understanding of the CUDA GPU programming model, GPU microarchitecture, and performance analysis and optimization techniques
Familiarity with key high-performance computing abstractions such as Layout, Tile, MMA, and TMA in the CuTeecosystem

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/China-Shanghai/Deep-Learning-Performance-Architect--CUTLASS-DSL_JR2018773