Description
Are you passionate about programming languages, compiler technology, and GPU performance? We are looking for outstanding engineers to build CUTLASS DSL, a Python-native language for GPU kernel development, along with the MLIR dialects and lowering passes behind it.
In this role, you will design, develop, and optimize CUTLASS DSL, a Python-native language for high-performance GPU kernel development.
Responsibilities:
- Design, develop, and optimize CUTLASS DSL, a Python-native language for high-performance GPU kernel development
- Build and advance the MLIR dialects, lowering passes, and code generation flows that power the CUTLASS DSL stack
- Drive innovations that improve kernel compilation speed while maintaining performance on par with CUTLASS C++
Requirements:
- MS, PhD, or equivalent experience in Computer Science, Software Engineering, or a related field
- 2+ years of relevant work experience
- Excellent programming skills in Python and strong proficiency in C++
- Hands-on experience with DSLs, compilers, or code generation systems
- Strong command of the MLIR/LLVM stack, including IR design and pass optimization
Preferred Qualifications:
- Deep understanding of the CUDA GPU programming model, GPU microarchitecture, and performance analysis and optimization techniques
- Familiarity with key high-performance computing abstractions such as Layout, Tile, MMA, and TMA in the CuTeecosystem
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/China-Shanghai/Deep-Learning-Performance-Architect--CUTLASS-DSL_JR2018773