New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
NVIDIA

Senior Software Engineer, CUTLASS Kernels

NVIDIA
Apply →
onsite senior full-time $120,000–$180,000 Santa Clara

First indexed 2 Jun 2026

Description

We are seeking a Senior Software Engineer to join our CUTLASS team. As a key member of our team, you will be responsible for developing and optimizing math kernels to extract the highest performance out of our hardware architecture.

Your primary focus will be on writing Tensor Core-based deep learning kernels such as grouped-GEMM, attention, and convolution using CUTLASS CUDA C++ and Python DSL for Blackwell, Rubin, and future architectures. You will also optimize kernels for peak throughput on both silicon and software performance simulators.

In addition to your technical expertise, you will collaborate with teams across NVIDIA including the GPU architecture, NVVM/PTX compiler, CUDA library, and DL frameworks teams to ensure fast, functional, and timely kernel delivery to customers.

To be successful in this role, you should have a strong proficiency in C++ programming and software design, including debugging, performance evaluation, and testing. You should also have experience with CUDA, OpenCL, HIP, SYCL, Mojo, Pallas, Triton, Mosaic, Halide, or any general-purpose or domain-specific programming language targeting highly parallel accelerators.

Experience writing code specifically targeting NVIDIA Tensor Cores, particularly through PTX or CUDA/cuTile, is highly desirable. Open-source contributions to math kernel libraries or frameworks are also a plus.

As a Senior Software Engineer, you will be eligible for equity and benefits. Applications for this job will be accepted at least until June 5, 2026.