# DL System Software Engineer - AI Platform

**Company**: NVIDIA
**Location**: Toronto
**Work arrangement**: onsite
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Canada-Toronto/DL-System-Software-Engineer---AI-Platform_JR2002456?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_8aa7a3f3-0de

## Description

We are seeking highly motivated and skilled systems engineers to join our team to help develop an AI Platform that offers an efficient infrastructure for inference and training large-scale models.

As a systems engineer, you will play a crucial role in building a unified solution that brings our innovative NVIDIA technologies such as high-performance, inference/training frameworks, ML compilers, performance predictor, and cluster scheduler into a single, cohesive platform.

Responsibilities:

- Take part in the development of the NVIDIA's AI platform for training, fine-tuning, and serving latest and greatest AI models with the best performance and efficiency.

- Design and build solutions for scheduling large-scale AI training and inference workloads on GPU clusters over many cloud infrastructure.

- Explore and find solutions for open problems like industry-scale resource management, GPU scheduling, performance prediction, and live workload migration.

- Work with and contribute to adjacent teams like TensorRT/Dynamo inference engine, ML compiler, KAI/Grove scheduler, Lepton cloud, etc.

Requirements:

- Bachelor's degree or equivalent experience in Computer Science, Computer Engineering, relevant technical field.

- 5+ years of experience.

- Experience building large-scale systems from scratch. Prior experience in container-based deployment systems like Kubernetes is beneficial.

- Strong coding skills in programming languages like Python, Go, Rust, and/or C/C++.

- Solid foundation in other computer science and computer engineering topics: algorithms and data structures, operating systems, computer architecture, etc. Strong understanding of AI and related technologies is a huge plus.

- Ability to quickly grasp new concepts and thrive in evolving situations.

Ways to stand out from the crowd:

- Graduate-level education or relevant practical background, particularly in research, is beneficial.

- Practical experience in building and optimizing AI applications is highly desired.

- Proficiency in container software such as containerd, CRI-O, Linux namespace, CRIU, and NVIDIA GPU technology such as CUDA graphs, Driver/runtime is greatly advantageous.

You will also be eligible for equity and benefits.

## Skills

### Required
- Python
- Go
- Rust
- C/C++
- Kubernetes
- containerd
- CRI-O
- Linux namespace
- CRIU
- CUDA graphs
- Driver/runtime

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Canada-Toronto/DL-System-Software-Engineer---AI-Platform_JR2002456?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
