# Senior Software Engineer, Distributed Systems - NIM Factory

**Company**: NVIDIA
**Location**: Santa Clara
**Work arrangement**: remote
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Software-Engineer--Distributed-Systems---NIM-Factory_JR2010745?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_2fb5f86f-126

## Description

We are seeking a senior engineer to design and build factory infrastructure and automation for NVIDIA Inference Microservices (NIMs). The right person for this role brings technical drive and creativity to change the way NVIDIA optimizes and serves performant inferencing for every AI model in a heterogeneous cluster environments.

Our NIM offerings are easy to use, highly performant and tested in all deployment scenarios, in the cloud, on customer’s self-hosted infrastructure and locally on all NVIDIA GPUs. You will apply your deep technical expertise to design an efficient, scalable and reliable automation factory infrastructure that will take AI models to become NIMs that are validated for best in class performance and accuracy.

You will harness groundbreaking technologies, and build a highly efficient factory to power how NVIDIA builds and validates NIMs for inferencing all the way through deployment in heterogeneous hardware and software environments. You will influence and drive technical advances in NVIDIA's workflows and build the infrastructure that strives to accelerate the delivery of every AI model on NVIDIA's GPUs anywhere.

**Responsibilities:**

- Develop a factory pipeline that will take an AI model in and produce a deployable service that is validated across Cloud, On-prem and Kubernetes environments.

- Work with technical leaders designing and developing scalable and reliable factory components.

- Define metrics and drive improvements based on user feedback.

**Requirements:**

- A history of using your advanced programming skills to build distributed and compute systems, backend services, microservices and cloud technologies.

- Effective experience working with multi-functional teams, principals and architects, across organizational boundaries.

- Deep technical expertise in distributed containerize applications using technologies such as Docker, K8s, Cloud Endpoints, Helm, and Prometheus.

- Passion for building rich, microservice applications build and test automation pipeline.

- Excellent interpersonal skills and the ability to lead multi-functional efforts.

**Nice to Have:**

- Experience delivering event-driven applications using various services such as Temporal, Kafka, Redis or others and a demonstrable ability to discuss the pros and cons of these choices.

- A history of building and deploying containers for Microservices, Cloud and On-prem deployments, and their associated CI/CD pipelines.

## Skills

### Required
- Docker
- Kubernetes
- Cloud Endpoints
- Helm
- Prometheus
- Microservices
- Cloud Technologies
- Containerization

### Nice to have
- Temporal
- Kafka
- Redis
- CI/CD Pipelines

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Software-Engineer--Distributed-Systems---NIM-Factory_JR2010745?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
