# Senior Staff Software Engineer - AI Agent Platform

**Company**: NVIDIA
**Location**: Santa Clara
**Work arrangement**: onsite
**Experience**: senior
**Job type**: full-time
**Salary**: Competitive salary and benefits package
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Staff-Software-Engineer---AI-Agent-Platform_JR2016997?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_7c3f5ab9-371

## Description

We are looking for a Sr. Engineer to design, build, and scale the infrastructure powering NVIDIA's AI agent ecosystem. You will work at the intersection of distributed systems, developer platforms, and agentic AI , building the foundational services that enable teams across the company to develop, deploy, orchestrate, and operate autonomous AI agents at production scale.

**Key Responsibilities:**

- Build and develop platform services that own the full agent lifecycle from registration through deployment, execution, and teardown

- Architect Kubernetes-based execution environments with pod lifecycle management, namespace isolation, persistent storage, and identity propagation

- Develop and maintain automated CI/CD pipelines using GitLab CI and ArgoCD, including reusable pipeline templates and deployment blueprints that standardize how agents are built across teams

- Build framework-agnostic infrastructure supporting multiple agent SDKs (Claude Code, OpenAI Codex, LangGraph), with hands-on experience using harnesses, lifecycle hooks, skills configurability, observability (OTEL), and memory services

- Build and operate Kafka-based message pipelines and real-time event streaming using Redis PubSub and SSE

- Develop data ingestion pipelines, access interfaces, and storage layers that power AI agent knowledge and context

- Implement session management for state persistence, conversation history, and agent recovery across sessions

- Develop multi-layer auth using OAuth 2.0, JWT validation, token exchange, and gateway integration, and manage secrets lifecycle with Vault (provisioning, rotation, container injection)

**Requirements:**

- Bachelor's or Master's degree in Computer Science, Engineering, or related field (or equivalent experience), with 12+ years in software engineering , ideally in platform engineering, infrastructure, or developer tools

- Experience building and scaling AI agents in production using frameworks like Claude Code, Codex, or LangGraph

- Deep Kubernetes expertise including pod orchestration, persistent storage, RBAC, and multi-cluster management

- Strong Python skills with production API experience using FastAPI, Flask, or similar async frameworks

- Proven track record designing distributed systems with Kafka, Redis, and MongoDB or PostgreSQL

- Expertise building and managing robust CI/CD pipelines using GitLab CI and ArgoCD for continuous delivery to Kubernetes

- Experience designing AI data platform components (ingestion pipelines, vector stores, retrieval APIs, data preprocessing workflows) and building developer-facing platform APIs consumed by multiple engineering teams

- Solid grasp of auth and identity: OAuth 2.0, JWT, token exchange, and secrets management with Vault

- History of leading sophisticated technical projects such as migrations or greenfield platform builds, with strong interpersonal skills to drive alignment across teams and write clear design documents

**Nice to Have:**

- Experience building or operating AI agent platforms or agentic workflow systems, with hands-on expertise in agent protocols and frameworks like MCP, A2A, LangChain, or LangGraph

- Hands-on experience with RAG architectures, embedding pipelines, and vector databases (Milvus, Pinecone, or Weaviate)

- Full-stack skills with React or Vue for building developer portals and dashboards

- Contributions to open-source infrastructure or platform tooling

## Skills

### Required
- Kubernetes
- Python
- FastAPI
- Flask
- GitLab CI
- ArgoCD
- Kafka
- Redis
- MongoDB
- PostgreSQL
- OAuth 2.0
- JWT
- Vault

### Nice to have
- Claude Code
- OpenAI Codex
- LangGraph
- RAG architectures
- embedding pipelines
- vector databases
- React
- Vue

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Staff-Software-Engineer---AI-Agent-Platform_JR2016997?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
