# Agent RL Infra Engineer

**Company**: NVIDIA
**Location**: Santa Clara
**Work arrangement**: onsite
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Agent-RL-Infra-Engineer_JR2015309?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_1a1bcc91-655

## Description

We're hiring an engineer to help us bring reinforcement learning to every agent team at NVIDIA. This is a rare chance to shape how autonomous, self-improving agents learn and evolve across the enterprise. The role sits at the intersection of ML research and production engineering. What if every agent developer could add self-improvement loops to their workflows without needing deep RL expertise? That's the challenge here: evaluate emerging approaches, adapt them into enterprise-ready blueprints, and make them available inside sandboxed execution environments with the security and governance the enterprise demands.

### Responsibilities

The work splits between creating enterprise-ready RL capabilities and partnering with agent teams to put them into practice.

#### Building RL Cookbooks and Environments

- Evaluate and adapt democratized RL approaches into reusable cookbooks and blueprints so agent developers can integrate self-improvement loops (GRPO, DPO, PPO, RLAIF) on their own

- Design verifiable reward environments building on NeMo Gym, extending to domain-specific environments for internal use cases

- Operationalize NVIDIA and third-party training backends as production services inside Sandbox

- Integrate with NeMo Microservices (Curator, Customizer, Evaluator, Guardrails) to enable end-to-end data flywheel workflows for RL

#### Infrastructure, Reliability, and Collaboration

- Lead data curation and active learning strategies to continuously improve training data quality

- Design RL training loops for agent self-improvement: reward modeling, policy optimization, safety constraints

- Integrate with AI Factory GPU infrastructure for throughput, data locality, and multi-node training

- Build observability for training runs and ensure workloads meet security and governance requirements

- Collaborate with platform, security, agent infrastructure, and internal customer teams on safe deployment of training outputs

### Requirements

- MS in CS, ML, or related field (or equivalent experience)

- 10+ years of experience

- Experience operationalizing fine-tuning methods (LoRA, SFT) and especially RL techniques (DPO, GRPO, PPO, RLAIF) into reusable cookbooks and self-service workflows

- Familiarity with distributed training frameworks (e.g., Megatron, NeMo, DeepSpeed, FSDP, HF Accelerate) and ML ops skills covering pipeline automation, job orchestration, and GPU cluster management are important here

- Proficiency in Python, Go, Rust, or similar

- Background in CS, ML, or related field through formal education or equivalent experience

### Ways to Stand Out

- Building RL environments or training recipes that other teams consumed as self-service capabilities

- Familiarity with NVIDIA infrastructure (DGX, AI Factory, NVLink/InfiniBand), NeMo Microservices, or the evolving RL-for-agents ecosystem (rLLM, Agent Lightning, HUD, OpenRLHF, SkyRL)

- Experience with data curation, active learning, continuous learning loops, or data flywheel architectures also valued

You will also be eligible for equity and benefits.

## Skills

### Required
- Python
- Go
- Rust
- Megatron
- NeMo
- DeepSpeed
- FSDP
- HF Accelerate
- LoRA
- SFT
- DPO
- GRPO
- PPO
- RLAIF

### Nice to have
- NVIDIA infrastructure
- NeMo Microservices
- RL-for-agents ecosystem
- data curation
- active learning
- continuous learning loops
- data flywheel architectures

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Agent-RL-Infra-Engineer_JR2015309?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
