Description
We're hiring an engineer to help us bring reinforcement learning to every agent team at NVIDIA. This is a rare chance to shape how autonomous, self-improving agents learn and evolve across the enterprise. The role sits at the intersection of ML research and production engineering. What if every agent developer could add self-improvement loops to their workflows without needing deep RL expertise? That's the challenge here: evaluate emerging approaches, adapt them into enterprise-ready blueprints, and make them available inside sandboxed execution environments with the security and governance the enterprise demands.
Responsibilities
The work splits between creating enterprise-ready RL capabilities and partnering with agent teams to put them into practice.
Building RL Cookbooks and Environments
- Evaluate and adapt democratized RL approaches into reusable cookbooks and blueprints so agent developers can integrate self-improvement loops (GRPO, DPO, PPO, RLAIF) on their own
- Design verifiable reward environments building on NeMo Gym, extending to domain-specific environments for internal use cases
- Operationalize NVIDIA and third-party training backends as production services inside Sandbox
- Integrate with NeMo Microservices (Curator, Customizer, Evaluator, Guardrails) to enable end-to-end data flywheel workflows for RL
Infrastructure, Reliability, and Collaboration
- Lead data curation and active learning strategies to continuously improve training data quality
- Design RL training loops for agent self-improvement: reward modeling, policy optimization, safety constraints
- Integrate with AI Factory GPU infrastructure for throughput, data locality, and multi-node training
- Build observability for training runs and ensure workloads meet security and governance requirements
- Collaborate with platform, security, agent infrastructure, and internal customer teams on safe deployment of training outputs
Requirements
- MS in CS, ML, or related field (or equivalent experience)
- 10+ years of experience
- Experience operationalizing fine-tuning methods (LoRA, SFT) and especially RL techniques (DPO, GRPO, PPO, RLAIF) into reusable cookbooks and self-service workflows
- Familiarity with distributed training frameworks (e.g., Megatron, NeMo, DeepSpeed, FSDP, HF Accelerate) and ML ops skills covering pipeline automation, job orchestration, and GPU cluster management are important here
- Proficiency in Python, Go, Rust, or similar
- Background in CS, ML, or related field through formal education or equivalent experience
Ways to Stand Out
- Building RL environments or training recipes that other teams consumed as self-service capabilities
- Familiarity with NVIDIA infrastructure (DGX, AI Factory, NVLink/InfiniBand), NeMo Microservices, or the evolving RL-for-agents ecosystem (rLLM, Agent Lightning, HUD, OpenRLHF, SkyRL)
- Experience with data curation, active learning, continuous learning loops, or data flywheel architectures also valued
You will also be eligible for equity and benefits.