# Senior ML Evaluation Engineer - Autonomous Vehicles

**Company**: NVIDIA
**Location**: Santa Clara
**Work arrangement**: remote
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-ML-Evaluation-Engineer---Autonomous-Vehicles_JR2015392?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_7dffc454-5ec

## Description

Join었 NVIDIA's AV Eval team in building the next generation of driving behaviour evaluation, moving beyond hand-crafted rules to learned evaluation using LLMs, VLMs, and agentic workflows.

As a Senior ML Evaluation Engineer, you will define how we measure whether an autonomous vehicle drives well, building systems that bridge ML research and production evaluation. You will ship systems that run at scale on real-world driving data and produce metrics that block or green-light software releases.

In this role, you will get to work on next-gen AV evaluation and create a direct impact on vehicle safety and shipping decisions. Join a new team being built from scratch , high ownership, high visibility to NVIDIA AV leadership.

Responsibilities:

- Design and build learned evaluation pipelines that assess driving behaviour using LLMs, VLMs, and multimodal models

- Develop agentic workflows that chain model inference, retrieval, and structured reasoning to evaluate complex driving scenarios

- Define evaluation-of-evaluation methodology , how do we know our learned evaluators are correct?

- Build golden-set frameworks and calibration loops for learned metrics

- Partner with AML (Alpamayo Logos) teams on model-specific eval needs (e.g., COT prediction quality, AML regression coverage)

- Instrument evaluation systems with robust experiment tracking, A/B comparison tooling, and model versioning

- Contribute to the team's transition from rule-based to learned evaluation: identify metrics and analyzers that are candidates for ML replacement and build the alternatives

Requirements:

- PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field

- Hands-on experience building LLM/VLM-based pipelines , fine-tuning, prompt engineering, retrieval-augmented generation, chain-of-thought

- Track record of shipping ML systems to production (not just prototyping or publishing)

- Strong software engineering fundamentals , you write clean, tested, reviewable code in Python and C++

- Experience with evaluation methodology: precision/recall, inter-rater reliability, calibration, annotation pipelines

- Comfort with large-scale data processing (Spark, Dask, or similar)

- Strong Python skills. Experience with PyTorch or JAX. Comfortable with GPU-based training workflows.

Ways to stand out from the crowd:

- Autonomous driving, robotics, or safety-critical domain experience

- Familiarity with driving behaviour taxonomies (cut-ins, hard braking events, lane-keeping metrics, scenario-based evaluation)

- Experience with video understanding models or multi-modal evaluation. Knowledge of agentic AI frameworks (LangChain, DSPy, CrewAI, or custom)

- Track record of influencing technical direction across team boundaries

- Experience with LLM/VLM fine-tuning or application development

## Skills

### Required
- LLMs
- VLMs
- multimodal models
- agentic workflows
- evaluation methodology
- precision/recall
- inter-rater reliability
- calibration
- annotation pipelines
- large-scale data processing
- Spark
- Dask
- Python
- PyTorch
- JAX
- GPU-based training workflows

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-ML-Evaluation-Engineer---Autonomous-Vehicles_JR2015392?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)