# Senior Scientist, Synthetic Data and Privacy

**Company**: NVIDIA
**Location**: Santa Clara, CA
**Work arrangement**: remote
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Scientist--Synthetic-Data-and-Privacy_JR2019462?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_f4e85e2b-1fd

## Description

We are looking for a Senior Scientist to join our team and help advance our capabilities in generating synthetic data and privacy-preserving AI.FieldValueYou will contribute to open-source libraries within the NVIDIA NeMo ecosystem that enable high-quality synthetic data generation and data privacy at scale, including context-aware anonymization. This role combines hands-on software engineering with applied research in LLMs and privacy-enhancing methods, and you will collaborate with research, engineering, product teams, and external labs.

**Responsibilities:**

- Build LLM-based methods for synthetic data generation, privacy, and context-aware anonymization, with automated evaluation across multilingual text, documents, and multimodal content.

- Optimize task-specific LLMs for low-latency, high-throughput inference (distillation, quantization), and scale our frameworks to run in real time.

- Design and maintain open-source libraries and SDKs with clean APIs and strong documentation.

- Drive software excellence with modern tooling, architecture based on configuration, and professional Git/CI-CD.

- Publish original research at top machine learning and AI conferences to maintain NVIDIA's technical leadership.

- Mentor interns and junior researchers to develop technical growth within the team.

**Requirements:**

- PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience.

- A research background of 2+ years in applied LLM/NLP research and engineering, synthetic data generation, anonymization and PII detection, or related areas. Comparable experience is also considered.

- Proven track record of developing or maintaining software libraries used by a broad developer community.

- Strong publication record at premier venues such as NeurIPS, ICML, ICLR, ACL or similar.

**Nice to Have:**

- Active contributions to open-source projects, particularly in ML, security, or privacy domains.

- Deep technical understanding of LLMs and inference optimization (quantization, distillation, latency/throughput tuning), with frameworks such as vLLM or TGI.

- Ability to build and optimize scalable data processing pipelines for large-scale models.

## Skills

### Required
- Synthetic data generation
- Privacy-preserving AI
- LLMs
- Machine learning
- Deep learning
- Natural language processing
- Software engineering
- Open-source development
- Research
- Publication

### Nice to have
- Quantization
- Distillation
- Latency/throughput tuning
- vLLM
- TGI
- Scalable data processing pipelines

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Scientist--Synthetic-Data-and-Privacy_JR2019462?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
