Description

We are looking for a Senior Scientist to join our team and help advance our capabilities in generating synthetic data and privacy-preserving AI.FieldValueYou will contribute to open-source libraries within the NVIDIA NeMo ecosystem that enable high-quality synthetic data generation and data privacy at scale, including context-aware anonymization. This role combines hands-on software engineering with applied research in LLMs and privacy-enhancing methods, and you will collaborate with research, engineering, product teams, and external labs.

Responsibilities:

Build LLM-based methods for synthetic data generation, privacy, and context-aware anonymization, with automated evaluation across multilingual text, documents, and multimodal content.
Optimize task-specific LLMs for low-latency, high-throughput inference (distillation, quantization), and scale our frameworks to run in real time.
Design and maintain open-source libraries and SDKs with clean APIs and strong documentation.
Drive software excellence with modern tooling, architecture based on configuration, and professional Git/CI-CD.
Publish original research at top machine learning and AI conferences to maintain NVIDIA's technical leadership.
Mentor interns and junior researchers to develop technical growth within the team.

Requirements:

PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience.
A research background of 2+ years in applied LLM/NLP research and engineering, synthetic data generation, anonymization and PII detection, or related areas. Comparable experience is also considered.
Proven track record of developing or maintaining software libraries used by a broad developer community.
Strong publication record at premier venues such as NeurIPS, ICML, ICLR, ACL or similar.

Nice to Have:

Active contributions to open-source projects, particularly in ML, security, or privacy domains.
Deep technical understanding of LLMs and inference optimization (quantization, distillation, latency/throughput tuning), with frameworks such as vLLM or TGI.
Ability to build and optimize scalable data processing pipelines for large-scale models.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Scientist--Synthetic-Data-and-Privacy_JR2019462