Description
We are looking for a Senior Scientist to join our team and help advance our capabilities in generating synthetic data and privacy-preserving AI.FieldValueYou will contribute to open-source libraries within the NVIDIA NeMo ecosystem that enable high-quality synthetic data generation and data privacy at scale, including context-aware anonymization. This role combines hands-on software engineering with applied research in LLMs and privacy-enhancing methods, and you will collaborate with research, engineering, product teams, and external labs.
Responsibilities:
- Build LLM-based methods for synthetic data generation, privacy, and context-aware anonymization, with automated evaluation across multilingual text, documents, and multimodal content.
- Optimize task-specific LLMs for low-latency, high-throughput inference (distillation, quantization), and scale our frameworks to run in real time.
- Design and maintain open-source libraries and SDKs with clean APIs and strong documentation.
- Drive software excellence with modern tooling, architecture based on configuration, and professional Git/CI-CD.
- Publish original research at top machine learning and AI conferences to maintain NVIDIA's technical leadership.
- Mentor interns and junior researchers to develop technical growth within the team.
Requirements:
- PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience.
- A research background of 2+ years in applied LLM/NLP research and engineering, synthetic data generation, anonymization and PII detection, or related areas. Comparable experience is also considered.
- Proven track record of developing or maintaining software libraries used by a broad developer community.
- Strong publication record at premier venues such as NeurIPS, ICML, ICLR, ACL or similar.
Nice to Have:
- Active contributions to open-source projects, particularly in ML, security, or privacy domains.
- Deep technical understanding of LLMs and inference optimization (quantization, distillation, latency/throughput tuning), with frameworks such as vLLM or TGI.
- Ability to build and optimize scalable data processing pipelines for large-scale models.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Scientist--Synthetic-Data-and-Privacy_JR2019462