Description
Join ZoomInfo's mission to build the next-generation go-to-market platform.
As a Senior Data Scientist on our Foundation Data team, you'll be the end-to-end owner of critical projects that enhance the quality and reliability of our core datasets.
You'll work at the intersection of cutting-edge AI and massive-scale data processing to solve complex entity resolution challenges that directly impact millions of sales and marketing professionals worldwide.
You will own core retrieval, NER, and aligned entity-resolution & knowledge-graph initiatives that touch billions of records and serve millions of daily queries.
Your responsibilities will include:
- Inventing and productionizing Transformer/RAG/Graph RAG architectures that surface the right contact, company, or insight while driving quantization, distillation, and SLM fine-tuning (GTE-Qwen, modernBERT) so models stay fast and affordable at petabyte scale
- Prototyping and launching hybrid dense/sparse retrieval pipelines on vector DBs to build language-agnostic clustering and classification systems that power our intelligence layer
- Owning high-recall NER models that tag people, orgs, locations, and industry-specific entities across multi-language text, extracting structured insights from web data to improve our signal detection capabilities
- Building cross-dataset entity-resolution frameworks that dedupe and merge hundreds of millions of fragmented company and person records with sub-second latency, creating enriched, unified entities enhanced with knowledge-graph signals
- Designing and implementing agentic workflows with robust evaluation frameworks focused on NER and entity resolution tasks, including large-scale A/B and back-testing plans that close the loop from experiment to KPI uplift
- Scaling ML solutions and driving cross-functional impact by partnering with ML engineers to ensure production reliability, translating product goals into measurable ML KPIs, and influencing roadmap and investment decisions while mentoring junior scientists and engineers
- Driving end-to-end project ownership from problem definition through deployment, collaborating closely with engineering and product teams to understand business requirements and translate them into scalable ML solutions that enhance foundation data quality across company firmographics, professional demographics, C-suite profiles, and web-extracted signals
We're looking for a Senior Data Scientist with 6+ years hands-on ML/NLP experience and deep expertise in modern AI architectures including transformer stacks (BERT/GPT/T5), RAG systems, vector-based information retrieval, and latency/throughput optimization techniques.
You should have a proven track record building NER or entity-resolution systems at 100M+ record scale with experience in record linkage, data deduplication, and knowledge-graph integration.
Strong applied research capabilities (PyTorch or TensorFlow) paired with software-engineering rigor (Python, Go/Java a plus) and familiarity with embedding models and vector search technologies are required.
Executive communication skills with ability to persuade technical and non-technical audiences through data-driven storytelling, comfortable owning strategy, budget, and cross-functional collaboration are essential.
Actual compensation offered will be based on factors such as the candidate’s work location, qualifications, skills, experience and/or training.
In addition to comprehensive benefits, we offer holistic mind, body and lifestyle programs designed for overall well-being.
Learn more about ZoomInfo benefits here.
US base salary for this position is $164,500-$258,500 USD. Additional compensation such as Bonus, Commission, Equity and other benefits may also apply.