Description
Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale , unleashing the potential of businesses and people.
The Search Conversational Experiences team builds Elastic’s new conversational and agentic platform that lets customers chat with their own data in Elasticsearch.
As a Principal Data Scientist, you will help set the technical direction for how we evaluate, improve, and scale chat quality across Elastic’s agentic platform.
Responsibilities
- Define the evaluation strategy for conversational and agentic search, including offline and online evaluation, golden datasets, rubrics, LLM-as-judge calibration, groundedness and citation checks, and A/B testing.
- Lead the design of quality metrics and decision frameworks for RAG, agents, tools, model selection, agent routing, prompt behavior, and cost/latency trade-offs.
- Build, compare, and guide improvements across retrieval and re-ranking approaches, including sparse and dense retrieval, vector search, query understanding, semantic rewrites, and context enrichment.
- Turn experimental results into product and business decisions: which models to use, how to route requests efficiently, which tools should be exposed, and how agents should be customized for different Elastic use cases.
- Partner with engineering to productionize evaluation pipelines, telemetry, dashboards, CI guardrails, and regression detection for chat quality, helpfulness, dedication, latency, and cost.
- Influence the roadmap by identifying the highest-leverage quality gaps, proposing practical solutions, and communicating trade-offs clearly to product, engineering, and leadership.
- Mentor other data scientists and engineers in experiment design, evaluation methodology, statistical rigor, and practical approaches to improving LLM-powered systems.
- Share outcomes through clear docs, notebooks, PRs, dashboards, technical proposals, and cross-functional reviews.
Requirements
- 8+ years of applied DS/ML experience, with deep expertise in IR, NLP, ranking, semantic search, RAG, or LLM-powered product experiences.
- Strong track record defining and leading evaluation for production AI/ML systems, including offline metrics, online experimentation, LLM-as-judge approaches, groundedness, citation quality, and model comparison.
- Experience influencing product and technical strategy through data, especially in ambiguous or emerging domains where the “right” metric or approach is not obvious at the start.
- Hands-on ability with Python, PyTorch/Transformers, Pandas, notebooks, reproducible experiments, versioned datasets, and clean, reviewable code.
- Strong understanding of retrieval systems, including dense and sparse retrieval, re-ranking, vector search, query understanding, and evaluation metrics such as nDCG, MRR, Recall@k, precision,and latency/cost trade-offs.
- Experience collaborating closely with engineering teams to move from prototype to production, including telemetry design, dashboards, CI guardrails, and quality regression tracking.
- Practical Elasticsearch experience, or experience with similar search and distributed data systems. ES|QL familiarity is a plus.
- Excellent written and verbal communication, with the ability to explain complex scientific and technical trade-offs to engineering, product, design, and leadership audiences.
- A collaborative, low-ego style and a strong ability to mentor, raise standards, and develop transparency for others in a distributed team.
Benefits
- Competitive pay based on the work you do here and not your previous salary
- Health coverage for you and your family in many locations
- Ability to craft your calendar with flexible locations and schedules for many roles
- Generous number of vacation days each year
- Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service
- Up to 40 hours each year to use toward volunteer projects you love
- Embracing parenthood with minimum of 16 weeks of parental leave
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://job-boards.greenhouse.io/elastic/jobs/8008502