Description

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale , unleashing the potential of businesses and people.

The Search Conversational Experiences team builds Elastic’s new conversational and agentic platform that lets customers chat with their own data in Elasticsearch.

As a Principal Data Scientist, you will help set the technical direction for how we evaluate, improve, and scale chat quality across Elastic’s agentic platform.

Responsibilities

Define the evaluation strategy for conversational and agentic search, including offline and online evaluation, golden datasets, rubrics, LLM-as-judge calibration, groundedness and citation checks, and A/B testing.
Lead the design of quality metrics and decision frameworks for RAG, agents, tools, model selection, agent routing, prompt behavior, and cost/latency trade-offs.
Build, compare, and guide improvements across retrieval and re-ranking approaches, including sparse and dense retrieval, vector search, query understanding, semantic rewrites, and context enrichment.
Turn experimental results into product and business decisions: which models to use, how to route requests efficiently, which tools should be exposed, and how agents should be customized for different Elastic use cases.
Partner with engineering to productionize evaluation pipelines, telemetry, dashboards, CI guardrails, and regression detection for chat quality, helpfulness, dedication, latency, and cost.
Influence the roadmap by identifying the highest-leverage quality gaps, proposing practical solutions, and communicating trade-offs clearly to product, engineering, and leadership.
Mentor other data scientists and engineers in experiment design, evaluation methodology, statistical rigor, and practical approaches to improving LLM-powered systems.
Share outcomes through clear docs, notebooks, PRs, dashboards, technical proposals, and cross-functional reviews.

Requirements

8+ years of applied DS/ML experience, with deep expertise in IR, NLP, ranking, semantic search, RAG, or LLM-powered product experiences.
Strong track record defining and leading evaluation for production AI/ML systems, including offline metrics, online experimentation, LLM-as-judge approaches, groundedness, citation quality, and model comparison.
Experience influencing product and technical strategy through data, especially in ambiguous or emerging domains where the “right” metric or approach is not obvious at the start.
Hands-on ability with Python, PyTorch/Transformers, Pandas, notebooks, reproducible experiments, versioned datasets, and clean, reviewable code.
Strong understanding of retrieval systems, including dense and sparse retrieval, re-ranking, vector search, query understanding, and evaluation metrics such as nDCG, MRR, Recall@k, precision,and latency/cost trade-offs.
Experience collaborating closely with engineering teams to move from prototype to production, including telemetry design, dashboards, CI guardrails, and quality regression tracking.
Practical Elasticsearch experience, or experience with similar search and distributed data systems. ES|QL familiarity is a plus.
Excellent written and verbal communication, with the ability to explain complex scientific and technical trade-offs to engineering, product, design, and leadership audiences.
A collaborative, low-ego style and a strong ability to mentor, raise standards, and develop transparency for others in a distributed team.

Benefits

Competitive pay based on the work you do here and not your previous salary
Health coverage for you and your family in many locations
Ability to craft your calendar with flexible locations and schedules for many roles
Generous number of vacation days each year
Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service
Up to 40 hours each year to use toward volunteer projects you love
Embracing parenthood with minimum of 16 weeks of parental leave

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/elastic/jobs/8008502