# Principal Data Scientist - Agent Builder

**Company**: Elastic
**Location**: Greece
**Experience**: senior
**Job type**: full-time
**Salary**: €73.300-€115.900 EUR
**Category**: Engineering
**Industry**: Technology

**Apply**: https://job-boards.greenhouse.io/elastic/jobs/8008502?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_d6cd4870-db2

## Description

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale , unleashing the potential of businesses and people.

The Search Conversational Experiences team builds Elastic’s new conversational and agentic platform that lets customers chat with their own data in Elasticsearch.

As a Principal Data Scientist, you will help set the technical direction for how we evaluate, improve, and scale chat quality across Elastic’s agentic platform.

## Responsibilities

- Define the evaluation strategy for conversational and agentic search, including offline and online evaluation, golden datasets, rubrics, LLM-as-judge calibration, groundedness and citation checks, and A/B testing.

- Lead the design of quality metrics and decision frameworks for RAG, agents, tools, model selection, agent routing, prompt behavior, and cost/latency trade-offs.

- Build, compare, and guide improvements across retrieval and re-ranking approaches, including sparse and dense retrieval, vector search, query understanding, semantic rewrites, and context enrichment.

- Turn experimental results into product and business decisions: which models to use, how to route requests efficiently, which tools should be exposed, and how agents should be customized for different Elastic use cases.

- Partner with engineering to productionize evaluation pipelines, telemetry, dashboards, CI guardrails, and regression detection for chat quality, helpfulness, dedication, latency, and cost.

- Influence the roadmap by identifying the highest-leverage quality gaps, proposing practical solutions, and communicating trade-offs clearly to product, engineering, and leadership.

- Mentor other data scientists and engineers in experiment design, evaluation methodology, statistical rigor, and practical approaches to improving LLM-powered systems.

- Share outcomes through clear docs, notebooks, PRs, dashboards, technical proposals, and cross-functional reviews.

## Requirements

- 8+ years of applied DS/ML experience, with deep expertise in IR, NLP, ranking, semantic search, RAG, or LLM-powered product experiences.

- Strong track record defining and leading evaluation for production AI/ML systems, including offline metrics, online experimentation, LLM-as-judge approaches, groundedness, citation quality, and model comparison.

- Experience influencing product and technical strategy through data, especially in ambiguous or emerging domains where the “right” metric or approach is not obvious at the start.

- Hands-on ability with Python, PyTorch/Transformers, Pandas, notebooks, reproducible experiments, versioned datasets, and clean, reviewable code.

- Strong understanding of retrieval systems, including dense and sparse retrieval, re-ranking, vector search, query understanding, and evaluation metrics such as nDCG, MRR, Recall@k, precision,and latency/cost trade-offs.

- Experience collaborating closely with engineering teams to move from prototype to production, including telemetry design, dashboards, CI guardrails, and quality regression tracking.

- Practical Elasticsearch experience, or experience with similar search and distributed data systems. ES|QL familiarity is a plus.

- Excellent written and verbal communication, with the ability to explain complex scientific and technical trade-offs to engineering, product, design, and leadership audiences.

- A collaborative, low-ego style and a strong ability to mentor, raise standards, and develop transparency for others in a distributed team.

## Benefits

- Competitive pay based on the work you do here and not your previous salary

- Health coverage for you and your family in many locations

- Ability to craft your calendar with flexible locations and schedules for many roles

- Generous number of vacation days each year

- Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service

- Up to 40 hours each year to use toward volunteer projects you love

- Embracing parenthood with minimum of 16 weeks of parental leave

## Skills

### Required
- Python
- PyTorch/Transformers
- Pandas
- Elasticsearch
- IR
- NLP
- ranking
- semantic search
- RAG
- LLM-powered product experiences

### Nice to have
- ES|QL

---

Source: [Apply at job-boards.greenhouse.io](https://job-boards.greenhouse.io/elastic/jobs/8008502?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)