Description

As a Senior Data Scientist for LLM Evaluation, you will develop and implement cutting-edge methodologies to help evaluate how well Copilot performs in real-world usage scenarios.

Users turn to Copilot for various tasks, making it crucial to ensure our AI systems effectively assist them.

Your responsibilities will include:

Developing new methods to evaluate LLMs, train classifiers, and experiment with data collection techniques
Implementing methodologies to provide real-time signals on Copilot performance
Collaborating with user researchers and product leaders to build automated evaluation frameworks

The ideal candidate will have experience in social sciences, machine learning, and natural language analysis, with strong problem-solving skills and the ability to work independently.

Responsibilities

Leverage expertise to measure Copilot performance, identify failure modes, and develop mitigation strategies
Create and implement comprehensive evaluation frameworks across diverse scenarios
Build automated testing systems and write efficient code for model pipelines
Maintain a user-oriented perspective and serve as a trusted advisor on AI matters
Track advances in research and adapt algorithms to drive innovation

Qualifications

Doctorate or Master's degree in Data Science, Mathematics, Statistics, or related field with relevant experience
Experience with large language models, Python programming, and Responsible AI

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://microsoft.ai/job/senior-data-scientist-15/