Description
The Crowd Intelligence Platform (CIP) team, part of the MAI platform organization, is building an AI-driven testing platform that ensures high quality for Microsoft AI products such as Bing, Copilot, MSN, and Edge,both pre-release and in production. The platform leverages Agentic AI to perform scalable, intelligent testing across diverse product surfaces.
We are looking for a Senior Applied Scientist to help design, build, and scale AI-powered testing systems that can be applied generically across a wide range of cutting-edge MAI products. This role offers a unique opportunity to apply AI at scale to real-world product quality challenges and directly influence how MAI products are tested and improved.
Responsibilities: Build and scale AI-driven testing capabilities using LLMs, prompts, and agent-based workflows to validate MAI products across scenarios, geographies, and product surfaces. Design and optimize prompts, models, and agent behaviors to perform functional, quality, and experience-focused testing at scale. Collaborate closely with product and engineering teams across MAI and beyond to understand testing needs and translate them into efficient, AI-powered testing workflows. Develop metrics and evaluation frameworks to measure test quality, coverage, effectiveness, and signal accuracy across AI-driven testing pipelines. Create actionable outputs and insights (issues, summaries, trends, and recommendations) that product owners can directly consume to fix defects and improve product quality. Continuously evolve the platform toward more autonomous and agentic workflows, reducing manual effort while increasing depth and reliability of testing. Partner with engineers and platform teams to operationalize data science solutions in production, ensuring scalability, reliability, and performance.
Qualifications: Bachelor’s Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 7+ years related experience (e.g., statistics predictive analytics, research). OR Master’s Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 5+ years related experience (e.g., statistics, predictive analytics, research). OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ year(s) related experience (e.g., statistics, predictive analytics, research). OR equivalent experience. 4+ years of solid experience in Data Science, Applied AI, or Machine Learning, with a track record of building solutions that operate at scale. Hands-on experience with LLMs, prompt engineering, and/or agentic AI systems. Solid foundation in statistics, experimentation, and metrics design, especially for evaluating AI system quality. Experience working with data pipelines, model evaluation, and production systems. Ability to work across multiple product teams, influence without authority, and translate ambiguous testing needs into concrete AI solutions. Solid communication skills to explain complex AI outputs clearly to engineering and product stakeholders.
Preferred Qualifications: 3+ years experience creating publications (e.g., patents, libraries, peer-reviewed academic papers). 3+ year(s) experience developing and deploying live production systems, as part of a product team. 3+ year(s) experience developing and deploying products or systems at multiple points in the product cycle from ideation to shipping.