Description
We're looking to advance how OpenAI builds and understands pretraining data at scale. You'll treat data quality and curation as core research problems: developing new methods to select, combine, and transform data; creating datasets that improve model capabilities; and designing rigorous experiments to understand how data choices and interventions affect model learning and downstream behavior.
You'll work closely with frontier models and web-scale data to build evidence for which approaches work and why, then translate successful research into scalable data processing pipelines.
Requirements
- Have a strong track record of new or improved ML ideas, through publications, projects, or applied research.
- Own and drive a research agenda, from choosing the right problems to carrying long-running work through to impact.
- Be excited by OpenAI's empirical, collaborative approach to research.
Nice To Have
- Thoughtfulness about AI's impact, including privacy, provenance, and data quality.
- Experience building high-performance deep learning or large-scale data processing systems.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://jobs.ashbyhq.com/openai/f9731ef2-9b8a-49ec-95ca-ecef35fa996a