# Researcher, Artifacts - Agent Post-Training

**Company**: OpenAI
**Location**: San Francisco
**Work arrangement**: hybrid
**Experience**: mid
**Job type**: Full time
**Salary**: $250K – $380K
**Category**: Engineering
**Industry**: Technology
**Wikidata**: https://www.wikidata.org/wiki/Q124605186

**Apply**: https://jobs.ashbyhq.com/openai/c701bf4a-3b17-4b14-895a-05f52be51cf8?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_d4091eed-9c5

## Description

As a member of Agent Post-Training, Artifacts, you will train frontier models to create polished, useful work products: documents, spreadsheets, slide decks, dashboards, reports, analyses, and other interactive or editable artifacts. You will help teach our models to move from a vague user goal to a finished artifact with strong structure, visual taste, domain judgment, correctness, and low latency. This work will require owning improvements across our post-training stack, including RL, data pipelines, graders, reward signals, evals, and behavioral analysis.

You will work with researchers, engineers, product teams, infrastructure teams, and safety/alignment partners to decide what should go into major model runs, measure whether it worked, and ship improvements into products used by real people. This is a high-agency role for people who want their work to land directly in frontier models.

In this role, you will:

- Design and run experiments that improve agentic model behavior for complex software and plugins.

- Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis.

- Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions.

- Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements.

- Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior.

- Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs.

- Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness.

- Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments.

- Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes.

You might thrive in this role if you:

- Have strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field, and can learn quickly across the parts you have not worked in before.

- Have hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems.

- Are excited by open-ended problems where the path is unclear, the signal is noisy, and the right answer requires both research taste and engineering execution.

- Care about product impact and model behavior, not just benchmark movement. You have opinions about what makes an agent useful, reliable, honest, tasteful, and easy to work with.

- Can move from a vague behavioral problem to a concrete experiment: define the hypothesis, build the pipeline, run the model, analyze the result, and decide what to do next.

- Are comfortable working across research, product, infrastructure, data, evals, and safety boundaries, and can communicate clearly with each group.

- Like building load-bearing systems and processes when that is what the team needs, even if the work is not glamorous.

- Want to train and ship the models that make agents genuinely useful for developers, enterprises, researchers, and everyday users.

- Have some prior background in consulting, finance, marketing, operations, or data science.

## Skills

### Required
- machine learning
- software engineering
- systems
- statistics
- LLMs
- RL
- RLHF/RLAIF
- post-training
- evals
- graders
- synthetic data
- model training
- coding agents
- tool-using agents
- production ML systems

---

Source: [Apply at jobs.ashbyhq.com](https://jobs.ashbyhq.com/openai/c701bf4a-3b17-4b14-895a-05f52be51cf8?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)