# Software Engineer, Research Data Platform

**Company**: Anthropic
**Location**: San Francisco, CA | New York City, NY
**Work arrangement**: hybrid
**Experience**: mid
**Job type**: full-time
**Salary**: $320,000-$405,000 USD
**Category**: Engineering
**Industry**: Technology
**Wikidata**: https://www.wikidata.org/wiki/Q116758847

**Apply**: https://job-boards.greenhouse.io/anthropic/jobs/5191226008
**Canonical**: https://yubhub.co/jobs/job_8f03ad2d-96f

## Description

We're looking for engineers who love working directly with users and who excel at building data products. The Research Data Platform team builds the tools that Anthropic's researchers use every day to manage, query, and analyze the data that goes into training and evaluating frontier models.

As a Software Engineer on the Research Data Platform team, you will:

- Build and operate data pipelines that extract data from research training runs and land it in storage systems that are easy and fast to query

- Work closely with researchers to design and build APIs, libraries, and web interfaces that support data management, exploration, and analysis

- Develop dataset management, data cataloging, and provenance tooling that researchers use in their day-to-day work

- Embed with research teams to understand their workflows, identify high-leverage tooling opportunities, and ship solutions quickly

- Collaborate with adjacent teams to build on existing systems rather than reinventing them

We do not require prior ML or AI training experience. If you enjoy working closely with technical users, learning new domains quickly, and building tools people actually want to use, you'll pick up the research context fast.

Strong candidates may also have experience with large-scale ETL, columnar storage formats, and query engines (e.g., Spark, BigQuery, DuckDB, Parquet), high-volume time series data , ingestion, storage, and efficient querying, data cataloging, lineage, or metadata management systems, or ML experiment tracking or metrics platforms.

## Skills

### Required
- large-scale ETL
- columnar storage formats
- query engines
- high-volume time series data
- data cataloging
- lineage
- metadata management systems
- ML experiment tracking

### Nice to have
- Spark
- BigQuery
- DuckDB
- Parquet
