# Machine Learning Systems Engineer, Research Tools

**Company**: Anthropic
**Location**: San Francisco, CA | New York City, NY | Seattle, WA
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Salary**: $320,000-$405,000 USD
**Category**: Engineering
**Industry**: Technology
**Wikidata**: https://www.wikidata.org/wiki/Q116758847

**Apply**: https://job-boards.greenhouse.io/anthropic/jobs/4952079008?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_3b359ef2-6f8

## Description

We are seeking an experienced Machine Learning Systems Engineer to join our Encodings and Tokenization team at Anthropic. This cross-functional role will be instrumental in developing and optimizing the encodings and tokenization systems used throughout our Finetuning workflows. As a bridge between our Pretraining and Finetuning teams, you'll build critical infrastructure that directly impacts how our models learn from and interpret data.

Responsibilities:

- Design, develop, and maintain tokenization systems used across Pretraining and Finetuning workflows

- Optimize encoding techniques to improve model training efficiency and performance

- Collaborate closely with research teams to understand their evolving needs around data representation

- Build infrastructure that enables researchers to experiment with novel tokenization approaches

- Implement systems for monitoring and debugging tokenization-related issues in the model training pipeline

- Create robust testing frameworks to validate tokenization systems across diverse languages and data types

- Identify and address bottlenecks in data processing pipelines related to tokenization

- Document systems thoroughly and communicate technical decisions clearly to stakeholders across teams

You May Be a Good Fit If You:

- Have significant software engineering experience with demonstrated machine learning expertise

- Are comfortable navigating ambiguity and developing solutions in rapidly evolving research environments

- Can work independently while maintaining strong collaboration with cross-functional teams

- Are results-oriented, with a bias towards flexibility and impact

- Have experience with machine learning systems, data pipelines, or ML infrastructure

- Are proficient in Python and familiar with modern ML development practices

- Have strong analytical skills and can evaluate the impact of engineering changes on research outcomes

- Pick up slack, even if it goes outside your job description

- Enjoy pair programming (we love to pair!)

- Care about the societal impacts of your work and are committed to developing AI responsibly

Strong Candidates May Also Have Experience With:

- Working with machine learning data processing pipelines

- Building or optimizing data encodings for ML applications

- Implementing or working with BPE, WordPiece, or other tokenization algorithms

- Performance optimization of ML data processing systems

- Multi-language tokenization challenges and solutions

- Research environments where engineering directly enables scientific progress

- Distributed systems and parallel computing for ML workflows

- Large language models or other transformer-based architectures (not required)

The annual compensation range for this role is $320,000-$405,000 USD.

## Skills

### Required
- Machine Learning
- Software Engineering
- Python
- Data Pipelines
- ML Infrastructure

### Nice to have
- BPE
- WordPiece
- Tokenization Algorithms
- Performance Optimization
- Distributed Systems

---

Source: [Apply at job-boards.greenhouse.io](https://job-boards.greenhouse.io/anthropic/jobs/4952079008?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)