Description
We are seeking an experienced Machine Learning Systems Engineer to join our Encodings and Tokenization team at Anthropic. This cross-functional role will be instrumental in developing and optimizing the encodings and tokenization systems used throughout our Finetuning workflows. As a bridge between our Pretraining and Finetuning teams, you'll build critical infrastructure that directly impacts how our models learn from and interpret data.
Responsibilities:
- Design, develop, and maintain tokenization systems used across Pretraining and Finetuning workflows
- Optimize encoding techniques to improve model training efficiency and performance
- Collaborate closely with research teams to understand their evolving needs around data representation
- Build infrastructure that enables researchers to experiment with novel tokenization approaches
- Implement systems for monitoring and debugging tokenization-related issues in the model training pipeline
- Create robust testing frameworks to validate tokenization systems across diverse languages and data types
- Identify and address bottlenecks in data processing pipelines related to tokenization
- Document systems thoroughly and communicate technical decisions clearly to stakeholders across teams
You May Be a Good Fit If You:
- Have significant software engineering experience with demonstrated machine learning expertise
- Are comfortable navigating ambiguity and developing solutions in rapidly evolving research environments
- Can work independently while maintaining strong collaboration with cross-functional teams
- Are results-oriented, with a bias towards flexibility and impact
- Have experience with machine learning systems, data pipelines, or ML infrastructure
- Are proficient in Python and familiar with modern ML development practices
- Have strong analytical skills and can evaluate the impact of engineering changes on research outcomes
- Pick up slack, even if it goes outside your job description
- Enjoy pair programming (we love to pair!)
- Care about the societal impacts of your work and are committed to developing AI responsibly
Strong Candidates May Also Have Experience With:
- Working with machine learning data processing pipelines
- Building or optimizing data encodings for ML applications
- Implementing or working with BPE, WordPiece, or other tokenization algorithms
- Performance optimization of ML data processing systems
- Multi-language tokenization challenges and solutions
- Research environments where engineering directly enables scientific progress
- Distributed systems and parallel computing for ML workflows
- Large language models or other transformer-based architectures (not required)
The annual compensation range for this role is $320,000-$405,000 USD.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://job-boards.greenhouse.io/anthropic/jobs/4952079008