Description
Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
We're seeking a Research Engineer to join our Pre-training team, responsible for developing the next generation of large language models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems.
Key Responsibilities:
- Conduct research and implement solutions in areas such as model architecture, algorithms, data processing, and optimizer development
- Independently lead small research projects while collaborating with team members on larger initiatives
- Design, run, and analyze scientific experiments to advance our understanding of large language models
- Optimize and scale our training infrastructure to improve efficiency and reliability
- Develop and improve dev tooling to enhance team productivity
- Contribute to the entire stack, from low-level optimizations to high-level model design
Qualifications:
- Advanced degree (MS or PhD) in Computer Science, Machine Learning, or a related field
- Strong software engineering skills with a proven track record of building complex systems
- Expertise in Python and experience with deep learning frameworks (PyTorch preferred)
- Familiarity with large-scale machine learning, particularly in the context of language models
- Ability to balance research goals with practical engineering constraints
- Strong problem-solving skills and a results-oriented mindset
- Excellent communication skills and ability to work in a collaborative environment
- Care about the societal impacts of your work
Preferred Experience:
- Work on high-performance, large-scale ML systems
- Familiarity with GPUs, Kubernetes, and OS internals
- Experience with language modeling using transformer architectures
- Knowledge of reinforcement learning techniques
- Background in large-scale ETL processes
Sample Projects:
- Optimizing the throughput of novel attention mechanisms
- Comparing compute efficiency of different Transformer variants
- Preparing large-scale datasets for efficient model consumption
- Scaling distributed training jobs to thousands of GPUs
- Designing fault tolerance strategies for our training infrastructure
- Creating interactive visualizations of model internals, such as attention patterns
Logistics:
- Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
- Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
- Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position
- Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
- Visa sponsorship: We do sponsor visas!
Compensation:
- Annual Salary: $350,000-$850,000 USD