Description

You will contribute to exploratory experimental research on AI safety, with a focus on risks from powerful future systems. Your work will involve building and running elegant and thorough machine learning experiments to help us understand and steer the behavior of powerful AI systems.

As a Research Engineer on Alignment Science, you'll collaborate with other teams including Interpretability, Fine-Tuning, and the Frontier Red Team. Your responsibilities will include testing the robustness of our safety techniques, running multi-agent reinforcement learning experiments, building tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks, and contributing ideas, figures, and writing to research papers, blog posts, and talks.

You may be a good fit if you have significant software, ML, or research engineering experience, have some experience contributing to empirical AI research projects, and have some familiarity with technical AI safety research. Strong candidates may also have experience authoring research papers in machine learning, NLP, or AI safety, have experience with LLMs, have experience with reinforcement learning, and have experience with Kubernetes clusters and complex shared codebases.

The annual compensation range for this role is $350,000-$500,000 USD.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/anthropic/jobs/4631822008