Description

We are seeking a Research Engineer/Research Scientist to join our Audio team. As a member of this team, you will work across the full stack of audio ML, developing audio codecs and representations, sourcing and synthesizing high-quality audio data, training large-scale speech language models and large audio diffusion models, and developing novel architectures for incorporating continuous signals into LLMs.

Our team focuses primarily but not exclusively on speech, building advanced steerable systems spanning end-to-end conversational systems, speech and audio understanding models, and speech synthesis capabilities. The team works closely with many collaborators across pretraining, finetuning, reinforcement learning, production inference, and product to get advanced audio technologies from early research to high-impact real-world deployments.

Responsibilities:

Develop and train audio models, including conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, and generative audio models
Work across abstraction levels, from signal processing fundamentals to large-scale model training and inference optimization
Collaborate with teams across the company to develop and deploy audio technologies
Communicate clearly and effectively with colleagues and stakeholders

Strong candidates may also have experience with:

Large language model pretraining and finetuning
Training diffusion models for image and audio generation
Reinforcement learning for large language models and diffusion models
End-to-end system optimization, from performance benchmarking to kernel optimization
GPUs, Kubernetes, PyTorch, or distributed training infrastructure

Representative projects:

Training state-of-the-art neural audio codecs for 48 kHz stereo audio
Developing novel algorithms for diffusion pretraining and reinforcement learning
Scaling audio datasets to millions of hours of high-quality audio
Creating robust evaluation methodologies for hard-to-measure qualities such as naturalness or expressiveness
Studying training dynamics of mixed audio-text language models
Optimizing latency and inference throughput for deployed streaming audio systems

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/anthropic/jobs/5074815008