Description

About the Role

We're building the foundation models that power the next wave of visual intelligence, and pretraining is where that work begins. This role sits at the center of our research effort, shaping training objectives, architectures, data strategies, and systems behind our joint image, video, and audio foundation models.

Responsibilities

Lead large-scale pretraining experiments for our multimodal (image, video, audio) foundation models (architecture, objective functions, scaling strategies)
Develop and evaluate novel ideas across architecture, optimizers, and training algorithms
Contribute across the full stack: low-level GPU and systems optimizations, research code, and high-level model design
Lead focused research projects independently and drive larger cross-team initiatives

Requirements

You've led or co-owned pretraining for a foundation model (image, video, LLM, or multimodal) that shipped to production or a major release
Own architectural calls that move the model: attention patterns, modulation schemes, loss formulations, tokenization strategies
Deep experience with large-scale distributed training: FSDP/TP/PP, multi-node runs at 500+ GPUs, debugging loss spikes, NaNs, throughput regressions, and silent correctness issues at scale
Strong intuition for architecture and objective design , you've made calls on attention patterns, modulation schemes, or loss formulations that moved a real model
Track record of shipping: top-venue publications (NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV) paired with production impact, or unambiguous production wins at a frontier lab
Deep Python and PyTorch proficiency; comfortable reading and modifying low-level training code
Familiarity with visual generative models is a must

How We Work Together

We’re a distributed team with real offices that people actually use. Depending on your role, you’ll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We’ll cover reasonable travel costs to make this possible. We think in-person time matters, and we’ve structured things to make it accessible to all. We’ll discuss what this will look like for the role during our interview process.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/blackforestlabs/jobs/5193508008