Description
About This Role
We're seeking a Member of Technical Staff to pioneer the integration of vision-language models (VLMs) into our FLUX stack. As a key member of our team, you'll develop novel approaches, innovate on architectures, and answer questions that haven't been solved yet.
What You'll Work On
- Lead development and training of state-of-the-art multimodal vision-language models within the FLUX stack , innovating on architectures, not just applying existing ones
- Design fine-tuning strategies that adapt VLMs to specialized creative use cases (captioning, editing instructions, prompt enhancement) that general-purpose models can't handle
- Research integrations between VLM/LLM capabilities and our diffusion and flow pipelines , finding creative ways to improve generation quality and controllability without computational bottlenecks
- Evaluate emerging multimodal architectures, translating the best of recent research into practical improvements
What We're Looking For
- You've pretrained or significantly advanced a VLM (not just SFT'd or LoRA'd one) that was deployed in a production system or released publicly
- Strong publication record or unambiguous production track record showing you push the frontier on multimodal architectures
- Deep understanding of how vision and language representations interact: tokenization, alignment, grounding, cross-modal attention, and the failure modes of each
- Experience with distributed training at multi-node scale
- Comfortable at the research/production boundary , you care whether the work ships and generalizes, not just whether it reads well
- Experience with diffusion or flow-based generative models is a strong plus , especially if you've thought about how autoregressive and diffusion paradigms can compose
How We Work Together
We’re a distributed team with real offices that people actually use. Depending on your role, you’ll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We’ll cover reasonable travel costs to make this possible. We think in-person time matters, and we’ve structured things to make it accessible to all.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://job-boards.greenhouse.io/blackforestlabs/jobs/5193513008