# Member of Technical Staff - VLM

**Company**: Black Forest Labs
**Location**: Freiburg (Germany)
**Work arrangement**: hybrid
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://job-boards.greenhouse.io/blackforestlabs/jobs/5193513008?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_556f5f38-c43

## Description

### About This Role

We're seeking a Member of Technical Staff to pioneer the integration of vision-language models (VLMs) into our FLUX stack. As a key member of our team, you'll develop novel approaches, innovate on architectures, and answer questions that haven't been solved yet.

### What You'll Work On

- Lead development and training of state-of-the-art multimodal vision-language models within the FLUX stack , innovating on architectures, not just applying existing ones

- Design fine-tuning strategies that adapt VLMs to specialized creative use cases (captioning, editing instructions, prompt enhancement) that general-purpose models can't handle

- Research integrations between VLM/LLM capabilities and our diffusion and flow pipelines , finding creative ways to improve generation quality and controllability without computational bottlenecks

- Evaluate emerging multimodal architectures, translating the best of recent research into practical improvements

### What We're Looking For

- You've pretrained or significantly advanced a VLM (not just SFT'd or LoRA'd one) that was deployed in a production system or released publicly

- Strong publication record or unambiguous production track record showing you push the frontier on multimodal architectures

- Deep understanding of how vision and language representations interact: tokenization, alignment, grounding, cross-modal attention, and the failure modes of each

- Experience with distributed training at multi-node scale

- Comfortable at the research/production boundary , you care whether the work ships and generalizes, not just whether it reads well

- Experience with diffusion or flow-based generative models is a strong plus , especially if you've thought about how autoregressive and diffusion paradigms can compose

### How We Work Together

We’re a distributed team with real offices that people actually use. Depending on your role, you’ll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We’ll cover reasonable travel costs to make this possible. We think in-person time matters, and we’ve structured things to make it accessible to all.

## Skills

### Required
- Pretrained or significantly advanced a VLM
- Multimodal vision-language models
- Fine-tuning strategies
- Diffusion and flow pipelines
- Emerging multimodal architectures

### Nice to have
- Distributed training at multi-node scale
- Research/production boundary
- Diffusion or flow-based generative models

---

Source: [Apply at job-boards.greenhouse.io](https://job-boards.greenhouse.io/blackforestlabs/jobs/5193513008?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)