World Labs

Research Engineer (Scaling Multimodal Data)

World Labs
onsite senior full-time San Francisco
Apply →

First indexed 17 Apr 2026

Description

We're looking for a research engineer to help improve our in-house world models through better multimodal data. This role is about figuring out what data actually moves model quality , then building the datasets, pipelines, and experiments to prove it.

The best generative models aren’t just a product of model architecture and compute, they are a product of the training data. The model output reflects someone’s obsession over what goes into the data, how it’s processed, and what gets thrown away. We’re looking for the person who does the obsessing and builds the tools to act on it at scale.

This isn’t a role where someone hands you a dataset and asks you to clean it. You will decide what data we need, figure out where to get it, build the processing and curation systems, and close the loop with model training to make sure it actually works.

Responsibilities:

  • Discover, evaluate, and acquire training data
  • Build data processing and curation systems
  • Look at the actual data constantly
  • Close the data → model → evaluation loop
  • Deploy ML models for data enrichment
  • Make systematic, documented decisions

Requirements:

  • Strong software engineering fundamentals
  • Deep experience with image and video data at scale
  • Experience with distributed computing
  • Experience using ML models as components
  • A research-oriented approach to data decisions
  • Familiarity with the model training lifecycle

Nice to Have:

  • Familiarity with columnar and large-scale data storage formats and libraries
  • Track record of independently discovering and integrating new data sources into a training pipeline
  • Direct experience closing the data → model quality loop
  • Strong visual intuition for data quality and diversity

What This Isn’t:

  • Not infrastructure
  • Not pure research
  • Not a role where you wait for instructions
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/worldlabs/jobs/4164503009