# Research, Vision Expertise

**Company**: Thinking Machines Lab
**Location**: San Francisco
**Work arrangement**: onsite
**Experience**: senior
**Job type**: full-time
**Salary**: $350,000 - $475,000 USD per year
**Category**: Engineering
**Industry**: Technology

**Apply**: https://job-boards.greenhouse.io/thinkingmachines/jobs/5002288008
**Canonical**: https://yubhub.co/jobs/job_4ced2159-802

## Description

Thinking Machines Lab is seeking a researcher to join their team in San Francisco. The successful candidate will work on advancing the science of visual perception and multimodal learning. They will design architectures that fuse pixels and text, build datasets and evaluation methods that test real-world comprehension, and develop representations that let models ground abstract concepts in the physical world.

The ideal candidate will have expertise in multimodality and experience running large-scale experiments. They will be comfortable contributing to complex engineering systems and have a strong grasp of probability, statistics, and machine learning fundamentals.

This is an evergreen role, meaning that the position is open on an ongoing basis. The company receives many applications, and there may not always be an immediate role that aligns perfectly with the candidate's experience and skills. However, they encourage candidates to apply and continuously review applications.

Responsibilities:

- Own research projects on training and performance analysis of multimodal AI models.

- Curate and build large-scale datasets and evaluation benchmarks to advance vision capabilities.

- Work with data infrastructure engineers, pretraining researchers and engineers, and product teams to create frontier multimodal models and the products that leverage them.

- Publish and present research that moves the entire community forward.

Skills and Qualifications:

- Ability to design, run, and analyze experiments thoughtfully, with demonstrated research judgment and empirical rigor.

- Understanding of machine learning fundamentals, large-scale training, and distributed compute environments.

- Proficiency in Python and familiarity with at least one deep learning framework (e.g., PyTorch, TensorFlow, or JAX).

- Comfortable with debugging distributed training and writing code that scales.

- Bachelor's degree or equivalent experience in Computer Science, Machine Learning, Physics, Mathematics, or a related discipline with strong theoretical and empirical grounding.

Preferred qualifications include research or engineering contributions in visual reasoning, spatial understanding, or multimodal architecture design, experience developing evaluation frameworks for multimodal tasks, publications or open-source contributions in vision-language modeling, video understanding, or multimodal AI, and a strong grasp of probability, statistics, and ML fundamentals.

Logistics:

- Location: San Francisco, California.

- Compensation: $350,000 - $475,000 USD per year, depending on background, skills, and experience.

- Visa sponsorship: Yes.

- Benefits: Generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

## Skills

### Required
- Python
- Deep learning framework (e.g., PyTorch, TensorFlow, or JAX)
- Machine learning fundamentals
- Large-scale training
- Distributed compute environments

### Nice to have
- Visual reasoning
- Spatial understanding
- Multimodal architecture design
- Evaluation frameworks for multimodal tasks
- Vision-language modeling
- Video understanding
- Multimodal AI