# Member of Technical Staff - Imagine Model

**Company**: xAI
**Location**: Palo Alto, CA; Seattle, WA
**Work arrangement**: hybrid
**Experience**: staff
**Job type**: full-time
**Salary**: $180,000 - $440,000 USD
**Category**: Engineering
**Industry**: Technology
**Wikidata**: https://www.wikidata.org/wiki/Q120599684

**Apply**: https://job-boards.greenhouse.io/xai/jobs/5051985007
**Canonical**: https://yubhub.co/jobs/job_28b01ce3-8a3

## Description

As a Member of Technical Staff on the Imagine Model Team, you will develop cutting-edge AI experiences beyond text, with a strong focus on enabling high-fidelity understanding and generation across image and video modalities, while also incorporating audio where it enhances visual content.

Responsibilities:

- Create and drive engineering agendas to advance multimodal capabilities, with emphasis on image and video generation, editing, understanding, controllable/long-horizon synthesis, agentic planning, RL training, and world simulation (including audio integration for richer video experiences).

- Improve data quality through annotation, filtering, augmentation, synthetic generation, captioning, and in-depth data studies, particularly for visual and audio data.

- Design evaluation frameworks, metrics, benchmarks, evals, and reward models tailored to image/video/audio quality and coherence.

- Implement efficient algorithms for state-of-the-art model performance, including real-time inference, distillation, and scalable serving for visual content.

- Develop scalable data collection and processing pipelines for multimodal (primarily image/video-focused) datasets.

- Collaborate cross-functionally to integrate AI solutions into production and rapidly iterate based on user feedback.

Basic Qualifications:

- Track record in leading studies that significantly improve neural network capabilities and performance through better data or modeling.

- Experience in data-driven experiment designs, systematic analysis, and iterative model debugging.

- Experience developing or working with large-scale distributed machine learning systems.

- Ability to deliver optimal end-to-end user experiences.

- Hands-on contributor with initiative, excellence, strong work ethic, prioritization skills, and excellent communication.

Preferred Skills and Experience:

- Experience in SFT, RL, evals, human/synthetic data collection, or agentic systems.

- Proficiency in Python, JAX/XLA, PyTorch, Rust/C++, Spark, Ray, and related large-scale frameworks.

- Domain expertise in multimodal applications such as graphics engines, rendering techniques, image/video understanding and generation, world models, real-time simulation, or controllable/long-horizon visual content creation (audio/speech processing or music/audio generation experience is a plus where it supports video).

- Experience with agentic RL training, controllable/long-horizon generation, or multimodal agents that reason and act across modalities (especially in visual domains).

## Skills

### Required
- Python
- JAX/XLA
- PyTorch
- Rust/C++
- Spark
- Ray
- multimodal applications
- agentic systems
- RL training
- controllable/long-horizon generation

### Nice to have
- SFT
- evals
- human/synthetic data collection
- graphics engines
- rendering techniques
- image/video understanding and generation
- world models
- real-time simulation
