xAI

Member of Technical Staff - Imagine Model

xAI
hybrid staff full-time $180,000 - $440,000 USD Palo Alto, CA; Seattle, WA
Apply →

First indexed 18 Apr 2026

Description

As a Member of Technical Staff on the Imagine Model Team, you will develop cutting-edge AI experiences beyond text, with a strong focus on enabling high-fidelity understanding and generation across image and video modalities, while also incorporating audio where it enhances visual content.

Responsibilities:

  • Create and drive engineering agendas to advance multimodal capabilities, with emphasis on image and video generation, editing, understanding, controllable/long-horizon synthesis, agentic planning, RL training, and world simulation (including audio integration for richer video experiences).
  • Improve data quality through annotation, filtering, augmentation, synthetic generation, captioning, and in-depth data studies, particularly for visual and audio data.
  • Design evaluation frameworks, metrics, benchmarks, evals, and reward models tailored to image/video/audio quality and coherence.
  • Implement efficient algorithms for state-of-the-art model performance, including real-time inference, distillation, and scalable serving for visual content.
  • Develop scalable data collection and processing pipelines for multimodal (primarily image/video-focused) datasets.
  • Collaborate cross-functionally to integrate AI solutions into production and rapidly iterate based on user feedback.

Basic Qualifications:

  • Track record in leading studies that significantly improve neural network capabilities and performance through better data or modeling.
  • Experience in data-driven experiment designs, systematic analysis, and iterative model debugging.
  • Experience developing or working with large-scale distributed machine learning systems.
  • Ability to deliver optimal end-to-end user experiences.
  • Hands-on contributor with initiative, excellence, strong work ethic, prioritization skills, and excellent communication.

Preferred Skills and Experience:

  • Experience in SFT, RL, evals, human/synthetic data collection, or agentic systems.
  • Proficiency in Python, JAX/XLA, PyTorch, Rust/C++, Spark, Ray, and related large-scale frameworks.
  • Domain expertise in multimodal applications such as graphics engines, rendering techniques, image/video understanding and generation, world models, real-time simulation, or controllable/long-horizon visual content creation (audio/speech processing or music/audio generation experience is a plus where it supports video).
  • Experience with agentic RL training, controllable/long-horizon generation, or multimodal agents that reason and act across modalities (especially in visual domains).
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/xai/jobs/5051985007