# Member of Technical Staff - Compute Infrastructure

**Company**: xAI
**Location**: Palo Alto, CA
**Work arrangement**: onsite
**Experience**: staff
**Job type**: full-time
**Salary**: $180,000 - $440,000 USD
**Category**: Engineering
**Industry**: Technology
**Wikidata**: https://www.wikidata.org/wiki/Q120599684

**Apply**: https://job-boards.greenhouse.io/xai/jobs/5052040007
**Canonical**: https://yubhub.co/jobs/job_5daf8f5f-60a

## Description

Join the Compute Infrastructure team at xAI, responsible for designing, building, and operating massive-scale clusters and orchestration platforms. You will push the boundaries of container orchestration, manage exascale compute resources, and collaborate closely with research and systems teams to deliver reliable, ultra-scalable infrastructure.

Responsibilities:

- Build and manage massive-scale clusters to host, persist, train, and serve AI workloads with extreme reliability and performance.

- Design, develop, and extend an in-house container orchestration platform that achieves superior scalability, isolation, resource efficiency, and fault-tolerance compared to off-the-shelf solutions.

- Collaborate with research teams to architect and optimize compute clusters specifically for large-scale training runs, inference services, and real-time applications.

- Profile, debug, and resolve complex system-level performance bottlenecks, resource contention, scheduling issues, and reliability problems across the full stack.

- Own end-to-end infrastructure initiatives with first-principles design, rigorous testing, automation, and continuous optimization to support frontier AI compute demands.

Basic Qualifications:

- Deep expertise in virtualization technologies (KVM, Xen, QEMU) and advanced containerization/sandboxing (Kata, Firecracker, gVisor, Sysbox, or equivalent).

- Strong proficiency in systems programming languages such as C/C++ and Rust.

- Proven track record profiling, debugging, and optimizing complex system-level performance issues, with deep knowledge of Linux kernel internals, resource management, scheduling, memory management, and low-level engineering.

- Hands-on experience building or significantly enhancing distributed compute platforms, orchestration systems, or high-performance infrastructure at scale.

Preferred Skills and Experience:

- Experience in Linux kernel development, hypervisor extensions, or low-level system programming for compute-intensive workloads.

- Proven track record operating or designing large-scale AI training/inference clusters (GPU/TPU scale).

- Experience with custom runtimes, isolation techniques, or bespoke platforms for specialized AI compute.

- Familiarity with performance tools, tracing, and debugging in production distributed environments.

Compensation and Benefits:

$180,000 - $440,000 USD

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

## Skills

### Required
- virtualization technologies
- advanced containerization/sandboxing
- systems programming languages
- Linux kernel internals
- resource management
- scheduling
- memory management
- low-level engineering

### Nice to have
- Linux kernel development
- hypervisor extensions
- low-level system programming
- custom runtimes
- isolation techniques
- bespoke platforms
