# Staff Software Engineer

**Company**: Alluxio
**Work arrangement**: onsite
**Experience**: staff
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://jobs.lever.co/alluxio/65f09933-df44-4f0d-b70d-7d4e6fd57330
**Canonical**: https://yubhub.co/jobs/job_aa7543fd-8bc

## Description

We're looking for experienced distributed-systems engineers to join our Core Product team and advance the next generation of Alluxio's data-orchestration engine - the foundation for AI and analytics at global scale.

As a Staff Software Engineer, you'll work on high-impact systems problems such as optimizing metadata management, caching, and replication across thousands of nodes; designing concurrent, fault-tolerant services for multi-region and multi-cloud environments; evolving Alluxio's storage abstraction and scheduling layer to support large-scale AI/ML data pipelines; and collaborating with internal product teams to push the limits of distributed I/O performance.

This is a hands-on, architecture-plus-implementation role for engineers who love deep systems work and want visible impact in a small, senior, highly technical team.

### What You'll Own

- Cache and metadata consistency - advance Alluxio's intelligent caching framework for multi-tenant environments (TTL policies, write-back consistency, invalidation protocols, and distributed metadata scaling).

- High-throughput data I/O optimization - profile and optimize Alluxio's data path across S3, GCS, HDFS, and POSIX interfaces using adaptive prefetching, async I/O, and tier-aware scheduling.

- Scaling for AI and analytics workloads - evolve the coordination layer to efficiently serve distributed AI training clusters, accelerating model load and shuffle operations across regions and clouds.

- Observability and performance insights - build fine-grained metrics and tracing for cache efficiency, throughput, and latency across storage tiers.

- Open-source leadership - drive design discussions, mentor contributors, and represent Alluxio's core-systems direction within the OSS community.

### What You'll Do

- Design and implement core components of Alluxio's distributed file and object-access layer.

- Optimize performance for large-scale, high-throughput environments using advanced concurrency and caching techniques.

- Build scalable metadata and coordination systems that ensure strong consistency, high availability, and minimal latency.

- Collaborate cross-functionally with product, solution-engineering, and research teams to drive roadmap and customer success.

### What We're Looking For

- Strong computer-science fundamentals and a passion for large-scale distributed systems.

- Professional experience developing in Java, C++, or Go.

- Deep understanding of concurrency, replication, fault tolerance, and performance optimization.

- Experience with distributed storage, data-access layers, or cloud infrastructure (e.g., Spark, Presto, Hadoop, Kubernetes).

- Bachelor's or advanced degree in Computer Science or related technical field (or equivalent experience).

- Demonstrated technical leadership: defining architecture, mentoring peers, or driving major projects from design through release.

### Why Alluxio

- Build infrastructure trusted by the world's largest AI and data-driven companies.

- Join a small, senior engineering team where your designs shape the product's evolution.

- Work directly with the original creators of open-source Alluxio.

- A culture of empathy, curiosity, and ownership - where engineers collaborate closely to solve hard problems.

## Skills

### Required
- Java
- C++
- Go
- Distributed Systems
- Concurrency
- Replication
- Fault Tolerance
- Performance Optimization
- Distributed Storage
- Data-Access Layers
- Cloud Infrastructure
