Description
We're looking for experienced distributed-systems engineers to join our Core Product team and advance the next generation of Alluxio's data-orchestration engine - the foundation for AI and analytics at global scale.
As a Staff Software Engineer, you'll work on high-impact systems problems such as optimizing metadata management, caching, and replication across thousands of nodes; designing concurrent, fault-tolerant services for multi-region and multi-cloud environments; evolving Alluxio's storage abstraction and scheduling layer to support large-scale AI/ML data pipelines; and collaborating with internal product teams to push the limits of distributed I/O performance.
This is a hands-on, architecture-plus-implementation role for engineers who love deep systems work and want visible impact in a small, senior, highly technical team.
What You'll Own
- Cache and metadata consistency - advance Alluxio's intelligent caching framework for multi-tenant environments (TTL policies, write-back consistency, invalidation protocols, and distributed metadata scaling).
- High-throughput data I/O optimization - profile and optimize Alluxio's data path across S3, GCS, HDFS, and POSIX interfaces using adaptive prefetching, async I/O, and tier-aware scheduling.
- Scaling for AI and analytics workloads - evolve the coordination layer to efficiently serve distributed AI training clusters, accelerating model load and shuffle operations across regions and clouds.
- Observability and performance insights - build fine-grained metrics and tracing for cache efficiency, throughput, and latency across storage tiers.
- Open-source leadership - drive design discussions, mentor contributors, and represent Alluxio's core-systems direction within the OSS community.
What You'll Do
- Design and implement core components of Alluxio's distributed file and object-access layer.
- Optimize performance for large-scale, high-throughput environments using advanced concurrency and caching techniques.
- Build scalable metadata and coordination systems that ensure strong consistency, high availability, and minimal latency.
- Collaborate cross-functionally with product, solution-engineering, and research teams to drive roadmap and customer success.
What We're Looking For
- Strong computer-science fundamentals and a passion for large-scale distributed systems.
- Professional experience developing in Java, C++, or Go.
- Deep understanding of concurrency, replication, fault tolerance, and performance optimization.
- Experience with distributed storage, data-access layers, or cloud infrastructure (e.g., Spark, Presto, Hadoop, Kubernetes).
- Bachelor's or advanced degree in Computer Science or related technical field (or equivalent experience).
- Demonstrated technical leadership: defining architecture, mentoring peers, or driving major projects from design through release.
Why Alluxio
- Build infrastructure trusted by the world's largest AI and data-driven companies.
- Join a small, senior engineering team where your designs shape the product's evolution.
- Work directly with the original creators of open-source Alluxio.
- A culture of empathy, curiosity, and ownership - where engineers collaborate closely to solve hard problems.