# Infrastructure Reliability Engineer

**Company**: Anduril
**Location**: Costa Mesa, California, United States
**Work arrangement**: onsite
**Experience**: senior
**Job type**: full-time
**Salary**: $146,000-$194,000 USD
**Category**: Engineering
**Industry**: Technology

**Apply**: https://job-boards.greenhouse.io/andurilindustries/jobs/5149139007?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_5915a331-f21

## Description

This is a small but growing team responsible for the infrastructure and operations behind core developer tools used across the entire engineering organization. You'll own the full lifecycle , patching, upgrades, backups, scaling, and incident response , for services that every engineer depends on daily. The role blends DevOps, SRE, and software engineering, and is ideal for engineers who want high ownership and company-wide impact. You should have a mindset of continuous improvement , if something is manual and repetitive, your instinct should be to automate it away. As the company's on-prem infrastructure footprint grows, this team will expand its scope to provide SRE capabilities for on-prem systems , making this an opportunity to help shape that practice from the ground up.

- Own the lifecycle of core self-hosted developer tools (e.g., GitHub Enterprise Server, CircleCI, JFrog Artifactory/Xray)

- Design and implement automated systems for patching, backups (with validation), and upgrades

- Scale infrastructure to support a fast-growing engineering org

- Use Infrastructure-as-Code (Terraform) to manage environments

- Operate and troubleshoot systems using Docker, Kubernetes, and cloud platforms (AWS, GCP, Azure)

- Define and maintain SLOs for service availability, reliability, and performance

- Build and maintain monitoring, alerting, and observability for developer tool services

- Lead and participate in incident response and root cause analysis

- Work cross-functionally with platform, security, infrastructure (on-prem and cloud), and software teams

## Skills

### Required
- Docker
- Kubernetes
- Cloud platforms (AWS, GCP, Azure)
- Infrastructure-as-Code (Terraform)
- Scripting or software development experience (e.g., Python, Go, Bash)
- CI/CD pipelines and developer tooling
- Security best practices, compliance requirements, or auditing

### Nice to have
- GitHub Enterprise Server
- JFrog Artifactory/Xray
- CircleCI
- Monitoring and observability platforms (e.g., Datadog, Prometheus, Grafana)
- Background in SRE or hybrid SWE/DevOps roles
- Experience with on-prem infrastructure operations, reliability, or capacity planning

---

Source: [Apply at job-boards.greenhouse.io](https://job-boards.greenhouse.io/andurilindustries/jobs/5149139007?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
