Description

This is a small but growing team responsible for the infrastructure and operations behind core developer tools used across the entire engineering organization. You'll own the full lifecycle , patching, upgrades, backups, scaling, and incident response , for services that every engineer depends on daily. The role blends DevOps, SRE, and software engineering, and is ideal for engineers who want high ownership and company-wide impact. You should have a mindset of continuous improvement , if something is manual and repetitive, your instinct should be to automate it away. As the company's on-prem infrastructure footprint grows, this team will expand its scope to provide SRE capabilities for on-prem systems , making this an opportunity to help shape that practice from the ground up.

Own the lifecycle of core self-hosted developer tools (e.g., GitHub Enterprise Server, CircleCI, JFrog Artifactory/Xray)
Design and implement automated systems for patching, backups (with validation), and upgrades
Scale infrastructure to support a fast-growing engineering org
Use Infrastructure-as-Code (Terraform) to manage environments
Operate and troubleshoot systems using Docker, Kubernetes, and cloud platforms (AWS, GCP, Azure)
Define and maintain SLOs for service availability, reliability, and performance
Build and maintain monitoring, alerting, and observability for developer tool services
Lead and participate in incident response and root cause analysis
Work cross-functionally with platform, security, infrastructure (on-prem and cloud), and software teams

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/andurilindustries/jobs/5149139007