Description
We're looking for a Staff Infrastructure Software Engineer (Kubernetes) to join our engineering team. As a member of the infrastructure team, you will be responsible for designing, building, and advancing our core infrastructure that allows the engineering team to execute quickly, productively, and securely.
You will partner with engineers to build dev tools that empower developer workflows and deployment infrastructure. You will ensure the reliability of multi-cloud Kubernetes clusters and pipelines. You will also implement metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.
You will focus on automation so we can spend energy where it matters. You will build machine learning infrastructure that enables AI teams to train, test, and deploy on large-scale datasets.
We're looking for someone with 5+ years of experience in DevOps, Site Reliability Engineering, Production Engineering, or equivalent field. You should have deep proficiency with coding languages such as Golang or Python. You should also have deep familiarity with container-related security best practices.
Production experience working with Kubernetes, and a deep understanding of the Kubernetes ecosystem, including popular open-source tooling such as cert-manager or external-dns, is required. Experience with GPU-enabled clusters is a bonus.
Production experience with Kubernetes templating tools such as Helm or Kustomize, and production experience working with IAC tools such as Terraform or CloudFormation, is a plus.
Production experience working with AWS and services such as IAM, S3, EC2, and EKS, and production experience with other cloud providers such as Google Cloud and Azure, is a bonus.
Experience with GitOps tooling such as Flux or Argo, and experience with CI/CD such as GitHub Actions, is a plus.
Compensation for this position includes a base salary, equity, and a variety of benefits.