Description
As a Staff Engineer on the GitLab Delivery - Upgrades team, you'll guide the technical direction for GitLab's self-managed deployment strategy so customers can deploy, upgrade, and run GitLab reliably in their own infrastructure with minimal disruption.
You'll serve as a technical anchor for the team, working closely with your engineering manager, product manager, and partners across Site Reliability Engineering, Release, Security, and Development to shape cloud-native, operator-driven deployment patterns that reduce operational complexity and upgrade friction.
In your first year, you'll help define the architecture for zero-downtime upgrades, strengthen observability and reliability practices, and guide the next generation of deployment automation for self-managed GitLab environments.
Some examples of our projects:
- Evolving GitLab Operator and Helm charts to support zero-downtime upgrades for complex, stateful GitLab installations
- Advancing the GitLab Environment Toolkit to simplify large-scale, production-ready self-managed deployments
Responsibilities
- Guide the technical vision and architecture for GitLab's cloud-native, self-managed deployments and upgrade workflows.
- Establish operational maturity standards, service integration patterns, and deployment models that help development teams manage the lifecycle of their components.
- Design and maintain Kubernetes Operators, Helm charts, and upgrade orchestration tooling for self-managed GitLab deployments across varied environments.
- Develop automation and integration frameworks for database migrations, rolling deployments, compatibility checks, and rollback paths.
- Define database and application lifecycle strategies, including safe PostgreSQL migration approaches and validation mechanisms that reduce downtime risk.
- Work with Product Management, GitLab.com Site Reliability Engineering, GitLab Dedicated, and development teams to align deployment patterns with customer needs.
- Mentor engineers and enable customer-facing teams through design reviews, code reviews, documentation, and runbooks.
- Drive observability, testing, performance, and resilience practices for self-managed deployments, and contribute to incident response and post-incident learning.
Requirements
- Strong software engineering experience designing and delivering production systems that customers install and operate in their own infrastructure.
- Proficiency in Go for large, complex codebases, with familiarity with Ruby on Rails and Rails application architecture as a useful addition.
- Hands-on experience with Kubernetes in production, including building and maintaining Operators, designing Helm charts for stateful applications, and working with Custom Resource Definitions, admission controllers, and controller patterns.
- Knowledge of cloud-native systems and tooling, such as service mesh, observability stacks, infrastructure as code, and automation tools like Terraform or Ansible.
- Experience with stateful workloads and databases, including PostgreSQL schema design and migrations, persistent volumes, storage classes, and approaches for reducing downtime during upgrades.
- Understanding of Linux systems and production operations, including package management, systemd, system-level debugging, observability, incident response, and on-call participation.
- Ability to guide through influence, including writing clear technical proposals, documenting decisions, mentoring engineers, and working effectively across teams.
- Interest in open source infrastructure or deployment tooling, or transferable experience from adjacent domains, with the ability to explain technical concepts clearly to different audiences.
About the Team
The Delivery - Upgrades team sits within GitLab Delivery and focuses on delivering GitLab to self-managed users through supported, validated deployment tooling. We own and evolve the GitLab Omnibus package, Helm charts, GitLab Operator, and the GitLab Environment Toolkit, and we work asynchronously across regions with partners in Site Reliability Engineering, Release, Security, and Development.
Our work centers on enabling zero-downtime upgrades, reducing operational complexity at scale, supporting GitLab’s cloud-native transition while continuing to serve existing deployments, and improving the upgrade experience for customers running GitLab in diverse environments.
For more on how we work, see [Link: Team Handbook Page].