New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
Reddit

Staff Site Reliability Engineer - Site Experience

Reddit
Apply →
onsite staff full-time Dublin, Ireland

First indexed 9 May 2026

Description

As a Staff Site Reliability Engineer at Reddit, you will lead reliability engineering initiatives for critical user-facing systems at internet scale. You will partner closely with product and infrastructure teams to improve availability, latency, scalability, and operational excellence across Reddit's most business-critical experiences.

In this role, you will:

  • Lead Reliability Engineering for User Experience
  • Drive reliability, scalability, and operational excellence for critical user-facing systems and services
  • Architect for Scale
  • Reduce Operational Risk
  • Drive Automation
  • Incident Management
  • Influence Engineering Standards
  • Mentor and Multiply Impact

To succeed in this role, you will need:

  • 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large-scale distributed systems
  • Strong collaboration and communication skills with the ability to influence technical direction across teams
  • Strong experience supporting high-traffic, user-facing production environments
  • Deep understanding of one or more: distributed systems, networking, Linux systems, cloud-native architectures
  • Experience designing highly available systems with strong operational and reliability practices
  • Strong programming skills in languages such as Go, Python, or similar
  • Strong understanding of observability systems including metrics, logging, tracing, and alerting
  • Experience improving reliability through SLOs, automation, incident management, and performance optimization

Nice to Have:

  • Experience operating systems at internet scale traffic volumes
  • Experience with Kubernetes, containers, cloud infrastructure, and modern deployment platforms
  • Familiarity with technologies such as Prometheus, Grafana, OpenTelemetry, Envoy, Kafka, ClickHouse, Cassandra, Redis, or similar distributed infrastructure technologies
  • Experience with CDN optimization, edge reliability, traffic engineering, or global infrastructure
  • Contributions to open-source software or participation in technical communities
  • Experience leading large-scale incident response and operational transformation initiatives

Why Join Reddit? You'll help shape the reliability and performance of one of the internet's largest platforms, influencing experiences used by millions of people every day. This is an opportunity to solve deeply complex engineering problems at massive scale while helping define the future of reliability engineering for a modern consumer platform.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/reddit/jobs/7909463