New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
EarnIn

Senior Site Reliability Engineer

EarnIn
Apply →
remote senior full-time Mexico City, Mexico; Remote, Mexico

First indexed 14 May 2026

Description

We're looking for a Senior Site Reliability Engineer to join our team. As a Senior SRE, you will be a technical leader in designing, observing, and operating our systems in production. You will focus on how services behave as a whole: reliability, performance, failure modes, and the engineers' experience building them.

Responsibilities:

  • Design systems with resilience, graceful degradation, and capacity in mind.
  • Define and measure SLOs and SLIs that actually reflect what our customers feel.
  • Use Datadog (logging, metrics, APM) together with CloudWatch to build signal-heavy, noise-light observability.
  • Configure alerting and routing that reach engineers through incident.io, where we run incident management and on-call, so that when a human gets paged, it really matters.
  • Continuously improve our incident lifecycle, from fast detection and solid triage, through clear communication, to blameless, actionable follow-ups.

Requirements:

  • Bachelor's or master's degree in computer science or equivalent industry experience.
  • 4+ years of experience in an SRE or Software Engineering role.
  • Hands-on coding experience in Python and/or Go.
  • Distributed Systems Expertise , Proven experience designing, operating, and shepherding large-scale distributed systems from design through production, including incident learnings that make on-call quieter over time.
  • Reliability Engineering Mindset , Deep fluency in SLOs, SLIs, error budgets, and MTTR , using them to drive decisions and explain tradeoffs, not just decorate dashboards.
  • Observability & Incident Response , Treats observability as essential, not optional; stays calm under pressure; can diagnose incidents from logs and metrics and translate findings into durable process and technical improvements.
  • Cross-functional Communication , Able to work across technical and non-technical teams, reduce silos through documentation and runbooks, and explain reliability concepts in plain language.
  • Operational Tooling & AI Fluency , Selects the right tools for production management and leverages AI-assisted development to reduce toil, accelerate RCA, and streamline infrastructure-as-code workflows.
  • Leadership & Mentorship , Can plan and lead strategic reliability initiatives across engineering, and invests in mentoring engineers as a high-leverage path to long-term reliability improvements.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/earnin/jobs/7895718