Elastic

Senior Site Reliability Engineer (FinOps) - Platform

Elastic
remote senior full-time Spain
Apply →

First indexed 18 Apr 2026

Description

As a Senior Site Reliability Engineer (FinOps) - Platform, you will be part of the Platform Engineering department, responsible for designing, building, scaling, and maturing the multi-cloud platform for hosting internal and external services. You will lead technical initiatives for automating system engineering efforts to guarantee the reliability of the global Elastic infrastructure. You will also grow our global Platform infrastructure to meet the increasing scaling demands by developing and maintaining software, tooling, and automations.

Key responsibilities include:

  • Taking an engineering approach in leading technical initiatives for automating system engineering efforts to guarantee the reliability of the global Elastic infrastructure.
  • Growing our global Platform infrastructure to meet the increasing scaling demands by developing and maintaining software, tooling, and automations.
  • Using an inclusive approach at championing an environment focused on collaboration, operational excellence, and uplifting others.
  • Responding to and preventing repeated customer impact in response to major incidents and prioritized problem management.

The ideal candidate will have success and lessons of experiences from striving for 'progress not perfection' in the name of Platform reliability. They will have a background in software engineering to collaborate with engineers to expertly identify, implement, and deliver solutions. An experience in public cloud and managed Kubernetes services is advantageous.

The role requires passion for developing solutions that involve inclusive communication methods to grow and strengthen partner and team relationships. Examples of working in distributed teams or working remotely is desirable.

Bonus points for experience in operating a SaaS product in a public cloud, building or operating a Kubernetes-at-scale infrastructure, writing non-trivial programs in Golang or other programming languages, working with containerized services, leading and improving alerting and major incident management standard processes metrics systems, and experience in system administration with professional skills in Linux on distributed systems at scale.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/elastic/jobs/7565188