# Senior Site Reliability Engineer - Observability

**Company**: Okta
**Location**: Bellevue, Washington
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Salary**: $147,000-$202,000 USD
**Category**: Engineering
**Industry**: Technology

**Apply**: https://job-boards.greenhouse.io/okta/jobs/7658254?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_54a37748-223

## Description

### Job Overview

We are seeking a highly technical Senior Observability Site Reliability Engineer with a specialty in Splunk to own and evolve our Splunk ecosystem. In this role, you will move beyond simple monitoring to delivering a world-class, comprehensive, scalable Observability Platform that enables our SRE teams and business partners.

### Key Responsibilities

- Design, build, and maintain scalable observability infrastructure using tools like Terraform.

- Optimize the collection, processing, and storage of log data to ensure high reliability and low latency of our Splunk services.

- Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and 'observability-driven development.'

- Eliminate 'toil' by automating the deployment and scaling of observability agents and collectors.

### Required Skills & Experience

- Minimum 5+ years of experience scaling and managing Splunk Cloud at scale (1000+ SVCs), including Workload Management (WLM) and HEC optimization.

- Expertise in creating intuitive, actionable Splunk dashboards that correlate data across multiple sources.

- Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.

- Strong coding skills in SPL, Go, Python, or Ruby for building internal tools and automating workflows.

- Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).

- A data-driven approach to debugging complex, cross-service performance bottlenecks.

### Bonus Skills

- Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.

- Experience in implementing Splunk charge-back app for usage reporting.

- Experience managing observability native tools within AWS or GCP.

### Additional Requirements

- This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status.

- This person must attend in-person onboarding in our San Francisco office the first week of employment.

### Salary Information

The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is between: $147,000-$202,000 USD

## Skills

### Required
- Splunk
- Terraform
- Go
- Python
- Ruby
- SPL
- Linux
- Kubernetes
- EKS
- TCP/IP
- DNS
- Load Balancing

### Nice to have
- OpenTelemetry
- Vector
- AWS
- GCP
- Splunk charge-back app

---

Source: [Apply at job-boards.greenhouse.io](https://job-boards.greenhouse.io/okta/jobs/7658254?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
