# Site Reliability Engineer III

**Company**: Electronic Arts
**Location**: Hyderabad, Telangana
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology
**Ticker**: EA
**Wikidata**: https://www.wikidata.org/wiki/Q173941

**Apply**: https://jobs.ea.com/en_US/careers/JobDetail/Site-Reliability-Engineer-Infrastructure-Platform/214230?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_0bec91e6-f34

## Description

Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter. A team where everyone makes play happen.

As a Site Reliability Engineer III on the Site Reliability Engineering (SRE) team, you will contribute to the design, automation and operation of large-scale, cloud-based systems that power EA's global gaming platform. You will work closely with senior engineers to enhance service reliability, scalability and performance across multiple game studios and services.

**Responsibilities:**

- Build and Operate Scalable Systems: Support the development, deployment, and maintenance of distributed, cloud-based infrastructure leveraging modern open-source technologies (AWS/GCP/Azure, Kubernetes, Terraform, Docker, etc.).

- Platform Operations and Automation: Develop automation scripts, tools, and workflows to reduce manual effort, improve system reliability, and optimize infrastructure operations (reducing MTTD and MTTR).

- Monitoring, Alerting & Incident Response: Create and maintain dashboards, alerts, and metrics to improve system visibility and proactively identify issues. Participate in on-call rotations and assist in incident response and root cause analysis.

- Continuous Integration / Continuous Deployment (CI/CD): Contribute to the design, implementation, and maintenance of CI/CD pipelines to ensure consistent, repeatable, and reliable deployments.

- Reliability and Performance Engineering: Collaborate with cross-functional teams to identify reliability bottlenecks, define SLIs/SLOs/SLAs, and implement improvements that enhance the stability and performance of production services.

- Post-Incident Reviews & Documentation: Participate in root cause analyses, document learnings, and contribute to preventive measures to avoid recurrence of production issues. Maintain detailed operational documentation and runbooks.

- Collaboration & Mentorship: Work closely with senior SREs and software engineers to gain exposure to large-scale systems, adopt best practices, and gradually take ownership of more complex systems and initiatives.

- Modernization & Continuous Improvement: Contribute to ongoing modernization efforts by identifying areas for improvement in automation, monitoring, and reliability.

**Qualifications – Site Reliability Engineer III**

- 7+ years of experience in Cloud Computing (AWS preferred), Virtualization, and Containerization using Kubernetes, Docker, or VMWare. And Extensive hands-on experience in container orchestration technologies, such as EKS, Kubernetes, Docker

- Experience supporting production-grade, high-availability systems with defined SLIs/SLOs.

- Strong Linux/Unix administration and networking fundamentals (protocols, load balancing, DNS, firewalls).

- Hands-on experience with Infrastructure as Code and automation tools such as Terraform, Helm, Ansible, or Chef..

- Proficiency in Python, Golang, Bash, or Java for scripting and automation.

- Familiar with monitoring and observability tools like Prometheus, Grafana, Loki, or Datadog.

- Exposure to distributed systems, SQL/NoSQL databases, and CI/CD pipelines.

- Strong problem-solving, troubleshooting, and collaboration skills in cross-functional environments.

## Skills

### Required
- Cloud Computing
- Kubernetes
- Docker
- Terraform
- Python
- Golang
- Bash
- Java
- Prometheus
- Grafana
- Loki
- Datadog
- Linux/Unix administration
- Networking fundamentals
- Infrastructure as Code
- Automation tools
- Distributed systems
- SQL/NoSQL databases
- CI/CD pipelines

---

Source: [Apply at jobs.ea.com](https://jobs.ea.com/en_US/careers/JobDetail/Site-Reliability-Engineer-Infrastructure-Platform/214230?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
