New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
Electronic Arts

Site Reliability Engineer III

Electronic Arts
Apply →
hybrid senior full-time Hyderabad, Telangana

First indexed 18 Jun 2026

Description

Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter. A team where everyone makes play happen.

As a Site Reliability Engineer III on the Site Reliability Engineering (SRE) team, you will contribute to the design, automation and operation of large-scale, cloud-based systems that power EA's global gaming platform. You will work closely with senior engineers to enhance service reliability, scalability and performance across multiple game studios and services.

Responsibilities:

  • Build and Operate Scalable Systems: Support the development, deployment, and maintenance of distributed, cloud-based infrastructure leveraging modern open-source technologies (AWS/GCP/Azure, Kubernetes, Terraform, Docker, etc.).
  • Platform Operations and Automation: Develop automation scripts, tools, and workflows to reduce manual effort, improve system reliability, and optimize infrastructure operations (reducing MTTD and MTTR).
  • Monitoring, Alerting & Incident Response: Create and maintain dashboards, alerts, and metrics to improve system visibility and proactively identify issues. Participate in on-call rotations and assist in incident response and root cause analysis.
  • Continuous Integration / Continuous Deployment (CI/CD): Contribute to the design, implementation, and maintenance of CI/CD pipelines to ensure consistent, repeatable, and reliable deployments.
  • Reliability and Performance Engineering: Collaborate with cross-functional teams to identify reliability bottlenecks, define SLIs/SLOs/SLAs, and implement improvements that enhance the stability and performance of production services.
  • Post-Incident Reviews & Documentation: Participate in root cause analyses, document learnings, and contribute to preventive measures to avoid recurrence of production issues. Maintain detailed operational documentation and runbooks.
  • Collaboration & Mentorship: Work closely with senior SREs and software engineers to gain exposure to large-scale systems, adopt best practices, and gradually take ownership of more complex systems and initiatives.
  • Modernization & Continuous Improvement: Contribute to ongoing modernization efforts by identifying areas for improvement in automation, monitoring, and reliability.

Qualifications – Site Reliability Engineer III

  • 7+ years of experience in Cloud Computing (AWS preferred), Virtualization, and Containerization using Kubernetes, Docker, or VMWare. And Extensive hands-on experience in container orchestration technologies, such as EKS, Kubernetes, Docker
  • Experience supporting production-grade, high-availability systems with defined SLIs/SLOs.
  • Strong Linux/Unix administration and networking fundamentals (protocols, load balancing, DNS, firewalls).
  • Hands-on experience with Infrastructure as Code and automation tools such as Terraform, Helm, Ansible, or Chef..
  • Proficiency in Python, Golang, Bash, or Java for scripting and automation.
  • Familiar with monitoring and observability tools like Prometheus, Grafana, Loki, or Datadog.
  • Exposure to distributed systems, SQL/NoSQL databases, and CI/CD pipelines.
  • Strong problem-solving, troubleshooting, and collaboration skills in cross-functional environments.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://jobs.ea.com/en_US/careers/JobDetail/Site-Reliability-Engineer-Infrastructure-Platform/214230