Description
Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. This role is part of the CT - Infrastructure & Platform team, which builds and operates distributed, large-scale, cloud-based infrastructure using modern open-source software solutions.
As a SEIII/SRE Engineer, you will be responsible for building and operating a unified platform across EA, extracting and processing massive data from spanning 20+ game studios, and using the insight to serve massive online requests. You will also use automation technologies to ensure repeatability, eliminate toil, reduce mean time to detection and resolution (MTTD & MTTR) and repair services.
Your responsibilities will include:
- Building and operating distributed, large-scale, cloud-based infrastructure using modern open-source software solutions
- Helping build and operate a unified platform across EA, extracting and processing massive data from spanning 20+ game studios, and using the insight to serve massive online requests
- Using automation technologies to ensure repeatability, eliminate toil, reduce MTTD & MTTR and repair services
- Performing root cause analysis and post-mortems with an eye towards future prevention
- Designing and building CI/CD pipelines
- Creating monitoring, alerting and dashboarding solutions that improve visibility into EA's application performance and business metrics
- Producing documentation and support tooling for online support teams
- Developing reporting systems that inform on important metrics, detect anomalies, and forecast future results
- Developing and Operating both SQL and NoSQL solutions
- Building complex queries to solve data mining problems
- Developing large-scale online platform to personalize player experience and provide reporting and feedback
- Helping in interviewing and hiring the best candidates for the team
- Helping mentor the team members and help them grow in their skillsets
- Being responsible for driving growth and modernization efforts and projects for the team
To be successful in this role, you will need:
- 7+ years of experience with Virtualization, Containerization, Cloud Computing (AWS preferred), VMWare ecosystems, Kubernetes, or Docker
- 7+ years of experience supporting high-availability production-grade Data infrastructure and applications with defined SLIs and SLOs
- Systems Administration or Cloud experience, including a strong understanding of Linux / Unix
- Network experience, including an understanding of standard protocols/components
- Automation and orchestration experience including Terraform, Helm, Chef, Packer
- Experience writing code in Python, Golang, or Java
- Experience with Monitoring tech stack like Prometheus, Grafana, Loki, Alertmanager
- Experience with distributed system to serve massive concurrent requests
- Experience working with large-scale systems and data platforms/warehouses
If you are passionate about building and operating scalable, reliable, and efficient systems, and have a strong background in software development and operations, we encourage you to apply for this exciting opportunity.