Description

Joining Razer will place you on a global mission to revolutionize the way the world games. As a Senior Site Reliability Engineer, you will be part of a growing infrastructure and platform engineering team. The ideal candidate will have hands-on experience in Amazon Web Services (AWS), strong troubleshooting capabilities, and a passion for building scalable, observable, and resilient systems using modern Infrastructure as Code (IaC) and automation tools.

Responsibilities:

Design, implement, and maintain Infrastructure as Code (IaC) solutions using Terraform and/or CloudFormation across multi-account AWS environments.
Collaborate with developers, architects, and DevOps teams to build scalable, secure, and observable cloud infrastructure.
Lead and participate in architecture design sessions, focusing on system reliability, scalability, security, and performance.
Implement and manage robust monitoring, alerting, and observability solutions (e.g., CloudWatch, Prometheus, ELK, Datadog).
Set and monitor Key Performance Indicators (KPIs) for system uptime, latency, throughput, and overall reliability.
Drive incident response processes, including coordination, triaging, resolution, documentation, and post-incident reviews (PIRs).
Supervise and mentor junior SREs and infrastructure engineers, fostering knowledge-sharing and team growth.
Collaborate across development, operations, and security teams to ensure secure and compliant deployments.
Automate manual tasks and workflows through scripting and tooling (Python, Node.js, Bash, Ruby, JSON/YAML).
Troubleshoot complex infrastructure issues across Linux, Windows, Docker, and cloud-native environments.
Provide IaC and CI/CD best practices to ensure repeatability, scalability, and compliance across all environments.
Provide on-call support, participate in incident rotations, and lead technical investigations during outages or degradations.
Strong understanding and experience for Disaster Recovery (DR).
Provide support and solution handling to incident and tickets assigned.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://razer.wd3.myworkdayjobs.com/en-US/Careers/job/Bangsar-South/Site-Reliability---Engineering-3_JR2025005670