New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
Razer

Senior Site Reliability Engineer

Razer
Apply →
onsite senior full-time Bangsar South

First indexed 19 May 2026

Description

Joining Razer will place you on a global mission to revolutionize the way the world games. As a Senior Site Reliability Engineer, you will be part of a growing infrastructure and platform engineering team. The ideal candidate will have hands-on experience in Amazon Web Services (AWS), strong troubleshooting capabilities, and a passion for building scalable, observable, and resilient systems using modern Infrastructure as Code (IaC) and automation tools.

Responsibilities:

  • Design, implement, and maintain Infrastructure as Code (IaC) solutions using Terraform and/or CloudFormation across multi-account AWS environments.
  • Collaborate with developers, architects, and DevOps teams to build scalable, secure, and observable cloud infrastructure.
  • Lead and participate in architecture design sessions, focusing on system reliability, scalability, security, and performance.
  • Implement and manage robust monitoring, alerting, and observability solutions (e.g., CloudWatch, Prometheus, ELK, Datadog).
  • Set and monitor Key Performance Indicators (KPIs) for system uptime, latency, throughput, and overall reliability.
  • Drive incident response processes, including coordination, triaging, resolution, documentation, and post-incident reviews (PIRs).
  • Supervise and mentor junior SREs and infrastructure engineers, fostering knowledge-sharing and team growth.
  • Collaborate across development, operations, and security teams to ensure secure and compliant deployments.
  • Automate manual tasks and workflows through scripting and tooling (Python, Node.js, Bash, Ruby, JSON/YAML).
  • Troubleshoot complex infrastructure issues across Linux, Windows, Docker, and cloud-native environments.
  • Provide IaC and CI/CD best practices to ensure repeatability, scalability, and compliance across all environments.
  • Provide on-call support, participate in incident rotations, and lead technical investigations during outages or degradations.
  • Strong understanding and experience for Disaster Recovery (DR).
  • Provide support and solution handling to incident and tickets assigned.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://razer.wd3.myworkdayjobs.com/en-US/Careers/job/Bangsar-South/Site-Reliability---Engineering-3_JR2025005670