Description
As a Senior Software Engineer (Infrastructure), you will be a core technical contributor on the IT Infrastructure team, owning and driving the evolution of our core infrastructure and observability platforms. This role requires a strong software engineering mindset, deep technical breadth across SRE and infrastructure worlds, and the ability to deliver high-quality, scalable solutions for currently 'immature' system problems.
You will be responsible for building resilient, scalable, and automated infrastructure that empowers our development teams. As a senior member of the team, you will bridge the gap between software engineering and systems architecture, ensuring our AWS environment is cost-optimised, secure, and highly available.
Key responsibilities include:
- Architect and Automate: Design and deploy production-grade infrastructure on AWS using Terraform or Pulumi.
- Orchestration: Manage and scale containerised workloads using AKS (Azure Kubernetes Service) or EKS, focusing on cluster security and resource efficiency.
- CI/CD Excellence: Architect robust deployment pipelines using GitHub Actions, managing both GitHub-hosted and self-hosted runners for specialised build requirements.
- Drive 'Observable by Default' Frameworks: Create underlying infrastructure to ensure new internal applications are secure and have logging and metrics enabled by default.
- Tooling, Scripting & AI: Build internal CLI tools, AI plugins and automation scripts to streamline developer workflows and enhance operational efficiency.
- Partner Cross-Functionally: Collaborate with stakeholders across Security, Engineering, Infrastructure, and Support to deliver impactful projects with real business outcomes.
- Mentor and Document: Participate in Code reviews, Document solutions and failure triage playbooks, and mentor junior engineers on the platforms you own.
Requirements include:
- Software Engineering Expertise: 5+ years of production-level experience with a strong proficiency in Python (non-negotiable).
- IaC: Expert-level proficiency in Terraform (modules, state management) or Pulumi (Preferred).
- Cloud & Infrastructure Breadth: Hands-on experience with AWS (or Azure/GCP), Kubernetes, Docker and containerisation concepts.
- Automation & Integration Mindset: Experience building and troubleshooting integrations between infrastructure, data pipelines, and observability platforms.
- CI/CD: Advanced knowledge of GitHub Actions, GitHub Runners.
- Strong Observability Mindset: Understanding of observability pillars (logging, metrics, tracing) and hands-on experience with tools like Datadog, Prometheus, or ELK.
- Distributed Systems: Proficiency in running systems through concepts like Kafka or messaging queues.
- Independent Execution: Ability to operate with minimal guidance, take ownership of ambiguous projects, and follow a vision set by tech leads to execute independently.