Description
As a Principal SRE (Networking) - Platform Control Plane, you will lead technical initiatives for designing, building, and automating network infrastructure and services to guarantee the reliability of the global Elastic network infrastructure. You will focus on Layer 2/3/4 of the TCP/IP stack (Ethernet and/or IP encapsulation, routing, firewalling, load balancing).
You will grow our global Platform network infrastructure to meet the increasing scaling demands by developing and maintaining software, codebases, tooling, and automations to serve our Network Infrastructure as Code principle.
You will collaborate in an environment with an inclusive approach, focusing on operational excellence, which uplifts others. You will prevent repeated customer impact in response to major incidents and prioritize problem management. Our on-call rotation is spread well, and we address complex customer concerns too.
You will take an engineering approach to leading technical initiatives, developing and maintaining software, codebases, tooling, and automations to serve our Network Infrastructure as Code principle. You will focus on Layer 2/3/4 of the TCP/IP stack (Ethernet and/or IP encapsulation, routing, firewalling, load balancing).
You will have excellent networking skills, with knowledge of protocols such as IP/IPv6, TCP/UDP, BGP, DNS. You will have strong technical depth for building and automating networks (Terraform, Ansible) in collaboration with other engineers as an authority in identifying, implementing, and delivering solutions.
You will have good knowledge of public CSP network components (Load balancers, VPC peering/Transit gateways, VPN connectivity, Direct Connects). You will have success and lessons of experiences from striving for 'progress not perfection' in the name of Platform reliability. We want to hear about your customer-first approach in solving operational problems for both today and the future.
You will have passion for developing solutions that involve inclusive communication methods to grow and strengthen partner and team relationships. Examples of working in distributed teams or working remotely are desirable.
You will have site-reliability engineering experience. We tackle problems with code, but fundamentally, we keep things working and have proven success in operational excellence. Responding to and preventing repeated customer impact in response to major incidents and prioritized problem management. Our on-call rotation uses follow-the-sun model where everyone participates in it in (mostly) their working hours.
Bonus points:
- You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform.
- You have designed and/or operated large network topologies that dynamic routing is based on BGP.
- You have operated network topologies based on software routers.
- You have experience in IP address management (IPAM) and you have used relevant tools for automated IP allocations.
- You have designed and/or operated overlay networks with use of encapsulation protocols such as IPSec, GRE, and VXLAN.
- You have built or operated a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, with knowledge of the Cilium CNI.
- You have written non-trivial programs in Golang or other programming languages.
- You have worked with containerized services (such as Docker).
- You have proven experience in leading and improving alerting and major incident management standard processes metrics systems (e.g., Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues and quantify impacts to present to others at varying levels of the organization.
- You have experience in system and network administration with professional skills in Linux on distributed systems at scale.
- You have diagnosed or designed, implemented, and created solutions with the Elastic Stack.
- You are experienced in thriving in a self-organizing and sharing in a globally distributed team environment.
- You strengthen team members in bringing out the best of each other by uplifting others with coaching and mentoring.