Description
We are seeking experienced and highly motivated Distributed Systems Engineers to join Cloudflare's dynamic DATA Organisation. This is a pivotal opportunity to contribute to the future of data at Cloudflare, working on systems that are fundamental to our global operations and customer insights.
As a Distributed Systems Engineer in the Logs and Audit Logs group, you will focus on designing, building, and operating a robust logging platform, ensuring reliable logging, and secure data transfer to a wide array of customer destinations and third-party integrations.
Responsibilities:
- Design, build, and operate a robust logging platform, ensuring reliable logging, and secure data transfer to a wide array of customer destinations and third-party integrations.
- Develop and maintain high-performance data connectors and integrations for our log-shipping products, focusing on usability, scalability and data integrity.
- Create and manage systems for handling comprehensive audit logs, ensuring they are delivered securely and adhere to strict compliance and performance standards.
- Scale and optimise the data delivery pipeline to handle massive data volumes with low latency, identifying and removing bottlenecks in data processing and routing.
- Work closely with Product and other engineering teams to define requirements for a new logging platform and integrations.
- Maintain the operational health of our log delivery platform through comprehensive monitoring and participation in an on-call rotation (with flexibility for out-of-hours technical issue resolution).
- Collaborate on the architectural evolution of our data egress platform, researching and implementing new technologies to improve efficiency and reliability.
Key Qualifications:
- 3+ years of experience working in software development covering distributed systems and data pipelines.
- Strong programming skills (Go is preferable), with a deep understanding of software development best practices for building resilient, high-throughput systems.
- Hands-on experience with modern observability stacks, including Prometheus, Grafana, and a strong understanding of handling high-cardinality metrics at scale.
- Strong knowledge of SQL, including experience with query optimisation.
- A solid foundation in computer science, including algorithms, data structures, distributed systems, and concurrency.
- Strong analytical and problem-solving skills, with a willingness to debug, troubleshoot, and learn about complex problems at high scale.
- Ability to work collaboratively in a team environment and communicate effectively with other teams across Cloudflare.
- Experience with data streaming technologies (e.g., Kafka, Flink) is a strong plus.
- Experience with various logging platforms or SIEMs (e.g., Splunk, Datadog, Sumo Logic) and storage destinations (e.g., S3, R2, GCS) is a plus.
- Experience with Infrastructure as Code tools like SALT or Terraform is a plus.
- Experience with Linux container technologies, such as Docker and Kubernetes, is a plus.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://job-boards.greenhouse.io/cloudflare/jobs/7462802