Description

We're looking for an ambitious Systems / Platform Engineer to join a team at the intersection of SRE and low-latency distributed systems. This team will help power Pinterest's next generation of realtime ML and measurement infrastructure, with a focus on sub-millisecond decisioning, high-throughput data access, and tight integration with Pinterest's core tech stack.

In this role, you'll think about queries and RPCs in terms of syscalls, cache lines, and wire formats, and design systems that stay fast and predictable under load. You'll help define and harden the foundation for our training and serving stack: from storage and indexing strategies, to streaming and fanout, to backpressure and failure handling across services and regions.

You'll work closely with software engineering, data infra, and SRE partners to ensure our systems are observable, debuggable, and operable in production. If topics like IO scheduling and batching, lock-free or low-contention data structures, connection pooling, query planning, kernel and network tuning, on-disk layout and indexing, circuit-breaking, autoscaling, incident response, NixOS, Rust, and robust SLIs/SLOs sound interesting (even if it's just a subset), this role gives you a chance to apply that expertise to business-critical, high-leverage infrastructure at Pinterest scale.

What you'll do:

Scale the decision making process for tools for the tvScientific AI team, from our workflows to our training infrastructure to our Kubernetes deployments

Improve the developer experience for the data science team

Upgrade our observability tooling

Make every deployment smooth as our infrastructure evolves

What we're looking for:

Deep understanding of Linux

Excellent writing skills

A systems-oriented mindset

Experience in high-performance software (RTB, HFT, etc.)

Software engineering experience + reliability (e.g. CI/CD) expertise

Strong observability instincts

Demonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputs

Strong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)

High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverables

Nice-To-Haves:

Reverse-engineering experience

Terraform, EKS, or MLOps experience

Python, Scala, or Zig experience

NixOS experience

Adtech or CTV experience

Experience deploying a distributed system across multiple clouds

Experience in hard real-time low-latency

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/pinterest/jobs/7782571