Description

About the role

You will serve as a technical lead across the Machine Learning Platform space and a key contributor to the evolution of the platforms that power Stripe's ML-driven products.

Responsibilities

Take ownership of end-to-end architecture and system design for large, complex projects across ML Platform.
Define technical directions for projects with high ambiguity, transforming complex user needs into long-lasting platform strategy.
Design the system architecture and solutions for the most challenging problems in the ML Platform domain, including low-latency model inference, large-scale feature stores, real-time monitoring, and LLM/agent orchestration.
Turn high-leverage ideas into tangible, robust solutions that shape platform and product roadmap, combining technical excellence with creative problem-solving.
Scope and lead large projects with significant business impact, driving them from requirements through design, implementation, and production operation.
Work with ML engineers, data scientists, and product teams directly to translate their needs into functional requirements and scalable technical solutions.
Arbitrate critical decisions that balance competing priorities while meeting latency, reliability, cost, and security constraints.
Serve as a key engineering representative, engaging senior leaders across Stripe and advising the leadership team on key technical considerations related to the end-to-end ML lifecycle.
Drive cross-team technical initiatives that improve ML development velocity and MLOps maturity across the company.
Mentor and grow other engineers. Serve as a role model for designing, implementing, and operating great software systems.

Requirements

10+ years of professional software development experience, or equivalent domain expertise, with a solid background in service-oriented architecture and large-scale distributed systems.
Track record of serving as a technical lead, with the ability to provide technical direction, lead multi-team initiatives, and mentor team members.
Experience working on production ML platform services.
Strong product instincts and a deep understanding of the business context in which you operate.
Strong communication skills with the ability to explain complex technical concepts to both technical and non-technical stakeholders.
Demonstrated ability to work cross-functionally, collaborating effectively with ML engineers, data scientists, software engineers, product managers, and business stakeholders.
The ability to thrive on a high level of autonomy and responsibility, and comfort operating in ambiguous environments.
Hands-on experience using AI tools to accelerate how you work.

Preferred qualifications

Experience building large-scale serving or data infrastructure for machine learning use cases (e.g., model inference, feature stores, real-time feature computation, model registries).
Familiarity with LLMs, LLM frameworks, and agentic AI patterns (e.g., tool use, multi-agent orchestration, retrieval-augmented generation).
Experience rapidly developing prototypes and iterating based on user feedback.
Familiarity with cloud services (e.g., AWS) and cloud-based AI/ML services (e.g., SageMaker, Bedrock, Databricks, OpenAI).
Experience training and shipping machine learning models to production to solve critical business problems.
Ability to synthesize ideas across the organization while setting a compelling technical vision.
Comfortable working with geographically distributed teams.
Passion for side-projects, open source, or self-driven technical initiatives.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/stripe/jobs/7939868