Description
As a Senior Staff Machine Learning Engineer, you will help define and lead the vision for Reddit's large-scale GenAI Platform, shaping the strategy, architecture, and operating model that enable teams across the company to build, deploy, and scale generative AI products with confidence.
Contribute to the design, implementation, and maintenance of the LLM Gateway, focusing on features like unified API endpoints for internal/externally hosted LLM, rate/token limit management, and intelligent failover mechanisms to boost uptime and reliability.
Lead and execute the vision, strategy, and roadmap for Reddit's large-scale GenAI Platform.
Define the platform architecture and operating model that enable teams to build, deploy, and scale GenAI products reliably.
Drive the strategy for a unified LAG Gateway supporting internally and externally hosted LLMs through consistent APIs and abstractions.
Set the direction for core platform capabilities such as rate and token limit management, intelligent failover, and production resilience.
Shape Reddit's approach to an enterprise-grade RAG system.
Establish the strategic direction for agentic AI workflows and tool-use patterns across the platform.
Own the end-to-end platform strategy from concept through production adoption and long-term evolution.
Drive MLOps and LLMOps standards across CI/CD, testing, versioning, evaluation, and lifecycle management.
Define best practices for observability, monitoring, governance, and operational excellence across GenAI systems.
Partner across engineering, product, and leadership to align platform investments with company priorities and user needs.
Champion platform thinking with a strong focus on scalability, reliability, performance, and developer experience.
Influence technical direction across teams by turning emerging AI capabilities into a scalable platform strategy.