Description
At Databricks, we are committed to enabling data teams to solve the world's toughest problems. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of our AI Runtime (AIR) product.
You will be responsible for shaping customer-facing capabilities while designing for scalability, extensibility, and performance of GPU training and adjacent areas. This will involve collaborating closely across the platform, product, infrastructure, and research organisations.
Key responsibilities include:
- Leading, mentoring, and growing a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure
- Defining and owning the product and technical roadmap for AIR, balancing customer experience, functionality, and foundational investments
- Collaborating closely with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery
- Driving architectural decisions and product design for managed GPU training at scale
- Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact
We are looking for someone with 8+ years of software engineering experience, with 3+ years in engineering management. You should have a track record of building and operating managed GPU training infrastructure at scale, as well as deep familiarity with distributed training frameworks and parallelism strategies.
In addition, you should have experience with training resilience patterns, such as checkpointing, elastic training, and automated failure recovery for long-running jobs. You should also have a strong understanding of GPU performance fundamentals, including NCCL, interconnect topologies, and memory optimisation.
Experience building platform products with clear SLAs is also essential, as is strong cross-functional leadership across platform, product, and research teams. Excellent collaboration and communication skills are also required.
The pay range for this role is $228,600-$314,250 USD per year, depending on location. The total compensation package may also include eligibility for annual performance bonus, equity, and benefits.