New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
BlackRock

Director, Site Reliability Engineer | Senior Engineering Team Director

BlackRock
hybrid senior full-time England
Apply →

First indexed 24 Apr 2026

Description

We're seeking a Site Reliability Engineering (SRE) Lead to design, build, and maintain resilient, high-scale systems supporting BlackRock's Private Markets platform. In this hands-on leadership role, you'll apply deep engineering expertise to solve complex challenges, guide a global team, shape technical direction, and communicate effectively with senior stakeholders,ensuring the reliability of mission-critical systems that power private market investment workflows and decision-making. You will drive the adoption of AI-driven solutions to accelerate incident detection and triage, reduce toil, improve forecasting and capacity planning, and strengthen end-to-end observability and resilience.

Key Responsibilities:

  • Take ownership of project priorities, deadlines and deliverables using Agile methodologies, with clear outcomes around reliability automation and AI-enabled operations
  • Understand and refine business and functional requirements, translating them into SLOs/SLIs and AI-assisted observability and support capabilities
  • Hands on approach to getting work done,this role requires a “roll your sleeves up” mentality, including building and operationalizing reliability tooling and automation that measurably reduces toil and improves stability
  • Be a leader with vision and a partner in brainstorming solutions for team productivity and efficiency to improve engineering effectiveness
  • Drive priority setting of the engineering teams, balancing foundational reliability work with delivery of new product features
  • Improve Engineering culture by encouraging continuous focus on reliability across the entire application lifecycle, and by adopting AI-enabled SRE practices (e.g., intelligent alerting, automated diagnosis, and self-healing where appropriate)
  • Proactive participant in architectural and design decisions, including AI-ready telemetry, data quality, and model integration patterns for operational analytics
  • Design and implement end-to-end monitoring solutions for application and infrastructure components, leveraging modern observability platforms plus AI/ML techniques for anomaly detection, correlation, and alert noise reduction
  • Drive the engineering of capacity management and demand forecasting solutions, including predictive analytics/ML approaches where they add measurable value
  • Act as a culture carrier and leader, passing on SRE knowledge and best practices to the engineering team
  • Drive detailed root cause investigations for production incidents with rigorous focus on issue avoidance, using AI-assisted correlation/analysis to accelerate time-to-insight
  • Create/coordinate retros for significant incidents, ensuring learnings are captured in automated/AI-assisted runbooks and embedded into prevention mechanisms
  • Additional core engineering functions, such as adding custom telemetry metrics/logs/traces to the code base of in-scope applications to enable AI/ML-driven operational insights
  • Anticipate new opportunities to continuously evolve the resiliency profile of scoped applications and infrastructure

Requirements:

  • B.S. / M.S. degree in Computer Science, Engineering or a related discipline with 10+ years of experience
  • Experience leading high performing engineering/SRE teams, with a track record of driving continuous improvement through automation and AI-enabled operations
  • Demonstrated ability to represent engineering/SRE priorities, status, and risk to senior leadership stakeholders with clear, executive-ready communication
  • Hands-on experience building or operating AI-assisted capabilities (AIOps, ML-based anomaly detection, or GenAI workflows) in an engineering/production environment
  • A passion for providing engineering support for highly available, performant full stack applications with a “Student of Technology” attitude
  • Experience with relational database and NoSQL Database (e.g. Redis, Apache Cassandra)

Benefits:

  • Retirement investment and tools designed to help you in building a sound financial future
  • Access to education reimbursement
  • Comprehensive resources to support your physical health and emotional well-being
  • Family support programs
  • Flexible Time Off (FTO) so you can relax, recharge and be there for the people you care about

Hybrid Work Model:

  • BlackRock’s hybrid work model is designed to enable a culture of collaboration and apprenticeship that enriches the experience of our employees, while supporting flexibility for all
  • Employees are currently required to work at least 4 days in the office per week, with the flexibility to work from home 1 day a week
  • Some business groups may require more time in the office due to their roles and responsibilities
  • We remain focused on increasing the impactful moments that arise when we work together in person – aligned with our commitment to performance and innovation

About BlackRock:

  • At BlackRock, we are all connected by one mission: to help more and more people experience financial well-being
  • Our clients, and the people they serve, are saving for retirement, paying for their children’s educations, buying homes and starting businesses
  • Their investments also help to strengthen the global economy: support businesses small and large; finance infrastructure projects that connect and power cities; and facilitate innovations that drive progress