Description
We are seeking a Sr. Director, Infrastructure, SRE, & Security to lead our Infrastructure, SRE & Security team in building the cloud and data infrastructure that enables our AI platform.
As a key member of our Engineering department, you will be responsible for owning the architecture, reliability, and cost efficiency of our cloud infrastructure, driving full IaC coverage and leading Kubernetes operations at scale.
You will also own data infrastructure operations, cost governance, and security hardening, partnering with Data Product Engineering on modernizing data delivery infrastructure.
In addition, you will lead security posture management across cloud, application, and identity layers, defining and instrumenting cost-per-unit metrics, implementing per-team budgets with automated alerting, and giving leadership direct visibility into infrastructure efficiency.
You will operate internal developer platforms with self-service onboarding, CI/CD, and observability infrastructure that improves engineering velocity.
You will also own incident response, on-call rotations, and post-mortem processes, driving reduction in preventable operational incidents and maintaining high availability SLAs.
Lastly, you will lead, recruit, and grow a globally distributed team of cloud, data, and security engineers, fostering a culture of ownership and technical excellence.
What you bring to Komodo Health:
- 8+ years in infrastructure, SRE, or platform engineering;
- 3+ years leading teams in an AI/ML-intensive environment;
- Hands-on experience with AI workload infrastructure , LLM serving, agent orchestration, GPU compute, or ML pipelines , and the reliability and cost challenges they introduce;
- Deep AWS and production Kubernetes expertise (EKS, autoscaling, multi-cluster management) and strong IaC discipline (Terraform or equivalent);
- Demonstrated track record of driving significant cloud cost reduction through systematic FinOps , team-level budgets, cost-per-unit metrics, and leadership-facing dashboards;
- Practical security and compliance experience , cloud posture management, vulnerability lifecycle, IAM, and SOC 2 or equivalent frameworks; comfort in regulated environments;
- Strong executive communication skills , able to translate infrastructure strategy into business outcomes for CTO, Finance, Legal, and Product stakeholders;
- Active user of AI tools in your own workflow; track record of driving AI-assisted automation adoption within your teams
AI Use Expectations:
- Use AI coding tools (Copilot, Cursor, Claude Code, or equivalent) to accelerate IaC authoring, runbook generation, and infrastructure automation;
- Leverage AI-assisted observability and incident triage to reduce MTTR and surface patterns across system telemetry;
- Evaluate and adopt AI-native DevOps tooling; set the standard for responsible AI use across the team
Additional skills and experience we’d prioritize (nice to have):
- Snowflake administration and data infrastructure experience at scale;
- Multi-cloud environment experience (AWS + GCP);
- Healthcare, life sciences, or regulated industry background;
- Experience with security automation or agentic security workflows;
- Familiarity with data pipeline technologies (Spark, Airflow, Temporal);
- Experience supporting multi-tenant SaaS infrastructure
The pay range for this role is $225,000-$270,000 USD per year, depending on location. This position may be eligible for performance-based bonuses and equity awards.