Description
At Mistral AI, we're looking for an experienced Site Reliability Engineer to join our Applied AI team. As a key member of our team, you will be responsible for building and operating the framework to ensure our solution delivery is reliable and sustainable across all our accounts.
Your mission will be to design, build, and operate the infrastructure to support our AI solutions, ensuring they are scalable, secure, and aligned with customer needs. You will work closely with our development team to identify and resolve issues, and collaborate with our technical support team to provide excellent customer service.
In this role, you will operate in four concurrent modes:
- BUILD: Design for a fleet of Mistral platforms and apps. Build proactivity to reduce reactivity. Productize reliability, author runbooks, create SLO templates, implement observability.
- RUN: Operate the Tier-1 customer environments that Mistral are contracted to operate. Ensure SLO compliance, own on-call and incident response, manage drift, partner with Technical Support as L3 escalation, champion high signal post-mortems.
- ENABLE: Productize how Mistral deploy, secure, and scale our Applied AI solutions. Engineer on-demand provisioning, author security baseline packages, embed security guardrails, automate everything.
- SECURE: Own the security operations layer for our customer-side deployments. Lead CVE response across the fleet, ship supply-chain integrity controls (SBOM, signed images, provenance), co-page with InfoSec on security incidents, enforce secure-config baselines.
This is a framework-first, fleet management role at heart. If you're excited by the difference between solving one customer's problem and structurally solving the class of problem for every customer, this is the role.