# Systems Engineer, HPC

**Company**: Mistral AI
**Location**: Paris
**Work arrangement**: remote
**Experience**: mid
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology
**Wikidata**: https://www.wikidata.org/wiki/Q119718658

**Apply**: https://jobs.lever.co/mistral/c2cf8b02-cb79-4e13-8717-25817813542d
**Canonical**: https://yubhub.co/jobs/job_92f42e08-ee9

## Description

About Mistral

At Mistral AI, we build high-performance, open, and efficient AI systems designed to power the next generation of applications. Our infrastructure combines large-scale distributed systems, cloud platforms, and HPC environments to support cutting-edge research and production workloads.

We are a collaborative, low-ego, and highly technical team, operating across Europe, the US, and beyond. As we scale rapidly, we are building the foundational infrastructure to support thousands of nodes and petabyte-scale systems.

Join us to be part of a pioneering company shaping the future of AI. Together, we can make a meaningful impact.

## About the Role

We are looking for Systems Engineers / System Administrators to help design, operate, and scale the infrastructure behind Mistral’s AI platforms.

This is a hands-on, hybrid role combining:

Systems administration (operating and troubleshooting large-scale Linux environments)

Systems engineering (automation, scalability, and performance improvements)

You’ll work closely with infrastructure, HPC, and research teams to ensure our clusters and platforms run reliably at scale.

## What You’ll Work On

### Core Systems Operations

Operate and maintain large-scale Linux environments (bare metal, clusters, cloud)

Monitor system health, troubleshoot incidents, and ensure high availability

Support production and research workloads across multiple environments

### Scaling Infrastructure

Help scale clusters toward hundreds to thousands of nodes

Work on systems handling petabyte-scale storage

Improve performance, reliability, and resource utilisation

### Automation & Engineering

Automate operational tasks using tools like Python, Bash, Ansible, or Terraform

Improve deployment, provisioning, and system lifecycle management

Contribute to system design and architecture decisions

### Cross-Functional Collaboration

Work closely with:

HPC / infrastructure teams

Platform / DevOps engineers

Research teams

Act as a bridge between users and infrastructure

## What We’re Looking For

### Must-have

Strong Linux systems administration experience (core requirement)

Experience working in large-scale environments:

HPC clusters or cloud infrastructure

Experience with Job schedulers (e.g. Slurm)

Solid troubleshooting skills across systems, hardware, and networks

### Nice-to-have (any of these)

Containers / orchestration (e.g. Kubernetes)

Storage systems (e.g. Ceph, Lustre, NFS)

Networking fundamentals (Ethernet; InfiniBand is a plus)

Infrastructure as Code / automation tooling

GPU or AI/ML experience

## Profile We Value

Pragmatic problem solver who can operate in fast-scaling environments

Comfortable working across multiple domains (“Swiss army knife” mindset)

Able to go deep in one area while learning others

Low-ego, collaborative, and hands-on

## Skills

### Required
- Linux
- Python
- Bash
- Ansible
- Terraform
- Slurm
- Job scheduler

### Nice to have
- Kubernetes
- Ceph
- Lustre
- NFS
- InfiniBand
- Infrastructure as Code
- GPU
- AI/ML
