# AI Inference Engineer

**Company**: Perplexity
**Location**: London
**Work arrangement**: onsite
**Experience**: mid
**Job type**: full-time
**Salary**: Final offer amounts are determined by multiple factors, including, experience and expertise.
**Category**: Engineering
**Industry**: Technology

**Apply**: https://jobs.ashbyhq.com/perplexity/e4777627-ff8f-4257-8612-3a016bb58592
**Canonical**: https://yubhub.co/jobs/job_4054dca1-a4f

## Description

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

## What you'll do

Develop APIs for AI inference that will be used by both internal and external customers.

- Develop APIs for AI inference that will be used by both internal and external customers

- Benchmark and address bottlenecks throughout our inference stack

- Improve the reliability and observability of our systems and respond to system outages

- Explore novel research and implement LLM inference optimizations

## What you need

- Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)

- Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)

- Understanding of GPU architectures or experience with GPU kernel programming using CUDA

## Why this matters

As an AI Inference engineer, you will play a critical role in the development and deployment of our machine learning models. Your work will have a direct impact on the performance and reliability of our systems, and will help us to continue to innovate and improve our products.

## Skills

### Required
- ML systems
- deep learning frameworks
- GPU architectures

### Nice to have
- LLM architectures
- inference optimization techniques