# Senior Software Engineer II, Inference

**Company**: CoreWeave
**Location**: Sunnyvale, CA / Bellevue, WA
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Salary**: $165,000 to $242,000
**Category**: Engineering
**Industry**: Technology

**Apply**: https://job-boards.greenhouse.io/coreweave/jobs/4604832006
**Canonical**: https://yubhub.co/jobs/job_ec7cc743-ef4

## Description

We're seeking a senior software engineer to join our team and lead the design and development of our Kubernetes-native inference platform. As a senior engineer, you will be responsible for leading design reviews, driving architecture, and ensuring the reliability and scalability of our platform.

Key responsibilities include:

- Leading design reviews and driving architecture within the team

- Defining and owning SLIs/SLOs and ensuring post-incident actions land and reliability improves release-over-release

- Implementing advanced optimizations such as micro-batch schedulers, speculative decoding, and KV-cache reuse

- Strengthening incident posture through capacity planning, autoscaling policy, and rollback/traffic-shift strategies

- Mentoring IC1/IC2 engineers and reviewing cross-team designs to elevate coding/testing standards

We're looking for someone with strong coding skills in Python or Go, deep familiarity with networked systems and performance, and hands-on experience with Kubernetes at production scale. If you have experience with inference internals, batching, caching, mixed precision, and streaming token delivery, that's a plus.

In addition to a competitive salary, we offer a range of benefits including medical, dental, and vision insurance, company-paid life insurance, and flexible PTO. We're committed to creating a work environment that's inclusive, diverse, and supportive of our employees' well-being.

## Skills

### Required
- Python
- Go
- Kubernetes
- Networked systems
- Performance
- Inference internals
- Batching
- Caching
- Mixed precision
- Streaming token delivery

### Nice to have
- CUDA kernels
- NCCL/SHARP
- RDMA/NUMA
- GPU interconnect topologies
- Contributions to inference frameworks
- Experience with multi-team initiatives
