# Network Engineer - AI/HPC

**Company**: xAI
**Location**: Memphis, TN
**Work arrangement**: onsite
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology
**Wikidata**: https://www.wikidata.org/wiki/Q120599684

**Apply**: https://job-boards.greenhouse.io/xai/jobs/4946691007
**Canonical**: https://yubhub.co/jobs/job_99450ad6-e3b

## Description

## About the Role

We are seeking a skilled Network Engineer to join our team at xAI. As a Network Engineer, you will play a critical role in designing and operating large-scale networks for our AI and HPC systems.

## Responsibilities

- Design and operate large-scale networks with a deep understanding of congestion control on ethernet and Infiniband

- Develop and optimize network configurations to ensure high performance and availability

- Collaborate with the team to design the next iteration of our backend and front-end networks

- Travel to Memphis to build capacity and participate in a team on-call rotation

## Requirements

- Minimum of 10 years designing and operating large-scale networks with 5 years in the ethernet AI/HPC space

- Deep understanding of congestion control on ethernet with Infiniband an added bonus

- Expertise in creating a portfolio of metrics for performance and operations to optimize the fleet for training and inference traffic

- Experience with Python to automate away repetitive tasks and facilitate daily job working with and analyzing large sets of data

## Benefits

- Opportunity to work with a highly motivated team focused on engineering excellence

- Collaborative and dynamic work environment

- Professional development opportunities

## What We Offer

- Competitive salary and benefits package

- Opportunity to work on cutting-edge AI and HPC projects

- Collaborative and dynamic work environment

## How to Apply

If you are a motivated and experienced Network Engineer looking for a new challenge, please submit your application, including your resume and cover letter, to [insert contact information].

## Skills

### Required
- RoCEv2
- NCCL
- Python
- Ethernet
- Infiniband
- AI training and inference workloads
