Description
Job Title: Software Engineer, ML Systems & Training Architecture
About the Role
As a Senior Software Engineer, ML Systems & Training Infrastructure, you will be a deeply hands-on engineering force multiplier for the robotics team. You will help keep the training framework and surrounding infrastructure healthy, review and improve code quickly, debug failures across ML systems and infrastructure, and unblock researchers and engineers when the path from idea to working training job gets rough.
We’re looking for people who love writing, reading, reviewing, and fixing code; who can get productive quickly in unfamiliar systems; and who bring strong practical judgment without a lot of ego or process overhead.
Responsibilities
- Review, improve, and clean up code across training frameworks and adjacent infrastructure.
- Identify risky or low-quality changes before they land, and raise the code quality bar without slowing the team down.
- Debug issues across ML training systems, GPUs, clusters, networking, and related infrastructure.
- Help researchers and engineers unblock broken training jobs, flaky workflows, and brittle internal tooling.
- Improve the reliability, maintainability, and usability of the robotics team’s training framework.
- Move quickly on practical engineering problems that directly affect team velocity.
Benefits
- Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
- Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
- 401(k) retirement plan with employer match
- Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
- Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
- 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
- Mental health and wellness support
- Employer-paid basic life and disability coverage
- Annual learning and development stipend to fuel your professional growth
- Daily meals in our offices, and meal delivery credits as eligible
- Relocation support for eligible employees
- Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.
Salary Min: 295000
Salary Max: 380000
Salary Currency: USD
Salary Period: year
Required Skills: ML systems, training frameworks, GPUs, distributed systems, infrastructure, code review, debugging, practical engineering
Preferred Skills: None