Description
Summary
Microsoft AI are looking for a talented Member of Technical Staff, Compute Orchestration & Scheduling to help build the next wave of capabilities of our personalized AI assistant, Copilot. We’re looking for someone who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective.
About the Role
We are looking for a highly skilled and experienced technical professional to join our team as a Member of Technical Staff, Compute Orchestration & Scheduling. The successful candidate will be responsible for designing and building our compute orchestration and scheduling layer on top of Kubernetes and Ray, working on everything from workload placement and scaling to reliability and developer experience. You’ll work closely with research and framework teams to turn their requirements into scalable abstractions, improve cluster efficiency, and ensure our compute platform is observable, and easy to operate in production.
Accountabilities
- Develop and tune the pretraining scalable software for Nvidia GB200 72NVL CX8 and AMD MIxxx architectures
- Benchmark GB200 and AMD MIxxx GPU clusters
- Gather data and insights to develop the pretraining compute roadmap
The Candidate we're looking for
Experience:
- Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Technical skills:
- Proficiency in C, C++, C#, Java, JavaScript, or Python
- Experience with Kubernetes and Ray
Personal attributes:
- Strong problem-solving skills
- Excellent communication and collaboration skills
Benefits
- Competitive salary
- Comprehensive benefits package
- Opportunities for professional growth and development
- Collaborative and dynamic work environment