Description
We are looking for a strong technical architect to own end-to-end system software architecture for Space-1 and successor orbital platforms. You will architect the full stack , application to libraries, from data center stack to BMC and BIOS firmware, manageability, and telemetry through the host OS, GPU and CPU drivers, and CUDA , to deliver a production-ready inference platform that operates reliably in the radiation, thermal-cycling, and remote-operations environment of LEO.
The ideal candidate will have 15+ years of relevant experience in server/platform system software , spanning compute libraries, BMC firmware, BIOS, host OS, drivers, and manageability. They will have a strong knowledge of server architecture, data center manageability, and full-stack integration of firmware with OS and accelerator software. They will also have hands-on experience with data center health management workflows, telemetry, and fault management at scale.
The successful candidate will be able to drive Redfish, MCTP, PLDM, and constellation-level management protocols across BMC, BIOS, and host software so customers can operate orbital fleets with the same tools they use on the ground. They will also be able to define the manageability architecture for an unreachable, autonomous data center: remote bring-up, in-orbit firmware update, dual-module redundancy, fault containment, recovery from SEU/SEFI events, and telemetry for fleets ranging from tens to millions of nodes.
In addition, the ideal candidate will have excellent written and oral communication skills, good work ethics, high sense of teamwork, love to produce quality work, and commitment to finish their tasks every single day. They will be a self-starter who loves to find creative solutions to complicated problems and hands-on with coding.
Experience architecting platform software for space, aerospace, defense, or other radiation, thermal, and vibration-constrained environments , including SEU/SEFI mitigation, ECC strategy, TID/SEE qualification, and rad-hard design partitioning , is highly desirable. Hands-on experience with autonomous, remote, or unreachable data center operations , in-orbit or in-field firmware update, dual-module redundancy, and recovery without physical access , is also highly desirable.