Description
We are seeking an outstanding individual to join our platform SWQA team as a Senior Software Development Engineer in Test. You will be responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plans on servers, OS, FW, and CUDA SW stack from design documents. You will install and test various systems, OS, server firmware, and SW stack. You will drive support for root cause analysis on reliability and validation test failures to identify root causes and achieve mitigation. You will build, develop, and debug server and OS level automation front-end and back-end frameworks and tests. You will review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed. You will work in an agile software development team with high production quality standards. You will manage bug lifecycles and collaborate with inter-groups to drive solutions.
The ideal candidate will have a Bachelor's Degree in a STEM field and 5+ years of proven experience in OS and server level automation, CI/CD process, and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript. They will have strong server and Linux troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment. They will have good knowledge and hands-on experience in model testing, AI tools/frameworks, NLP, and LLM benchmarking. They will have experience in using AI development tools for test plans creation, test cases development, and test cases automation. They will have strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish. They will have proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker.
To stand out from the crowd, the ideal candidate will have experience working with NVIDIA GPU hardware, solid understanding of virtualization in Linux, background in parallel programming ideally CUDA/OpenCL, and experience with AI related tools, LLM, and NLP.