Description
We are seeking a Senior Software Engineer to join our team in developing and executing software-driven characterization workflows on NVIDIA rack-scale systems. As part of our engineering organization, you will play a key hands-on role in running AI workloads across the full stack to analyze, characterize, and optimize power, performance, and drive behavior at system level.
Your primary responsibilities will include:
- Developing and running software tools, automation, and workloads to characterize power, performance, and drive behavior across NVIDIA rack-scale systems.
- Executing AI and system-level workloads to stress and evaluate behavior across the stack, including GPUs, CPUs, networking, storage, firmware, drivers, and system software.
- Building automated frameworks for data collection, telemetry, validation, correlation, and analysis of characterization results.
- Investigating system behavior under different workloads and operating conditions to identify bottlenecks, anomalies, and optimization opportunities.
- Working closely with hardware, firmware, driver, system software, performance, and validation teams to define characterization methodologies and debug cross-stack issues.
- Supporting bring-up, validation, and readiness activities for new rack-scale platforms and AI infrastructure.
- Creating clear documentation, test flows, and repeatable processes to improve coverage, efficiency, and reproducibility.
To be successful in this role, you will need:
- A B.Sc. or M.Sc. in Computer Science, Electrical Engineering, or a related field.
- 5+ years of software engineering experience, preferably in system software, infrastructure, validation, or performance-focused environments.
- Strong programming skills in Python and at least one system-level language such as C/C++.
- Experience developing automation and test infrastructure for complex hardware/software systems.
- Hands-on experience running, debugging, or optimizing AI, HPC, or large-scale system workloads.
- Good understanding of system-level architecture, including interactions across hardware, firmware, drivers, operating systems, and application layers.
- Experience working in Linux environments and with scripting, telemetry, logging, and data analysis tools.
- Strong debugging and problem-solving skills, with the ability to work across multiple engineering disciplines.
- Good communication skills and the ability to drive technical work in a fast-paced, cross-functional environment.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Israel-Yokneam/Senior-Software-Engineer--Data-Center-Workloads---Infrastructure_JR2017132