Description
We are seeking a Lead Systems Quality and Reliability Engineer to join our LPU team! You will own, build, and manage the RMA and FA debug and root-cause analysis for existing and new Nvidia AI/ML products.
Responsibilities:
- Conduct and lead debug and root-cause analysis of field RMAs. Collaborate with Systems Engineers, Hardware engineers, Software engineers, and operations engineers as required
- Scale root cause FA capabilities within your organization
- Create FA result reports that align with standard 8D or similar process
- Analyze RMA, FA and repair data. Identify trends and raise quality alerts when necessary. Drive resolution, containment, and mitigation plans for such quality alerts
- Oversee hardware quality performance, monitoring field quality data and associated metrics including RMA rates, MTBF, and Reliability Ratio
- Manage operational perf of FA at CMs, ensuring partner achieve key perf indicators including FA cycle times, fault duplication rates and fault isolation rates
- Oversee the setup of new products into Failure Analysis operations
Requirements:
- BS/MS in EE, Physics or a related degree (or equivalent experience)
- 8+ yrs of hands on systems test and/or validation engineering experience
- Proven hands-on management and leadership experience
- Competence using lab equipment such as oscilloscopes, logic analyzers, power analyzers etc.
- Experience with enabling reliability tests such as HTOL and quality tests such as Burn in
- Ideal candidate will have working knowledge of FA techniques and tools such as FIB, SEM, TDR, VNA and CSAM
- Strong knowledge of Fault isolation techniques such as OBIRCH, DLS/LADA, LVP and LVI
- Proficiency with high speed interfaces (SerDes, PCIe, DDR)
- Proficiency in Python, PERL, C++, or other languages on UNIX /Linux
- Excellent knowledge of PCB card and system level test and debug as well as be able to manage factory floor partners (CMs) for RMA/FA activities
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Systems-Quality-and-Reliability-Lead---LPU_JR2013680