# Senior Software Engineer, AIOps

**Company**: NVIDIA
**Location**: Raanana, Tel Aviv
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Israel-Raanana/Senior-Software-Engineer--AIOps_JR2019710?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_4a7c8b64-a26

## Description

NVIDIA is building a mission-critical Observability and Prediction platform to ensure the seamless operation of AI Factories. We're looking for a Senior Software Engineer to join the AIOps platform team and help build core distributed systems that ingest massive telemetry streams from GPU clusters and operationalize predictive AI models at scale.

**Responsibilities:**

- Architect and build an agentic AIOps system that autonomously monitors GPU fleet health, aggregates and correlates massive telemetry streams, surfaces intelligent alerts, and orchestrates multi-step diagnostic workflows and corrective actions.

- Research, evaluate, and prototype data storage strategies and data representations across diverse database technologies and modalities.

- Design distributed systems to handle the extreme telemetry density of large-scale AI clusters.

- Instrument services with deep observability to support rapid debugging and continuous performance improvement.

- Build and own the model-serving infrastructure that operationalizes predictive algorithms at scale.

- Contribute to the platform's core libraries and abstractions that accelerate development across the broader AIOps engineering team.

**Requirements:**

- B.Sc./M.Sc. in Computer Science, Computer Engineering, or a related technical field.

- 8+ years of software engineering experience building production distributed systems.

- Expert-level proficiency in languages such as Go, C++, or Rust.

- Solid understanding of Kubernetes and container-based deployments for production services.

- Experience deploying, monitoring, and maintaining ML models or data-intensive services in a production environment.

**Nice to Have:**

- Experience building ML model-serving platforms or MLOps tooling at scale.

- A track record of taking systems from prototype to stable, production-grade platform serving real enterprise customers.

- A "Systems" Thinker with practical innovation skills.

Competitive salaries and a generous benefits package are offered.

## Skills

### Required
- Go
- C++
- Rust
- Kubernetes
- container-based deployments
- ML models
- data-intensive services

### Nice to have
- ML model-serving platforms
- MLOps tooling
- systems thinking
- practical innovation

---

Source: [Apply at nvidia.wd5.myworkdayjobs.com](https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Israel-Raanana/Senior-Software-Engineer--AIOps_JR2019710?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
