Description
Engineer the Future with Us
We're seeking a passionate and experienced Network Site Reliability Engineer to join our global Network Engineering organization. As a Staff Engineer, you will champion automation initiatives, own and evolve the observability strategy, and collaborate with cross-functional stakeholders to deliver impactful enhancements to network reliability and scalability.
Key Responsibilities:
- Champion automation initiatives that significantly reduce operational toil, enhance reliability, and boost efficiency at scale.
- Own and evolve the observability strategy,driving improvements in monitoring, alerting, logging, and telemetry across multiple teams.
- Identify and implement operational improvements, partnering with teams to ensure scalable, sustainable excellence.
- Design and build automated operations and maintenance platforms to minimize manual intervention and maximize system performance and resiliency.
- Apply deep technical judgment to proactively prevent incidents and lead complex production investigations and root-cause analyses.
- Measure and optimize system performance, anticipating customer needs and innovating to continuously improve network capabilities.
- Collaborate with cross-functional stakeholders to deliver impactful enhancements to network reliability and scalability.
- Leverage AI tools and technologies for process automation and workflow optimization.
- Experience with integrating AI agents with network infrastructure, utilizing MCP to bridge LLMs with real-world network devices.
- Create and maintain Model Context Protocol (MCP) servers to expose network APIs, inventory systems, and state data to LLMs, enabling agents to take actions in real-time.
The Impact You Will Have:
- Drive Synopsys' network reliability to new heights, ensuring seamless connectivity for global operations and customers.
- Enable rapid incident response and recovery, minimizing downtime and safeguarding mission-critical services.
- Advance automation and infrastructure-as-code practices, setting new standards for efficiency and operational excellence.
- Elevate the observability and performance monitoring capabilities across the network stack, empowering data-driven decision-making.
- Foster cross-team collaboration, sharing best practices and mentoring peers to build a culture of reliability and innovation.
- Contribute to the continuous evolution of Synopsys' network architecture, supporting future growth and technological advancement.
- Shape the adoption of AI-assisted workflows and advanced analytics for proactive network management.
Requirements:
- Bachelor's degree in Computer Science, Electrical Engineering, or related technical field,or equivalent practical experience.
- 5+ years of industry experience in Network Site Reliability Engineering, network automation, or network operations, including hands-on work with campus and data center networks.
- Expertise in managing and troubleshooting large-scale network deployments.
- Strong knowledge of network protocols and fundamentals: TCP/UDP, IPv4/IPv6, Wireless, BGP, VPN, Layer 2 switching, firewalls, load balancers, segment routing, etc.
- Proficiency in Python (preferred) and Ansible for programming and automation; experience in DevOps culture, CI/CD pipelines, infrastructure-as-code, and automated testing/deployment workflows.
- Experience with cloud networking in AWS, Azure, and/or GCP.
- Hands-on experience with diverse networking hardware and software platforms in multi-vendor environments.
- Familiarity with network management tools: Prometheus, Grafana, Opsgenie, Rootly, Solarwinds.
- Experience implementing monitoring and observability solutions: Elasticsearch, Logstash, Kibana, Kafka, Grafana, Prometheus, and others.
- Experience with ServiceNow, Jira, Linux system fundamentals, Git, Flux, Kubernetes.
- Demonstrated ability to leverage AI tools for process automation and workflow optimization.
- Strong understanding of performance metrics, telemetry, data pipelines, management, and analytics.
Who You Are:
- Analytical thinker with a structured approach to problem-solving and troubleshooting.
- Proactive, self-motivated, and committed to continuous improvement.
- Collaborative team player with strong communication skills and the ability to build relationships across functions.
- Adaptable and resilient in fast-paced, changing environments.
- Detail-oriented, with a passion for operational excellence and reliability.
- Inclusive, open-minded, and eager to learn from diverse perspectives.
- Mentor and leader, willing to share knowledge and support the growth of others.
The Team You’ll Be A Part Of:
You’ll join Synopsys’ global Network Engineering organization,a high-impact, collaborative team dedicated to designing, building, and operating a secure, scalable, and highly available network infrastructure. Our team applies Site Reliability Engineering principles to networking, focusing on automation, observability, and operational excellence. We embrace innovation and continuous improvement, working closely with cross-functional partners to deliver reliable and future-ready network solutions.
Rewards and Benefits:
We offer a comprehensive range of health, wellness, and financial benefits to cater to your needs. Our total rewards include both monetary and non-monetary offerings. Your recruiter will provide more details about the salary range and benefits during the hiring process.