Description
At Synopsys, we drive the innovations that shape the way we live and connect. Our technology is central to the Era of Pervasive Intelligence, from self-driving cars to learning machines. We lead in chip design, verification, and IP integration, empowering the creation of high-performance silicon chips and software content.
You are a highly motivated Site Reliability, Staff Engineer with a passion for Linux platforms and a commitment to operational excellence. You thrive in dynamic, multi-faceted environments and are energized by the challenge of deploying, maintaining, and optimizing complex systems. Your curiosity drives you to continually learn and adapt, while your technical expertise enables you to solve intricate problems efficiently.
Administering and managing Linux operating systems, including kernel components, memory management, process scheduling, and system performance optimization. Performing routine and advanced system administration tasks such as monitoring, tuning, and troubleshooting across bare-metal and virtualized nodes. Deploying, configuring, and managing Linux-based operating systems using Kickstart and Ansible for automation and environment standardization. Implementing and managing MAAS (Metal as a Service) for large-scale bare-metal provisioning and lifecycle operations. Operating and maintaining OpenStack environments for On Demand Computing and cloud infrastructure. Providing support for virtualization technologies (VMware, KVM, etc.), including troubleshooting and maintenance. Delivering basic Linux networking support, resolving connectivity, routing, firewall, NIC bonding, VLAN, and interface configuration issues. Collaborating with cross-functional teams to enhance infrastructure reliability, scalability, and security. Creating and maintaining detailed documentation, including configurations, SOPs, troubleshooting guides, and operational runbooks.
Ensuring the reliability and uptime of critical Linux environments that underpin Synopsys' engineering and development operations. Enabling rapid deployment and scalability of infrastructure through automation and standardized processes. Reducing downtime and improving system performance by proactively identifying and resolving technical issues. Enhancing security and compliance across platforms through robust configuration and monitoring practices. Accelerating innovation by providing stable, high-performance environments for development and testing teams. Fostering a collaborative culture by sharing expertise, mentoring peers, and contributing to knowledge repositories.