Description
Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology.
As a Senior Site Reliability Engineer on the Maritime Digital Shipbuilding team, you will build and operate the infrastructure that keeps our digital production systems running at full speed.
Responsibilities
- Build and Manage CI/CD Pipelines: Develop and maintain CI/CD pipelines using tools like GitHub Actions and Jfrog Artifactory to ensure seamless integration and deployment of machine learning models and applications.
- Infrastructure as Code (IaC): Utilize Terraform and Ansible to automate infrastructure provisioning and management on cloud platforms such as Azure, AWS, or Google Cloud Platform (GCP).
- Containerization and Orchestration: Implement containerization solutions with Docker and manage container orchestration using Kubernetes to ensure reliable deployment and scaling of applications.
- Model Management and Deployment: Set up and maintain model registries and feature stores (e.g., MLflow, Kubeflow), and manage deployment pipelines for both batch and real-time inference.
- Monitoring and Logging: Establish comprehensive monitoring and logging solutions using tools like ELK Stack (Elasticsearch, Logstash, Kibana), Prometheus, and Grafana to ensure the smooth operation of deployment environments.
- Collaborate with Cross-Functional Teams: Work closely with development, data science, and operations teams to foster collaboration and ensure the efficient and effective deployment of machine learning models.
- Optimize Performance: Utilize parallel computing frameworks such as CUDA and OpenCL to accelerate high-performance computing tasks, ensuring timely processing of large datasets and complex simulations.
Requirements
- Advanced proficiency in programming languages (Python for scripting and integration).
- Experience with CI/CD tools like GitHub Actions, Jfrog Artifactory, and Git.
- Proficiency with IaC tools (Terraform, Ansible).
- Experience with cloud platforms (Azure, AWS, GCP).
- Proficiency in containerization (Docker) and container orchestration (Kubernetes).
- Knowledge of model registries and feature stores (e.g., MLflow, Kubeflow).
- Experience with logging and monitoring tools (ELK Stack, Prometheus, Grafana).
- Understanding of parallel computing frameworks (CUDA, OpenCL).
- Strong collaboration skills and proficiency with collaborative tools (JIRA, Confluence).
- Eligible to obtain and maintain an active U.S. Secret security clearance.
Preferred Qualifications
- Previous experience in a manufacturing or industrial setting.
- Familiarity with observability concepts and tools.
- Knowledge of security best practices for DevOps and MLOps.
Benefits
- US Salary Range: $166,000-$220,000 USD
- Comprehensive medical, dental, and vision plans
- Income protection: life and disability insurance
- Generous time off: highly competitive PTO plans
- Family planning and parenting support
- Mental health resources
- Professional development
- Commuter benefits
- Relocation assistance
- Retirement savings plan
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting:
https://job-boards.greenhouse.io/andurilindustries/jobs/4995589007