Scale

Staff Infrastructure Software Engineer, Enterprise AI

Scale
onsite senior full-time $216,200-$310,500 USD New York, NY; San Francisco, CA
Apply →

First indexed 18 Apr 2026

Description

We are looking for a Staff Infrastructure Software Engineer to act as a primary technical lead, engineering the 'paved road' for our knowledge retrieval and inference engines. You will define the deployment standards for Agentic workflows at scale, bridging the gap between complex AI orchestration and world-class infrastructure.

The ideal candidate thrives in a fast-paced environment, has a passion for both deep technical work and mentoring, and is capable of setting a long-term technical strategy for a critical domain while maintaining a strong, hands-on delivery focus.

You will architect and implement solutions across multiple cloud providers (GCP, Azure, AWS) for customers in diverse, highly-regulated industries like healthcare, telecom, finance, and retail.

Key responsibilities include:

  • Architecting multi-cloud systems and abstractions to allow the SGP platform to run on top of existing Cloud providers.
  • Using our own data and AI platform to analyse build and test logs and metrics to identify areas for improvement.
  • Defining the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers.
  • Enhancing engineering and infrastructure efficiency, reliability, accuracy, and response times, including CI/CD processes, test frameworks, data quality assurance, end-to-end reconciliation, and anomaly detection.
  • Collaborating with platform and product teams to develop and implement innovative infrastructure that scales to meet evolving needs.
  • Designing and championing highly scalable, reliable, and low-latency infrastructure and frameworks for building, orchestrating, and evaluating multi-agent systems at enterprise scale.
  • Leading the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies.
  • Owning the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response.
  • Driving developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization to improve workflows and operational efficiency.

The ideal candidate has proven experience in a senior role, with 5+ years of full-time software engineering experience, and a deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana).

Extensive experience with at least one major cloud provider (AWS, Azure, or GCP) and strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups is required.

Proficiency in Python or JavaScript/TypeScript, and SQL is also necessary.

Bonus points for hands-on experience and a passion for working with Agents, LLMs, vector databases, and other emerging AI technologies.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/scaleai/jobs/4599700005