Description
We're looking for an exceptional Principal Software Engineer to serve as the de facto Technical Lead for our Web Data Acquisition (WDA) team. This is a highly visible, hands-on technical leadership role where you'll own the architectural direction for crawling systems, evolve and unify crawling platforms into a best-in-class stack, and elevate a high-performing engineering team.
As a Principal Software Engineer, you'll solve complex distributed systems challenges, build modular tooling that accelerates delivery, and set the standard for observability and operational excellence. You'll have a dedicated manager handling all HR and administrative responsibilities. A product manager connects business needs with technical work. Your focus is 100% technical leadership, mentorship, and hands-on execution.
Key Responsibilities:
- Technical Leadership & System Design: Proven experience building web crawling or large-scale data systems from scratch. Strong architectural skills designing scalable, fault-tolerant distributed systems. Track record leading complex technical initiatives and driving architecture direction for teams.
- Data Engineering Expertise: Deep background in large-scale data engineering (terabytes daily). Hands-on experience with cloud data warehouses (BigQuery, Snowflake). Experience with Apache Kafka, Kubernetes (GKE/EKS), and orchestration tools (Airflow).
- Web Crawling & Data Extraction: Deep expertise in web crawling technologies and advanced scraping (Scrapy or similar). Experience extracting structured/unstructured web data and SERP extraction. Knowledge of proxy infrastructure management, anti-bot detection, and ethical crawling.
- Leadership & Team Development: Experience mentoring engineers at all levels and fostering collaborative culture. Strong ability to influence technical direction and establish best practices. Track record hiring, coaching, and developing senior engineers.
Ideal Candidate Profile:
- 10+ years software engineering experience. 5+ years focused on data engineering. 3+ years in senior/principal-level technical leadership.
- Strong CS fundamentals (algorithms, data structures, distributed systems). Self-starter who thrives in fast-paced environments.
Core Technical Stack:
- Python & Java
- Apache Kafka
- GCP (BigQuery, GKE, Vertex AI)
- Snowflake & Starburst/Trino
- Terraform
- Scrapy / Web Scraping Frameworks
- Proxy Management Systems
- Distributed Systems & Kubernetes
- Apache Airflow
- Large-Scale ETL Pipelines