Description
We are looking for a Data Engineer to join our Data Platform team to partner with our product and business stakeholders across risk, operations, and other domains. As a Data Engineer, you will be responsible for building robust data pipelines and engineering foundations by ingesting data from disparate sources, ensuring data quality and consistency, and enabling better business decisions through reliable data infrastructure across core product areas.
Your primary focus will be on building scalable data pipelines using Airflow to orchestrate data workflows that ingest, transform, and deliver data from various sources into Snowflake and Databricks. You will also design and implement data models in Snowflake that support analytics, reporting, and ML use cases with a focus on performance, reliability, and scalability.
In addition, you will develop infrastructure as code using Terraform to automate and manage cloud resources in AWS, ensuring consistent and reproducible deployments. You will monitor data pipeline health and implement data quality checks to ensure accuracy, completeness, and timeliness of data as business needs evolve.
You will also optimize data processing workflows to improve performance, reduce costs, and handle growing data volumes efficiently. Troubleshooting and resolving data pipeline issues, working through ambiguity to get to the root cause and implementing long-term fixes will be a key part of your role.
As a Data Engineer, you will bridge gaps between data and the business by working with cross-functional teams across the US and India office to understand requirements and translate them into robust technical solutions. You will create comprehensive documentation on data pipelines, data models, and infrastructure, keeping documentation up to date and facilitating knowledge transfer across the team.
Requirements:
- 2+ years of data engineering experience with strong technical skills and the ability to architect scalable data solutions.
- Hands-on experience with Python for data processing, automation, and building data pipelines.
- Proficiency with workflow orchestration tools, preferably Airflow, including DAG development, task dependencies, and monitoring.
- Strong SQL skills and experience with cloud data warehouses like Snowflake, including performance optimization and data modeling.
- Experience with cloud platforms, preferably AWS (S3, Lambda, EC2, IAM, etc.), and understanding of cloud-based data architectures.
- Experience working cross-functionally with data analysts, analytics engineers, data scientists, and business stakeholders to understand requirements and deliver solutions.
- An ownership mentality – this engineer will be responsible for the reliability and performance of their data pipelines and expected to fully understand data flows, dependencies, and their implications on downstream users.
Nice to have:
- Experience with dbt for transformation logic and analytics engineering workflows integrated with data pipelines.
- Familiarity with Databricks for large-scale data processing, including Spark optimization and Delta Lake.
- Experience with Infrastructure as Code (IaC) tools like Terraform for managing cloud resources and data infrastructure.
- Knowledge of data modeling concepts (e.g., dimensional modeling, star/snowflake schemas, slowly changing dimensions).
- Experience with CI/CD practices for data pipelines and automated testing frameworks.
- Experience with streaming data and real-time processing frameworks