Description

We are looking for a motivated and detail-oriented Data Engineer with 2 to 5 years of experience in designing, building, and managing scalable data solutions on Google Cloud Platform (GCP).

The ideal candidate will have a strong background in data engineering, cloud-based architectures, and proficiency in implementing data pipelines to transform raw data into actionable insights.

Experience building or supporting AI and GenAI data workflows, including pipelines for LLM applications and AI/ML model training, is a strong plus.

As a Data Engineer at Brainlabs, you will be responsible for designing, developing, and maintaining ETL/ELT pipelines using GCP tools like CloudFunctions, CloudRun, Dataflow, Dataproc, or Cloud Data Fusion.

You will also build and manage data pipelines that support LLM and GenAI applications, including Retrieval-Augmented Generation (RAG) architectures, vector data stores, and prompt context assembly workflows.

Additionally, you will integrate data from various sources into GCP services such as BigQuery, Cloud Storage, and Cloud SQL, and design and implement data warehouse/mart solutions using BigQuery for analytics and reporting.

Your responsibilities will also include building transformation logic using SQL, Python, or Spark for preparing clean and structured data, optimizing query performance and storage cost in BigQuery or other GCP storage systems, and developing processes to ensure data quality, integrity, and consistency across the pipeline.

You will work closely with cross-functional teams, including data analysts, data scientists, and business stakeholders, to understand requirements and provide technical guidance on GCP best practices and tools.

You will maintain clear documentation of processes, workflows, and data architecture, and ensure regular maintenance and version control of pipelines and scripts.

To be successful in this role, you will have hands-on experience with GCP services like CloudFunctions, CloudRun, Schedular, BigQuery, Dataflow, Pub/Sub, and Cloud Storage, and strong programming skills in Python, SQL.

You will also have knowledge of data modelling, schema design, and query optimization techniques, and experience in building batch and streaming data pipelines.

Excellent communication and collaboration skills are essential for this role, as well as the ability to work in a fast-paced and dynamic environment.

Familiarity with orchestration tools like Apache Airflow, Cloud Composer, or similar, and working experience on other cloud stacks for ETL (AWS or Azure) are a plus.

Experience with GCP's AI/ML platform (Vertex AI, BigQuery ML, or AutoML) for building, evaluating, or serving models, and hands-on experience building or supporting LLM/GenAI pipelines using frameworks such as LangChain, LlamaIndex, or Vertex AI Agent Builder are also desirable.

Knowledge of CI/CD practices and tools like Git, Jenkins, or Terraform for pipeline deployments, and understanding of data security, governance, and compliance practices on GCP are also beneficial.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://job-boards.greenhouse.io/brainlabs/jobs/4873156101