# Senior Data Engineer - Real World Data

**Company**: Formation Bio
**Location**: New York, NY; Boston, MA; San Francisco, CA; Raleigh-Durham, NC
**Work arrangement**: hybrid
**Experience**: senior
**Job type**: full-time
**Salary**: $204,500 - $267,000
**Category**: Engineering
**Industry**: Healthcare

**Apply**: https://job-boards.greenhouse.io/formationbio/jobs/7757932?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_7725a8e8-e0b

## Description

Formation Bio is seeking a Senior Data Engineer to join the Scientific Data Intelligence (SDI) team. The successful candidate will help transform Real World Data (RWD) into structured, analytics-ready assets.

Responsibilities:

- Model and transform raw EHR and claims data into clean, canonical, and analytics-ready datasets using SQL, Python, and clinical standards like OMOP.

- Build and manage scalable data pipelines using Dagster for orchestration, dbt for transformation, and Snowflake as the primary compute and storage engine.

- Conduct hands-on RWD analyses to answer scientific and strategic research questions.

- Partner with Data Scientists and clinical leads to design and execute observational studies.

- Implement data validation, completeness, and observability frameworks.

- Apply Generative AI techniques within transformation and analysis layers.

- Communicate findings clearly to both technical and non-technical stakeholders.

Requirements:

- 5+ years of experience in data engineering, ideally with at least 2 years working in healthcare or life sciences.

- Experience with ontologies and biomedical schemas (e.g. UMLS, LOINC, ICD9/10, MeSH).

- Fluency in SQL and Python, and experience building and maintaining production-grade pipelines.

- Experience building longitudinal patient cohorts from EHR or claims data.

- Solid understanding of causal inference frameworks.

- Working familiarity with real-world evidence study design concepts.

- Hands-on expertise with modern data infrastructure, such as Snowflake, dbt, and Dagster.

Total Compensation Range: $204,500 - $267,000

## Skills

### Required
- SQL
- Python
- OMOP
- Dagster
- dbt
- Snowflake
- Generative AI
- UMLS
- LOINC
- ICD9/10
- MeSH

### Nice to have
- regulated or privacy-sensitive data environments
- commercial RWD vendors

---

Source: [Apply at job-boards.greenhouse.io](https://job-boards.greenhouse.io/formationbio/jobs/7757932?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)
