Clinical Data Warehouse — OMOP CDM with dbt

Overview

Implemented an OMOP CDM v5.4 transformation layer on top of a clinical Delta Lake using dbt (data build tool) running on Databricks SQL warehouses, enabling research and analytics teams to run standardized phenotyping queries without touching raw EHR data.

Architecture

Source: Silver-layer Delta tables containing normalized patient, encounter, order, and result data from the HL7 lakehouse pipeline
Transformation: dbt models map source clinical concepts to OMOP domains (Person, Visit Occurrence, Condition Occurrence, Measurement, Drug Exposure) using custom concept mapping tables maintained in Delta
Vocabulary: OMOP standard vocabularies (SNOMED, LOINC, RxNorm) loaded into Databricks Unity Catalog and joined at transformation time
Testing: dbt tests enforce referential integrity, concept coverage thresholds, and null constraints on mandatory OMOP fields
Orchestration: dbt runs scheduled via Azure Data Factory with incremental materialization strategies to limit Databricks compute cost

Key Outcomes

OMOP CDM enabled cross-site cohort queries that previously required manual data extraction requests
dbt test suite caught 3 upstream schema changes before they propagated to research consumers
Incremental dbt models reduced daily transformation compute time by 65% vs. full refresh