HL7-to-FHIR Clinical Data Lakehouse
Overview
Built a production clinical data integration platform for a hybrid on-prem/Azure environment, handling all inbound and outbound HL7 v2 message traffic from hospital systems, reference labs, and internal instruments.
Architecture
- Ingestion: Mirth Connect channels parse and route ADT (A01–A08), ORM O01, ORU R01, and reconciliation messages from TCP/MLLP listeners
- Landing Zone: Raw HL7 payloads written to ADLS Gen 2 bronze layer as JSON-serialized segments
- Processing: Databricks PySpark jobs normalize segments into silver-layer Delta tables (patient, encounter, order, result)
- FHIR Conversion: Custom Python transformers map silver tables to FHIR R4 resources (Patient, Encounter, DiagnosticReport, Observation) served via a REST API
- Orchestration: Azure Data Factory pipelines schedule bronze-to-silver and silver-to-gold promotion jobs
- Audit & Reconciliation: PostgreSQL tracks message acknowledgment state, reprocessing queues, and SLA breach alerts
Key Outcomes
- Reduced manual reconciliation effort by ~70% through automated ACK tracking and exception routing
- Achieved sub-5-minute latency from HL7 receipt to queryable Delta table row
- FHIR R4 output consumed by two downstream EHR systems with zero breaking schema changes across 18 months