Real-Time Clinical Event Streaming with Kafka & Flink

Event-driven pipeline using Apache Kafka and Apache Flink to stream ADT and lab result events in real time into a Databricks Delta Lake, enabling live patient census dashboards and alerting.

November 1, 2024
Apache Kafka Apache Flink Azure Event Hubs Databricks Delta Lake Python Real-time Azure

Real-Time Clinical Event Streaming with Kafka & Flink

Overview

Built a real-time event streaming platform to replace batch-based patient census reporting, cutting data latency from hours to seconds for clinical operations teams.

Architecture

  • Source: Mirth Connect publishes parsed HL7 ADT and ORU events as JSON to Azure Event Hubs (Kafka-compatible endpoint)
  • Stream Processing: Apache Flink jobs consume from Event Hubs, apply stateful sessionization (grouping ADT events per patient encounter), and compute derived fields (LOS running total, result turnaround time)
  • Windowed Aggregations: Flink tumbling 5-minute windows aggregate lab result volumes by department and specimen type
  • Sink: Flink writes enriched events to Databricks Delta Lake via the Delta Kafka connector; separate sink writes aggregates to Azure Cache for Redis for dashboard reads
  • Alerting: Flink CEP (Complex Event Processing) patterns detect critical lab value sequences and emit alerts to an Azure Service Bus topic

Key Outcomes

  • Reduced patient census reporting latency from 4-hour batch cycles to under 30 seconds
  • Flink CEP alerting flagged critical lab results for clinical review with median 18-second end-to-end latency
  • Stateful sessionization handled out-of-order ADT event delivery with a 10-minute event-time watermark