My Projects

Here's a collection of projects I've worked on. Each project represents a learning journey and showcases different aspects of my development skills.

Real-Time Clinical Event Streaming with Kafka & Flink

Event-driven pipeline using Apache Kafka and Apache Flink to stream ADT and lab result events in real time into a Databricks Delta Lake, enabling live patient census dashboards and alerting.

Apache Kafka Apache Flink Azure Event Hubs Databricks Delta Lake Python Real-time Azure

Genomics Variant Ingestion & QC Pipeline

Scalable PySpark pipeline on Databricks for ingesting, validating, and annotating VCF files from whole-genome and targeted sequencing panels, stored in a Delta Lake variant catalog on Azure.

Databricks PySpark Delta Lake VCF Genomics ADLS Gen 2 Python Azure

HL7-to-FHIR Clinical Data Lakehouse

End-to-end clinical data integration platform ingesting HL7 v2 messages (ADT, ORM, ORU, reconciliation) via Mirth Connect into a Databricks Delta Lake on ADLS Gen 2, with full FHIR R4 conversion for downstream interoperability.

Mirth Connect HL7 v2 FHIR R4 Databricks Delta Lake ADLS Gen 2 Azure Data Factory Python PostgreSQL

Clinical Data Warehouse — OMOP CDM with dbt

Transformed a multi-source clinical Delta Lake into an OMOP Common Data Model using dbt on Databricks, enabling standardized cohort analysis and federated research queries across patient populations.

dbt Databricks OMOP CDM SQL Delta Lake Data Modeling Azure Python

Multi-Source EHR Pipeline Orchestration with Airflow

Apache Airflow DAGs orchestrating a multi-source EHR ingestion platform that coordinates Mirth Connect polling, ADF triggers, Databricks job runs, and data quality checks across a clinical data lakehouse.

Apache Airflow Databricks Azure Data Factory Python EHR Delta Lake Azure HIPAA

Interested in Working Together?

I'm always open to discussing new opportunities and interesting projects.

Get In Touch