Data Warehouse · Cedar Gate Technologies
Enterprise Data Warehouse
Overview
Developed an AWS Redshift data warehouse storing 10M+ healthcare records annually for Cedar Gate's value-based care analytics platform. The warehouse consolidated claims, eligibility, provider, and quality-measure data from multiple source systems into a unified star-schema model.
Improved query performance by 30% through careful distribution key and sort key design, compound sort keys on high-cardinality filter columns, and creation of materialised views for frequently-queried aggregations. Automated ingestion pipelines with AWS Glue (ETL jobs), Apache Airflow (orchestration), and EventBridge-driven schedules that reduced downstream reporting delays by 70%.
Engineered robust ETL pipelines processing 500K+ healthcare claims daily, developed data validation frameworks that reduced data quality issues by 85%, and contributed to SQL optimisation efforts and database performance tuning across the analytics team.
Technical Implementation
Schema Design — Star Model
Fact tables for claims, encounters, and quality measures with conformed dimensions for members, providers, dates, and diagnosis/procedure codes. Distribution keys chosen to co-locate fact-to-dimension joins on the same compute node.
ETL Pipeline — AWS Glue
Glue jobs handle extraction from operational MySQL/Postgres databases, S3 landing zones, and third-party FTP feeds. Transformation logic applies business rules (HEDIS measure logic, risk stratification) before loading.
Orchestration — Airflow + EventBridge
Airflow DAGs manage nightly full-refresh and intra-day incremental loads. EventBridge rules trigger ad-hoc Glue job runs when source files arrive in S3, reducing latency for time-sensitive feeds.
Performance Optimisation
Compound sort keys on date + member_id columns, materialised views for frequent aggregations, VACUUM / ANALYZE scheduled weekly, and WLM queue configuration to prioritise interactive vs. batch workloads.