Tutorial 46: Commercial Healthcare Operations¶

Overview¶

Build an end-to-end hospital operations analytics platform on Microsoft Fabric. This tutorial covers synthetic data generation, medallion architecture ingestion, clinical KPI computation, and Power BI dashboarding -- all with HIPAA compliance built in.

What You'll Build¶

graph LR
    A[Data Generator] -->|Parquet| B[Bronze Layer]
    B -->|Cleanse & Dedupe| C[Silver Layer]
    C -->|Aggregate KPIs| D[Gold Layer]
    D -->|Direct Lake| E[Power BI Dashboard]

    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e8f5e9
    style E fill:#fce4ec

KPIs Delivered¶

KPI	Description	Clinical Benchmark
30-Day Readmission Rate	% patients readmitted within 30 days	< 15%
Average LOS	Mean length of stay by DRG	Varies by DRG
ED-to-Admit Time	Minutes from ED arrival to inpatient bed	< 240 min
Bed Occupancy Rate	Daily census / total beds	75-85%
Denial Rate	% claims denied by payer	< 5%
Case Mix Index	Weighted DRG complexity score	~1.4 - 1.8

Prerequisites¶

Microsoft Fabric workspace (F64 or trial)
Three Lakehouses: lh_bronze, lh_silver, lh_gold
Python 3.10+ (for local data generation)
Environment variable: FABRIC_POC_HASH_SALT (for HIPAA-safe hashing)

Step 1: Generate Synthetic Data¶

1.1 Install Dependencies¶

pip install pandas numpy faker tqdm

1.2 Set HIPAA Hash Salt¶

# Required for MRN hashing -- never use a weak salt in production
export FABRIC_POC_HASH_SALT="your-secure-salt-here"

1.3 Generate Hospital Data¶

from data_generation.generators.healthcare.hospital_operations_generator import (
    HospitalOperationsGenerator,
)

gen = HospitalOperationsGenerator(seed=42)

# Generate datasets
admissions = gen.generate_admissions(10000)
claims = gen.generate_claims(25000)
staffing = gen.generate_staffing_records(5000)

# Save to Parquet
gen.to_parquet(admissions, "output/healthcare/admissions.parquet")
gen.to_parquet(claims, "output/healthcare/claims.parquet")
gen.to_parquet(staffing, "output/healthcare/staffing.parquet")

print(f"Admissions: {len(admissions):,} records")
print(f"Claims: {len(claims):,} records")
print(f"Staffing: {len(staffing):,} records")

Verification¶

Admissions file created with 10,000 rows
Claims file created with 25,000 rows
All SSNs are 900-series (synthetic)
MRN values are SHA-256 hashes (64 hex chars)

Step 2: Upload to Fabric Lakehouse¶

2.1 Upload Files¶

Open your lh_bronze Lakehouse in Fabric
Navigate to Files > create folder landing/healthcare/
Upload the three Parquet files from Step 1

2.2 Verify Upload¶

# In a Fabric notebook
files = mssparkutils.fs.ls("Files/landing/healthcare/")
for f in files:
    print(f"{f.name} - {f.size:,} bytes")

Verification¶

Three Parquet files visible in Files/landing/healthcare/
File sizes are non-zero

Step 3: Bronze Ingestion¶

3.1 Import Notebook¶

Import notebooks/bronze/50_healthcare_admissions.py into your Fabric workspace.

3.2 Run Bronze Notebook¶

Execute all cells. The notebook will: 1. Read Parquet files with explicit schema enforcement 2. Add Bronze metadata (_bronze_ingested_at, _bronze_source_file, _bronze_batch_id) 3. Write to bronze_healthcare_admissions Delta table partitioned by load date

3.3 Verify¶

SELECT COUNT(*) as total,
       COUNT(DISTINCT encounter_id) as unique_encounters,
       MIN(admit_dt) as earliest,
       MAX(admit_dt) as latest
FROM lh_bronze.dbo.bronze_healthcare_admissions

Verification¶

Delta table created with expected record count
No null encounter_id or mrn_hash values
Bronze metadata columns present

Step 4: Silver Cleansing¶

4.1 Import & Run Silver Notebook¶

Import notebooks/silver/50_healthcare_cleansed.py and execute.

4.2 Transformations Applied¶

Transformation	Description
Deduplication	Latest ingestion wins per encounter_id
LOS Recalculation	`datediff(discharge_dt, admit_dt)`
30-Day Readmission	Lag window on mrn_hash by admit_dt
ED-to-Admit Time	Minutes between ED arrival and admission
Data Quality Score	0-100 based on field completeness

4.3 Verify¶

SELECT COUNT(*) as total,
       ROUND(AVG(_dq_score), 1) as avg_quality,
       ROUND(AVG(los_calc), 1) as avg_los,
       SUM(readmit_30d) as readmissions
FROM lh_silver.dbo.silver_healthcare_encounters

Verification¶

Record count matches (after dedup)
Average quality score > 80
LOS values are positive integers
readmit_30d is 0 or 1

Step 5: Gold KPIs¶

5.1 Import & Run Gold Notebook¶

Import notebooks/gold/50_healthcare_kpis.py and execute.

5.2 Output Tables¶

gold_healthcare_kpis - Aggregated operational metrics by DRG:

Column	Description
drg_code	MS-DRG code
encounters	Count of encounters
avg_los	Mean length of stay
readmit_rate_pct	30-day readmission %
case_mix_index	Overall CMI

gold_healthcare_readmission_risk - Patient-level risk scoring:

Column	Description
encounter_id	Encounter identifier
risk_score	Composite 0-100 score
risk_tier	HIGH / MEDIUM / LOW

5.3 Verify¶

-- KPI check
SELECT drg_code, encounters, avg_los, readmit_rate_pct
FROM lh_gold.dbo.gold_healthcare_kpis
ORDER BY encounters DESC LIMIT 10;

-- Risk distribution
SELECT risk_tier, COUNT(*) as patients
FROM lh_gold.dbo.gold_healthcare_readmission_risk
GROUP BY risk_tier;

Verification¶

KPI table has rows for each DRG code
Readmission rates are between 0-100%
Risk tiers include HIGH, MEDIUM, LOW
Case Mix Index is a reasonable value (1.0-2.5)

Step 6: Power BI Dashboard¶

6.1 Create Direct Lake Semantic Model¶

In your Fabric workspace, select New > Semantic model
Choose lh_gold as the data source
Select tables: gold_healthcare_kpis, gold_healthcare_readmission_risk
Direct Lake mode is automatic

6.2 Build Dashboard Pages¶

Page 1: Executive Summary - Card: Case Mix Index, Overall Readmission Rate, Avg LOS - Bar chart: Readmission rate by DRG (top 10) - Line chart: Daily occupancy rate trend

Page 2: Readmission Risk - Donut: Risk tier distribution (HIGH/MEDIUM/LOW) - Table: High-risk encounters with DRG, LOS, payer - Slicer: Filter by payer, DRG, disposition

Page 3: ED Operations - Gauge: Median ED-to-admit time vs 240-min target - Histogram: ED wait time distribution - KPI: ED encounter volume trend

6.3 DAX Measures¶

// 30-Day Readmission Rate
Readmission Rate =
DIVIDE(
    CALCULATE(COUNTROWS('gold_healthcare_readmission_risk'),
              'gold_healthcare_readmission_risk'[risk_score] >= 40),
    COUNTROWS('gold_healthcare_readmission_risk'),
    0
)

// Case Mix Index
CMI = AVERAGE('gold_healthcare_kpis'[case_mix_index])

Verification¶

Semantic model connects via Direct Lake (no import)
All three dashboard pages render correctly
Slicers filter across visuals

HIPAA Compliance Checklist¶

This POC implements the following HIPAA safeguards:

Control	Implementation	Status
PHI De-identification	MRN hashed with HMAC-SHA-256 + salt	Implemented
Synthetic SSN	900-series only (IRS test range)	Implemented
No Real Patient Data	All data is synthetically generated	Verified
SSN Masking	Only last 4 digits shown (`XXX-XX-####`)	Implemented
Salt Management	`FABRIC_POC_HASH_SALT` env var required	Enforced
Access Control	Lakehouse schemas + workspace RBAC	Configurable
Audit Logging	Fabric unified audit log enabled	Platform
Encryption at Rest	OneLake default encryption (AES-256)	Platform
Encryption in Transit	TLS 1.2+ enforced	Platform

Important: This is a POC with synthetic data. Production healthcare workloads require a full HIPAA Business Associate Agreement (BAA) with Microsoft and additional controls per 45 CFR 164.

Architecture Diagram¶

graph TB
    subgraph "Data Generation"
        G[HospitalOperationsGenerator]
        G -->|admissions| P1[admissions.parquet]
        G -->|claims| P2[claims.parquet]
        G -->|staffing| P3[staffing.parquet]
    end

    subgraph "Bronze Layer (lh_bronze)"
        P1 -->|50_healthcare_admissions.py| B1[bronze_healthcare_admissions]
    end

    subgraph "Silver Layer (lh_silver)"
        B1 -->|50_healthcare_cleansed.py| S1[silver_healthcare_encounters]
    end

    subgraph "Gold Layer (lh_gold)"
        S1 -->|50_healthcare_kpis.py| G1[gold_healthcare_kpis]
        S1 -->|50_healthcare_kpis.py| G2[gold_healthcare_readmission_risk]
    end

    subgraph "Analytics"
        G1 -->|Direct Lake| PBI[Power BI Dashboard]
        G2 -->|Direct Lake| PBI
    end

Troubleshooting¶

Issue	Cause	Fix
`ValueError: hash_value requires salt`	Missing env var	Set `FABRIC_POC_HASH_SALT`
Bronze table empty	Wrong source path	Verify `Files/landing/healthcare/` has Parquet files
Negative LOS values	discharge < admit	Silver notebook filters these (sets to NULL)
High readmission rate	Small synthetic dataset	Generate more records (50K+) for realistic distributions
Direct Lake fallback	Table not optimized	Run `OPTIMIZE ... ZORDER BY` on Gold tables

Next Steps¶

Expand claims pipeline with bronze/silver/gold_healthcare_claims
Add staffing ratio analysis to Gold layer
Integrate with Fabric Data Activator for readmission alerts
Connect to Real-Time Intelligence for ED wait time streaming
Add Great Expectations data quality suite

References¶

Tutorial 46 | Commercial Healthcare Operations | Microsoft Fabric POC

← PreviousSemantic Model & ReportsRead more →Next →47 — Financial ServicesRead more →