Tutorial 46: Commercial Healthcare Operations¶
Overview¶
Build an end-to-end hospital operations analytics platform on Microsoft Fabric. This tutorial covers synthetic data generation, medallion architecture ingestion, clinical KPI computation, and Power BI dashboarding -- all with HIPAA compliance built in.
What You'll Build¶
graph LR
A[Data Generator] -->|Parquet| B[Bronze Layer]
B -->|Cleanse & Dedupe| C[Silver Layer]
C -->|Aggregate KPIs| D[Gold Layer]
D -->|Direct Lake| E[Power BI Dashboard]
style A fill:#e1f5fe
style B fill:#fff3e0
style C fill:#f3e5f5
style D fill:#e8f5e9
style E fill:#fce4ec KPIs Delivered¶
| KPI | Description | Clinical Benchmark |
|---|---|---|
| 30-Day Readmission Rate | % patients readmitted within 30 days | < 15% |
| Average LOS | Mean length of stay by DRG | Varies by DRG |
| ED-to-Admit Time | Minutes from ED arrival to inpatient bed | < 240 min |
| Bed Occupancy Rate | Daily census / total beds | 75-85% |
| Denial Rate | % claims denied by payer | < 5% |
| Case Mix Index | Weighted DRG complexity score | ~1.4 - 1.8 |
Prerequisites¶
- Microsoft Fabric workspace (F64 or trial)
- Three Lakehouses:
lh_bronze,lh_silver,lh_gold - Python 3.10+ (for local data generation)
- Environment variable:
FABRIC_POC_HASH_SALT(for HIPAA-safe hashing)
Step 1: Generate Synthetic Data¶
1.1 Install Dependencies¶
1.2 Set HIPAA Hash Salt¶
# Required for MRN hashing -- never use a weak salt in production
export FABRIC_POC_HASH_SALT="your-secure-salt-here"
1.3 Generate Hospital Data¶
from data_generation.generators.healthcare.hospital_operations_generator import (
HospitalOperationsGenerator,
)
gen = HospitalOperationsGenerator(seed=42)
# Generate datasets
admissions = gen.generate_admissions(10000)
claims = gen.generate_claims(25000)
staffing = gen.generate_staffing_records(5000)
# Save to Parquet
gen.to_parquet(admissions, "output/healthcare/admissions.parquet")
gen.to_parquet(claims, "output/healthcare/claims.parquet")
gen.to_parquet(staffing, "output/healthcare/staffing.parquet")
print(f"Admissions: {len(admissions):,} records")
print(f"Claims: {len(claims):,} records")
print(f"Staffing: {len(staffing):,} records")
Verification¶
- Admissions file created with 10,000 rows
- Claims file created with 25,000 rows
- All SSNs are 900-series (synthetic)
- MRN values are SHA-256 hashes (64 hex chars)
Step 2: Upload to Fabric Lakehouse¶
2.1 Upload Files¶
- Open your
lh_bronzeLakehouse in Fabric - Navigate to Files > create folder
landing/healthcare/ - Upload the three Parquet files from Step 1
2.2 Verify Upload¶
# In a Fabric notebook
files = mssparkutils.fs.ls("Files/landing/healthcare/")
for f in files:
print(f"{f.name} - {f.size:,} bytes")
Verification¶
- Three Parquet files visible in Files/landing/healthcare/
- File sizes are non-zero
Step 3: Bronze Ingestion¶
3.1 Import Notebook¶
Import notebooks/bronze/50_healthcare_admissions.py into your Fabric workspace.
3.2 Run Bronze Notebook¶
Execute all cells. The notebook will: 1. Read Parquet files with explicit schema enforcement 2. Add Bronze metadata (_bronze_ingested_at, _bronze_source_file, _bronze_batch_id) 3. Write to bronze_healthcare_admissions Delta table partitioned by load date
3.3 Verify¶
SELECT COUNT(*) as total,
COUNT(DISTINCT encounter_id) as unique_encounters,
MIN(admit_dt) as earliest,
MAX(admit_dt) as latest
FROM lh_bronze.dbo.bronze_healthcare_admissions
Verification¶
- Delta table created with expected record count
- No null encounter_id or mrn_hash values
- Bronze metadata columns present
Step 4: Silver Cleansing¶
4.1 Import & Run Silver Notebook¶
Import notebooks/silver/50_healthcare_cleansed.py and execute.
4.2 Transformations Applied¶
| Transformation | Description |
|---|---|
| Deduplication | Latest ingestion wins per encounter_id |
| LOS Recalculation | datediff(discharge_dt, admit_dt) |
| 30-Day Readmission | Lag window on mrn_hash by admit_dt |
| ED-to-Admit Time | Minutes between ED arrival and admission |
| Data Quality Score | 0-100 based on field completeness |
4.3 Verify¶
SELECT COUNT(*) as total,
ROUND(AVG(_dq_score), 1) as avg_quality,
ROUND(AVG(los_calc), 1) as avg_los,
SUM(readmit_30d) as readmissions
FROM lh_silver.dbo.silver_healthcare_encounters
Verification¶
- Record count matches (after dedup)
- Average quality score > 80
- LOS values are positive integers
- readmit_30d is 0 or 1
Step 5: Gold KPIs¶
5.1 Import & Run Gold Notebook¶
Import notebooks/gold/50_healthcare_kpis.py and execute.
5.2 Output Tables¶
gold_healthcare_kpis - Aggregated operational metrics by DRG:
| Column | Description |
|---|---|
| drg_code | MS-DRG code |
| encounters | Count of encounters |
| avg_los | Mean length of stay |
| readmit_rate_pct | 30-day readmission % |
| case_mix_index | Overall CMI |
gold_healthcare_readmission_risk - Patient-level risk scoring:
| Column | Description |
|---|---|
| encounter_id | Encounter identifier |
| risk_score | Composite 0-100 score |
| risk_tier | HIGH / MEDIUM / LOW |
5.3 Verify¶
-- KPI check
SELECT drg_code, encounters, avg_los, readmit_rate_pct
FROM lh_gold.dbo.gold_healthcare_kpis
ORDER BY encounters DESC LIMIT 10;
-- Risk distribution
SELECT risk_tier, COUNT(*) as patients
FROM lh_gold.dbo.gold_healthcare_readmission_risk
GROUP BY risk_tier;
Verification¶
- KPI table has rows for each DRG code
- Readmission rates are between 0-100%
- Risk tiers include HIGH, MEDIUM, LOW
- Case Mix Index is a reasonable value (1.0-2.5)
Step 6: Power BI Dashboard¶
6.1 Create Direct Lake Semantic Model¶
- In your Fabric workspace, select New > Semantic model
- Choose
lh_goldas the data source - Select tables:
gold_healthcare_kpis,gold_healthcare_readmission_risk - Direct Lake mode is automatic
6.2 Build Dashboard Pages¶
Page 1: Executive Summary - Card: Case Mix Index, Overall Readmission Rate, Avg LOS - Bar chart: Readmission rate by DRG (top 10) - Line chart: Daily occupancy rate trend
Page 2: Readmission Risk - Donut: Risk tier distribution (HIGH/MEDIUM/LOW) - Table: High-risk encounters with DRG, LOS, payer - Slicer: Filter by payer, DRG, disposition
Page 3: ED Operations - Gauge: Median ED-to-admit time vs 240-min target - Histogram: ED wait time distribution - KPI: ED encounter volume trend
6.3 DAX Measures¶
// 30-Day Readmission Rate
Readmission Rate =
DIVIDE(
CALCULATE(COUNTROWS('gold_healthcare_readmission_risk'),
'gold_healthcare_readmission_risk'[risk_score] >= 40),
COUNTROWS('gold_healthcare_readmission_risk'),
0
)
// Case Mix Index
CMI = AVERAGE('gold_healthcare_kpis'[case_mix_index])
Verification¶
- Semantic model connects via Direct Lake (no import)
- All three dashboard pages render correctly
- Slicers filter across visuals
HIPAA Compliance Checklist¶
This POC implements the following HIPAA safeguards:
| Control | Implementation | Status |
|---|---|---|
| PHI De-identification | MRN hashed with HMAC-SHA-256 + salt | Implemented |
| Synthetic SSN | 900-series only (IRS test range) | Implemented |
| No Real Patient Data | All data is synthetically generated | Verified |
| SSN Masking | Only last 4 digits shown (XXX-XX-####) | Implemented |
| Salt Management | FABRIC_POC_HASH_SALT env var required | Enforced |
| Access Control | Lakehouse schemas + workspace RBAC | Configurable |
| Audit Logging | Fabric unified audit log enabled | Platform |
| Encryption at Rest | OneLake default encryption (AES-256) | Platform |
| Encryption in Transit | TLS 1.2+ enforced | Platform |
Important: This is a POC with synthetic data. Production healthcare workloads require a full HIPAA Business Associate Agreement (BAA) with Microsoft and additional controls per 45 CFR 164.
Architecture Diagram¶
graph TB
subgraph "Data Generation"
G[HospitalOperationsGenerator]
G -->|admissions| P1[admissions.parquet]
G -->|claims| P2[claims.parquet]
G -->|staffing| P3[staffing.parquet]
end
subgraph "Bronze Layer (lh_bronze)"
P1 -->|50_healthcare_admissions.py| B1[bronze_healthcare_admissions]
end
subgraph "Silver Layer (lh_silver)"
B1 -->|50_healthcare_cleansed.py| S1[silver_healthcare_encounters]
end
subgraph "Gold Layer (lh_gold)"
S1 -->|50_healthcare_kpis.py| G1[gold_healthcare_kpis]
S1 -->|50_healthcare_kpis.py| G2[gold_healthcare_readmission_risk]
end
subgraph "Analytics"
G1 -->|Direct Lake| PBI[Power BI Dashboard]
G2 -->|Direct Lake| PBI
end Troubleshooting¶
| Issue | Cause | Fix |
|---|---|---|
ValueError: hash_value requires salt | Missing env var | Set FABRIC_POC_HASH_SALT |
| Bronze table empty | Wrong source path | Verify Files/landing/healthcare/ has Parquet files |
| Negative LOS values | discharge < admit | Silver notebook filters these (sets to NULL) |
| High readmission rate | Small synthetic dataset | Generate more records (50K+) for realistic distributions |
| Direct Lake fallback | Table not optimized | Run OPTIMIZE ... ZORDER BY on Gold tables |
Next Steps¶
- Expand claims pipeline with
bronze/silver/gold_healthcare_claims - Add staffing ratio analysis to Gold layer
- Integrate with Fabric Data Activator for readmission alerts
- Connect to Real-Time Intelligence for ED wait time streaming
- Add Great Expectations data quality suite
References¶
- CMS MS-DRG Definitions
- HIPAA Security Rule
- Microsoft Fabric Healthcare Solutions
- Direct Lake Documentation
Tutorial 46 | Commercial Healthcare Operations | Microsoft Fabric POC