Source: examples/tribal-health/README.md — this page is rendered live from that file.

CIPSEA awareness

The data in this example may be subject to CIPSEA (the Confidential Information Protection and Statistical Efficiency Act, 44 U.S.C. §§ 3561–3583) when collected from respondents under a pledge of confidentiality for exclusively statistical purposes.

Knowing and willful disclosure of identifiable CIPSEA data is a Class E felony (§ 3572) attaching to individual officers, employees, or designated agents — including cloud-operator personnel where applicable.

The architecture below is starting-point reference guidance only. Validate the specific compliance posture for your workload with your designating statistical agency and Confidentiality Officer before production use:

CIPSEA control mapping & narrative (DRAFT — under validation)
CIPSEA operational playbook for Azure (DRAFT — under validation)

Tribal Health Data Warehouse — IHS & Tribal Health Analytics¶

Examples > Tribal Health

[!TIP] TL;DR — Population health analytics for IHS and tribal health programs, deployed exclusively in Azure Government. All data is synthetic. Features diabetes tracking, behavioral health resource allocation, and tribal data sovereignty.

📋 Table of Contents¶

Overview
Key Features
Data Sources
Compliance Framework
Architecture Overview
Prerequisites
Azure Government Resources
Tools Required
Compliance Prerequisites
Quick Start
1. Configure Azure Government Environment
2. Generate Synthetic Data
3. Deploy Infrastructure
4. Run dbt Models
Analytics Scenarios
1. Diabetes Prevalence Tracking
2. Behavioral Health Resource Allocation
3. Maternal & Child Health Outcomes
Data Sovereignty
Tribal Control Over Health Data
Data Products
Population Health Summary
Diabetes Registry
Behavioral Health Dashboard
Configuration
dbt Profiles
Environment Variables
Monitoring & Compliance
HIPAA-Compliant Logging
Data Quality Monitoring
Development
Adding New Health Domains
Testing
Troubleshooting
Common Issues
Logs
Ethical Considerations
Contributing
License
Acknowledgments
Disclaimer

A population health analytics platform built on Azure Cloud Scale Analytics (CSA) for Indian Health Service (IHS) area offices, tribal health organizations, and urban Indian health programs. Deployed exclusively in Azure Government with HIPAA, FedRAMP High, and tribal data sovereignty compliance.

📋 Overview¶

The Indian Health Service provides healthcare to approximately 2.6 million American Indians and Alaska Natives across 574 federally recognized tribes. Tribal health systems face unique challenges: vast geographic service areas, complex jurisdictional relationships, chronic disease burdens significantly exceeding national averages, and critical behavioral health needs. This platform ingests, transforms, and analyzes health data from IHS service units and tribal health programs to provide actionable insights for population health management, resource allocation, and health equity measurement.

The platform follows the medallion architecture (Bronze → Silver → Gold), uses HL7 FHIR-aligned data models, and enforces tribal data sovereignty at every layer.

[!WARNING] CRITICAL: All Data Is Synthetic — Individual-level tribal health data is restricted by tribal law, federal policy, and HIPAA. This platform uses only: - Aggregate IHS public statistics (published reports and fact sheets) - Fully synthetic RPMS-compatible data generated for demonstration purposes - HL7 FHIR R4 schemas for interoperability without exposing real patient data

No real patient data, tribal member data, or Protected Health Information (PHI) is included. Any deployment with real data requires explicit tribal council authorization, a data sharing agreement, and IRB approval.

✨ Key Features¶

Azure Government Deployment: All resources provisioned in US Gov Virginia / US Gov Arizona — FedRAMP High baseline
HIPAA-Compliant Architecture: Encryption at rest (AES-256) and in transit (TLS 1.3), PHI audit logging, BAA coverage
Tribal Data Sovereignty: Per-tribe data isolation via Row-Level Security, tribal-controlled access policies, data sharing consent framework
HL7 FHIR Alignment: Data models map to FHIR R4 resources (Patient, Encounter, Organization, Condition) for interoperability
Population Health Analytics: Chronic disease tracking, behavioral health metrics, maternal/child health outcomes
De-identified Reporting: Automated suppression of small cell sizes (<5) per IHS data release policy

🗄️ Data Sources¶

All source data is synthetic. The architecture supports these real-world source patterns:

Source	Type	Description	Access
IHS National Data Warehouse (NDW)	Aggregate	Published health statistics by IHS area	https://www.ihs.gov/dps/
RPMS (Resource & Patient Management System)	Synthetic	EHR data via synthetic RPMS-compatible extracts	Included generator
CDC Tribal Health Data	Aggregate	SVI, BRFSS tribal supplement, vital statistics	https://wonder.cdc.gov/
CMS Quality Measures	Reference	HEDIS/GPRA clinical quality measures	https://www.cms.gov/
Tribal Epidemiology Centers	Aggregate	Regional health surveillance (consent-based)	By arrangement

🔒 Compliance Framework¶

Regulation	Requirement	Implementation
HIPAA	PHI protection, audit controls	Encryption, RBAC, audit logging, BAA
FedRAMP High	Federal cloud security baseline	Azure Government, NIST 800-53 controls
Tribal Data Sovereignty	Tribal ownership of member health data	Per-tribe RLS, consent ledger, data sharing agreements
IHS Data Policy	Small cell suppression, aggregate-only release	Automated suppression in Gold models
42 CFR Part 2	Substance use disorder record confidentiality	Segmented access, SUD consent tracking
FISMA	Federal information security	Continuous monitoring, POA&M tracking

🏗️ Architecture Overview¶

graph TD
    A[Synthetic Data Sources] --> B[Bronze Layer]
    B --> C[Silver Layer]
    C --> D[Gold Layer]
    D --> E[Analytics & Reporting]

    subgraph "Data Sources"
        A1[IHS NDW Extracts<br/>Aggregate Statistics]
        A2[RPMS Synthetic Extracts<br/>Patient/Encounter Data]
        A3[CDC Tribal Health<br/>Surveillance Data]
        A4[Facility Reference<br/>IHS/Tribal/Urban]
    end

    subgraph "Bronze Layer"
        B1[brz_patient_demographics]
        B2[brz_encounters]
        B3[brz_facilities]
    end

    subgraph "Silver Layer"
        C1[slv_patient_demographics]
        C2[slv_encounters]
        C3[slv_facilities]
    end

    subgraph "Gold Layer"
        D1[gld_diabetes_prevalence]
        D2[gld_behavioral_health]
        D3[gld_maternal_child_health]
    end

    subgraph "Consumption"
        E1[Population Health Dashboard]
        E2[GPRA Quality Reports]
        E3[Tribal Health Authority Reports]
        E4[Epidemiology Center Analytics]
    end

    A1 --> B1
    A2 --> B2
    A3 --> B1
    A4 --> B3

    B1 --> C1
    B2 --> C2
    B3 --> C3

    C1 --> D1
    C2 --> D1
    C3 --> D1
    C1 --> D2
    C2 --> D2
    C1 --> D3
    C2 --> D3

    D1 --> E1
    D2 --> E1
    D3 --> E1
    D1 --> E2
    D2 --> E2
    D3 --> E2
    D1 --> E3
    D2 --> E3
    D3 --> E3

📎 Prerequisites¶

🔒 Azure Government Resources¶

[!IMPORTANT] This example deploys EXCLUSIVELY to Azure Government (usgovvirginia / usgovarizona). Azure Commercial is not supported for this workload due to FedRAMP High and IHS compliance requirements.

Azure Government subscription with contributor access
Azure Data Factory (Gov) or Synapse Analytics (Gov)
Azure Data Lake Storage Gen2 (Gov) with hierarchical namespace enabled
Azure Databricks (Gov) or Synapse SQL Pool
Azure Key Vault (Gov) with HSM backing for tribal-controlled encryption keys
Azure Monitor (Gov) with HIPAA-compliant diagnostic settings
Azure API for FHIR (Gov) — endpoint: .fhir.azurehealthcareapis.us

Tools Required¶

Azure CLI (2.55.0+) configured for Azure Government (az cloud set --name AzureUSGovernment)
dbt CLI (1.7.0+)
Python 3.9+
Git

📎 Compliance Prerequisites¶

FedRAMP High ATO or equivalent authorization
HIPAA BAA with Microsoft (included with Azure Government)
Tribal council data sharing agreement (for any real data deployment)
IRB approval (for research use cases)

🚀 Quick Start¶

🔒 1. Configure Azure Government Environment¶

# Set cloud environment to Azure Government
az cloud set --name AzureUSGovernment
az login --tenant <your-gov-tenant-id>

# Verify you're in Gov cloud
az cloud show --query name
# Expected output: "AzureUSGovernment"

2. Generate Synthetic Data¶

# Install dependencies
pip install -r requirements.txt

# Generate synthetic patient, encounter, and facility data
# CRITICAL: This generates ENTIRELY SYNTHETIC data. No real patient data.
python data/generators/generate_tribal_health_data.py \
    --patients 25000 \
    --days 730 \
    --facilities 50 \
    --output-dir domains/dbt/seeds \
    --seed 42

# Small dataset for quick testing
python data/generators/generate_tribal_health_data.py \
    --patients 1000 \
    --days 90 \
    --facilities 15 \
    --output-dir domains/dbt/seeds \
    --seed 42

3. Deploy Infrastructure¶

# Configure deployment parameters
cp deploy/params.dev.json deploy/params.local.json
# Edit params.local.json — ensure region is usgovvirginia or usgovarizona

# Deploy to Azure Government
az deployment group create \
    --resource-group rg-tribal-health-analytics \
    --template-file ../../deploy/bicep/DLZ/main.bicep \
    --parameters @deploy/params.local.json \
    --parameters azureEnvironment=AzureUSGovernment

4. Run dbt Models¶

cd domains/dbt

# Verify connectivity
dbt debug

# Load synthetic seed data
dbt seed

# Run Bronze → Silver → Gold models
dbt run

# Execute data quality tests (including HIPAA validation)
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

💡 Analytics Scenarios¶

1. Diabetes Prevalence Tracking¶

Type 2 diabetes affects American Indian/Alaska Native populations at 2-3x the national average. This model tracks prevalence by service unit, A1C control rates, complication rates, and intervention effectiveness with year-over-year trends.

-- Diabetes prevalence and A1C control by service unit
SELECT
    service_unit,
    reporting_period,
    total_diabetic_patients,
    total_population,
    prevalence_rate_per_1000,
    a1c_controlled_pct,
    a1c_poor_control_pct,
    complication_rate_pct,
    retinopathy_screening_pct,
    nephropathy_screening_pct,
    foot_exam_pct,
    yoy_prevalence_change_pct
FROM gold.gld_diabetes_prevalence
WHERE reporting_period >= '2023-01-01'
ORDER BY prevalence_rate_per_1000 DESC;

2. Behavioral Health Resource Allocation¶

Behavioral health services are critically under-resourced in many tribal communities. This model surfaces substance use trends, mental health service utilization, provider-to-population ratios, waitlist metrics, and crisis intervention counts.

-- Behavioral health service gaps and resource needs
SELECT
    service_unit,
    reporting_period,
    sud_encounter_rate_per_1000,
    mh_encounter_rate_per_1000,
    total_bh_encounters,
    unique_bh_patients,
    provider_ratio_per_10000,
    avg_waitlist_days,
    crisis_intervention_count,
    telehealth_utilization_pct,
    no_show_rate_pct
FROM gold.gld_behavioral_health
WHERE reporting_period >= '2023-01-01'
ORDER BY avg_waitlist_days DESC;

3. Maternal & Child Health Outcomes¶

Tracking prenatal visit completion, birth outcomes, immunization rates, and well-child visit adherence to reduce MCH disparities.

-- MCH outcomes by service unit and age cohort
SELECT
    service_unit,
    reporting_period,
    total_pregnancies,
    prenatal_first_trimester_pct,
    adequate_prenatal_visits_pct,
    low_birth_weight_pct,
    preterm_birth_pct,
    immunization_series_complete_pct,
    well_child_0to1_adherence_pct,
    well_child_1to2_adherence_pct,
    well_child_3to5_adherence_pct,
    teen_pregnancy_rate_per_1000
FROM gold.gld_maternal_child_health
WHERE reporting_period >= '2023-01-01'
ORDER BY prenatal_first_trimester_pct ASC;

🔒 Data Sovereignty¶

Tribal Control Over Health Data¶

This platform is designed with tribal data sovereignty as a foundational principle, not an afterthought.

Per-Tribe Data Isolation - Each tribal affiliation maps to a Microsoft Entra ID security group - Row-Level Security (RLS) policies on Silver and Gold tables restrict queries to authorized tribal data only - Tribal health directors control who can access their nation's data - Cross-tribe queries require explicit data sharing agreements registered in the consent ledger

Data Sharing Consent Framework - Every data sharing action is logged to an immutable audit ledger in ADLS Gen2 - Tribal councils can revoke data access at any time — revocation propagates within 15 minutes - Aggregate-only sharing mode: tribes can share de-identified aggregate statistics without exposing row-level data - IHS area office access requires a current Tribal Resolution or equivalent authorization

De-Identification & Small Cell Suppression - Gold-layer models automatically suppress any cell with fewer than 5 individuals - Secondary suppression (complementary suppression) prevents back-calculation - De-identification follows the HIPAA Safe Harbor method with tribal-specific additional protections - Re-identification risk assessments run quarterly

# Example: Data sharing agreement configuration
tribal_data_sharing:
  tribe_code: "NAV"
  tribe_name: "Navajo Nation"
  sharing_level: "aggregate_only"
  authorized_consumers:
    - "IHS_Navajo_Area_Office"
    - "Navajo_Epi_Center"
  excluded_categories:
    - "substance_use_disorder"  # 42 CFR Part 2
    - "behavioral_health_individual"
  consent_expiry: "2025-12-31"
  tribal_resolution_number: "CJN-42-24"

✨ Data Products¶

Population Health Summary (`population-health`)¶

Description: Aggregated population health metrics by service unit and tribal affiliation
Classification: CUI // SP-HLTH (Controlled Unclassified Information — Health)
Freshness: Monthly updates
Coverage: All 12 IHS service units, 730 days of history
Access: Tribal health authorities, IHS area epidemiologists

Diabetes Registry (`diabetes-registry`)¶

Description: De-identified diabetes cohort metrics with A1C tracking
Classification: CUI // SP-HLTH
Freshness: Quarterly updates aligned with GPRA reporting
Coverage: Type 2 diabetes population across all service units

Behavioral Health Dashboard (`behavioral-health`)¶

Description: Service utilization, provider capacity, and access metrics
Classification: CUI // SP-HLTH, 42 CFR Part 2 restricted
Freshness: Monthly updates
Coverage: Mental health and SUD services

⚙️ Configuration¶

⚙️ dbt Profiles¶

Add to your ~/.dbt/profiles.yml:

tribal_health_analytics:
  target: dev
  outputs:
    dev:
      type: databricks
      host: "{{ env_var('DBT_HOST') }}"
      http_path: "{{ env_var('DBT_HTTP_PATH') }}"
      token: "{{ env_var('DBT_TOKEN') }}"
      schema: tribal_health_dev
      catalog: dev
    staging:
      type: databricks
      host: "{{ env_var('DBT_HOST_STAGING') }}"
      http_path: "{{ env_var('DBT_HTTP_PATH_STAGING') }}"
      token: "{{ env_var('DBT_TOKEN_STAGING') }}"
      schema: tribal_health_staging
      catalog: staging
    prod:
      type: databricks
      host: "{{ env_var('DBT_HOST_PROD') }}"
      http_path: "{{ env_var('DBT_HTTP_PATH_PROD') }}"
      token: "{{ env_var('DBT_TOKEN_PROD') }}"
      schema: tribal_health
      catalog: prod

⚙️ Environment Variables¶

# Azure Government configuration
AZURE_ENVIRONMENT=AzureUSGovernment
AZURE_GOV_TENANT_ID=your-gov-tenant-id

# dbt connectivity (Azure Databricks on Gov)
DBT_HOST=adb-xxxxxxxxxxxx.xx.azuredatabricks.us    # Note: .us for Gov
DBT_HTTP_PATH=/sql/1.0/warehouses/xxxxxxxxxxxx
DBT_TOKEN=dapi-xxxxxxxxxxxx

# HIPAA audit logging
AUDIT_LOG_STORAGE_ACCOUNT=stauditlogstribalhealth
AUDIT_LOG_CONTAINER=hipaa-audit-logs

# Data sovereignty
TRIBAL_DATA_CONSENT_LEDGER=tribal-consent-ledger
DATA_SHARING_CONFIG_PATH=./config/data-sharing-agreements.yaml

# Monitoring
LOG_LEVEL=INFO
HIPAA_AUDIT_ENABLED=true
SMALL_CELL_THRESHOLD=5

📊 Monitoring & Compliance¶

HIPAA-Compliant Logging¶

All data access is logged to a tamper-evident audit trail:

Who accessed data (Microsoft Entra ID principal, IP address)
What data was queried (table, columns, row count, tribal affiliation filter)
When the access occurred (UTC timestamp)
Why — linked to authorized purpose code (treatment, operations, research)
Outcome — query success/failure, rows returned, suppression applied

# Query audit logs (Azure Monitor / Log Analytics)
az monitor log-analytics query \
    --workspace $LOG_ANALYTICS_WORKSPACE_ID \
    --analytics-query "
        TribalHealthAudit_CL
        | where TimeGenerated > ago(24h)
        | where AccessType_s == 'PHI_QUERY'
        | summarize QueryCount=count() by Principal_s, TableAccessed_s
        | order by QueryCount desc
    "

📊 Data Quality Monitoring¶

dbt Tests: Schema validation, referential integrity, clinical value range checks
HIPAA Validation: PHI field encryption verification, access control audit
Clinical Validity: ICD-10 code validation, age-appropriate diagnosis checks
Small Cell Suppression: Automated verification that no Gold-layer output contains cells < 5
Data Freshness: Alerts when source data hasn't updated within SLA

🚀 Development¶

Adding New Health Domains¶

Create Bronze model in domains/dbt/models/bronze/ with source mapping
Add HIPAA-relevant data quality tests in schema.yml
Create Silver model with clinical data standardization and de-identification flags
Add Gold aggregation with small cell suppression logic
Update data contracts in contracts/ with CUI classification
Register new tribal data access policies

🧪 Testing¶

# Unit tests for data generator
pytest data/tests/

# dbt model tests (includes HIPAA validation)
dbt test

# Test specific compliance tags
dbt test --select tag:hipaa_compliance
dbt test --select tag:data_sovereignty

# Integration tests
pytest data/tests/integration/

🔧 Troubleshooting¶

🔧 Common Issues¶

Azure Government Login: Ensure az cloud set --name AzureUSGovernment before az login. Gov endpoints differ from commercial Azure.
dbt Connection to Gov Databricks: Gov Databricks URLs end in .azuredatabricks.us not .azuredatabricks.net. Verify your DBT_HOST.
Small Cell Suppression Errors: If Gold models fail validation, check that all aggregate outputs have n >= 5. Adjust grouping granularity if needed.
Tribal RLS Policy Conflicts: Ensure the querying principal belongs to exactly one tribal AD group. Multi-tribe membership requires explicit cross-tribe authorization.
42 CFR Part 2 Access Denied: Substance use disorder data requires separate consent. Verify the SUD consent flag in the consent ledger.

📊 Logs¶

Application logs: logs/tribal-health-analytics.log
dbt logs: domains/dbt/logs/dbt.log
HIPAA audit logs: Azure Monitor → Log Analytics workspace
Data pipeline logs: Azure Data Factory monitoring (Gov portal)

🔒 Ethical Considerations¶

This platform was designed with the following ethical principles:

Tribal Sovereignty: Tribes own their data. No deployment with real data proceeds without explicit tribal council authorization.
Community Benefit: Analytics must serve the health needs of tribal communities, not external research agendas.
Transparency: All algorithms and scoring methods are documented and auditable.
No Harm: Aggregate statistics are published at levels that prevent re-identification of individuals or small communities.
Reciprocity: Findings and dashboards are shared with tribal health programs, not locked behind paywalls.

🔗 Contributing¶

Fork the repository
Create a feature branch (git checkout -b feature/new-health-domain)
Ensure HIPAA compliance in any new data models
Add appropriate data quality tests
Run dbt test --select tag:hipaa_compliance before submitting
Submit a pull request with security review tag

🔗 License¶

This project is licensed under the MIT License. See LICENSE file for details.

🔗 Acknowledgments¶

Indian Health Service for publicly available aggregate health statistics
Tribal Epidemiology Centers for population health methodology guidance
HL7 FHIR community for interoperability standards
Azure Government team for FedRAMP High platform support
Azure Cloud Scale Analytics team for the foundational platform architecture

🔗 Disclaimer¶

This example uses entirely synthetic data generated to reflect publicly available aggregate health statistics from IHS annual reports. No real patient data, tribal member data, or Protected Health Information (PHI) is included. The synthetic data generator produces statistically plausible distributions for development and demonstration purposes only. Any resemblance to real individuals is coincidental.

Tribal Health Architecture — Detailed platform architecture and design decisions
Examples Index — Overview of all CSA-in-a-Box example verticals
Platform Architecture — Core CSA platform architecture
Getting Started Guide — Platform setup and onboarding
Interior Natural Resources — Related federal/tribal vertical
Casino Analytics — Related tribal operations vertical

Prerequisites / Cost / Teardown¶

[!IMPORTANT] Cost-safety: this vertical deploys real Azure resources. Always run teardown.sh when you are done. A forgotten workshop environment can run $180-280/day.

Prerequisites¶

Azure CLI 2.50+ logged in (az login), subscription selected (az account set --subscription <id>)
jq installed (used by teardown enumeration)
Bicep CLI 0.25+ (az bicep version)
Contributor + User Access Administrator on target subscription (or a pre-created RG with equivalent RBAC)
bash scripts/deploy/validate-prerequisites.sh passes

Cost estimate (rough, East US 2)¶

While running: ~$$180-280/day (services: Synapse, Databricks, ADF, Purview, Storage, Key Vault (HIPAA-hardened))
Idle overnight: roughly half if you stop compute (Databricks autostop + Synapse pause)
Storage + Key Vault residual: <$5/month if you skip teardown

Numbers are indicative for a small demo dataset; production workloads vary significantly. Use az consumption usage list or Cost Management for live numbers.

Runtime¶

Deploy: ~40-55 minutes (first run; cold Bicep)
Teardown: ~15-20 minutes (async RG delete completes in the background)

Teardown¶

When finished, run the per-example teardown script. It enforces a typed DESTROY-tribal-health confirmation, logs every step to reports/teardown/tribal-health-<timestamp>.log, and deletes the resource group rg-tribal-health-analytics along with any matching subscription-scope deployments.

# Interactive (recommended)
bash examples/tribal-health/deploy/teardown.sh

# Dry run (enumerate only)
bash examples/tribal-health/deploy/teardown.sh --dry-run

# From the repo root via Makefile
make teardown-example VERTICAL=tribal-health
make teardown-example VERTICAL=tribal-health DRYRUN=1

# CI automation (no prompt — only for ephemeral environments)
bash examples/tribal-health/deploy/teardown.sh --yes

See docs/QUICKSTART.md#teardown for the platform-wide teardown flow.

Directory Structure¶

tribal-health/
├── contracts/                # Data product contracts (schemas, SLOs, owners)
│   ├── clinical-encounters.yaml
│   ├── encounters.yaml
│   ├── healthcare-facilities.yaml
│   ├── patient-demographics.yaml
│   └── population-health.yaml
├── data/                     # Sample data + synthetic generators
│   ├── generators/
│   └── open-data/
├── deploy/                   # Deployment parameters / Bicep templates
│   ├── params.dev.json
│   ├── params.gov.json
│   └── teardown.sh
├── domains/                  # dbt models (bronze / silver / gold) and seeds
│   └── dbt/
├── notebooks/                # Synapse / Fabric / Databricks notebooks
│   ├── chronic_disease_prediction.py
│   └── population_health_dashboard.py
├── reports/                  # Power BI report templates and pbix sources
├── ARCHITECTURE.md           # Mermaid + prose architecture diagrams
└── README.md                 # This file

Expected Results¶

After running the medallion pipeline against the bundled seed data, the Gold layer should populate the following tables. Row counts vary with the seed-data generator parameters; the figures below are the approximate scale you should see on a default run.

Gold Table	Approximate Rows	Notes
`gld_behavioral_health`	TODO: capture after first run	Populated from Silver via dbt `--select tag:gold`
`gld_diabetes_prevalence`	TODO: capture after first run	Populated from Silver via dbt `--select tag:gold`
`gld_maternal_child_health`	TODO: capture after first run	Populated from Silver via dbt `--select tag:gold`

TODO: capture exact counts after the next end-to-end seed run. These are bounded by the seed-data generator parameters in data/generators/.