Source:
examples/tribal-health/README.md— this page is rendered live from that file.
CIPSEA awareness
The data in this example may be subject to CIPSEA (the Confidential Information Protection and Statistical Efficiency Act, 44 U.S.C. §§ 3561–3583) when collected from respondents under a pledge of confidentiality for exclusively statistical purposes.
Knowing and willful disclosure of identifiable CIPSEA data is a Class E felony (§ 3572) attaching to individual officers, employees, or designated agents — including cloud-operator personnel where applicable.
The architecture below is starting-point reference guidance only. Validate the specific compliance posture for your workload with your designating statistical agency and Confidentiality Officer before production use:
- CIPSEA control mapping & narrative (DRAFT — under validation)
- CIPSEA operational playbook for Azure (DRAFT — under validation)
Tribal Health Data Warehouse — IHS & Tribal Health Analytics¶
Examples > Tribal Health
[!TIP] TL;DR — Population health analytics for IHS and tribal health programs, deployed exclusively in Azure Government. All data is synthetic. Features diabetes tracking, behavioral health resource allocation, and tribal data sovereignty.
📋 Table of Contents¶
- Overview
- Key Features
- Data Sources
- Compliance Framework
- Architecture Overview
- Prerequisites
- Azure Government Resources
- Tools Required
- Compliance Prerequisites
- Quick Start
- 1. Configure Azure Government Environment
- 2. Generate Synthetic Data
- 3. Deploy Infrastructure
- 4. Run dbt Models
- Analytics Scenarios
- 1. Diabetes Prevalence Tracking
- 2. Behavioral Health Resource Allocation
- 3. Maternal & Child Health Outcomes
- Data Sovereignty
- Tribal Control Over Health Data
- Data Products
- Population Health Summary
- Diabetes Registry
- Behavioral Health Dashboard
- Configuration
- dbt Profiles
- Environment Variables
- Monitoring & Compliance
- HIPAA-Compliant Logging
- Data Quality Monitoring
- Development
- Adding New Health Domains
- Testing
- Troubleshooting
- Common Issues
- Logs
- Ethical Considerations
- Contributing
- License
- Acknowledgments
- Disclaimer
A population health analytics platform built on Azure Cloud Scale Analytics (CSA) for Indian Health Service (IHS) area offices, tribal health organizations, and urban Indian health programs. Deployed exclusively in Azure Government with HIPAA, FedRAMP High, and tribal data sovereignty compliance.
📋 Overview¶
The Indian Health Service provides healthcare to approximately 2.6 million American Indians and Alaska Natives across 574 federally recognized tribes. Tribal health systems face unique challenges: vast geographic service areas, complex jurisdictional relationships, chronic disease burdens significantly exceeding national averages, and critical behavioral health needs. This platform ingests, transforms, and analyzes health data from IHS service units and tribal health programs to provide actionable insights for population health management, resource allocation, and health equity measurement.
The platform follows the medallion architecture (Bronze → Silver → Gold), uses HL7 FHIR-aligned data models, and enforces tribal data sovereignty at every layer.
[!WARNING] CRITICAL: All Data Is Synthetic — Individual-level tribal health data is restricted by tribal law, federal policy, and HIPAA. This platform uses only: - Aggregate IHS public statistics (published reports and fact sheets) - Fully synthetic RPMS-compatible data generated for demonstration purposes - HL7 FHIR R4 schemas for interoperability without exposing real patient data
No real patient data, tribal member data, or Protected Health Information (PHI) is included. Any deployment with real data requires explicit tribal council authorization, a data sharing agreement, and IRB approval.
✨ Key Features¶
- Azure Government Deployment: All resources provisioned in US Gov Virginia / US Gov Arizona — FedRAMP High baseline
- HIPAA-Compliant Architecture: Encryption at rest (AES-256) and in transit (TLS 1.3), PHI audit logging, BAA coverage
- Tribal Data Sovereignty: Per-tribe data isolation via Row-Level Security, tribal-controlled access policies, data sharing consent framework
- HL7 FHIR Alignment: Data models map to FHIR R4 resources (Patient, Encounter, Organization, Condition) for interoperability
- Population Health Analytics: Chronic disease tracking, behavioral health metrics, maternal/child health outcomes
- De-identified Reporting: Automated suppression of small cell sizes (<5) per IHS data release policy
🗄️ Data Sources¶
All source data is synthetic. The architecture supports these real-world source patterns:
| Source | Type | Description | Access |
|---|---|---|---|
| IHS National Data Warehouse (NDW) | Aggregate | Published health statistics by IHS area | https://www.ihs.gov/dps/ |
| RPMS (Resource & Patient Management System) | Synthetic | EHR data via synthetic RPMS-compatible extracts | Included generator |
| CDC Tribal Health Data | Aggregate | SVI, BRFSS tribal supplement, vital statistics | https://wonder.cdc.gov/ |
| CMS Quality Measures | Reference | HEDIS/GPRA clinical quality measures | https://www.cms.gov/ |
| Tribal Epidemiology Centers | Aggregate | Regional health surveillance (consent-based) | By arrangement |
🔒 Compliance Framework¶
| Regulation | Requirement | Implementation |
|---|---|---|
| HIPAA | PHI protection, audit controls | Encryption, RBAC, audit logging, BAA |
| FedRAMP High | Federal cloud security baseline | Azure Government, NIST 800-53 controls |
| Tribal Data Sovereignty | Tribal ownership of member health data | Per-tribe RLS, consent ledger, data sharing agreements |
| IHS Data Policy | Small cell suppression, aggregate-only release | Automated suppression in Gold models |
| 42 CFR Part 2 | Substance use disorder record confidentiality | Segmented access, SUD consent tracking |
| FISMA | Federal information security | Continuous monitoring, POA&M tracking |
🏗️ Architecture Overview¶
graph TD
A[Synthetic Data Sources] --> B[Bronze Layer]
B --> C[Silver Layer]
C --> D[Gold Layer]
D --> E[Analytics & Reporting]
subgraph "Data Sources"
A1[IHS NDW Extracts<br/>Aggregate Statistics]
A2[RPMS Synthetic Extracts<br/>Patient/Encounter Data]
A3[CDC Tribal Health<br/>Surveillance Data]
A4[Facility Reference<br/>IHS/Tribal/Urban]
end
subgraph "Bronze Layer"
B1[brz_patient_demographics]
B2[brz_encounters]
B3[brz_facilities]
end
subgraph "Silver Layer"
C1[slv_patient_demographics]
C2[slv_encounters]
C3[slv_facilities]
end
subgraph "Gold Layer"
D1[gld_diabetes_prevalence]
D2[gld_behavioral_health]
D3[gld_maternal_child_health]
end
subgraph "Consumption"
E1[Population Health Dashboard]
E2[GPRA Quality Reports]
E3[Tribal Health Authority Reports]
E4[Epidemiology Center Analytics]
end
A1 --> B1
A2 --> B2
A3 --> B1
A4 --> B3
B1 --> C1
B2 --> C2
B3 --> C3
C1 --> D1
C2 --> D1
C3 --> D1
C1 --> D2
C2 --> D2
C1 --> D3
C2 --> D3
D1 --> E1
D2 --> E1
D3 --> E1
D1 --> E2
D2 --> E2
D3 --> E2
D1 --> E3
D2 --> E3
D3 --> E3 📎 Prerequisites¶
🔒 Azure Government Resources¶
[!IMPORTANT] This example deploys EXCLUSIVELY to Azure Government (usgovvirginia / usgovarizona). Azure Commercial is not supported for this workload due to FedRAMP High and IHS compliance requirements.
- Azure Government subscription with contributor access
- Azure Data Factory (Gov) or Synapse Analytics (Gov)
- Azure Data Lake Storage Gen2 (Gov) with hierarchical namespace enabled
- Azure Databricks (Gov) or Synapse SQL Pool
- Azure Key Vault (Gov) with HSM backing for tribal-controlled encryption keys
- Azure Monitor (Gov) with HIPAA-compliant diagnostic settings
- Azure API for FHIR (Gov) — endpoint:
.fhir.azurehealthcareapis.us
Tools Required¶
- Azure CLI (2.55.0+) configured for Azure Government (
az cloud set --name AzureUSGovernment) - dbt CLI (1.7.0+)
- Python 3.9+
- Git
📎 Compliance Prerequisites¶
- FedRAMP High ATO or equivalent authorization
- HIPAA BAA with Microsoft (included with Azure Government)
- Tribal council data sharing agreement (for any real data deployment)
- IRB approval (for research use cases)
🚀 Quick Start¶
🔒 1. Configure Azure Government Environment¶
# Set cloud environment to Azure Government
az cloud set --name AzureUSGovernment
az login --tenant <your-gov-tenant-id>
# Verify you're in Gov cloud
az cloud show --query name
# Expected output: "AzureUSGovernment"
2. Generate Synthetic Data¶
# Install dependencies
pip install -r requirements.txt
# Generate synthetic patient, encounter, and facility data
# CRITICAL: This generates ENTIRELY SYNTHETIC data. No real patient data.
python data/generators/generate_tribal_health_data.py \
--patients 25000 \
--days 730 \
--facilities 50 \
--output-dir domains/dbt/seeds \
--seed 42
# Small dataset for quick testing
python data/generators/generate_tribal_health_data.py \
--patients 1000 \
--days 90 \
--facilities 15 \
--output-dir domains/dbt/seeds \
--seed 42
3. Deploy Infrastructure¶
# Configure deployment parameters
cp deploy/params.dev.json deploy/params.local.json
# Edit params.local.json — ensure region is usgovvirginia or usgovarizona
# Deploy to Azure Government
az deployment group create \
--resource-group rg-tribal-health-analytics \
--template-file ../../deploy/bicep/DLZ/main.bicep \
--parameters @deploy/params.local.json \
--parameters azureEnvironment=AzureUSGovernment
4. Run dbt Models¶
cd domains/dbt
# Verify connectivity
dbt debug
# Load synthetic seed data
dbt seed
# Run Bronze → Silver → Gold models
dbt run
# Execute data quality tests (including HIPAA validation)
dbt test
# Generate documentation
dbt docs generate
dbt docs serve
💡 Analytics Scenarios¶
1. Diabetes Prevalence Tracking¶
Type 2 diabetes affects American Indian/Alaska Native populations at 2-3x the national average. This model tracks prevalence by service unit, A1C control rates, complication rates, and intervention effectiveness with year-over-year trends.
-- Diabetes prevalence and A1C control by service unit
SELECT
service_unit,
reporting_period,
total_diabetic_patients,
total_population,
prevalence_rate_per_1000,
a1c_controlled_pct,
a1c_poor_control_pct,
complication_rate_pct,
retinopathy_screening_pct,
nephropathy_screening_pct,
foot_exam_pct,
yoy_prevalence_change_pct
FROM gold.gld_diabetes_prevalence
WHERE reporting_period >= '2023-01-01'
ORDER BY prevalence_rate_per_1000 DESC;
2. Behavioral Health Resource Allocation¶
Behavioral health services are critically under-resourced in many tribal communities. This model surfaces substance use trends, mental health service utilization, provider-to-population ratios, waitlist metrics, and crisis intervention counts.
-- Behavioral health service gaps and resource needs
SELECT
service_unit,
reporting_period,
sud_encounter_rate_per_1000,
mh_encounter_rate_per_1000,
total_bh_encounters,
unique_bh_patients,
provider_ratio_per_10000,
avg_waitlist_days,
crisis_intervention_count,
telehealth_utilization_pct,
no_show_rate_pct
FROM gold.gld_behavioral_health
WHERE reporting_period >= '2023-01-01'
ORDER BY avg_waitlist_days DESC;
3. Maternal & Child Health Outcomes¶
Tracking prenatal visit completion, birth outcomes, immunization rates, and well-child visit adherence to reduce MCH disparities.
-- MCH outcomes by service unit and age cohort
SELECT
service_unit,
reporting_period,
total_pregnancies,
prenatal_first_trimester_pct,
adequate_prenatal_visits_pct,
low_birth_weight_pct,
preterm_birth_pct,
immunization_series_complete_pct,
well_child_0to1_adherence_pct,
well_child_1to2_adherence_pct,
well_child_3to5_adherence_pct,
teen_pregnancy_rate_per_1000
FROM gold.gld_maternal_child_health
WHERE reporting_period >= '2023-01-01'
ORDER BY prenatal_first_trimester_pct ASC;
🔒 Data Sovereignty¶
Tribal Control Over Health Data¶
This platform is designed with tribal data sovereignty as a foundational principle, not an afterthought.
Per-Tribe Data Isolation - Each tribal affiliation maps to a Microsoft Entra ID security group - Row-Level Security (RLS) policies on Silver and Gold tables restrict queries to authorized tribal data only - Tribal health directors control who can access their nation's data - Cross-tribe queries require explicit data sharing agreements registered in the consent ledger
Data Sharing Consent Framework - Every data sharing action is logged to an immutable audit ledger in ADLS Gen2 - Tribal councils can revoke data access at any time — revocation propagates within 15 minutes - Aggregate-only sharing mode: tribes can share de-identified aggregate statistics without exposing row-level data - IHS area office access requires a current Tribal Resolution or equivalent authorization
De-Identification & Small Cell Suppression - Gold-layer models automatically suppress any cell with fewer than 5 individuals - Secondary suppression (complementary suppression) prevents back-calculation - De-identification follows the HIPAA Safe Harbor method with tribal-specific additional protections - Re-identification risk assessments run quarterly
# Example: Data sharing agreement configuration
tribal_data_sharing:
tribe_code: "NAV"
tribe_name: "Navajo Nation"
sharing_level: "aggregate_only"
authorized_consumers:
- "IHS_Navajo_Area_Office"
- "Navajo_Epi_Center"
excluded_categories:
- "substance_use_disorder" # 42 CFR Part 2
- "behavioral_health_individual"
consent_expiry: "2025-12-31"
tribal_resolution_number: "CJN-42-24"
✨ Data Products¶
Population Health Summary (population-health)¶
- Description: Aggregated population health metrics by service unit and tribal affiliation
- Classification: CUI // SP-HLTH (Controlled Unclassified Information — Health)
- Freshness: Monthly updates
- Coverage: All 12 IHS service units, 730 days of history
- Access: Tribal health authorities, IHS area epidemiologists
Diabetes Registry (diabetes-registry)¶
- Description: De-identified diabetes cohort metrics with A1C tracking
- Classification: CUI // SP-HLTH
- Freshness: Quarterly updates aligned with GPRA reporting
- Coverage: Type 2 diabetes population across all service units
Behavioral Health Dashboard (behavioral-health)¶
- Description: Service utilization, provider capacity, and access metrics
- Classification: CUI // SP-HLTH, 42 CFR Part 2 restricted
- Freshness: Monthly updates
- Coverage: Mental health and SUD services
⚙️ Configuration¶
⚙️ dbt Profiles¶
Add to your ~/.dbt/profiles.yml:
tribal_health_analytics:
target: dev
outputs:
dev:
type: databricks
host: "{{ env_var('DBT_HOST') }}"
http_path: "{{ env_var('DBT_HTTP_PATH') }}"
token: "{{ env_var('DBT_TOKEN') }}"
schema: tribal_health_dev
catalog: dev
staging:
type: databricks
host: "{{ env_var('DBT_HOST_STAGING') }}"
http_path: "{{ env_var('DBT_HTTP_PATH_STAGING') }}"
token: "{{ env_var('DBT_TOKEN_STAGING') }}"
schema: tribal_health_staging
catalog: staging
prod:
type: databricks
host: "{{ env_var('DBT_HOST_PROD') }}"
http_path: "{{ env_var('DBT_HTTP_PATH_PROD') }}"
token: "{{ env_var('DBT_TOKEN_PROD') }}"
schema: tribal_health
catalog: prod
⚙️ Environment Variables¶
# Azure Government configuration
AZURE_ENVIRONMENT=AzureUSGovernment
AZURE_GOV_TENANT_ID=your-gov-tenant-id
# dbt connectivity (Azure Databricks on Gov)
DBT_HOST=adb-xxxxxxxxxxxx.xx.azuredatabricks.us # Note: .us for Gov
DBT_HTTP_PATH=/sql/1.0/warehouses/xxxxxxxxxxxx
DBT_TOKEN=dapi-xxxxxxxxxxxx
# HIPAA audit logging
AUDIT_LOG_STORAGE_ACCOUNT=stauditlogstribalhealth
AUDIT_LOG_CONTAINER=hipaa-audit-logs
# Data sovereignty
TRIBAL_DATA_CONSENT_LEDGER=tribal-consent-ledger
DATA_SHARING_CONFIG_PATH=./config/data-sharing-agreements.yaml
# Monitoring
LOG_LEVEL=INFO
HIPAA_AUDIT_ENABLED=true
SMALL_CELL_THRESHOLD=5
📊 Monitoring & Compliance¶
HIPAA-Compliant Logging¶
All data access is logged to a tamper-evident audit trail:
- Who accessed data (Microsoft Entra ID principal, IP address)
- What data was queried (table, columns, row count, tribal affiliation filter)
- When the access occurred (UTC timestamp)
- Why — linked to authorized purpose code (treatment, operations, research)
- Outcome — query success/failure, rows returned, suppression applied
# Query audit logs (Azure Monitor / Log Analytics)
az monitor log-analytics query \
--workspace $LOG_ANALYTICS_WORKSPACE_ID \
--analytics-query "
TribalHealthAudit_CL
| where TimeGenerated > ago(24h)
| where AccessType_s == 'PHI_QUERY'
| summarize QueryCount=count() by Principal_s, TableAccessed_s
| order by QueryCount desc
"
📊 Data Quality Monitoring¶
- dbt Tests: Schema validation, referential integrity, clinical value range checks
- HIPAA Validation: PHI field encryption verification, access control audit
- Clinical Validity: ICD-10 code validation, age-appropriate diagnosis checks
- Small Cell Suppression: Automated verification that no Gold-layer output contains cells < 5
- Data Freshness: Alerts when source data hasn't updated within SLA
🚀 Development¶
Adding New Health Domains¶
- Create Bronze model in
domains/dbt/models/bronze/with source mapping - Add HIPAA-relevant data quality tests in
schema.yml - Create Silver model with clinical data standardization and de-identification flags
- Add Gold aggregation with small cell suppression logic
- Update data contracts in
contracts/with CUI classification - Register new tribal data access policies
🧪 Testing¶
# Unit tests for data generator
pytest data/tests/
# dbt model tests (includes HIPAA validation)
dbt test
# Test specific compliance tags
dbt test --select tag:hipaa_compliance
dbt test --select tag:data_sovereignty
# Integration tests
pytest data/tests/integration/
🔧 Troubleshooting¶
🔧 Common Issues¶
-
Azure Government Login: Ensure
az cloud set --name AzureUSGovernmentbeforeaz login. Gov endpoints differ from commercial Azure. -
dbt Connection to Gov Databricks: Gov Databricks URLs end in
.azuredatabricks.usnot.azuredatabricks.net. Verify your DBT_HOST. -
Small Cell Suppression Errors: If Gold models fail validation, check that all aggregate outputs have n >= 5. Adjust grouping granularity if needed.
-
Tribal RLS Policy Conflicts: Ensure the querying principal belongs to exactly one tribal AD group. Multi-tribe membership requires explicit cross-tribe authorization.
-
42 CFR Part 2 Access Denied: Substance use disorder data requires separate consent. Verify the SUD consent flag in the consent ledger.
📊 Logs¶
- Application logs:
logs/tribal-health-analytics.log - dbt logs:
domains/dbt/logs/dbt.log - HIPAA audit logs: Azure Monitor → Log Analytics workspace
- Data pipeline logs: Azure Data Factory monitoring (Gov portal)
🔒 Ethical Considerations¶
This platform was designed with the following ethical principles:
- Tribal Sovereignty: Tribes own their data. No deployment with real data proceeds without explicit tribal council authorization.
- Community Benefit: Analytics must serve the health needs of tribal communities, not external research agendas.
- Transparency: All algorithms and scoring methods are documented and auditable.
- No Harm: Aggregate statistics are published at levels that prevent re-identification of individuals or small communities.
- Reciprocity: Findings and dashboards are shared with tribal health programs, not locked behind paywalls.
🔗 Contributing¶
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-health-domain) - Ensure HIPAA compliance in any new data models
- Add appropriate data quality tests
- Run
dbt test --select tag:hipaa_compliancebefore submitting - Submit a pull request with security review tag
🔗 License¶
This project is licensed under the MIT License. See LICENSE file for details.
🔗 Acknowledgments¶
- Indian Health Service for publicly available aggregate health statistics
- Tribal Epidemiology Centers for population health methodology guidance
- HL7 FHIR community for interoperability standards
- Azure Government team for FedRAMP High platform support
- Azure Cloud Scale Analytics team for the foundational platform architecture
🔗 Disclaimer¶
This example uses entirely synthetic data generated to reflect publicly available aggregate health statistics from IHS annual reports. No real patient data, tribal member data, or Protected Health Information (PHI) is included. The synthetic data generator produces statistically plausible distributions for development and demonstration purposes only. Any resemblance to real individuals is coincidental.
🔗 Related Documentation¶
- Tribal Health Architecture — Detailed platform architecture and design decisions
- Examples Index — Overview of all CSA-in-a-Box example verticals
- Platform Architecture — Core CSA platform architecture
- Getting Started Guide — Platform setup and onboarding
- Interior Natural Resources — Related federal/tribal vertical
- Casino Analytics — Related tribal operations vertical
Prerequisites / Cost / Teardown¶
[!IMPORTANT] Cost-safety: this vertical deploys real Azure resources. Always run
teardown.shwhen you are done. A forgotten workshop environment can run $180-280/day.
Prerequisites¶
- Azure CLI 2.50+ logged in (
az login), subscription selected (az account set --subscription <id>) jqinstalled (used by teardown enumeration)- Bicep CLI 0.25+ (
az bicep version) - Contributor + User Access Administrator on target subscription (or a pre-created RG with equivalent RBAC)
bash scripts/deploy/validate-prerequisites.shpasses
Cost estimate (rough, East US 2)¶
- While running: ~$$180-280/day (services: Synapse, Databricks, ADF, Purview, Storage, Key Vault (HIPAA-hardened))
- Idle overnight: roughly half if you stop compute (Databricks autostop + Synapse pause)
- Storage + Key Vault residual: <$5/month if you skip teardown
Numbers are indicative for a small demo dataset; production workloads vary significantly. Use az consumption usage list or Cost Management for live numbers.
Runtime¶
- Deploy: ~40-55 minutes (first run; cold Bicep)
- Teardown: ~15-20 minutes (async RG delete completes in the background)
Teardown¶
When finished, run the per-example teardown script. It enforces a typed DESTROY-tribal-health confirmation, logs every step to reports/teardown/tribal-health-<timestamp>.log, and deletes the resource group rg-tribal-health-analytics along with any matching subscription-scope deployments.
# Interactive (recommended)
bash examples/tribal-health/deploy/teardown.sh
# Dry run (enumerate only)
bash examples/tribal-health/deploy/teardown.sh --dry-run
# From the repo root via Makefile
make teardown-example VERTICAL=tribal-health
make teardown-example VERTICAL=tribal-health DRYRUN=1
# CI automation (no prompt — only for ephemeral environments)
bash examples/tribal-health/deploy/teardown.sh --yes
See docs/QUICKSTART.md#teardown for the platform-wide teardown flow.
Directory Structure¶
tribal-health/
├── contracts/ # Data product contracts (schemas, SLOs, owners)
│ ├── clinical-encounters.yaml
│ ├── encounters.yaml
│ ├── healthcare-facilities.yaml
│ ├── patient-demographics.yaml
│ └── population-health.yaml
├── data/ # Sample data + synthetic generators
│ ├── generators/
│ └── open-data/
├── deploy/ # Deployment parameters / Bicep templates
│ ├── params.dev.json
│ ├── params.gov.json
│ └── teardown.sh
├── domains/ # dbt models (bronze / silver / gold) and seeds
│ └── dbt/
├── notebooks/ # Synapse / Fabric / Databricks notebooks
│ ├── chronic_disease_prediction.py
│ └── population_health_dashboard.py
├── reports/ # Power BI report templates and pbix sources
├── ARCHITECTURE.md # Mermaid + prose architecture diagrams
└── README.md # This file
Expected Results¶
After running the medallion pipeline against the bundled seed data, the Gold layer should populate the following tables. Row counts vary with the seed-data generator parameters; the figures below are the approximate scale you should see on a default run.
| Gold Table | Approximate Rows | Notes |
|---|---|---|
gld_behavioral_health | TODO: capture after first run | Populated from Silver via dbt --select tag:gold |
gld_diabetes_prevalence | TODO: capture after first run | Populated from Silver via dbt --select tag:gold |
gld_maternal_child_health | TODO: capture after first run | Populated from Silver via dbt --select tag:gold |
TODO: capture exact counts after the next end-to-end seed run. These are bounded by the seed-data generator parameters in
data/generators/.