Source: examples/interior/README.md — this page is rendered live from that file.

Department of Interior Natural Resources Analytics Platform¶

Examples > Interior

[!TIP] TL;DR — Natural resources analytics covering USGS seismic monitoring, NPS visitor management (300M+ visitors/year), wildfire risk prediction, and water resource optimization with real-time earthquake alerting.

📋 Table of Contents¶

Overview
Key Features
Data Sources
Architecture Overview
Prerequisites
Azure Resources
Tools Required
API Access
Quick Start
1. Environment Setup
2. Configure API Keys
3. Generate Sample Data
4. Deploy Infrastructure
5. Run dbt Models
Sample Analytics Scenarios
1. Earthquake Probability Assessment
2. National Park Capacity Management
3. Wildfire Risk Prediction
Data Products
Earthquake Probability
Park Capacity
Wildfire Risk
Configuration
dbt Profiles
Environment Variables
Azure Government Notes
Monitoring & Alerts
Troubleshooting
Common Issues
Contributing
License
Acknowledgments

A comprehensive natural resources analytics platform built on Azure Cloud Scale Analytics (CSA), providing insights into seismic activity, water resources, national park management, wildfire risk, and wildlife conservation using official USGS, NPS, BLM, and FWS data sources.

📋 Overview¶

The Department of Interior manages 500 million acres of federal land, monitors 1.5 million stream-flow measurements daily, tracks seismic activity from 8,000+ sensors, welcomes 300 million national park visitors annually, and oversees conservation of 2,700+ threatened and endangered species. This platform ingests, processes, and analyzes data from USGS, NPS, BLM, and FWS to enable earthquake probability assessment, national park capacity management, wildfire risk prediction, and water resource optimization.

✨ Key Features¶

Earthquake Monitoring & Probability: Real-time seismic event tracking with statistical forecasting
Water Resource Analytics: Stream-flow, groundwater, and water quality monitoring across 13,000+ sites
National Park Capacity Management: Visitor trends, carrying capacity, and reservation optimization
Wildfire Risk Prediction: Fire weather indices, historical burn patterns, and fuel load analysis
Wildlife Conservation: Species distribution, critical habitat, and population trend tracking
Land Management Dashboards: BLM resource extraction, grazing, and recreation permits

🗄️ Data Sources¶

Source	Agency	Description	URL
Earthquake Catalog	USGS	Real-time and historical earthquake events worldwide	https://earthquake.usgs.gov/fdsnws/event/1/
Water Services API	USGS	Stream-flow, groundwater levels, water quality	https://waterservices.usgs.gov/
NPS Visitor Stats	NPS	Monthly recreation visits by park unit	https://irma.nps.gov/Stats/
NPS API	NPS	Park info, alerts, campgrounds, activities	https://www.nps.gov/subjects/developer/api-documentation.htm
NIFC Wildfire Data	DOI/USDA	Active fire perimeters, historical burns, fire weather	https://data-nifc.opendata.arcgis.com/
ECOS	FWS	Endangered species listings, critical habitat, recovery plans	https://ecos.fws.gov/ecp/
BLM Public Data	BLM	Land status, mining claims, grazing allotments, oil & gas leases	https://gbp-blm-egis.hub.arcgis.com/

🏗️ Architecture Overview¶

graph TD
    subgraph "Data Sources"
        A1[USGS Earthquake API<br/>Real-Time Seismic]
        A2[USGS Water Services<br/>Stream & Groundwater]
        A3[NPS Stats API<br/>Visitor Counts]
        A4[NIFC Fire Data<br/>Wildfire Perimeters]
        A5[FWS ECOS<br/>Species & Habitat]
        A6[BLM Data Hub<br/>Land & Resources]
    end

    subgraph "Ingestion Layer"
        I1[ADF Pipeline<br/>Batch Ingestion]
        I2[REST Connectors<br/>API Polling]
        I3[GeoJSON Loader<br/>Spatial Data]
    end

    subgraph "Bronze Layer — Raw"
        B1[brz_earthquakes]
        B2[brz_water_sites]
        B3[brz_water_measurements]
        B4[brz_park_visits]
        B5[brz_fire_perimeters]
        B6[brz_species_listings]
        B7[brz_blm_permits]
    end

    subgraph "Silver Layer — Cleansed"
        S1[slv_seismic_events]
        S2[slv_water_resources]
        S3[slv_park_operations]
        S4[slv_fire_history]
        S5[slv_species_status]
    end

    subgraph "Gold Layer — Analytics"
        G1[gld_earthquake_probability]
        G2[gld_park_capacity]
        G3[gld_wildfire_risk]
        G4[gld_water_availability]
        G5[gld_interior_dashboard]
    end

    subgraph "Consumption"
        C1[Seismic Dashboard]
        C2[Park Capacity Tool]
        C3[Fire Risk Maps]
        C4[Resource APIs]
    end

    A1 --> I2
    A2 --> I2
    A3 --> I2
    A4 --> I3
    A5 --> I1
    A6 --> I1

    I1 --> B6
    I1 --> B7
    I2 --> B1
    I2 --> B2
    I2 --> B3
    I2 --> B4
    I3 --> B5

    B1 --> S1
    B2 --> S2
    B3 --> S2
    B4 --> S3
    B5 --> S4
    B6 --> S5

    S1 --> G1
    S2 --> G4
    S3 --> G2
    S4 --> G3
    S1 --> G5
    S2 --> G5
    S3 --> G5
    S4 --> G5

    G1 --> C1
    G2 --> C2
    G3 --> C3
    G5 --> C4

📎 Prerequisites¶

Azure Resources¶

Azure subscription with contributor access
Azure Data Factory or Synapse Analytics
Azure Data Lake Storage Gen2
Azure SQL Database or Synapse SQL Pool
Azure Key Vault for API credentials

Tools Required¶

Azure CLI (2.55.0 or later)
dbt CLI (1.7.0 or later)
Python 3.9+
Git
GDAL/OGR (optional, for geospatial data conversion)

API Access¶

NPS API key (free at https://www.nps.gov/subjects/developer/get-started.htm)
USGS APIs (no key required — open access)
NIFC ArcGIS (no key required — open access)

🚀 Quick Start¶

1. Environment Setup¶

# Clone the repository
git clone <repository-url>
cd csa-inabox/examples/interior

# Install Python dependencies
pip install -r requirements.txt

# Install dbt packages
cd domains/dbt
dbt deps

2. Configure API Keys¶

# Add to Azure Key Vault or local environment
export NPS_API_KEY="your-nps-api-key"

3. Generate Sample Data¶

# Generate synthetic natural resources data
python data/generators/generate_interior_data.py --output-dir domains/dbt/seeds

# Or fetch real data from APIs
python data/open-data/fetch_earthquakes.py \
  --starttime "2023-01-01" --endtime "2023-12-31" \
  --minmagnitude 2.5 --maxmagnitude 9.0

python data/open-data/fetch_water.py \
  --sites "09380000,02037500,12354500" \
  --parameters "00060,00065" \
  --period "P365D"

python data/open-data/fetch_park_visits.py --years "2020,2021,2022,2023"

python data/open-data/fetch_fire_perimeters.py --year 2023 --min-acres 100

4. Deploy Infrastructure¶

# Configure parameters
cp deploy/params.dev.json deploy/params.local.json
# Edit params.local.json with your values

# Deploy using Azure CLI
az deployment group create \
  --resource-group rg-interior-analytics \
  --template-file ../../deploy/bicep/DLZ/main.bicep \
  --parameters @deploy/params.local.json

5. Run dbt Models¶

cd domains/dbt

# Test connections
dbt debug

# Load seed data
dbt seed

# Run models
dbt run

# Run tests
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

💡 Sample Analytics Scenarios¶

1. Earthquake Probability Assessment¶

Analyze seismic event clustering using the USGS earthquake catalog to estimate probability of significant aftershocks and identify regions with elevated seismic risk.

-- Seismic risk zones with probability estimates
SELECT
    seismic_zone,
    region_name,
    total_events_10yr,
    m4_plus_events,
    m5_plus_events,
    avg_depth_km,
    max_magnitude,
    last_significant_event,
    days_since_m5_plus,
    gutenberg_richter_b_value,
    annual_m4_probability,
    risk_tier
FROM gold.gld_earthquake_probability
WHERE annual_m4_probability > 0.10
ORDER BY annual_m4_probability DESC;

2. National Park Capacity Management¶

Model park carrying capacity using visitor count trends, infrastructure data, and resource sensitivity to optimize reservation systems and reduce overcrowding.

-- Park capacity utilization and management recommendations
SELECT
    park_name,
    park_code,
    state,
    annual_visits_2023,
    peak_month,
    peak_month_visits,
    estimated_carrying_capacity,
    utilization_pct_peak,
    avg_length_of_stay_hours,
    campground_fill_rate_pct,
    overcrowding_risk,
    recommended_action
FROM gold.gld_park_capacity
WHERE overcrowding_risk IN ('HIGH', 'CRITICAL')
ORDER BY utilization_pct_peak DESC;

3. Wildfire Risk Prediction¶

Combine historical fire perimeters, fuel load estimates, drought indices, and weather forecasts to score wildfire risk at the landscape level.

-- Wildfire risk scoring by region
SELECT
    region_name,
    state,
    total_acres,
    fire_history_score,
    fuel_load_index,
    drought_severity_index,
    wind_exposure_score,
    wui_population,
    suppression_difficulty_score,
    composite_fire_risk,
    historical_fires_10yr,
    total_acres_burned_10yr,
    risk_tier
FROM gold.gld_wildfire_risk
WHERE composite_fire_risk >= 70
ORDER BY composite_fire_risk DESC
LIMIT 50;

✨ Data Products¶

Earthquake Probability (`earthquake-probability`)¶

Description: Seismic zone risk assessment with Gutenberg-Richter modeling
Freshness: Daily (earthquake catalog updates every 5 minutes; models retrained weekly)
Coverage: Global (emphasis on CONUS, Alaska, Hawaii, territories)
API: /api/v1/earthquake-probability

Park Capacity (`park-capacity`)¶

Description: National park visitor trends with capacity modeling
Freshness: Monthly visitor counts with annual model recalibration
Coverage: All 423 NPS units (63 national parks + monuments, seashores, etc.)
API: /api/v1/park-capacity

Wildfire Risk (`wildfire-risk`)¶

Description: Landscape-level wildfire risk scoring with multi-factor analysis
Freshness: Daily (fire weather) / Annual (fuel load and historical recalc)
Coverage: All federal and adjacent lands in the western U.S.
API: /api/v1/wildfire-risk

⚙️ Configuration¶

⚙️ dbt Profiles¶

Add to your ~/.dbt/profiles.yml:

interior_analytics:
  target: dev
  outputs:
    dev:
      type: databricks
      host: "{{ env_var('DBT_HOST') }}"
      http_path: "{{ env_var('DBT_HTTP_PATH') }}"
      token: "{{ env_var('DBT_TOKEN') }}"
      schema: interior_dev
      catalog: dev
    prod:
      type: databricks
      host: "{{ env_var('DBT_HOST_PROD') }}"
      http_path: "{{ env_var('DBT_HTTP_PATH_PROD') }}"
      token: "{{ env_var('DBT_TOKEN_PROD') }}"
      schema: interior
      catalog: prod

⚙️ Environment Variables¶

# Required for data fetching
NPS_API_KEY=your-nps-api-key

# Required for dbt
DBT_HOST=your-databricks-host
DBT_HTTP_PATH=your-sql-warehouse-path
DBT_TOKEN=your-access-token

# Optional
INTERIOR_LOG_LEVEL=INFO
INTERIOR_BATCH_SIZE=5000

🔒 Azure Government Notes¶

This example is compatible with Azure Government (US) regions. When deploying to Azure Government:

Use usgovvirginia or usgovarizona as your Azure region
Update ARM/Bicep endpoint references to .usgovcloudapi.net
USGS APIs are publicly accessible from government networks
BLM permit data may contain lessee PII — apply data masking in Silver layer for non-privileged users
NIFC fire data is public; operational fire data during active incidents may have access restrictions
Endangered species location data may be restricted to prevent poaching — consult FWS before exposing precise coordinates

📊 Monitoring & Alerts¶

Earthquake Alerts: Automated notifications for M4.0+ events in monitored zones
Data Freshness: Alerts when USGS water data or NPS visitor counts are overdue
Data Quality: Automated tests on magnitude ranges, coordinate bounds, and flow measurements
Fire Season: Elevated monitoring during April–October fire season
Cost Management: Daily compute spend tracking with budget thresholds

🔧 Troubleshooting¶

🔧 Common Issues¶

USGS Earthquake API Limits: Queries returning >20,000 events will fail. Use time and magnitude filters to partition requests.
Water Services Time Series: Long time series (10+ years) should use the --period parameter rather than date ranges to avoid timeouts.
NPS Stats Seasonality: Monthly data is only available after ~90 days. Use --year-to-date for preliminary figures.
Fire Perimeter Shapefiles: Active fire perimeters update multiple times daily. Use the --latest flag for current boundaries.
Geospatial Join Performance: BLM land status data is large (~50 million polygons). Pre-filter by state before spatial joins.

🔗 Contributing¶

Fork the repository
Create a feature branch (git checkout -b feature/new-data-source)
Make changes and add tests
Run quality checks (make lint test)
Submit a pull request

🔗 License¶

This project is licensed under the MIT License. See LICENSE file for details.

🔗 Acknowledgments¶

USGS, NPS, BLM, and FWS for maintaining comprehensive natural resource data programs
NIFC for open wildfire data and the Wildland Fire Decision Support System
Azure Cloud Scale Analytics team for the foundational platform
Contributors and the open-source community

Interior Architecture — Detailed platform architecture and design decisions
Examples Index — Overview of all CSA-in-a-Box example verticals
Platform Architecture — Core CSA platform architecture
Getting Started Guide — Platform setup and onboarding
DOT Transportation Analytics — Related federal infrastructure vertical
Tribal Health Analytics — Related federal/tribal vertical

Prerequisites / Cost / Teardown¶

[!IMPORTANT] Cost-safety: this vertical deploys real Azure resources. Always run teardown.sh when you are done. A forgotten workshop environment can run $120-200/day.

Prerequisites¶

Azure CLI 2.50+ logged in (az login), subscription selected (az account set --subscription <id>)
jq installed (used by teardown enumeration)
Bicep CLI 0.25+ (az bicep version)
Contributor + User Access Administrator on target subscription (or a pre-created RG with equivalent RBAC)
bash scripts/deploy/validate-prerequisites.sh passes

Cost estimate (rough, East US 2)¶

While running: ~$$120-200/day (services: Synapse, Databricks, ADF, Storage, Key Vault)
Idle overnight: roughly half if you stop compute (Databricks autostop + Synapse pause)
Storage + Key Vault residual: <$5/month if you skip teardown

Numbers are indicative for a small demo dataset; production workloads vary significantly. Use az consumption usage list or Cost Management for live numbers.

Runtime¶

Deploy: ~30-45 minutes (first run; cold Bicep)
Teardown: ~10-15 minutes (async RG delete completes in the background)

Teardown¶

When finished, run the per-example teardown script. It enforces a typed DESTROY-interior confirmation, logs every step to reports/teardown/interior-<timestamp>.log, and deletes the resource group rg-interior-analytics along with any matching subscription-scope deployments.

# Interactive (recommended)
bash examples/interior/deploy/teardown.sh

# Dry run (enumerate only)
bash examples/interior/deploy/teardown.sh --dry-run

# From the repo root via Makefile
make teardown-example VERTICAL=interior
make teardown-example VERTICAL=interior DRYRUN=1

# CI automation (no prompt — only for ephemeral environments)
bash examples/interior/deploy/teardown.sh --yes

See docs/QUICKSTART.md#teardown for the platform-wide teardown flow.

Directory Structure¶

interior/
├── contracts/                # Data product contracts (schemas, SLOs, owners)
│   ├── earthquake-events.yaml
│   ├── earthquake-monitoring.yaml
│   ├── natural-resources.yaml
│   └── park-visitors.yaml
├── data/                     # Sample data + synthetic generators
│   ├── generators/
│   └── open-data/
├── deploy/                   # Deployment parameters / Bicep templates
│   ├── params.dev.json
│   ├── params.gov.json
│   └── teardown.sh
├── domains/                  # dbt models (bronze / silver / gold) and seeds
│   └── dbt/
├── notebooks/                # Synapse / Fabric / Databricks notebooks
│   ├── geological_hazard_analysis.py
│   └── park_capacity_forecasting.py
├── reports/                  # Power BI report templates and pbix sources
├── ARCHITECTURE.md           # Mermaid + prose architecture diagrams
└── README.md                 # This file

Expected Results¶

After running the medallion pipeline against the bundled seed data, the Gold layer should populate the following tables. Row counts vary with the seed-data generator parameters; the figures below are the approximate scale you should see on a default run.

Gold Table	Approximate Rows	Notes
`gld_park_capacity`	TODO: capture after first run	Populated from Silver via dbt `--select tag:gold`
`gld_seismic_risk`	TODO: capture after first run	Populated from Silver via dbt `--select tag:gold`
`gld_wildfire_risk`	TODO: capture after first run	Populated from Silver via dbt `--select tag:gold`

TODO: capture exact counts after the next end-to-end seed run. These are bounded by the seed-data generator parameters in data/generators/.