Skip to content

Interior Natural Resources

Department of Interior Natural Resources Analytics on Azure

The U.S. Department of the Interior manages roughly 500 million acres of federal land and oversees natural resource programs spanning six major bureaus: the U.S. Geological Survey (USGS), National Park Service (NPS), Bureau of Land Management (BLM), U.S. Fish and Wildlife Service (FWS), Bureau of Reclamation, and the Office of Surface Mining. Each bureau publishes operational data through independent APIs and bulk-download portals — earthquake catalogs, stream-gauge telemetry, visitor counts, wildfire perimeters, species occurrence records, and land-use permits.

Individually, these datasets answer narrow operational questions. Combined in a unified lakehouse, they enable cross-bureau analysis: correlating seismic activity with groundwater-level changes, forecasting park capacity against wildfire smoke advisories, or mapping endangered species habitat overlap with active mining claims. This use case applies the CSA-in-a-Box medallion architecture to Interior's public data catalog, demonstrating both real-time streaming for sensor networks and batch ingestion for administrative records.


Architecture

The architecture separates two ingestion paths: a streaming path for high-frequency sensor telemetry (USGS earthquake feeds, water-gauge readings) and a batch path for periodically updated administrative datasets (NPS visitor stats, FWS species listings, BLM permit records, NIFC fire perimeters). Both converge in ADLS Gen2 bronze, flow through dbt-managed silver and gold layers, and serve downstream consumers through Power BI and Azure Data Explorer.

graph LR
    subgraph Interior Data Sources
        A[USGS Earthquake API<br/>8K+ seismic sensors]
        B[USGS Water Services<br/>13K+ stream gauges]
        C[NPS Visitor Stats<br/>400+ park units]
        D[NIFC Wildfire Data<br/>Fire perimeters]
        E[FWS ECOS<br/>Endangered species]
        F[BLM Public Data<br/>Land & permits]
    end

    subgraph Ingestion
        G[Azure Event Hubs<br/>Real-time streaming]
        H[Azure Data Factory<br/>Batch pipelines]
    end

    subgraph "Medallion Lakehouse"
        I[ADLS Gen2<br/>Bronze — raw JSON/CSV]
        J[dbt Models<br/>Silver — normalized]
        K[dbt Models<br/>Gold — analytics-ready]
    end

    subgraph Consumption
        L[Power BI<br/>Operational dashboards]
        M[Azure Data Explorer<br/>Real-time monitoring]
        N[Synapse Notebooks<br/>Geospatial analysis]
    end

    A --> G
    B --> G
    C --> H
    D --> H
    E --> H
    F --> H
    G --> I
    H --> I
    G --> M
    I --> J
    J --> K
    K --> L
    K --> M
    K --> N

Data Sources

Source Description Volume / Coverage Update Frequency Access Method
USGS Earthquake API ComCat earthquake catalog — origin time, magnitude, depth, felt reports, tsunami flags from the Advanced National Seismic System 8,000+ sensors, ~20,000 events/year globally (M2.5+) Real-time (seconds) REST API (GeoJSON)
USGS Water Services National Water Information System (NWIS) — daily and instantaneous streamflow, gauge height, water temperature, dissolved oxygen 13,000+ active stream gauges, 850,000+ historical sites Real-time (15 min) to daily REST API (JSON/RDB)
NPS Visitor Statistics Monthly recreation visits, camping, lodging, and backcountry use across all NPS units 400+ park units, 300M+ recreation visits/year Monthly IRMA Stats
NIFC Wildfire Data National Interagency Fire Center — active fire perimeters, prescribed burns, fire weather zones 50,000+ fires/year, GIS perimeter polygons Daily during fire season ArcGIS REST / GeoJSON
FWS ECOS Environmental Conservation Online System — threatened/endangered species listings, critical habitat designations, recovery plans 1,600+ listed species, 800+ critical habitat designations Quarterly ECOS REST API
BLM Public Data Land status, mining claims, grazing allotments, oil & gas lease parcels, right-of-way grants 245M acres of public land, 63,000+ grazing permits Monthly BLM Geospatial Services

Data Access

All Interior datasets listed above are publicly available at no cost. The USGS Earthquake API and Water Services require no authentication. The NPS Stats API is open but rate-limited. NIFC and BLM data are available through ArcGIS Open Data portals. FWS ECOS provides both a web interface and downloadable bulk files.

Rate Limits

USGS Water Services requests should include a User-Agent header identifying your application. Batch fetches against NPS IRMA should observe a 300 ms delay between requests to avoid throttling. The USGS Earthquake API limits queries to 20,000 events per request — use date windowing for larger extractions.


Step 1 — Real-Time Earthquake Monitoring

The USGS Earthquake Hazards Program publishes events within seconds of detection through the ComCat API. A streaming producer polls the API on a short interval, serializes events to Avro, and pushes them to Event Hubs for sub-minute alerting in Azure Data Explorer alongside batch persistence to the lakehouse.

Streaming Producer

The following producer fetches recent earthquake events from the USGS API and forwards them to Azure Event Hubs. This pattern mirrors the fetcher in examples/interior/data/open-data/fetch_usgs_nps.py but adds Event Hubs integration for real-time downstream consumption.

"""USGS Earthquake streaming producer for Azure Event Hubs."""

import json
import time
from datetime import datetime, timedelta, timezone

import requests
from azure.eventhub import EventHubProducerClient, EventData

USGS_API = "https://earthquake.usgs.gov/fdsnws/event/1/query"
EVENT_HUB_CONN_STR = "Endpoint=sb://<namespace>.servicebus.windows.net/;..."
EVENT_HUB_NAME = "usgs-earthquakes"
POLL_INTERVAL_SEC = 60
MIN_MAGNITUDE = 2.5


def fetch_recent_earthquakes(since: datetime) -> list[dict]:
    """Fetch earthquake events since a given timestamp."""
    params = {
        "format": "geojson",
        "starttime": since.strftime("%Y-%m-%dT%H:%M:%S"),
        "minmagnitude": MIN_MAGNITUDE,
        "orderby": "time",
    }
    resp = requests.get(USGS_API, params=params, timeout=30)
    resp.raise_for_status()
    features = resp.json().get("features", [])

    events = []
    for feat in features:
        props = feat["properties"]
        coords = feat["geometry"]["coordinates"]
        events.append({
            "event_id": feat["id"],
            "event_time": datetime.fromtimestamp(
                props["time"] / 1000, tz=timezone.utc
            ).isoformat(),
            "latitude": coords[1],
            "longitude": coords[0],
            "depth_km": coords[2],
            "magnitude": props["mag"],
            "magnitude_type": props.get("magType", ""),
            "place": props.get("place", ""),
            "tsunami_flag": bool(props.get("tsunami", 0)),
            "felt_reports": props.get("felt"),
            "alert_level": props.get("alert", ""),
            "status": props.get("status", ""),
        })
    return events


def stream_to_event_hub():
    """Continuously poll USGS and push events to Event Hubs."""
    producer = EventHubProducerClient.from_connection_string(
        conn_str=EVENT_HUB_CONN_STR,
        eventhub_name=EVENT_HUB_NAME,
    )
    last_poll = datetime.now(timezone.utc) - timedelta(minutes=5)

    with producer:
        while True:
            events = fetch_recent_earthquakes(since=last_poll)
            if events:
                batch = producer.create_batch()
                for event in events:
                    batch.add(EventData(json.dumps(event)))
                producer.send_batch(batch)
                print(f"Sent {len(events)} earthquake events to Event Hubs")

            last_poll = datetime.now(timezone.utc)
            time.sleep(POLL_INTERVAL_SEC)


if __name__ == "__main__":
    stream_to_event_hub()

KQL — Real-Time Earthquake Monitoring

Once events land in Azure Data Explorer, analysts can query for significant seismic activity, detect swarms, and correlate with nearby water-gauge anomalies.

// Recent significant earthquakes (M4.0+) in the last 24 hours
UsgsEarthquakes
| where event_time > ago(24h)
| where magnitude >= 4.0
| project event_time, magnitude, magnitude_type, depth_km,
          latitude, longitude, place, alert_level, felt_reports
| order by magnitude desc

// Earthquake swarm detection: 10+ events within 50 km in 6 hours
UsgsEarthquakes
| where event_time > ago(7d)
| extend geo_hash = geo_point_to_s2cell(longitude, latitude, 12)
| summarize event_count = count(),
            avg_magnitude = avg(magnitude),
            max_magnitude = max(magnitude),
            first_event = min(event_time),
            last_event = max(event_time)
    by geo_hash, bin(event_time, 6h)
| where event_count >= 10
| order by event_count desc

// Cross-reference: water-gauge anomalies near recent M5+ earthquakes
let significant_quakes = UsgsEarthquakes
    | where event_time > ago(48h) and magnitude >= 5.0
    | project quake_time = event_time, quake_lat = latitude,
              quake_lon = longitude, magnitude, place;
UsgsWaterGauges
| where measurement_time > ago(48h)
| extend distance_km = geo_distance_2points(
    longitude, latitude,
    toscalar(significant_quakes | top 1 by magnitude | project quake_lon),
    toscalar(significant_quakes | top 1 by magnitude | project quake_lat)
  ) / 1000
| where distance_km < 100
| summarize avg_flow = avg(streamflow_cfs),
            stddev_flow = stdev(streamflow_cfs)
    by site_id, bin(measurement_time, 1h)
| where stddev_flow > avg_flow * 0.5
| order by measurement_time desc

Step 2 — Water Resource Analytics

USGS operates over 13,000 active stream gauges reporting streamflow (discharge) and gauge height at 15-minute to daily intervals. The silver layer normalizes raw NWIS records, calculates drought indices, classifies flood stages against NWS reference levels, and derives seasonal percentile rankings.

dbt Silver Model — Water Observations

The silver model transforms raw bronze records into analytics-ready water observations with drought classification, flood stage mapping, and specific discharge normalization. Full model: examples/interior/domains/dbt/models/silver/slv_water_resources.sql.

-- Silver layer: key transformations (excerpt from slv_water_resources.sql)

-- Flood stage classification against NWS reference levels
CASE
    WHEN parameter_code = '00065'
         AND COALESCE(daily_mean, value) >= major_flood_stage_ft
        THEN 'MAJOR_FLOOD'
    WHEN parameter_code = '00065'
         AND COALESCE(daily_mean, value) >= moderate_flood_stage_ft
        THEN 'MODERATE_FLOOD'
    WHEN parameter_code = '00065'
         AND COALESCE(daily_mean, value) >= flood_stage_ft
        THEN 'MINOR_FLOOD'
    WHEN parameter_code = '00065'
         AND COALESCE(daily_mean, value) >= action_stage_ft
        THEN 'ACTION_STAGE'
    WHEN parameter_code = '00065' THEN 'NORMAL'
    ELSE NULL
END AS flood_stage_class,

-- Drought index classification using streamflow percentile
CASE
    WHEN COALESCE(daily_mean, value) <= percentile_10
        THEN 'EXTREME_DROUGHT'
    WHEN COALESCE(daily_mean, value) <= percentile_25
        THEN 'MODERATE_DROUGHT'
    WHEN COALESCE(daily_mean, value) <= percentile_75
        THEN 'NORMAL'
    WHEN COALESCE(daily_mean, value) <= percentile_90
        THEN 'ABOVE_NORMAL'
    WHEN COALESCE(daily_mean, value) > percentile_90
        THEN 'MUCH_ABOVE_NORMAL'
    ELSE 'INSUFFICIENT_HISTORY'
END AS drought_index,

-- Specific discharge: flow normalized by drainage area
CASE
    WHEN parameter_code = '00060' AND drainage_area_sq_mi > 0
    THEN ROUND(COALESCE(daily_mean, value) / drainage_area_sq_mi, 4)
    ELSE NULL
END AS specific_discharge_cfs_per_sqmi

Hydrologic Unit Codes

The silver model extracts HUC region and sub-region codes from the full HUC identifier, enabling watershed-level aggregation. HUC-2 regions align with major river basins (e.g., 14 = Upper Colorado, 15 = Lower Colorado), while HUC-4 sub-regions provide finer drainage area groupings.


Step 3 — National Park Capacity Management

NPS publishes monthly visitation statistics for each park unit — recreation visits, camping (tent, RV, backcountry), lodging, and recreation hours. Combined with park infrastructure data (campground capacity, parking spaces, trail miles), the gold layer produces capacity utilization metrics and seasonal forecasting inputs.

Capacity Analysis Pattern

-- Gold layer: park capacity utilization and seasonal patterns
WITH monthly_metrics AS (
    SELECT
        park_code,
        park_name,
        state,
        region,
        measurement_year,
        measurement_month,
        recreation_visits,
        tent_campers + rv_campers + backcountry_campers AS total_campers,
        campground_capacity,
        CASE
            WHEN campground_capacity > 0
            THEN ROUND(
                (tent_campers + rv_campers) * 100.0 / campground_capacity, 1
            )
            ELSE NULL
        END AS campground_utilization_pct,
        -- Year-over-year comparison
        LAG(recreation_visits, 12) OVER (
            PARTITION BY park_code ORDER BY measurement_year, measurement_month
        ) AS visits_prior_year
    FROM {{ ref('slv_park_visitors') }}
),

seasonal_rank AS (
    SELECT *,
        PERCENT_RANK() OVER (
            PARTITION BY park_code, measurement_month
            ORDER BY recreation_visits
        ) AS seasonal_percentile
    FROM monthly_metrics
)

SELECT *,
    CASE
        WHEN campground_utilization_pct >= 95 THEN 'AT_CAPACITY'
        WHEN campground_utilization_pct >= 80 THEN 'HIGH_UTILIZATION'
        WHEN campground_utilization_pct >= 50 THEN 'MODERATE'
        ELSE 'LOW'
    END AS capacity_status,
    CASE
        WHEN visits_prior_year > 0
        THEN ROUND(
            (recreation_visits - visits_prior_year) * 100.0 / visits_prior_year, 1
        )
        ELSE NULL
    END AS yoy_change_pct
FROM seasonal_rank

Step 4 — Wildfire Risk Prediction

NIFC publishes active fire perimeters, historical fire occurrence data, and fire weather zone boundaries. The gold layer combines wildfire history with USGS vegetation and terrain data to produce risk scores at the HUC-8 watershed level.

Cross-Bureau Wildfire Risk Model

-- Gold layer: wildfire risk scoring by watershed (excerpt from gld_wildfire_risk.sql)
SELECT
    w.huc_code,
    w.huc_name,
    w.state_code,
    -- Historical fire frequency
    COUNT(DISTINCT f.fire_id) AS fires_last_10yr,
    SUM(f.acres_burned) AS total_acres_burned_10yr,
    -- Current conditions
    AVG(h.drought_index_numeric) AS avg_drought_severity,
    -- Infrastructure exposure
    COUNT(DISTINCT p.park_code) AS nps_units_in_watershed,
    SUM(p.annual_recreation_visits) AS total_park_visits_at_risk,
    -- BLM land overlap
    SUM(b.blm_acres) AS blm_acres_in_watershed,
    COUNT(DISTINCT b.grazing_allotment_id) AS active_grazing_allotments,
    -- Composite risk score
    (
        COALESCE(fires_last_10yr, 0) * 0.3 +
        COALESCE(avg_drought_severity, 0) * 0.3 +
        CASE WHEN total_park_visits_at_risk > 1000000 THEN 20
             WHEN total_park_visits_at_risk > 100000 THEN 10
             ELSE 5 END * 0.2 +
        COALESCE(total_acres_burned_10yr / NULLIF(blm_acres_in_watershed, 0), 0) * 100 * 0.2
    ) AS composite_fire_risk_score
FROM dim_watershed w
LEFT JOIN fct_wildfires f ON w.huc_code = f.huc_code AND f.fire_year >= YEAR(CURRENT_DATE) - 10
LEFT JOIN slv_water_resources h ON w.huc_code = h.huc_code
LEFT JOIN fct_park_annual p ON w.huc_code = p.huc_code
LEFT JOIN dim_blm_lands b ON w.huc_code = b.huc_code
GROUP BY w.huc_code, w.huc_name, w.state_code

Step 5 — Wildlife Conservation Analytics

FWS ECOS tracks 1,600+ species listed under the Endangered Species Act, including critical habitat boundary polygons. Overlaying these with BLM land-use permits and USGS water data enables impact assessment for proposed activities on federal land.

Conservation Overlap Analysis

Analysis Data Sources Output
Critical habitat vs. active mining claims FWS ECOS habitat polygons + BLM mining claims Conflict parcels requiring Section 7 consultation
Species range vs. wildfire perimeters FWS species occurrence + NIFC active fires At-risk populations for emergency response
Aquatic species vs. drought conditions FWS aquatic species + USGS streamflow drought index Streams where listed fish face low-flow stress
Habitat connectivity vs. BLM rights-of-way FWS habitat corridors + BLM ROW grants Fragmentation risk scoring for permit review

Section 7 Consultation

Under the Endangered Species Act, federal agencies must consult with FWS before authorizing actions that may affect listed species or critical habitat. The overlap analysis automates the initial screening step, identifying parcels where formal consultation is likely required.


Step 6 — BLM Land Management

BLM administers 245 million acres of public land, issuing permits for grazing, mining, oil and gas exploration, renewable energy, and rights-of-way. Combining BLM permit data with environmental layers from USGS, FWS, and NIFC enables data-driven land-use planning.

Key BLM Datasets

Dataset Records Description
Grazing allotments 18,000+ Boundaries, AUMs, permittee, livestock type
Mining claims 350,000+ active Lode, placer, mill site, tunnel site claims
Oil & gas leases 25,000+ active Lease parcels, production volumes, royalties
Rights-of-way 63,000+ Roads, pipelines, power lines, communication sites
Solar/wind energy zones 700+ Designated renewable energy development areas

Cross-Bureau Analytics

The unified lakehouse enables queries that span bureau boundaries — analysis that is difficult or impossible when each bureau maintains isolated data stores.

Scenario Bureaus Question Answered
Seismic impact on water supply USGS Earthquake + USGS Water Do M5+ earthquakes cause measurable streamflow changes within 100 km?
Wildfire smoke and park visitation NIFC + NPS How do active fires within 50 miles affect monthly park visits?
Drought and grazing permits USGS Water + BLM Which grazing allotments are in watersheds experiencing extreme drought?
Habitat and energy development FWS + BLM Do proposed solar energy zones overlap with critical habitat?
Park infrastructure and flood risk NPS + USGS Water Which park facilities are within flood-stage watersheds?

Implementation Reference

The examples/interior/ directory contains the complete implementation:

examples/interior/
  ARCHITECTURE.md                          # System design documentation
  contracts/
    earthquake-monitoring.yaml             # Streaming data contract
    earthquake-events.yaml                 # Batch data contract
    natural-resources.yaml                 # Cross-domain contract
    park-visitors.yaml                     # NPS data contract
  data/
    generators/generate_interior_data.py   # Synthetic test data
    open-data/fetch_usgs_nps.py            # USGS/NPS API fetcher
  deploy/
    params.dev.json                        # Development parameters
    params.gov.json                        # GovCloud parameters
  domains/dbt/
    models/bronze/                         # Raw ingestion models
    models/silver/                         # Normalized & enriched
    models/gold/                           # Analytics-ready aggregates
    seeds/                                 # Reference/test data
  notebooks/
    geological_hazard_analysis.py          # Seismic analysis notebook
    park_capacity_forecasting.py           # Visitor forecasting


Sources

Resource URL
USGS Earthquake Hazards API https://earthquake.usgs.gov/fdsnws/event/1/
USGS Water Services (NWIS) https://waterservices.usgs.gov/
NPS Visitor Use Statistics https://irma.nps.gov/Stats/
NPS Stats Portal https://irma.nps.gov/Stats/
NIFC Open Data https://data-nifc.opendata.arcgis.com/
FWS ECOS https://ecos.fws.gov/ecp/
FWS Critical Habitat Maps https://ecos.fws.gov/ecp/report/table/critical-habitat.html
BLM Navigator (replaced) https://www.blm.gov/services/geospatial (BLM Navigator was retired; geospatial services consolidated here)
BLM Geospatial Services https://www.blm.gov/services/geospatial
USGS National Map https://apps.nationalmap.gov/