DOT Transportation Analytics
DOT Multi-Modal Transportation Analytics on Azure¶
This page documents the end-to-end architecture for building a multi-modal transportation analytics platform on Azure. The platform ingests data from five DOT operating administrations — NHTSA (highway safety), FHWA (bridges and highway performance), FTA (transit), FAA (aviation), and BTS (cross-modal statistics) — and produces analytical outputs for crash hotspot detection, bridge maintenance prioritization, transit performance monitoring, and airport operations analysis.
The implementation follows the CSA-in-a-Box medallion architecture pattern. Working code is in examples/dot/.
Architecture¶
flowchart LR
subgraph sources["Data Sources"]
FARS["NHTSA FARS<br/>Fatal Crash Data"]
NBI["FHWA NBI<br/>Bridge Inventory"]
HPMS["FHWA HPMS<br/>Highway Performance"]
NTD["FTA NTD<br/>Transit Database"]
ATADS["FAA ATADS<br/>Airport Operations"]
BTS["BTS<br/>Cross-Modal Stats"]
GTFS["GTFS Real-Time<br/>Transit Feeds"]
end
subgraph ingestion["Ingestion"]
ADF["Azure Data Factory<br/>Batch Pipelines"]
EH["Event Hub<br/>GTFS-RT / SWIM"]
end
subgraph medallion["Medallion Architecture (ADLS Gen2)"]
BRZ["Bronze<br/>Raw JSON / CSV"]
SLV["Silver<br/>Cleaned & Conformed"]
GLD["Gold<br/>Analytical Models"]
end
subgraph serving["Serving"]
PBI["Power BI<br/>Dashboards"]
ADX["Azure Data Explorer<br/>Sub-second Queries"]
SYN["Synapse Serverless<br/>Ad-hoc SQL"]
end
FARS & NBI & HPMS & NTD & ATADS & BTS --> ADF
GTFS --> EH
ADF --> BRZ
EH --> BRZ
BRZ -->|"dbt"| SLV
SLV -->|"dbt"| GLD
GLD --> PBI & ADX & SYN Data Sources¶
| Source | Agency | Description | Volume | Update Frequency | API / Access |
|---|---|---|---|---|---|
| FARS | NHTSA | Fatality Analysis Reporting System — every fatal motor vehicle crash in the U.S. | ~6.4M crash records (1975–present) | Annual + preliminary quarterly | CrashAPI |
| NBI | FHWA | National Bridge Inventory — structural condition of every bridge >20ft | 620,000+ bridges | Annual | data.transportation.gov |
| HPMS | FHWA | Highway Performance Monitoring System — pavement condition, traffic volume, lane miles | All federal-aid highways | Annual | HPMS |
| NTD | FTA | National Transit Database — ridership, financial, safety data for transit agencies | 900+ agencies, ~2,200 modes | Monthly + Annual | NTD |
| ATADS | FAA | Air Traffic Activity Data System — tower operations, instrument operations | 500+ airports | Monthly | ATADS |
| BTS | DOT | Bureau of Transportation Statistics — cross-modal performance metrics | National | Monthly + Annual | BTS |
GTFS Real-Time
Over 1,000 U.S. transit agencies publish GTFS-realtime feeds (vehicle positions, trip updates, service alerts). These are ingested via Event Hub for near-real-time transit monitoring. The MobilityData GTFS-RT catalog lists available feeds.
Highway Safety Analytics¶
FARS Data Ingestion¶
The examples/dot/data/open-data/fetch_fars.py script fetches crash data from two endpoints:
- NHTSA CrashAPI — case-level fatal crash records by state and year
- data.transportation.gov SODA API — supplemental fatality and highway safety datasets
"""Fetch FARS crash cases for a state and year via CrashAPI."""
import requests
CRASH_API = "https://crashviewer.nhtsa.dot.gov/CrashAPI"
def fetch_crash_cases(state_fips: int, year: int, max_results: int = 5000):
response = requests.get(
f"{CRASH_API}/crashes/GetCaseList",
params={
"StateCase": state_fips,
"CaseYear": year,
"MaxResults": max_results,
"format": "json",
},
timeout=60,
)
response.raise_for_status()
data = response.json()
results = data.get("Results", [{}])
return results[0].get("CrashResultsCase", []) if results else []
# Example: Texas 2022
cases = fetch_crash_cases(state_fips=48, year=2022)
Crash Severity Analysis¶
The silver layer model slv_crash_data standardizes raw FARS records into a consistent schema with computed fields: severity_score, grid_cell_id (0.1-degree lat/lon bucketing), is_alcohol_related, is_pedestrian_involved, is_nighttime, and temporal buckets.
Hotspot Detection¶
The gold layer model gld_safety_hotspots (see examples/dot/domains/dbt/models/gold/gld_safety_hotspots.sql) aggregates crashes by grid cell and year to produce:
- Severity-weighted scores per grid cell
- Year-over-year trend classification (WORSENING / STABLE / IMPROVING)
- Contributing factor breakdown (alcohol, pedestrian, nighttime, adverse weather)
- Hotspot tier ranking (TIER_1_CRITICAL through TIER_4_STANDARD based on percentile cutoffs)
- State and national hotspot ranks
-- Simplified excerpt from gld_safety_hotspots.sql
SELECT
grid_cell_id,
state_code,
analysis_year,
COUNT(DISTINCT case_id) AS total_crashes,
SUM(fatality_count) AS total_fatalities,
SUM(severity_score) AS severity_weighted_score,
ROUND(AVG(severity_score), 2) AS avg_severity_score,
ROUND(AVG(latitude), 6) AS cluster_center_lat,
ROUND(AVG(longitude), 6) AS cluster_center_lon,
-- Hotspot tier via percentile window
CASE
WHEN severity_weighted_score >= PERCENTILE_CONT(0.95)
WITHIN GROUP (ORDER BY severity_weighted_score)
OVER (PARTITION BY analysis_year)
THEN 'TIER_1_CRITICAL'
...
END AS hotspot_tier
FROM silver.crash_data
GROUP BY grid_cell_id, state_code, analysis_year
ORDER BY severity_weighted_score DESC;
Geospatial Extension
For sub-county corridor-level analysis, pair grid-cell hotspots with road network geometry from HPMS. See the Geoanalytics patterns (planned) for Azure Maps and H3 hex-grid integration.
Bridge Infrastructure Analytics¶
NBI Data Ingestion¶
The National Bridge Inventory is available as a single flat file (~620K rows) via data.transportation.gov. ADF pulls the full dataset annually and incrementally detects changed records based on the structure_number_008 key and date_of_inspect_090.
Condition Scoring¶
The silver model slv_highway_conditions normalizes NBI condition ratings (0–9 scale for deck, superstructure, substructure) and computes a composite condition index.
Maintenance Priority Index¶
The gold model gld_infrastructure_priority (see examples/dot/domains/dbt/models/gold/gld_infrastructure_priority.sql) produces a weighted priority score combining:
| Factor | Weight | Source Field |
|---|---|---|
| Structural condition | 30% | Deck + superstructure + substructure ratings |
| Functional obsolescence | 20% | Functional classification vs. current demand |
| Average daily traffic | 20% | ADT from HPMS overlay |
| Detour length | 15% | NBI field detour_kilos_019 |
| Age since last rehab | 15% | year_reconstructed_106 vs. current year |
The output ranks every bridge within its state and nationally, enabling DOTs to allocate Highway Bridge Program (HBP) funding to the highest-risk structures.
Transit Analytics¶
NTD Ridership Trends¶
FTA publishes monthly ridership data (unlinked passenger trips by mode) and annual financial/operating statistics for every agency receiving federal transit funds. ADF ingests both via the NTD data portal.
Agency Performance Comparison¶
The gold model gld_transit_dashboard computes per-agency KPIs:
- Ridership recovery ratio — current UPT vs. pre-pandemic baseline (2019)
- Cost per trip — total operating expense / unlinked passenger trips
- Farebox recovery — fare revenue / operating expense
- Vehicle revenue miles per peak vehicle — fleet utilization
- Mean distance between failures — reliability metric
These KPIs are partitioned by mode (bus, heavy rail, light rail, commuter rail, demand response) for cross-modal comparison.
GTFS Real-Time Monitoring¶
For agencies publishing GTFS-RT feeds, Event Hub ingests protobuf messages at 15–30 second intervals. ADX stores vehicle positions and trip updates for sub-second query performance, enabling:
- Real-time vehicle tracking dashboards
- Schedule adherence analysis (on-time performance)
- Dwell time computation at stop level
- Headway regularity monitoring
FAA Aviation Operations¶
This section provides a summary of aviation analytics. For the complete treatment — including real-time flight operations, safety trend analysis, wildlife strike correlation, and NAS performance dashboards — see the dedicated FAA Aviation Safety & Operations Analytics page.
ATADS Airport Operations Data¶
The Air Traffic Activity Data System (ATADS) records tower operations (takeoffs, landings, overflights) and instrument operations at towered airports. ADF pulls monthly summaries via the OPSNET portal.
Capacity Analysis¶
The gold layer computes airport utilization rates by comparing actual operations against declared capacity (airport acceptance rate). Airports consistently operating above 85% of declared capacity are flagged for capacity constraint analysis.
Delay Pattern Detection (KQL)¶
For airports with ADX-integrated operations data, the following KQL query detects delay anomalies:
// Detect airports with delay rates exceeding 2 standard deviations
let baseline = AirportOperations
| where Timestamp between (ago(365d) .. ago(30d))
| summarize avg_delay = avg(AvgDepartureDelayMin),
stddev_delay = stdev(AvgDepartureDelayMin)
by AirportCode;
AirportOperations
| where Timestamp > ago(30d)
| summarize recent_avg_delay = avg(AvgDepartureDelayMin) by AirportCode
| join kind=inner baseline on AirportCode
| extend z_score = (recent_avg_delay - avg_delay) / stddev_delay
| where z_score > 2.0
| project AirportCode, recent_avg_delay, avg_delay, z_score
| order by z_score desc
Airport Performance Dashboards¶
Power BI dashboards surface airport-level KPIs: operations per hour, average delay minutes, on-time departure percentage, and capacity utilization. Drill-through pages link airport-level metrics to carrier-level and runway-level detail.
Dedicated FAA Page
For ASPM, OPSNET, wildlife strike analysis, SWIM real-time integration, and NAS-wide performance metrics, see FAA Aviation Safety & Operations Analytics.
Predictive Analytics¶
Crash Risk Scoring¶
Using the historical hotspot data and contributing factor profiles, a gradient-boosted model scores each grid cell's probability of experiencing a fatal crash in the next 12 months. Features include:
- 3-year crash trend (from
gld_safety_hotspots) - Road functional class and speed limit
- Rural/urban classification
- Alcohol-related crash percentage
- Nighttime crash percentage
- Adjacent cell severity (spatial lag)
The model is trained in Azure Machine Learning and scores are written back to the gold layer for Power BI consumption.
Infrastructure Deterioration Forecasting¶
Bridge condition trajectories are modeled using Markov chain transition matrices fitted to historical NBI inspection data. For each bridge, the model estimates:
- Expected years to structurally deficient — based on current condition and deterioration rate
- Optimal intervention year — minimizing lifecycle cost (repair vs. replace)
- Risk-adjusted priority — combining deterioration forecast with traffic impact
Data Contracts¶
The examples/dot/contracts/ directory contains data contract definitions (YAML) for each domain:
| Contract | Purpose |
|---|---|
crash-analytics.yaml | Schema, SLAs, and quality checks for crash data pipeline |
highway-conditions.yaml | Bridge condition and highway performance data contract |
transit-performance.yaml | Transit ridership and agency performance contract |
These contracts enforce schema validation, freshness checks, and data quality thresholds at the silver layer boundary.
Deployment¶
The examples/dot/deploy/ directory includes parameterized deployment configurations:
params.dev.json— development environment (Azure Commercial)params.gov.json— Azure Government Cloud (IL4/IL5 compliant)
Azure Government
Not all Azure services are available in Azure Government regions. Verify service availability against the Gov Service Matrix before deploying. ADX and Event Hub are available in Gov; confirm Synapse Serverless and Azure ML availability for your target IL.
Cross-References¶
- Government Data Analytics on Azure — compliance frameworks and Azure Government Cloud
- FAA Aviation Safety & Operations Analytics — dedicated aviation deep-dive
- Real-Time Intelligence & Anomaly Detection — streaming patterns for GTFS-RT and SWIM
- Fabric Unified Analytics — Microsoft Fabric alternative for transit analytics
examples/dot/ARCHITECTURE.md— detailed technical architectureexamples/dot/README.md— quickstart and local development
Sources¶
| Resource | URL |
|---|---|
| NHTSA FARS CrashAPI | https://crashviewer.nhtsa.dot.gov/CrashAPI |
| NHTSA FARS Encyclopedia | https://www-fars.nhtsa.dot.gov/Main/index.aspx |
| data.transportation.gov (SODA) | https://data.transportation.gov |
| FHWA National Bridge Inventory | https://www.fhwa.dot.gov/bridge/nbi.cfm |
| FHWA HPMS | https://www.fhwa.dot.gov/policyinformation/hpms.cfm |
| FTA National Transit Database | https://www.transit.dot.gov/ntd |
| FAA ATADS / OPSNET | https://aspm.faa.gov/opsnet/sys/Main.asp |
| FAA ASPM | https://aspm.faa.gov |
| BTS National Transportation Statistics | https://www.bts.gov/topics/national-transportation-statistics |
| GTFS Realtime Specification | https://gtfs.org/realtime/ |
| MobilityData GTFS-RT Catalog | https://gtfs.org/resources/gtfs-realtime/ |