USPS Postal Operations
USPS Postal Operations Analytics on Azure¶
The United States Postal Service operates the largest civilian logistics network in the world — 34,000+ post offices, 230,000+ delivery routes, 160 million delivery points, and roughly 127 billion mail pieces processed annually. The operational data generated across this network — delivery scan events, facility throughput metrics, mail volume counts by product class, and address-level geographic records — represents a high-value analytical corpus that is largely underutilized outside of internal USPS reporting.
This use case applies the CSA-in-a-Box medallion architecture to USPS public data sources, combining Web Tools API outputs, quarterly service performance reports, Census ZCTA geographic boundaries, USPS Facts & Figures statistical publications, and OpenAddresses geocoding records. The result is an analytics platform capable of last-mile route optimization, seasonal volume forecasting, facility utilization analysis, and ZIP code market segmentation — built entirely on publicly available data.
Data Sources¶
| Source | Description | Volume / Coverage | Update Frequency | Access Method |
|---|---|---|---|---|
| USPS Web Tools APIs | Address validation (Verify API), city/state lookup, ZIP code lookup, package tracking (TrackV2), and domestic rate calculation (RateV4) | 160M+ delivery points, 41K+ ZIP codes | On-demand | XML REST API (free User ID required) |
| Service Performance Reports | Quarterly on-time delivery rates by product class (First-Class, Priority, Marketing Mail) and postal district | All 67 postal districts, 7 product classes | Quarterly | PDF / spreadsheet download |
| Census ZCTA Boundaries | ZIP Code Tabulation Areas from TIGER/Line shapefiles with demographic overlays from the American Community Survey | 33,000+ ZCTAs, national coverage | Annual (decennial base) | Shapefile / REST API |
| USPS Facts & Figures | Annual operational statistics: total mail volume, revenue by product, facility counts, employee headcount, vehicle fleet size | Historical data back to 2000 | Annual | Web / downloadable datasets |
| OpenAddresses | Community-sourced address point data with latitude/longitude coordinates for geocoding and delivery point validation | 500M+ address points globally, US focus | Continuous community updates | Bulk download (CSV/GeoJSON) |
API Registration
The USPS Web Tools APIs require a free User ID obtained through the USPS Web Tools Registration. Rate limits are approximately 5 requests per second for individual operations; the batch Address Validation API supports up to 5 addresses per call. Census TIGER/Line data is freely available via the TIGERweb REST API.
Architecture¶
The architecture follows a batch ingestion pattern. USPS Web Tools API responses, service performance report extracts, Census ZCTA shapefiles, USPS Facts & Figures datasets, and OpenAddresses geocoding files are all ingested through Azure Data Factory into ADLS Gen2 bronze. dbt models handle silver-layer cleansing (address standardization, product class normalization, on-time flag calculation) and gold-layer analytics (route optimization scoring, volume forecasting, facility utilization). Power BI consumes gold-layer tables for operational dashboards.
graph LR
subgraph External Sources
A[USPS Web Tools APIs<br/>Address, ZIP, Tracking]
B[Service Performance<br/>Quarterly Reports]
C[Census ZCTA<br/>TIGER/Line Boundaries]
D[USPS Facts & Figures<br/>Annual Statistics]
E[OpenAddresses<br/>Geocoding Points]
end
subgraph Ingestion
F[Azure Data Factory<br/>REST + File Pipelines]
end
subgraph "ADLS Gen2 — Medallion Architecture"
G[Bronze<br/>Raw JSON/XML/CSV]
H[dbt Silver<br/>Cleaned & Conformed]
I[dbt Gold<br/>Analytics-Ready]
end
subgraph Consumption
J[Power BI<br/>Operations Dashboards]
K[Jupyter Notebooks<br/>Forecasting & Optimization]
L[Azure SQL<br/>Data Product APIs]
end
A --> F
B --> F
C --> F
D --> F
E --> F
F --> G
G --> H
H --> I
I --> J
I --> K
I --> L Step-by-Step Implementation¶
1. API Ingestion and Raw Data Landing¶
Data Factory pipelines handle three ingestion patterns:
- REST API extraction: USPS Web Tools API calls for address validation, city/state lookups, and ZIP code geography. Responses arrive as XML, parsed and stored as JSON in bronze.
- File-based ingestion: Service performance PDFs/spreadsheets and USPS Facts & Figures datasets are downloaded, parsed, and landed as CSV in bronze.
- Geospatial ingestion: Census TIGER/Line shapefiles and OpenAddresses CSV/GeoJSON files are loaded through ADF file connectors with schema inference.
All raw data lands in ADLS Gen2 under a bronze/usps/ partition structure organized by source and ingestion date.
Batch Address Validation
The USPS Verify API supports batching up to 5 addresses per request. For bulk validation workloads, the fetch_usps_data.py script in examples/usps/data/open-data/ implements automatic batching with configurable rate limiting (default 0.3s delay between calls).
2. Bronze Layer — Raw Data Preservation¶
Bronze dbt models (brz_delivery_performance, brz_facility_operations, brz_mail_volume) land raw records with source-system metadata, ingestion timestamps, and record hashes for deduplication. No business logic is applied at this layer.
Key bronze tables:
| Model | Source | Grain | Partition Strategy |
|---|---|---|---|
brz_delivery_performance | Synthetic delivery scans | One row per delivery event | product_class, acceptance_year |
brz_facility_operations | USPS Facts & Figures | One row per facility-month | facility_type |
brz_mail_volume | USPS Facts & Figures | One row per product-class-day | product_class |
3. Silver Layer — Cleansing and Conforming¶
Silver models apply address standardization, product class normalization, on-time delivery flag computation, and geographic enrichment. The slv_delivery_performance model demonstrates the core transformation pattern:
- ZIP code standardization: Left-padded to 5 digits, 3-digit Sectional Center Facility (SCF) code derived for routing analysis.
- Product class normalization: Aliases mapped to canonical values (
FC→FIRST_CLASS,PM→PRIORITY,PS→PARCEL_SELECT, etc.). - On-time flag: Computed by comparing actual delivery date against expected delivery date or service standard (e.g., 3 days for First-Class, 2 days for Priority).
- Intra-SCF and intra-state flags: Derived for local vs. cross-country routing segmentation.
- Surrogate keys: MD5 hash of tracking ID and acceptance date for incremental merge.
4. Gold Layer — Last-Mile Route Optimization¶
The gld_route_optimization model scores 230,000+ carrier routes on a 0–100 efficiency scale using four weighted factors:
| Factor | Weight | Scoring Logic |
|---|---|---|
| Volume density (stops/day) | 30% | ≥100 stops = 100 pts, ≥75 = 85, ≥50 = 70, ≥25 = 50, ≥10 = 30, <10 = 15 |
| On-time delivery rate | 30% | Direct percentage (capped at 100) |
| First-attempt delivery rate | 20% | ≤1.05 avg attempts = 100, ≤1.10 = 80, ≤1.20 = 60, ≤1.30 = 40, >1.30 = 20 |
| Delivery speed vs. standard | 20% | ≤1 day = 100, ≤2 = 85, ≤3 = 70, ≤5 = 50, >5 = 25 |
Routes are classified into optimization categories (WELL_OPTIMIZED, ADEQUATE, NEEDS_IMPROVEMENT, HIGH_OPTIMIZATION_POTENTIAL) with specific action recommendations:
REDUCE_REDELIVERY_RATE— routes with >1.20 average delivery attemptsCONSIDER_ROUTE_CONSOLIDATION— routes with <25 stops per dayIMPROVE_ON_TIME_PERFORMANCE— routes below 80% on-time rateSEPARATE_PARCEL_ROUTE— parcel-heavy routes (>50% parcels, avg weight >32 oz)
5. Gold Layer — Seasonal Volume Forecasting¶
The gld_volume_forecast model decomposes mail volume time series by product class and region into trend, seasonal, and residual components. It identifies:
- Peak periods: Holiday mailing surge (November–December), tax season (March–April), election mail cycles
- Secular trends: Year-over-year volume decline in First-Class Mail, growth in parcel volume
- Staffing signals: Resource allocation recommendations based on forecast volume and historical throughput
6. Gold Layer — Facility Utilization Analysis¶
The gld_facility_analysis model evaluates processing facility performance across 34,000+ post offices and 200+ distribution centers:
- Utilization rate: Actual throughput vs. rated capacity, flagging facilities consistently above 95% (overcapacity risk) or below 40% (consolidation candidate)
- Catchment area overlap: Geographic analysis identifying facilities with overlapping service areas suitable for consolidation
- Equipment utilization: Sorting machine and vehicle fleet usage rates by facility
7. ZIP Code Market Analytics¶
By joining silver delivery data with Census ZCTA demographic overlays, the platform enables market segmentation at the ZIP code level:
- Delivery density (pieces per household per day) by ZIP
- Product class mix (letter vs. parcel ratio) by demographic profile
- Service quality variance across urban, suburban, and rural ZCTAs
Code Samples¶
Python — USPS Web Tools API Address Validation¶
The fetch_usps_data.py script in examples/usps/data/open-data/ provides a complete USPS API client. The address validation method demonstrates the XML-based request pattern:
from examples.usps.data.open_data.fetch_usps_data import USPSDataFetcher
fetcher = USPSDataFetcher(api_key="YOUR_USPS_USER_ID")
# Validate a batch of addresses (up to 5 per API call)
addresses = [
{
"address2": "1600 Pennsylvania Ave NW",
"city": "Washington",
"state": "DC",
"zip5": "20500"
},
{
"address2": "31 Center Drive",
"city": "Bethesda",
"state": "MD",
"zip5": "20892"
}
]
results = fetcher.validate_address(addresses)
for addr in results:
if addr["is_valid"]:
print(f"{addr['Address2']}, {addr['City']}, {addr['State']} {addr['Zip5']}-{addr.get('Zip4', '')}")
else:
print(f"Invalid: {addr.get('error', 'Unknown error')}")
The underlying API call builds an XML request and sends it to https://secure.shippingapis.com/ShippingAPI.dll:
# Internal request structure (from USPSDataFetcher.validate_address)
xml_request = (
f'<AddressValidateRequest USERID="{api_key}">'
f' <Address ID="0">'
f' <Address1></Address1>'
f' <Address2>1600 Pennsylvania Ave NW</Address2>'
f' <City>Washington</City>'
f' <State>DC</State>'
f' <Zip5>20500</Zip5>'
f' <Zip4></Zip4>'
f' </Address>'
f'</AddressValidateRequest>'
)
response = requests.get(
"https://secure.shippingapis.com/ShippingAPI.dll",
params={"API": "Verify", "XML": xml_request},
timeout=30
)
dbt — Silver Delivery Performance Model¶
The slv_delivery_performance model standardizes raw delivery records with product class normalization and on-time flag computation:
-- From examples/usps/domains/dbt/models/silver/slv_delivery_performance.sql
{{ config(
materialized='incremental',
unique_key='delivery_sk',
tags=['silver', 'delivery_performance', 'cleaned']
) }}
WITH base AS (
SELECT * FROM {{ ref('brz_delivery_performance') }}
WHERE is_valid_record = TRUE
{% if is_incremental() %}
AND _dbt_loaded_at > (SELECT MAX(_dbt_loaded_at) FROM {{ this }})
{% endif %}
),
standardized AS (
SELECT
MD5(CONCAT_WS('|', tracking_id, CAST(acceptance_date AS STRING))) AS delivery_sk,
tracking_id,
carrier_route,
-- ZIP code standardization (left-pad to 5 digits)
LPAD(LEFT(TRIM(origin_zip), 5), 5, '0') AS origin_zip,
LPAD(LEFT(TRIM(destination_zip), 5), 5, '0') AS destination_zip,
-- Product class normalization
CASE UPPER(TRIM(product_class))
WHEN 'FC' THEN 'FIRST_CLASS'
WHEN 'PM' THEN 'PRIORITY'
WHEN 'PME' THEN 'PRIORITY_EXPRESS'
WHEN 'PS' THEN 'PARCEL_SELECT'
WHEN 'MM' THEN 'MEDIA_MAIL'
WHEN 'MKT' THEN 'MARKETING_MAIL'
ELSE UPPER(TRIM(product_class))
END AS product_class,
-- On-time delivery flag
CASE
WHEN actual_delivery_date <= expected_delivery_date THEN TRUE
ELSE FALSE
END AS is_on_time,
DATEDIFF(actual_delivery_date, acceptance_date) AS calculated_delivery_days,
CURRENT_TIMESTAMP() AS _dbt_loaded_at
FROM base
)
SELECT * FROM standardized
SQL — Delivery Performance Dashboard Query¶
A typical Power BI dataset query aggregating gold-layer route optimization data for an operations dashboard:
-- District-level delivery performance summary
-- Feeds Power BI operations dashboard
SELECT
r.district,
r.region,
r.analysis_year,
r.analysis_month,
-- Volume metrics
COUNT(DISTINCT r.route_id) AS total_routes,
SUM(r.total_deliveries) AS total_deliveries,
ROUND(AVG(r.stops_per_day), 1) AS avg_stops_per_day,
-- Performance metrics
ROUND(AVG(r.on_time_rate_pct), 1) AS avg_on_time_rate,
ROUND(AVG(r.avg_delivery_days), 2) AS avg_delivery_days,
ROUND(AVG(r.avg_delivery_attempts), 3) AS avg_delivery_attempts,
-- Efficiency distribution
SUM(CASE WHEN r.optimization_category = 'HIGH_OPTIMIZATION_POTENTIAL' THEN 1 ELSE 0 END) AS high_potential_routes,
SUM(CASE WHEN r.optimization_category = 'NEEDS_IMPROVEMENT' THEN 1 ELSE 0 END) AS needs_improvement_routes,
SUM(CASE WHEN r.optimization_category = 'WELL_OPTIMIZED' THEN 1 ELSE 0 END) AS well_optimized_routes,
-- Estimated savings
SUM(r.estimated_savings_minutes) AS total_savings_minutes,
ROUND(SUM(r.estimated_savings_minutes) / 60.0, 1) AS total_savings_hours
FROM gold.gld_route_optimization r
WHERE r.analysis_year = YEAR(CURRENT_DATE)
GROUP BY r.district, r.region, r.analysis_year, r.analysis_month
ORDER BY avg_on_time_rate ASC
Power BI Dashboards¶
The gold-layer models feed three primary dashboard surfaces:
1. National Operations Dashboard
- Delivery performance heatmap by district (67 districts, color-coded by on-time rate)
- Product class on-time trend lines with quarterly service standard targets
- Volume trend charts with seasonal forecast overlay
- District and region scorecards with drill-through to route-level detail
2. Route Planning Console
- Route efficiency ranking table (sortable by optimization score, stops/day, on-time rate)
- Geographic stop density visualization using ZCTA centroids
- Optimization category distribution (pie chart: well-optimized vs. needs-improvement)
- Estimated savings waterfall chart by recommendation type
3. Facility Capacity Report
- Processing plant utilization gauges (threshold alerts at 85% and 95%)
- Throughput trend by facility type (post office, distribution center, hub)
- Consolidation candidate map with catchment area overlap visualization
- Equipment utilization breakdown by sorting machine type
Data Quality and Validation¶
dbt tests enforce data integrity at each layer:
# From examples/usps/domains/dbt/models/schema.yml
models:
- name: slv_delivery_performance
columns:
- name: delivery_sk
tests:
- unique
- not_null
- name: origin_zip
tests:
- not_null
- dbt_utils.length_equal_to:
length: 5
- name: product_class
tests:
- accepted_values:
values:
[
"FIRST_CLASS",
"PRIORITY",
"PRIORITY_EXPRESS",
"PARCEL_SELECT",
"MEDIA_MAIL",
"MARKETING_MAIL",
"PERIODICALS",
]
- name: is_on_time
tests:
- not_null:
where: "delivery_status = 'DELIVERED'"
Synthetic Data
The examples/usps/ implementation uses synthetic delivery records generated by examples/usps/data/generators/generate_usps_data.py. Volume counts and performance rates approximate real-world distributions but are not actual USPS operational data. Service performance baselines are derived from publicly reported quarterly figures.
Deployment¶
The example includes environment-specific parameter files for both commercial and government Azure regions:
| File | Target Environment | Notes |
|---|---|---|
examples/usps/deploy/params.dev.json | Commercial Azure (dev) | Standard ADLS + Databricks deployment |
examples/usps/deploy/params.gov.json | Azure Government | FedRAMP-compliant endpoints, Gov cloud region mappings |
examples/usps/deploy/teardown.sh | Cleanup | Removes all deployed resources |
Government Cloud
For federal agency deployments, the params.gov.json configuration targets Azure Government regions with FedRAMP High-compliant service endpoints. USPS operational data, while largely public, may require FISMA controls when combined with route-level carrier performance or individual tracking records.
Cross-References¶
| Resource | Description |
|---|---|
examples/usps/ | Complete example implementation — dbt models, data generators, API fetcher, deployment configs |
examples/usps/ARCHITECTURE.md | Detailed architecture document with security, partitioning, and monitoring specifications |
examples/usps/notebooks/ | Jupyter notebooks for delivery optimization and volume forecasting analysis |
examples/usps/contracts/ | Data contracts for delivery analytics, mail volume, and facility operations |
| NOAA Climate Analytics | Related use case demonstrating similar medallion architecture with environmental data |
| EPA Environmental Analytics | Related federal agency analytics use case |
Sources¶
| Resource | URL |
|---|---|
| USPS Web Tools API Registration | https://www.usps.com/business/web-tools-apis/ |
| USPS Web Tools API Technical Documentation | https://www.usps.com/business/web-tools-apis/documentation-updates.htm |
| USPS Address Validation API (Verify) | https://www.usps.com/business/web-tools-apis/address-information-api.htm |
| USPS Package Tracking API (TrackV2) | https://www.usps.com/business/web-tools-apis/track-and-confirm-api.htm |
| USPS Service Performance Reports | https://about.usps.com/what/performance/service-performance/ |
| USPS Facts & Figures | https://facts.usps.com/ |
| Census TIGER/Line Shapefiles (ZCTA) | https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html |
| TIGERweb REST API | https://tigerweb.geo.census.gov/arcgis/rest/services |
| Census Geocoder | https://geocoding.geo.census.gov/geocoder/ |
| OpenAddresses Project | https://openaddresses.io/ |
| American Community Survey (ACS) | https://www.census.gov/programs-surveys/acs |