Skip to content

USPS Postal Operations

USPS Postal Operations Analytics on Azure

The United States Postal Service operates the largest civilian logistics network in the world — 34,000+ post offices, 230,000+ delivery routes, 160 million delivery points, and roughly 127 billion mail pieces processed annually. The operational data generated across this network — delivery scan events, facility throughput metrics, mail volume counts by product class, and address-level geographic records — represents a high-value analytical corpus that is largely underutilized outside of internal USPS reporting.

This use case applies the CSA-in-a-Box medallion architecture to USPS public data sources, combining Web Tools API outputs, quarterly service performance reports, Census ZCTA geographic boundaries, USPS Facts & Figures statistical publications, and OpenAddresses geocoding records. The result is an analytics platform capable of last-mile route optimization, seasonal volume forecasting, facility utilization analysis, and ZIP code market segmentation — built entirely on publicly available data.


Data Sources

Source Description Volume / Coverage Update Frequency Access Method
USPS Web Tools APIs Address validation (Verify API), city/state lookup, ZIP code lookup, package tracking (TrackV2), and domestic rate calculation (RateV4) 160M+ delivery points, 41K+ ZIP codes On-demand XML REST API (free User ID required)
Service Performance Reports Quarterly on-time delivery rates by product class (First-Class, Priority, Marketing Mail) and postal district All 67 postal districts, 7 product classes Quarterly PDF / spreadsheet download
Census ZCTA Boundaries ZIP Code Tabulation Areas from TIGER/Line shapefiles with demographic overlays from the American Community Survey 33,000+ ZCTAs, national coverage Annual (decennial base) Shapefile / REST API
USPS Facts & Figures Annual operational statistics: total mail volume, revenue by product, facility counts, employee headcount, vehicle fleet size Historical data back to 2000 Annual Web / downloadable datasets
OpenAddresses Community-sourced address point data with latitude/longitude coordinates for geocoding and delivery point validation 500M+ address points globally, US focus Continuous community updates Bulk download (CSV/GeoJSON)

API Registration

The USPS Web Tools APIs require a free User ID obtained through the USPS Web Tools Registration. Rate limits are approximately 5 requests per second for individual operations; the batch Address Validation API supports up to 5 addresses per call. Census TIGER/Line data is freely available via the TIGERweb REST API.


Architecture

The architecture follows a batch ingestion pattern. USPS Web Tools API responses, service performance report extracts, Census ZCTA shapefiles, USPS Facts & Figures datasets, and OpenAddresses geocoding files are all ingested through Azure Data Factory into ADLS Gen2 bronze. dbt models handle silver-layer cleansing (address standardization, product class normalization, on-time flag calculation) and gold-layer analytics (route optimization scoring, volume forecasting, facility utilization). Power BI consumes gold-layer tables for operational dashboards.

graph LR
    subgraph External Sources
        A[USPS Web Tools APIs<br/>Address, ZIP, Tracking]
        B[Service Performance<br/>Quarterly Reports]
        C[Census ZCTA<br/>TIGER/Line Boundaries]
        D[USPS Facts & Figures<br/>Annual Statistics]
        E[OpenAddresses<br/>Geocoding Points]
    end

    subgraph Ingestion
        F[Azure Data Factory<br/>REST + File Pipelines]
    end

    subgraph "ADLS Gen2 — Medallion Architecture"
        G[Bronze<br/>Raw JSON/XML/CSV]
        H[dbt Silver<br/>Cleaned & Conformed]
        I[dbt Gold<br/>Analytics-Ready]
    end

    subgraph Consumption
        J[Power BI<br/>Operations Dashboards]
        K[Jupyter Notebooks<br/>Forecasting & Optimization]
        L[Azure SQL<br/>Data Product APIs]
    end

    A --> F
    B --> F
    C --> F
    D --> F
    E --> F
    F --> G
    G --> H
    H --> I
    I --> J
    I --> K
    I --> L

Step-by-Step Implementation

1. API Ingestion and Raw Data Landing

Data Factory pipelines handle three ingestion patterns:

  • REST API extraction: USPS Web Tools API calls for address validation, city/state lookups, and ZIP code geography. Responses arrive as XML, parsed and stored as JSON in bronze.
  • File-based ingestion: Service performance PDFs/spreadsheets and USPS Facts & Figures datasets are downloaded, parsed, and landed as CSV in bronze.
  • Geospatial ingestion: Census TIGER/Line shapefiles and OpenAddresses CSV/GeoJSON files are loaded through ADF file connectors with schema inference.

All raw data lands in ADLS Gen2 under a bronze/usps/ partition structure organized by source and ingestion date.

Batch Address Validation

The USPS Verify API supports batching up to 5 addresses per request. For bulk validation workloads, the fetch_usps_data.py script in examples/usps/data/open-data/ implements automatic batching with configurable rate limiting (default 0.3s delay between calls).

2. Bronze Layer — Raw Data Preservation

Bronze dbt models (brz_delivery_performance, brz_facility_operations, brz_mail_volume) land raw records with source-system metadata, ingestion timestamps, and record hashes for deduplication. No business logic is applied at this layer.

Key bronze tables:

Model Source Grain Partition Strategy
brz_delivery_performance Synthetic delivery scans One row per delivery event product_class, acceptance_year
brz_facility_operations USPS Facts & Figures One row per facility-month facility_type
brz_mail_volume USPS Facts & Figures One row per product-class-day product_class

3. Silver Layer — Cleansing and Conforming

Silver models apply address standardization, product class normalization, on-time delivery flag computation, and geographic enrichment. The slv_delivery_performance model demonstrates the core transformation pattern:

  • ZIP code standardization: Left-padded to 5 digits, 3-digit Sectional Center Facility (SCF) code derived for routing analysis.
  • Product class normalization: Aliases mapped to canonical values (FCFIRST_CLASS, PMPRIORITY, PSPARCEL_SELECT, etc.).
  • On-time flag: Computed by comparing actual delivery date against expected delivery date or service standard (e.g., 3 days for First-Class, 2 days for Priority).
  • Intra-SCF and intra-state flags: Derived for local vs. cross-country routing segmentation.
  • Surrogate keys: MD5 hash of tracking ID and acceptance date for incremental merge.

4. Gold Layer — Last-Mile Route Optimization

The gld_route_optimization model scores 230,000+ carrier routes on a 0–100 efficiency scale using four weighted factors:

Factor Weight Scoring Logic
Volume density (stops/day) 30% ≥100 stops = 100 pts, ≥75 = 85, ≥50 = 70, ≥25 = 50, ≥10 = 30, <10 = 15
On-time delivery rate 30% Direct percentage (capped at 100)
First-attempt delivery rate 20% ≤1.05 avg attempts = 100, ≤1.10 = 80, ≤1.20 = 60, ≤1.30 = 40, >1.30 = 20
Delivery speed vs. standard 20% ≤1 day = 100, ≤2 = 85, ≤3 = 70, ≤5 = 50, >5 = 25

Routes are classified into optimization categories (WELL_OPTIMIZED, ADEQUATE, NEEDS_IMPROVEMENT, HIGH_OPTIMIZATION_POTENTIAL) with specific action recommendations:

  • REDUCE_REDELIVERY_RATE — routes with >1.20 average delivery attempts
  • CONSIDER_ROUTE_CONSOLIDATION — routes with <25 stops per day
  • IMPROVE_ON_TIME_PERFORMANCE — routes below 80% on-time rate
  • SEPARATE_PARCEL_ROUTE — parcel-heavy routes (>50% parcels, avg weight >32 oz)

5. Gold Layer — Seasonal Volume Forecasting

The gld_volume_forecast model decomposes mail volume time series by product class and region into trend, seasonal, and residual components. It identifies:

  • Peak periods: Holiday mailing surge (November–December), tax season (March–April), election mail cycles
  • Secular trends: Year-over-year volume decline in First-Class Mail, growth in parcel volume
  • Staffing signals: Resource allocation recommendations based on forecast volume and historical throughput

6. Gold Layer — Facility Utilization Analysis

The gld_facility_analysis model evaluates processing facility performance across 34,000+ post offices and 200+ distribution centers:

  • Utilization rate: Actual throughput vs. rated capacity, flagging facilities consistently above 95% (overcapacity risk) or below 40% (consolidation candidate)
  • Catchment area overlap: Geographic analysis identifying facilities with overlapping service areas suitable for consolidation
  • Equipment utilization: Sorting machine and vehicle fleet usage rates by facility

7. ZIP Code Market Analytics

By joining silver delivery data with Census ZCTA demographic overlays, the platform enables market segmentation at the ZIP code level:

  • Delivery density (pieces per household per day) by ZIP
  • Product class mix (letter vs. parcel ratio) by demographic profile
  • Service quality variance across urban, suburban, and rural ZCTAs

Code Samples

Python — USPS Web Tools API Address Validation

The fetch_usps_data.py script in examples/usps/data/open-data/ provides a complete USPS API client. The address validation method demonstrates the XML-based request pattern:

from examples.usps.data.open_data.fetch_usps_data import USPSDataFetcher

fetcher = USPSDataFetcher(api_key="YOUR_USPS_USER_ID")

# Validate a batch of addresses (up to 5 per API call)
addresses = [
    {
        "address2": "1600 Pennsylvania Ave NW",
        "city": "Washington",
        "state": "DC",
        "zip5": "20500"
    },
    {
        "address2": "31 Center Drive",
        "city": "Bethesda",
        "state": "MD",
        "zip5": "20892"
    }
]

results = fetcher.validate_address(addresses)
for addr in results:
    if addr["is_valid"]:
        print(f"{addr['Address2']}, {addr['City']}, {addr['State']} {addr['Zip5']}-{addr.get('Zip4', '')}")
    else:
        print(f"Invalid: {addr.get('error', 'Unknown error')}")

The underlying API call builds an XML request and sends it to https://secure.shippingapis.com/ShippingAPI.dll:

# Internal request structure (from USPSDataFetcher.validate_address)
xml_request = (
    f'<AddressValidateRequest USERID="{api_key}">'
    f'  <Address ID="0">'
    f'    <Address1></Address1>'
    f'    <Address2>1600 Pennsylvania Ave NW</Address2>'
    f'    <City>Washington</City>'
    f'    <State>DC</State>'
    f'    <Zip5>20500</Zip5>'
    f'    <Zip4></Zip4>'
    f'  </Address>'
    f'</AddressValidateRequest>'
)

response = requests.get(
    "https://secure.shippingapis.com/ShippingAPI.dll",
    params={"API": "Verify", "XML": xml_request},
    timeout=30
)

dbt — Silver Delivery Performance Model

The slv_delivery_performance model standardizes raw delivery records with product class normalization and on-time flag computation:

-- From examples/usps/domains/dbt/models/silver/slv_delivery_performance.sql
{{ config(
    materialized='incremental',
    unique_key='delivery_sk',
    tags=['silver', 'delivery_performance', 'cleaned']
) }}

WITH base AS (
    SELECT * FROM {{ ref('brz_delivery_performance') }}
    WHERE is_valid_record = TRUE
    {% if is_incremental() %}
        AND _dbt_loaded_at > (SELECT MAX(_dbt_loaded_at) FROM {{ this }})
    {% endif %}
),

standardized AS (
    SELECT
        MD5(CONCAT_WS('|', tracking_id, CAST(acceptance_date AS STRING))) AS delivery_sk,
        tracking_id,
        carrier_route,

        -- ZIP code standardization (left-pad to 5 digits)
        LPAD(LEFT(TRIM(origin_zip), 5), 5, '0') AS origin_zip,
        LPAD(LEFT(TRIM(destination_zip), 5), 5, '0') AS destination_zip,

        -- Product class normalization
        CASE UPPER(TRIM(product_class))
            WHEN 'FC' THEN 'FIRST_CLASS'
            WHEN 'PM' THEN 'PRIORITY'
            WHEN 'PME' THEN 'PRIORITY_EXPRESS'
            WHEN 'PS' THEN 'PARCEL_SELECT'
            WHEN 'MM' THEN 'MEDIA_MAIL'
            WHEN 'MKT' THEN 'MARKETING_MAIL'
            ELSE UPPER(TRIM(product_class))
        END AS product_class,

        -- On-time delivery flag
        CASE
            WHEN actual_delivery_date <= expected_delivery_date THEN TRUE
            ELSE FALSE
        END AS is_on_time,

        DATEDIFF(actual_delivery_date, acceptance_date) AS calculated_delivery_days,
        CURRENT_TIMESTAMP() AS _dbt_loaded_at

    FROM base
)

SELECT * FROM standardized

SQL — Delivery Performance Dashboard Query

A typical Power BI dataset query aggregating gold-layer route optimization data for an operations dashboard:

-- District-level delivery performance summary
-- Feeds Power BI operations dashboard
SELECT
    r.district,
    r.region,
    r.analysis_year,
    r.analysis_month,

    -- Volume metrics
    COUNT(DISTINCT r.route_id) AS total_routes,
    SUM(r.total_deliveries) AS total_deliveries,
    ROUND(AVG(r.stops_per_day), 1) AS avg_stops_per_day,

    -- Performance metrics
    ROUND(AVG(r.on_time_rate_pct), 1) AS avg_on_time_rate,
    ROUND(AVG(r.avg_delivery_days), 2) AS avg_delivery_days,
    ROUND(AVG(r.avg_delivery_attempts), 3) AS avg_delivery_attempts,

    -- Efficiency distribution
    SUM(CASE WHEN r.optimization_category = 'HIGH_OPTIMIZATION_POTENTIAL' THEN 1 ELSE 0 END) AS high_potential_routes,
    SUM(CASE WHEN r.optimization_category = 'NEEDS_IMPROVEMENT' THEN 1 ELSE 0 END) AS needs_improvement_routes,
    SUM(CASE WHEN r.optimization_category = 'WELL_OPTIMIZED' THEN 1 ELSE 0 END) AS well_optimized_routes,

    -- Estimated savings
    SUM(r.estimated_savings_minutes) AS total_savings_minutes,
    ROUND(SUM(r.estimated_savings_minutes) / 60.0, 1) AS total_savings_hours

FROM gold.gld_route_optimization r
WHERE r.analysis_year = YEAR(CURRENT_DATE)
GROUP BY r.district, r.region, r.analysis_year, r.analysis_month
ORDER BY avg_on_time_rate ASC

Power BI Dashboards

The gold-layer models feed three primary dashboard surfaces:

1. National Operations Dashboard

  • Delivery performance heatmap by district (67 districts, color-coded by on-time rate)
  • Product class on-time trend lines with quarterly service standard targets
  • Volume trend charts with seasonal forecast overlay
  • District and region scorecards with drill-through to route-level detail

2. Route Planning Console

  • Route efficiency ranking table (sortable by optimization score, stops/day, on-time rate)
  • Geographic stop density visualization using ZCTA centroids
  • Optimization category distribution (pie chart: well-optimized vs. needs-improvement)
  • Estimated savings waterfall chart by recommendation type

3. Facility Capacity Report

  • Processing plant utilization gauges (threshold alerts at 85% and 95%)
  • Throughput trend by facility type (post office, distribution center, hub)
  • Consolidation candidate map with catchment area overlap visualization
  • Equipment utilization breakdown by sorting machine type

Data Quality and Validation

dbt tests enforce data integrity at each layer:

# From examples/usps/domains/dbt/models/schema.yml
models:
    - name: slv_delivery_performance
      columns:
          - name: delivery_sk
            tests:
                - unique
                - not_null
          - name: origin_zip
            tests:
                - not_null
                - dbt_utils.length_equal_to:
                      length: 5
          - name: product_class
            tests:
                - accepted_values:
                      values:
                          [
                              "FIRST_CLASS",
                              "PRIORITY",
                              "PRIORITY_EXPRESS",
                              "PARCEL_SELECT",
                              "MEDIA_MAIL",
                              "MARKETING_MAIL",
                              "PERIODICALS",
                          ]
          - name: is_on_time
            tests:
                - not_null:
                      where: "delivery_status = 'DELIVERED'"

Synthetic Data

The examples/usps/ implementation uses synthetic delivery records generated by examples/usps/data/generators/generate_usps_data.py. Volume counts and performance rates approximate real-world distributions but are not actual USPS operational data. Service performance baselines are derived from publicly reported quarterly figures.


Deployment

The example includes environment-specific parameter files for both commercial and government Azure regions:

File Target Environment Notes
examples/usps/deploy/params.dev.json Commercial Azure (dev) Standard ADLS + Databricks deployment
examples/usps/deploy/params.gov.json Azure Government FedRAMP-compliant endpoints, Gov cloud region mappings
examples/usps/deploy/teardown.sh Cleanup Removes all deployed resources

Government Cloud

For federal agency deployments, the params.gov.json configuration targets Azure Government regions with FedRAMP High-compliant service endpoints. USPS operational data, while largely public, may require FISMA controls when combined with route-level carrier performance or individual tracking records.


Cross-References

Resource Description
examples/usps/ Complete example implementation — dbt models, data generators, API fetcher, deployment configs
examples/usps/ARCHITECTURE.md Detailed architecture document with security, partitioning, and monitoring specifications
examples/usps/notebooks/ Jupyter notebooks for delivery optimization and volume forecasting analysis
examples/usps/contracts/ Data contracts for delivery analytics, mail volume, and facility operations
NOAA Climate Analytics Related use case demonstrating similar medallion architecture with environmental data
EPA Environmental Analytics Related federal agency analytics use case

Sources

Resource URL
USPS Web Tools API Registration https://www.usps.com/business/web-tools-apis/
USPS Web Tools API Technical Documentation https://www.usps.com/business/web-tools-apis/documentation-updates.htm
USPS Address Validation API (Verify) https://www.usps.com/business/web-tools-apis/address-information-api.htm
USPS Package Tracking API (TrackV2) https://www.usps.com/business/web-tools-apis/track-and-confirm-api.htm
USPS Service Performance Reports https://about.usps.com/what/performance/service-performance/
USPS Facts & Figures https://facts.usps.com/
Census TIGER/Line Shapefiles (ZCTA) https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
TIGERweb REST API https://tigerweb.geo.census.gov/arcgis/rest/services
Census Geocoder https://geocoding.geo.census.gov/geocoder/
OpenAddresses Project https://openaddresses.io/
American Community Survey (ACS) https://www.census.gov/programs-surveys/acs