Source: examples/usps/README.md — this page is rendered live from that file.

USPS Postal Operations Analytics Platform¶

Examples > USPS

[!TIP] TL;DR — Postal operations analytics for delivery performance, route optimization, and facility capacity planning across 34,000+ post offices and 230,000+ routes with seasonal volume prediction.

📋 Table of Contents¶

Overview
Key Features
Data Sources
Architecture Overview
Prerequisites
Azure Resources
Tools Required
API Access
Quick Start
1. Environment Setup
2. Configure API Keys
3. Generate Sample Data
4. Deploy Infrastructure
5. Run dbt Models
Sample Analytics Scenarios
1. Last-Mile Delivery Optimization
2. Seasonal Volume Prediction
3. Facility Consolidation Analysis
Data Products
Route Optimization
Volume Forecast
Facility Utilization
Configuration
dbt Profiles
Environment Variables
Azure Government Notes
Monitoring & Alerts
Troubleshooting
Common Issues
Contributing
License
Acknowledgments

A comprehensive postal operations analytics platform built on Azure Cloud Scale Analytics (CSA), providing insights into delivery performance, facility utilization, geographic coverage, and volume forecasting using USPS public data and APIs.

📋 Overview¶

The United States Postal Service delivers 127 billion pieces of mail annually through a network of 34,000+ post offices, 230,000+ delivery routes, and 640,000+ employees. This platform ingests, processes, and analyzes USPS operational data — ZIP code geography, address validation, delivery performance, and facility metrics — to enable route optimization, capacity planning, and service-level management.

✨ Key Features¶

Last-Mile Delivery Analytics: Route-level performance with geospatial optimization
Volume Forecasting: Seasonal and trend-based mail/parcel volume prediction
Facility Utilization: Processing plant and post office capacity analysis
Address Intelligence: Address validation, geocoding, and deliverability scoring
Service Performance Dashboards: On-time delivery rates by product, region, and season
ZIP Code Demographics: Market analytics overlaying Census data on postal geography

🗄️ Data Sources¶

Source	Description	URL
USPS Web Tools APIs	Address validation, ZIP lookup, tracking, rate calculation	https://www.usps.com/business/web-tools-apis/
USPS Service Performance	Quarterly on-time delivery reports by product class	https://about.usps.com/what/performance/
ZIP Code Tabulation Areas	Census ZCTA boundaries with demographic overlays	https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2023&layergroup=ZIP+Code+Tabulation+Areas
USPS Facts & Figures	Annual operational statistics (volume, revenue, facilities)	https://facts.usps.com/
USPS Postal Bulletin	Rate changes, service updates, new ZIP codes	https://about.usps.com/postal-bulletin/
OpenAddresses	Open-source address point data for geocoding	https://openaddresses.io/

🏗️ Architecture Overview¶

graph TD
    subgraph "Data Sources"
        A1[USPS Web Tools APIs<br/>Address & ZIP Lookup]
        A2[Service Performance<br/>On-Time Delivery Reports]
        A3[Census ZCTA<br/>ZIP Boundaries & Demographics]
        A4[USPS Facts & Figures<br/>Operational Statistics]
        A5[OpenAddresses<br/>Address Point Data]
    end

    subgraph "Ingestion Layer"
        I1[ADF Pipeline<br/>Batch Ingestion]
        I2[REST Connectors<br/>API Polling]
        I3[Shapefile Loader<br/>Geospatial Ingest]
    end

    subgraph "Bronze Layer — Raw"
        B1[brz_zip_codes]
        B2[brz_address_records]
        B3[brz_delivery_performance]
        B4[brz_facility_operations]
        B5[brz_mail_volume]
        B6[brz_zcta_boundaries]
    end

    subgraph "Silver Layer — Cleansed"
        S1[slv_zip_geography]
        S2[slv_validated_addresses]
        S3[slv_service_performance]
        S4[slv_facility_metrics]
        S5[slv_volume_timeseries]
    end

    subgraph "Gold Layer — Analytics"
        G1[gld_route_optimization]
        G2[gld_volume_forecast]
        G3[gld_facility_utilization]
        G4[gld_service_scorecards]
        G5[gld_postal_dashboard]
    end

    subgraph "Consumption"
        C1[Operations Dashboard]
        C2[Route Planning Tools]
        C3[Capacity Reports]
        C4[Performance APIs]
    end

    A1 --> I2
    A2 --> I1
    A3 --> I3
    A4 --> I1
    A5 --> I1

    I1 --> B2
    I1 --> B3
    I1 --> B4
    I1 --> B5
    I2 --> B1
    I2 --> B2
    I3 --> B6

    B1 --> S1
    B6 --> S1
    B2 --> S2
    B3 --> S3
    B4 --> S4
    B5 --> S5

    S1 --> G1
    S2 --> G1
    S3 --> G4
    S4 --> G3
    S5 --> G2
    S1 --> G5
    S3 --> G5
    S5 --> G5

    G1 --> C2
    G2 --> C1
    G3 --> C3
    G4 --> C1
    G5 --> C4

📎 Prerequisites¶

Azure Resources¶

Azure subscription with contributor access
Azure Data Factory or Synapse Analytics
Azure Data Lake Storage Gen2
Azure SQL Database or Synapse SQL Pool
Azure Key Vault for API credentials

Tools Required¶

Azure CLI (2.55.0 or later)
dbt CLI (1.7.0 or later)
Python 3.9+
Git

API Access¶

USPS Web Tools API credentials (free registration at https://www.usps.com/business/web-tools-apis/documentation-updates.htm)
Census API key (free at https://api.census.gov/data/key_signup.html) for demographic overlays

🚀 Quick Start¶

1. Environment Setup¶

# Clone the repository
git clone <repository-url>
cd csa-inabox/examples/usps

# Install Python dependencies
pip install -r requirements.txt

# Install dbt packages
cd domains/dbt
dbt deps

2. Configure API Keys¶

# Add to Azure Key Vault or local environment
export USPS_USER_ID="your-usps-user-id"
export CENSUS_API_KEY="your-census-api-key"  # Optional for demographic overlays

3. Generate Sample Data¶

# Generate synthetic postal operations data
python data/generators/generate_usps_data.py --output-dir domains/dbt/seeds

# Or fetch real ZIP code data
python data/open-data/fetch_zip_codes.py --states "CA,TX,NY,FL"
python data/open-data/fetch_zcta_boundaries.py --year 2023
python data/open-data/fetch_service_performance.py --quarters "2023Q1,2023Q2,2023Q3,2023Q4"

4. Deploy Infrastructure¶

# Configure parameters
cp deploy/params.dev.json deploy/params.local.json
# Edit params.local.json with your values

# Deploy using Azure CLI
az deployment group create \
  --resource-group rg-usps-analytics \
  --template-file ../../deploy/bicep/DLZ/main.bicep \
  --parameters @deploy/params.local.json

5. Run dbt Models¶

cd domains/dbt

# Test connections
dbt debug

# Load seed data
dbt seed

# Run models
dbt run

# Run tests
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

💡 Sample Analytics Scenarios¶

1. Last-Mile Delivery Optimization¶

Analyze delivery route performance to identify inefficiencies and recommend consolidation or re-sequencing opportunities.

-- Routes with highest optimization potential
SELECT
    route_id,
    zip_code,
    carrier_type,
    avg_delivery_time_minutes,
    stops_per_route,
    miles_per_route,
    packages_per_stop,
    time_per_stop_minutes,
    optimization_score,
    estimated_savings_minutes
FROM gold.gld_route_optimization
WHERE optimization_score >= 70
    AND carrier_type = 'CITY'
ORDER BY estimated_savings_minutes DESC
LIMIT 25;

2. Seasonal Volume Prediction¶

Forecast mail and parcel volumes by product class and region to pre-position resources for peak seasons (Holiday, Tax, Election).

-- Volume forecast for upcoming quarter
SELECT
    region,
    product_class,
    forecast_month,
    predicted_volume,
    lower_bound_95,
    upper_bound_95,
    yoy_growth_pct,
    peak_season_flag,
    recommended_staffing_delta
FROM gold.gld_volume_forecast
WHERE forecast_month BETWEEN '2024-11-01' AND '2025-01-31'
ORDER BY region, product_class, forecast_month;

3. Facility Consolidation Analysis¶

Evaluate processing facility utilization rates and geographic overlap to identify consolidation candidates.

-- Under-utilized facilities with consolidation potential
SELECT
    facility_id,
    facility_name,
    facility_type,
    city,
    state,
    current_utilization_pct,
    max_throughput_daily,
    actual_throughput_daily,
    nearest_facility_miles,
    nearest_facility_spare_capacity_pct,
    consolidation_score
FROM gold.gld_facility_utilization
WHERE current_utilization_pct < 50
    AND consolidation_score >= 60
ORDER BY consolidation_score DESC;

✨ Data Products¶

Route Optimization (`route-optimization`)¶

Description: Delivery route metrics with optimization scoring
Freshness: Weekly updates
Coverage: All city and rural delivery routes
API: /api/v1/route-optimization

Volume Forecast (`volume-forecast`)¶

Description: Time-series forecasts by product class, region, and season
Freshness: Monthly model retraining
Coverage: All mail/parcel product classes, 67 districts
API: /api/v1/volume-forecast

Facility Utilization (`facility-utilization`)¶

Description: Processing plant and post office capacity metrics
Freshness: Daily operational updates
Coverage: 200+ processing plants, 34,000+ post offices
API: /api/v1/facility-utilization

⚙️ Configuration¶

⚙️ dbt Profiles¶

Add to your ~/.dbt/profiles.yml:

usps_analytics:
  target: dev
  outputs:
    dev:
      type: databricks
      host: "{{ env_var('DBT_HOST') }}"
      http_path: "{{ env_var('DBT_HTTP_PATH') }}"
      token: "{{ env_var('DBT_TOKEN') }}"
      schema: usps_dev
      catalog: dev
    prod:
      type: databricks
      host: "{{ env_var('DBT_HOST_PROD') }}"
      http_path: "{{ env_var('DBT_HTTP_PATH_PROD') }}"
      token: "{{ env_var('DBT_TOKEN_PROD') }}"
      schema: usps
      catalog: prod

⚙️ Environment Variables¶

# Required for data fetching
USPS_USER_ID=your-usps-user-id
CENSUS_API_KEY=your-census-api-key

# Required for dbt
DBT_HOST=your-databricks-host
DBT_HTTP_PATH=your-sql-warehouse-path
DBT_TOKEN=your-access-token

# Optional
USPS_LOG_LEVEL=INFO
USPS_BATCH_SIZE=5000

🔒 Azure Government Notes¶

This example is compatible with Azure Government (US) regions. When deploying to Azure Government:

Use usgovvirginia or usgovarizona as your Azure region
Update ARM/Bicep endpoint references to .usgovcloudapi.net
USPS APIs are accessible from government networks without special authorization
Note: USPS operational data at the route level may be considered sensitive — confirm data classification with your AO before deploying granular route data in cloud environments

📊 Monitoring & Alerts¶

Data Freshness: Alerts when service performance reports or volume feeds are overdue
Data Quality: Automated dbt tests on address validation rates and volume anomalies
API Health: USPS Web Tools API availability monitoring
Cost Management: Daily compute spend tracking with budget thresholds

🔧 Troubleshooting¶

🔧 Common Issues¶

USPS API Rate Limits: Web Tools APIs throttle at ~5 requests/second. Use the --delay parameter in fetch scripts.
ZCTA vs. ZIP Mismatch: Census ZCTAs approximate but do not exactly match USPS ZIP codes. See data/schemas/zip_zcta_crosswalk.csv for mapping.
Service Performance PDF Parsing: Quarterly reports are published as PDFs. Use data/parsers/parse_service_report.py for extraction.
Large Address Datasets: OpenAddresses files can exceed 5 GB per state. Use --county-filter for targeted loads.

🔗 Contributing¶

Fork the repository
Create a feature branch (git checkout -b feature/new-data-source)
Make changes and add tests
Run quality checks (make lint test)
Submit a pull request

🔗 License¶

This project is licensed under the MIT License. See LICENSE file for details.

🔗 Acknowledgments¶

USPS for public operational data and developer APIs
U.S. Census Bureau for ZCTA boundary and demographic data
Azure Cloud Scale Analytics team for the foundational platform
Contributors and the open-source community

USPS Architecture — Detailed platform architecture and design decisions
Examples Index — Overview of all CSA-in-a-Box example verticals
Platform Architecture — Core CSA platform architecture
Getting Started Guide — Platform setup and onboarding
Commerce Economic Analytics — Related logistics/trade vertical
DOT Transportation Analytics — Related federal logistics vertical

Prerequisites / Cost / Teardown¶

[!IMPORTANT] Cost-safety: this vertical deploys real Azure resources. Always run teardown.sh when you are done. A forgotten workshop environment can run $120-200/day.

Prerequisites¶

Azure CLI 2.50+ logged in (az login), subscription selected (az account set --subscription <id>)
jq installed (used by teardown enumeration)
Bicep CLI 0.25+ (az bicep version)
Contributor + User Access Administrator on target subscription (or a pre-created RG with equivalent RBAC)
bash scripts/deploy/validate-prerequisites.sh passes

Cost estimate (rough, East US 2)¶

While running: ~$$120-200/day (services: Synapse, Databricks, ADF, Storage, Key Vault)
Idle overnight: roughly half if you stop compute (Databricks autostop + Synapse pause)
Storage + Key Vault residual: <$5/month if you skip teardown

Numbers are indicative for a small demo dataset; production workloads vary significantly. Use az consumption usage list or Cost Management for live numbers.

Runtime¶

Deploy: ~30-45 minutes (first run; cold Bicep)
Teardown: ~10-15 minutes (async RG delete completes in the background)

Teardown¶

When finished, run the per-example teardown script. It enforces a typed DESTROY-usps confirmation, logs every step to reports/teardown/usps-<timestamp>.log, and deletes the resource group rg-usps-analytics along with any matching subscription-scope deployments.

# Interactive (recommended)
bash examples/usps/deploy/teardown.sh

# Dry run (enumerate only)
bash examples/usps/deploy/teardown.sh --dry-run

# From the repo root via Makefile
make teardown-example VERTICAL=usps
make teardown-example VERTICAL=usps DRYRUN=1

# CI automation (no prompt — only for ephemeral environments)
bash examples/usps/deploy/teardown.sh --yes

See docs/QUICKSTART.md#teardown for the platform-wide teardown flow.

Directory Structure¶

usps/
├── contracts/                # Data product contracts (schemas, SLOs, owners)
│   ├── delivery-analytics.yaml
│   ├── facility-operations.yaml
│   └── mail-volume.yaml
├── data/                     # Sample data + synthetic generators
│   ├── generators/
│   └── open-data/
├── deploy/                   # Deployment parameters / Bicep templates
│   ├── params.dev.json
│   ├── params.gov.json
│   └── teardown.sh
├── domains/                  # dbt models (bronze / silver / gold) and seeds
│   └── dbt/
├── notebooks/                # Synapse / Fabric / Databricks notebooks
│   ├── delivery_optimization.py
│   └── volume_forecasting.py
├── reports/                  # Power BI report templates and pbix sources
├── ARCHITECTURE.md           # Mermaid + prose architecture diagrams
└── README.md                 # This file

Expected Results¶

After running the medallion pipeline against the bundled seed data, the Gold layer should populate the following tables. Row counts vary with the seed-data generator parameters; the figures below are the approximate scale you should see on a default run.

Gold Table	Approximate Rows	Notes
`gld_facility_analysis`	TODO: capture after first run	Populated from Silver via dbt `--select tag:gold`
`gld_route_optimization`	TODO: capture after first run	Populated from Silver via dbt `--select tag:gold`
`gld_volume_forecast`	TODO: capture after first run	Populated from Silver via dbt `--select tag:gold`

TODO: capture exact counts after the next end-to-end seed run. These are bounded by the seed-data generator parameters in data/generators/.