Source:
examples/commerce/README.md— this page is rendered live from that file.
CIPSEA awareness
The data in this example may be subject to CIPSEA (the Confidential Information Protection and Statistical Efficiency Act, 44 U.S.C. §§ 3561–3583) when collected from respondents under a pledge of confidentiality for exclusively statistical purposes.
Knowing and willful disclosure of identifiable CIPSEA data is a Class E felony (§ 3572) attaching to individual officers, employees, or designated agents — including cloud-operator personnel where applicable.
The architecture below is starting-point reference guidance only. Validate the specific compliance posture for your workload with your designating statistical agency and Confidentiality Officer before production use:
- CIPSEA control mapping & narrative (DRAFT — under validation)
- CIPSEA operational playbook for Azure (DRAFT — under validation)
Department of Commerce Economic Analytics Platform¶
Examples > Commerce
[!TIP] TL;DR — Economic analytics platform analyzing Census demographics, BEA GDP/trade data, and business formation statistics. Provides regional economic resilience scoring, international trade pattern analysis, and small business growth prediction.
📋 Table of Contents¶
- Overview
- Key Features
- Data Sources
- Open Data APIs
- Architecture Overview
- Prerequisites
- Azure Resources
- Tools Required
- API Access
- Quick Start
- 1. Environment Setup
- 2. Configure API Keys
- 3. Generate Sample Data
- 4. Deploy Infrastructure
- 5. Run dbt Models
- Sample Analytics Scenarios
- 1. Regional Economic Resilience Scoring
- 2. International Trade Pattern Analysis
- 3. Small Business Growth Prediction
- Data Products
- Economic Resilience Index
- International Trade Patterns
- Business Growth Analytics
- Configuration
- dbt Profiles
- Environment Variables
- Government Cloud Deployment Notes
- Azure Government (GovCloud)
- FISMA Compliance
- Data Sensitivity
- Monitoring & Alerts
- Troubleshooting
- Common Issues
- Logs
- Development
- Adding New Data Sources
- Testing
- Contributing
- License
- Support
- Acknowledgments
A comprehensive economic analytics platform built on Azure Cloud Scale Analytics (CSA), providing insights into regional economic resilience, international trade patterns, and small business growth using official Commerce Department data sources.
📋 Overview¶
This platform ingests, processes, and analyzes data from multiple Commerce Department bureaus to provide actionable insights for economic policy, trade analysis, and business development decision-making. The platform follows the medallion architecture (Bronze -> Silver -> Gold) and implements modern data engineering best practices for federal data systems.
✨ Key Features¶
- Regional Economic Analytics: Employment diversity, industry concentration, and GDP stability scoring
- International Trade Intelligence: Bilateral trade flows, commodity trends, and tariff impact analysis
- Small Business Growth Prediction: Business formation rates, survival curves, and economic indicator correlation
- Census Demographic Insights: Population, income, education, and employment data at census-tract granularity
- Interactive Dashboards: Executive dashboards with KPIs and drill-down capabilities
- API-First Architecture: RESTful APIs for all data products
🗄️ Data Sources¶
- Census Bureau (ACS/Decennial): Demographic, economic, and housing data from the American Community Survey (1-year and 5-year estimates) and Decennial Census
- BEA (Bureau of Economic Analysis): GDP by state and industry, personal income, trade in goods and services, and national economic accounts
- NIST (National Institute of Standards and Technology): Manufacturing extension partnership data, industry competitiveness metrics
- ITA (International Trade Administration): Bilateral trade statistics, export/import data by partner country and commodity, tariff schedules
🗄️ Open Data APIs¶
| API | URL | Auth | Rate Limit |
|---|---|---|---|
| Census API | https://api.census.gov/data | API key (free) | 500 requests/day |
| BEA API | https://apps.bea.gov/api | API key (free) | 100 requests/minute |
| USA Trade Online | https://usatrade.census.gov | Public | Varies |
| Census Business Builder | https://www.census.gov/data/data-tools/cbb.html | Public | N/A |
🏗️ Architecture Overview¶
graph TD
A[Commerce APIs] --> B[Bronze Layer]
B --> C[Silver Layer]
C --> D[Gold Layer]
D --> E[Analytics & Dashboards]
subgraph "Data Sources"
A1[Census ACS API<br/>Demographics & Economy]
A2[BEA API<br/>GDP & Trade]
A3[ITA Trade Data<br/>International Trade]
A4[Census Business Builder<br/>Business Statistics]
end
subgraph "Bronze Layer"
B1[brz_census_demographics]
B2[brz_gdp_data]
B3[brz_trade_data]
end
subgraph "Silver Layer"
C1[slv_census_demographics]
C2[slv_gdp_data]
C3[slv_trade_data]
end
subgraph "Gold Layer"
D1[gld_economic_resilience]
D2[gld_trade_patterns]
D3[gld_business_growth]
end
subgraph "Consumption"
E1[Executive Dashboard]
E2[Trade Policy Reports]
E3[Economic Research APIs]
E4[Business Intelligence]
end
A1 --> B1
A2 --> B2
A3 --> B3
A4 --> B1
B1 --> C1
B2 --> C2
B3 --> C3
C1 --> D1
C2 --> D1
C1 --> D3
C2 --> D2
C3 --> D2
C2 --> D3
D1 --> E1
D2 --> E1
D3 --> E1
D1 --> E2
D2 --> E2
D1 --> E3
D2 --> E3
D3 --> E4 📎 Prerequisites¶
Azure Resources¶
- Azure subscription with contributor access
- Azure Data Factory or Synapse Analytics
- Azure Data Lake Storage Gen2
- Azure SQL Database or Synapse SQL Pool
- Azure Key Vault for API credentials
Tools Required¶
- Azure CLI (2.55.0 or later)
- dbt CLI (1.7.0 or later)
- Python 3.9+
- Git
API Access¶
- Census API key (free registration at https://api.census.gov/data/key_signup.html)
- BEA API key (free registration at https://apps.bea.gov/API/signup/)
🚀 Quick Start¶
1. Environment Setup¶
# Clone the repository
git clone <repository-url>
cd csa-inabox/examples/commerce
# Install Python dependencies
pip install -r requirements.txt
# Install dbt packages
cd domains/dbt
dbt deps
2. Configure API Keys¶
# Add to Azure Key Vault or local environment
export CENSUS_API_KEY="your-census-api-key"
export BEA_API_KEY="your-bea-api-key"
3. Generate Sample Data¶
# Generate synthetic data for development (no API keys required)
python data/generators/generate_commerce_data.py \
--records 5000 \
--output-dir domains/dbt/seeds \
--seed 42
# Or fetch real data from Census API
python data/open-data/fetch_census.py \
--api-key $CENSUS_API_KEY \
--dataset acs5 \
--year 2022 \
--geography tract
4. Deploy Infrastructure¶
# Configure parameters
cp deploy/params.dev.json deploy/params.local.json
# Edit params.local.json with your values
# Deploy using Azure CLI
az deployment group create \
--resource-group rg-commerce-analytics \
--template-file ../../deploy/bicep/DLZ/main.bicep \
--parameters @deploy/params.local.json
5. Run dbt Models¶
cd domains/dbt
# Test connections
dbt debug
# Load seed data
dbt seed
# Run models
dbt run
# Run tests
dbt test
# Generate documentation
dbt docs generate
dbt docs serve
💡 Sample Analytics Scenarios¶
1. Regional Economic Resilience Scoring¶
Assess the economic resilience of metropolitan statistical areas by combining employment diversity, industry concentration (HHI), and GDP stability metrics.
-- Top 20 most economically resilient regions
SELECT
state_code,
region_name,
resilience_score,
employment_diversity_index,
hhi_score,
gdp_stability_score,
resilience_category,
dominant_industry
FROM gold.gld_economic_resilience
WHERE year = 2023
ORDER BY resilience_score DESC
LIMIT 20;
2. International Trade Pattern Analysis¶
Analyze bilateral trade flows, identify commodity trends, and assess trade balance dynamics by partner country.
-- Trade balance by top 10 partner countries
SELECT
partner_country,
total_exports,
total_imports,
trade_balance,
export_growth_yoy_pct,
top_export_commodity,
top_import_commodity
FROM gold.gld_trade_patterns
WHERE year = 2023
AND flow_type = 'BILATERAL_SUMMARY'
ORDER BY ABS(trade_balance) DESC
LIMIT 10;
3. Small Business Growth Prediction¶
Identify regions with strong business formation trends and predict growth potential based on economic indicators.
-- States with highest small business growth potential
SELECT
state_code,
state_name,
net_business_formation_rate,
business_survival_rate_5yr,
growth_score,
median_household_income,
unemployment_rate,
gdp_per_capita,
growth_prediction_category
FROM gold.gld_business_growth
WHERE year = 2023
ORDER BY growth_score DESC
LIMIT 15;
✨ Data Products¶
Economic Resilience Index (economic-resilience)¶
- Description: Regional economic resilience scores combining employment diversity, industry concentration, and GDP stability
- Freshness: Quarterly updates (aligned with BEA GDP releases)
- Coverage: 2010-present, all 50 states and DC, MSA-level detail
- API:
/api/v1/economic-resilience
International Trade Patterns (trade-patterns)¶
- Description: Bilateral trade analysis with commodity-level detail and trend indicators
- Freshness: Monthly updates (1-month lag for customs data)
- Coverage: 2015-present, 200+ partner countries, 6-digit HS codes
- API:
/api/v1/trade-patterns
Business Growth Analytics (business-growth)¶
- Description: Small business formation, survival rates, and growth prediction scores
- Freshness: Quarterly updates
- Coverage: 2010-present, state and county level
- API:
/api/v1/business-growth
⚙️ Configuration¶
⚙️ dbt Profiles¶
Add to your ~/.dbt/profiles.yml:
commerce_analytics:
target: dev
outputs:
dev:
type: databricks
host: "{{ env_var('DBT_HOST') }}"
http_path: "{{ env_var('DBT_HTTP_PATH') }}"
token: "{{ env_var('DBT_TOKEN') }}"
schema: commerce_dev
catalog: dev
prod:
type: databricks
host: "{{ env_var('DBT_HOST_PROD') }}"
http_path: "{{ env_var('DBT_HTTP_PATH_PROD') }}"
token: "{{ env_var('DBT_TOKEN_PROD') }}"
schema: commerce
catalog: prod
⚙️ Environment Variables¶
# Required for data fetching
CENSUS_API_KEY=your-census-api-key
BEA_API_KEY=your-bea-api-key
# Required for dbt
DBT_HOST=your-databricks-host
DBT_HTTP_PATH=your-sql-warehouse-path
DBT_TOKEN=your-access-token
# Optional
COMMERCE_LOG_LEVEL=INFO
COMMERCE_BATCH_SIZE=1000
🔒 Government Cloud Deployment Notes¶
🔒 Azure Government (GovCloud)¶
- Use
usgovvirginiaorusgovarizonaregions - Azure Government endpoints differ from commercial Azure:
- Storage:
*.blob.core.usgovcloudapi.net - Key Vault:
*.vault.usgovcloudapi.net - Microsoft Entra ID:
login.microsoftonline.us - Ensure FedRAMP High compliance for production workloads
- Use Azure Government-specific resource providers
- FIPS 140-2 validated cryptographic modules required
FISMA Compliance¶
- All data at rest encrypted using AES-256
- TLS 1.2+ enforced for all data in transit
- Microsoft Entra Conditional Access policies for user authentication
- Audit logging enabled for all data access operations
- Network segmentation via VNet with private endpoints
- Data classification: CUI (Controlled Unclassified Information) for economic data
Data Sensitivity¶
- Census data: Publicly available aggregated data (no PII at tract level)
- BEA data: Public economic statistics
- Trade data: Aggregated customs data (individual transaction details are restricted)
- Apply data masking for any sub-threshold census cells to prevent re-identification
📊 Monitoring & Alerts¶
The platform includes built-in monitoring for:
- Data Freshness: Alerts when data sources haven't updated within SLA (Census: weekly, BEA: quarterly, Trade: monthly)
- Data Quality: Automated dbt tests with Slack notifications on failure
- API Performance: Response time and error rate monitoring via Application Insights
- Cost Management: Daily Azure spend alerts and optimization recommendations
- Pipeline Health: Azure Data Factory pipeline run monitoring with auto-retry
🔧 Troubleshooting¶
🔧 Common Issues¶
-
Census API Rate Limits: The Census API allows 500 requests/day with a free key. Use
--delay 2in fetch scripts for bulk operations. Request a bulk data agreement for higher limits. -
BEA API Throttling: BEA limits to 100 requests/minute. The data generator includes built-in backoff. For production, use cached responses where possible.
-
Authentication Errors: Verify API keys are correctly set in environment variables or Key Vault. Census keys are 40-character alphanumeric strings.
-
dbt Connection Issues: Verify Databricks credentials and that your IP is allowlisted. Run
dbt debugto test connectivity. -
Large Data Volumes: Use incremental models and partitioning for historical census data. The ACS 5-year detailed tables can exceed 10GB per year.
-
Trade Data HS Code Changes: Harmonized System codes are revised every 5 years. The silver layer includes a concordance table for cross-year comparisons.
📊 Logs¶
- Application logs:
logs/commerce-analytics.log - dbt logs:
domains/dbt/logs/dbt.log - Data pipeline logs: Azure Data Factory monitoring
🚀 Development¶
🗄️ Adding New Data Sources¶
- Create Bronze model in
domains/dbt/models/bronze/ - Add data quality tests in
schema.yml - Create corresponding Silver model with transformations
- Add to Gold aggregations as needed
- Update data contracts in
contracts/
🧪 Testing¶
# Unit tests
pytest data/tests/
# dbt tests
dbt test
# Integration tests
pytest data/tests/integration/
# Load tests
python data/tests/load_test.py
🔗 Contributing¶
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-data-source) - Make changes and add tests
- Run quality checks (
make lint test) - Submit a pull request
🔗 License¶
This project is licensed under the MIT License. See LICENSE file for details.
🔗 Support¶
- Documentation: https://csa-inabox.docs.microsoft.com/commerce
- Issues: Use GitHub Issues for bug reports and feature requests
- Security: Report security issues to security@contoso.com
- Community: Join our Slack channel
#csa-commerce-analytics
🔗 Acknowledgments¶
- U.S. Census Bureau for comprehensive demographic and economic data APIs
- Bureau of Economic Analysis for GDP and trade statistics
- International Trade Administration for trade data
- Azure Cloud Scale Analytics team for the foundational platform
- Contributors and the open-source community
🔗 Related Documentation¶
- Commerce Architecture — Detailed platform architecture and design decisions
- Examples Index — Overview of all CSA-in-a-Box example verticals
- Platform Architecture — Core CSA platform architecture
- Getting Started Guide — Platform setup and onboarding
- Casino Analytics — Related economic analytics vertical
- USPS Postal Operations — Related logistics and trade vertical
Prerequisites / Cost / Teardown¶
[!IMPORTANT] Cost-safety: this vertical deploys real Azure resources. Always run
teardown.shwhen you are done. A forgotten workshop environment can run $120-200/day.
Prerequisites¶
- Azure CLI 2.50+ logged in (
az login), subscription selected (az account set --subscription <id>) jqinstalled (used by teardown enumeration)- Bicep CLI 0.25+ (
az bicep version) - Contributor + User Access Administrator on target subscription (or a pre-created RG with equivalent RBAC)
bash scripts/deploy/validate-prerequisites.shpasses
Cost estimate (rough, East US 2)¶
- While running: ~$$120-200/day (services: Synapse, Databricks, Cosmos DB, ADF, Storage, Key Vault)
- Idle overnight: roughly half if you stop compute (Databricks autostop + Synapse pause)
- Storage + Key Vault residual: <$5/month if you skip teardown
Numbers are indicative for a small demo dataset; production workloads vary significantly. Use az consumption usage list or Cost Management for live numbers.
Runtime¶
- Deploy: ~30-45 minutes (first run; cold Bicep)
- Teardown: ~10-15 minutes (async RG delete completes in the background)
Teardown¶
When finished, run the per-example teardown script. It enforces a typed DESTROY-commerce confirmation, logs every step to reports/teardown/commerce-<timestamp>.log, and deletes the resource group rg-commerce-analytics along with any matching subscription-scope deployments.
# Interactive (recommended)
bash examples/commerce/deploy/teardown.sh
# Dry run (enumerate only)
bash examples/commerce/deploy/teardown.sh --dry-run
# From the repo root via Makefile
make teardown-example VERTICAL=commerce
make teardown-example VERTICAL=commerce DRYRUN=1
# CI automation (no prompt — only for ephemeral environments)
bash examples/commerce/deploy/teardown.sh --yes
See docs/QUICKSTART.md#teardown for the platform-wide teardown flow.
Directory Structure¶
commerce/
├── contracts/ # Data product contracts (schemas, SLOs, owners)
│ ├── census-demographics.yaml
│ ├── economic-indicators.yaml
│ └── trade-data.yaml
├── data/ # Sample data + synthetic generators
│ ├── generators/
│ └── open-data/
├── deploy/ # Deployment parameters / Bicep templates
│ ├── params.dev.json
│ ├── params.gov.json
│ └── teardown.sh
├── domains/ # dbt models (bronze / silver / gold) and seeds
│ └── dbt/
├── notebooks/ # Synapse / Fabric / Databricks notebooks
│ ├── economic_analysis.py
│ └── trade_pattern_prediction.py
├── reports/ # Power BI report templates and pbix sources
├── ARCHITECTURE.md # Mermaid + prose architecture diagrams
└── README.md # This file
Expected Results¶
After running the medallion pipeline against the bundled seed data, the Gold layer should populate the following tables. Row counts vary with the seed-data generator parameters; the figures below are the approximate scale you should see on a default run.
| Gold Table | Approximate Rows | Notes |
|---|---|---|
gld_business_growth | TODO: capture after first run | Populated from Silver via dbt --select tag:gold |
gld_economic_resilience | TODO: capture after first run | Populated from Silver via dbt --select tag:gold |
gld_trade_patterns | TODO: capture after first run | Populated from Silver via dbt --select tag:gold |
TODO: capture exact counts after the next end-to-end seed run. These are bounded by the seed-data generator parameters in
data/generators/.