USDA Agricultural Analytics

USDA Agricultural Analytics on Azure¶

This use case covers the ingestion, transformation, and analysis of data from multiple USDA agencies — NASS, FNS, FSIS, and FoodData Central — using Azure Cloud Scale Analytics patterns. The implementation combines crop production statistics with nutrition assistance program data, food safety inspection records, and nutritional composition databases to produce yield forecasts, enrollment trend analysis, food safety risk scores, and agricultural dashboards.

Reference Implementation

The complete working code for this domain lives in examples/usda/. This page explains the architecture, data sources, and step-by-step build process.

Architecture Overview¶

The platform follows a batch-first ingestion model. Azure Data Factory pulls data from USDA REST APIs and file-based sources into ADLS Gen2, where dbt models implement the Bronze-Silver-Gold medallion architecture. Gold-layer analytical models feed Power BI dashboards and downstream APIs.

graph LR
    subgraph Sources
        NASS[NASS QuickStats API<br/>Crop Yields & Production]
        FNS[FNS Data<br/>SNAP Enrollment & Benefits]
        FSIS[FSIS Inspections<br/>Meat/Poultry/Egg Safety]
        FDC[FoodData Central<br/>Nutritional Composition]
    end

    subgraph Ingestion
        ADF[Azure Data Factory]
        ADLS[ADLS Gen2<br/>Raw Landing Zone]
    end

    subgraph "Bronze Layer"
        BRZ_CY[brz_crop_yields]
        BRZ_SN[brz_snap_enrollment]
        BRZ_FI[brz_food_inspections]
    end

    subgraph "Silver Layer"
        SLV_CY[slv_crop_yields<br/>Standardized Commodities<br/>Quality Scoring]
        SLV_SN[slv_snap_enrollment<br/>Normalized Enrollment<br/>Benefits Calculations]
        SLV_FI[slv_food_inspections<br/>Violation Classification<br/>Compliance Status]
    end

    subgraph "Gold Layer"
        GLD_YF[gld_crop_yield_forecast<br/>Trend Analysis & Forecasting]
        GLD_ST[gld_snap_trends<br/>Enrollment Growth & Seasonality]
        GLD_RS[gld_food_safety_risk_score<br/>Risk Categorization]
        GLD_AD[gld_agricultural_dashboard<br/>Executive Summary]
    end

    subgraph Consumption
        PBI[Power BI<br/>Executive Dashboards]
        API[Data Product APIs]
        ML[Azure ML<br/>Yield Prediction]
    end

    NASS --> ADF
    FNS --> ADF
    FSIS --> ADF
    FDC --> ADF
    ADF --> ADLS

    ADLS --> BRZ_CY
    ADLS --> BRZ_SN
    ADLS --> BRZ_FI

    BRZ_CY --> SLV_CY
    BRZ_SN --> SLV_SN
    BRZ_FI --> SLV_FI

    SLV_CY --> GLD_YF
    SLV_SN --> GLD_ST
    SLV_FI --> GLD_RS
    SLV_CY --> GLD_AD
    SLV_SN --> GLD_AD
    SLV_FI --> GLD_AD

    GLD_YF --> PBI
    GLD_ST --> PBI
    GLD_RS --> PBI
    GLD_AD --> PBI
    GLD_YF --> API
    GLD_YF --> ML

Data Sources¶

Source	Agency	Data Content	Format	Update Frequency
NASS QuickStats API	National Agricultural Statistics Service	Crop yields (bu/acre), planted & harvested acres, production volumes by commodity, state, and county	JSON (REST)	Daily (current year), monthly (historical)
FNS SNAP Data	Food and Nutrition Service	SNAP enrollment counts, benefit amounts by state and county, household demographics	CSV (data.gov)	Monthly (2-month lag)
FSIS Inspection Data	Food Safety and Inspection Service	Meat, poultry, and egg inspection results, violation records, establishment compliance status	XML / CSV	Daily batch
FoodData Central	Agricultural Research Service	Nutritional composition per food item, nutrient amounts, food categories	JSON (REST)	Quarterly

API Registration

NASS QuickStats API keys are free. Register at quickstats.nass.usda.gov/api. The free tier allows 1,000 requests per day — sufficient for daily incremental loads but requires rate-limiting logic for historical backfills.

Step-by-Step Walkthrough¶

Step 1 — Ingest NASS Crop Data via QuickStats API¶

The ingestion script queries the NASS QuickStats REST API for crop yield, production, and acreage data. It supports multiple commodities (corn, soybeans, wheat, cotton, rice) and handles pagination, rate limiting, and error recovery.

"""Fetch crop statistics from NASS QuickStats API."""
import requests
import time

NASS_BASE_URL = "https://quickstats.nass.usda.gov/api/api_GET/"

def fetch_crop_yields(
    api_key: str,
    commodity: str,
    states: list[str],
    years: list[int],
) -> list[dict]:
    """Retrieve yield data for a commodity across states and years."""
    records = []
    for state in states:
        for year in years:
            params = {
                "key": api_key,
                "source_desc": "SURVEY",
                "sector_desc": "CROPS",
                "group_desc": "FIELD CROPS",
                "commodity_desc": commodity,
                "statisticcat_desc": "YIELD",
                "state_alpha": state,
                "year": year,
                "agg_level_desc": "STATE",
                "format": "JSON",
            }
            resp = requests.get(NASS_BASE_URL, params=params, timeout=30)
            resp.raise_for_status()
            data = resp.json().get("data", [])
            records.extend(data)
            time.sleep(0.2)  # respect rate limits
    return records

Rate Limits

The free-tier NASS API enforces 1,000 requests/day. Use the --delay flag on fetch_nass.py to throttle requests. For county-level backfills across all states, schedule the load across multiple days or request an elevated quota from NASS.

The full fetcher with multi-commodity support, CSV/JSON export, and connection testing is at examples/usda/data/open-data/fetch_nass.py.

Step 2 — Build Bronze Layer (Raw Ingestion)¶

Bronze models preserve the raw API responses with minimal transformation. Each source lands in its own table with ingestion metadata for lineage tracking.

-- examples/usda/domains/dbt/models/bronze/brz_crop_yields.sql
SELECT
    source_system,
    ingestion_timestamp,
    raw_data,
    year,
    state_fips_code,
    county_code,
    commodity_desc,
    data_item,
    value,
    cv_pct,
    load_time,
    _source_file_name,
    _source_file_timestamp
FROM {{ source('usda_raw', 'crop_yields') }}

Three Bronze tables map to the primary source systems:

Bronze Model	Source	Key Fields
`brz_crop_yields`	NASS QuickStats	`commodity_desc`, `state_fips_code`, `year`, `value`
`brz_snap_enrollment`	FNS SNAP	`state_code`, `enrollment_date`, `persons`, `benefits`
`brz_food_inspections`	FSIS	`establishment_number`, `inspection_date`, `violation_type`

Step 3 — Build Silver Layer (Cleansed and Conformed)¶

Silver models standardize commodity names, validate numeric fields, flag outliers, and compute derived metrics. The crop yields model demonstrates the pattern:

-- examples/usda/domains/dbt/models/silver/slv_crop_yields.sql (excerpt)
standardized AS (
    SELECT
        MD5(CONCAT_WS('|',
            state_fips_code, county_code,
            commodity_desc, CAST(year AS STRING), data_item
        )) AS crop_yield_sk,

        -- Commodity standardization
        CASE
            WHEN UPPER(commodity_desc) LIKE '%CORN%'
                 AND UPPER(commodity_desc) NOT LIKE '%POPCORN%'
            THEN 'CORN'
            WHEN UPPER(commodity_desc) LIKE '%SOYBEAN%' THEN 'SOYBEANS'
            WHEN UPPER(commodity_desc) LIKE '%WHEAT%'   THEN 'WHEAT'
            ELSE UPPER(TRIM(commodity_desc))
        END AS commodity,

        -- Derived yield metric
        CASE
            WHEN UPPER(data_item) LIKE '%YIELD%'
            THEN CAST(value AS DECIMAL(10,2))
        END AS yield_per_acre,

        -- Quality flag
        CASE
            WHEN value ~ '^[0-9]+\.?[0-9]*$' THEN TRUE
            ELSE FALSE
        END AS is_valid,

        CURRENT_TIMESTAMP() AS _dbt_loaded_at
    FROM base
)

Key Silver-layer transformations:

Geographic standardization — FIPS codes mapped to 2-letter state codes and county names
Commodity normalization — raw commodity_desc values collapsed into canonical names
Outlier detection — yield values flagged when they exceed 2 standard deviations from the 5-year rolling mean for the same state-commodity pair
Harvest efficiency — harvested_acres / planted_acres computed to catch data-quality issues (ratio should not exceed 110%)

Step 4 — Build Gold Layer: Crop Yield Forecasting¶

The yield forecast model computes 3-, 5-, and 10-year moving averages, year-over-year changes, trend classification, and a linear-regression-based next-year forecast with 95% confidence intervals.

-- examples/usda/domains/dbt/models/gold/gld_crop_yield_forecast.sql (excerpt)
trend_analysis AS (
    SELECT
        *,
        AVG(yield_per_acre) OVER (
            PARTITION BY state_code, commodity
            ORDER BY year ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS yield_3yr_avg,

        REGR_SLOPE(yield_per_acre, year) OVER (
            PARTITION BY state_code, commodity
            ORDER BY year ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
        ) AS yield_5yr_trend_slope
    FROM base_yields
),

forecasting AS (
    SELECT
        *,
        ROUND(yield_3yr_avg + yield_5yr_trend_slope, 2)
            AS yield_forecast_next_year,

        CONCAT(
            ROUND(yield_3yr_avg - (1.96 * yield_5yr_stddev), 1),
            ' - ',
            ROUND(yield_3yr_avg + (1.96 * yield_5yr_stddev), 1)
        ) AS yield_95pct_confidence_interval
    FROM trend_analysis
)

Forecast Method

The default forecast uses linear trend extrapolation from 5-year slopes. For production workloads, integrate Azure ML to train ARIMA or Prophet models on the Silver-layer time series and write predictions back to the Gold table via the yield_forecast_next_year column.

Step 5 — Build Gold Layer: SNAP Enrollment Trends¶

The SNAP trends model aggregates county-level enrollment to state-level monthly totals, then computes year-over-year growth, 3- and 12-month rolling averages, seasonal adjustments, and enrollment volatility.

-- Query: States with fastest-growing SNAP enrollment
SELECT
    state_code,
    enrollment_date,
    current_enrollment,
    enrollment_change_1yr,
    enrollment_trend,
    enrollment_3mo_avg,
    avg_benefits_per_person
FROM gold.gld_snap_trends
WHERE is_latest_month = TRUE
ORDER BY enrollment_change_1yr DESC
LIMIT 10;

The model classifies each state-month as INCREASING (>5% YoY growth), DECREASING (<-5%), or STABLE, and includes placeholder columns for economic correlation (unemployment rate, median income, poverty rate) to support future enrichment.

Step 6 — Build Gold Layer: Food Safety Risk Scoring¶

The risk scoring model computes a composite 0-100 score for each FSIS-inspected establishment based on three weighted components:

Component	Weight	Scoring Logic
Violation history	40%	12-month violation rate mapped to 0-100 scale
Inspection recency	30%	Days since last inspection — longer gaps increase risk
Violation severity	30%	Critical/moderate/minor breakdown with recency decay

The composite score is adjusted by industry multiplier (slaughter facilities 1.2x, retail 0.9x) and establishment size, then categorized into five tiers: CRITICAL, HIGH, MODERATE, LOW, MINIMAL.

-- Query: Establishments requiring immediate inspection
SELECT
    establishment_number,
    establishment_name,
    state_code,
    adjusted_risk_score,
    adjusted_risk_category,
    recommended_action,
    days_since_last_inspection,
    critical_violations,
    compliance_rate
FROM gold.gld_food_safety_risk_score
WHERE adjusted_risk_category IN ('CRITICAL', 'HIGH')
ORDER BY adjusted_risk_score DESC;

Score Reliability

The is_risk_score_reliable flag is TRUE only when an establishment has 3+ inspections within the last 12 months. New or infrequently inspected establishments may have artificially low scores. Filter on this flag for enforcement-priority dashboards.

Gold Layer Data Products¶

Data Product	dbt Model	Key Metrics	Refresh
Crop Yield Forecast	`gld_crop_yield_forecast`	`yield_per_acre`, `yield_3yr_avg`, `yield_forecast_next_year`, `yield_trend_5yr`	Daily
SNAP Enrollment Trends	`gld_snap_trends`	`current_enrollment`, `enrollment_change_1yr`, `enrollment_trend`, `avg_benefits_per_person`	Monthly
Food Safety Risk Score	`gld_food_safety_risk_score`	`adjusted_risk_score`, `adjusted_risk_category`, `compliance_rate`, `recommended_action`	Weekly
Agricultural Dashboard	`gld_agricultural_dashboard`	Cross-domain KPIs for executive reporting	Daily

Dashboard Design¶

The Power BI executive dashboard surfaces four analytical views:

graph TD
    subgraph "Agricultural Analytics Dashboard"
        A[Crop Production Overview<br/>Yield trends by commodity & state<br/>Forecast vs. actuals]
        B[SNAP Program Monitor<br/>Enrollment heatmap by state<br/>Benefits distribution]
        C[Food Safety Scorecard<br/>Risk distribution by category<br/>Overdue inspections]
        D[Cross-Domain Summary<br/>KPIs · Alerts · Drill-downs]
    end

    GLD_YF[gld_crop_yield_forecast] --> A
    GLD_ST[gld_snap_trends] --> B
    GLD_RS[gld_food_safety_risk_score] --> C
    GLD_AD[gld_agricultural_dashboard] --> D

Deployment¶

Cost

This vertical deploys real Azure resources. A running environment costs approximately $180-300/day (Synapse, Databricks, ADF, Storage, Key Vault). Always run teardown.sh when finished. See examples/usda/deploy/teardown.sh.

# Deploy infrastructure
az deployment group create \
  --resource-group rg-usda-analytics \
  --template-file deploy/bicep/DLZ/main.bicep \
  --parameters @examples/usda/deploy/params.dev.json

# Run dbt pipeline
cd examples/usda/domains/dbt
dbt seed && dbt run && dbt test

Two parameter files are provided: params.dev.json for commercial Azure and params.gov.json for Azure Government (IL4/IL5).

Cross-References¶

Reference implementation — examples/usda/ contains all dbt models, ingestion scripts, data contracts, seed data, and deployment parameters
Architecture deep-dive — examples/usda/ARCHITECTURE.md covers security, DR, cost optimization, and future ML integration
Geoanalytics — Combine with spatial analysis patterns for county-level crop yield mapping and precision agriculture overlays
Related verticals — EPA Environmental Analytics and NOAA Climate Analytics share geographic and environmental data that correlates with agricultural outcomes

Data Sources and API References¶

Resource	URL
NASS QuickStats API	quickstats.nass.usda.gov/api
NASS QuickStats Web Interface	quickstats.nass.usda.gov
FoodData Central API	fdc.nal.usda.gov
FNS SNAP Data (data.gov)	catalog.data.gov/dataset/snap-data
FSIS Inspection Data	www.fsis.usda.gov/science-data
USDA ERS Data Products	ers.usda.gov/data-products