Skip to content

Federal Justice Analytics

End-to-end analytics across the federal criminal justice system — from crime reporting through prosecution, sentencing, and incarceration — using publicly available FBI, DOJ, USSC, and BOP data on Microsoft Fabric.


Executive Summary

The U.S. federal justice system generates vast amounts of structured data at every stage: crime reporting (FBI NIBRS), prosecution (DOJ), sentencing (USSC), and incarceration (BOP). Individually, each dataset tells a partial story. Unified on Microsoft Fabric's lakehouse architecture, these datasets enable a complete pipeline view — from offense to outcome — revealing patterns in enforcement priorities, sentencing consistency, and system-wide trends.

This use case demonstrates how to build that unified view using the Supercharge Microsoft Fabric POC's medallion architecture, PySpark transformations, and Power BI dashboards.


Analytics Domains

1. Crime Analytics (FBI NIBRS)

The FBI's National Incident-Based Reporting System (NIBRS) replaced the legacy Summary Reporting System (SRS) as the national standard for crime data collection. NIBRS captures detailed information about each criminal incident including offense type, victim/offender demographics, location, property loss, and arrest data.

Key Metrics: - Offense counts and rates per 100,000 population - Clearance rates (percentage of offenses resolved by arrest) - Offense type distribution and year-over-year trends - Geographic patterns by state, MSA, and agency

PySpark: Crime Trend Analysis

# Databricks notebook source
# MAGIC %md
# MAGIC # FBI NIBRS Crime Trend Analysis

# COMMAND ----------

from pyspark.sql import functions as F
from pyspark.sql.window import Window

df_offenses = spark.read.table("lh_silver.fbi_nibrs_offenses")
df_population = spark.read.table("lh_silver.census_population")

# Offense rates per 100K population
df_rates = (
    df_offenses
    .groupBy("state_abbr", "year", "offense_category")
    .agg(F.count("incident_id").alias("offense_count"))
    .join(df_population, on=["state_abbr", "year"])
    .withColumn(
        "rate_per_100k",
        F.round(F.col("offense_count") / F.col("population") * 100000, 2)
    )
)

# Year-over-year change
w = Window.partitionBy("state_abbr", "offense_category").orderBy("year")
df_trends = (
    df_rates
    .withColumn("prior_rate", F.lag("rate_per_100k").over(w))
    .withColumn(
        "yoy_change_pct",
        F.round(
            (F.col("rate_per_100k") - F.col("prior_rate"))
            / F.col("prior_rate") * 100, 2
        )
    )
)

df_trends.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.crime_rate_trends"
)

# COMMAND ----------

# MAGIC %md
# MAGIC ## Clearance Rate Analysis

# COMMAND ----------

df_clearance = (
    df_offenses
    .groupBy("state_abbr", "year", "offense_category")
    .agg(
        F.count("incident_id").alias("total_offenses"),
        F.sum(
            F.when(F.col("cleared_flag") == True, 1).otherwise(0)
        ).alias("cleared_count")
    )
    .withColumn(
        "clearance_rate",
        F.round(F.col("cleared_count") / F.col("total_offenses") * 100, 2)
    )
)

df_clearance.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.crime_clearance_rates"
)

2. Prosecution Pipeline

The federal prosecution pipeline tracks cases from arrest through final disposition. By joining FBI arrest data, DOJ case management data, and USSC sentencing records, we can measure pipeline throughput and identify bottlenecks.

Pipeline Stages:

Arrest (FBI/Agency)
  └── Referral to U.S. Attorney
        ├── Declined (no prosecution)
        └── Filed
              ├── Dismissed
              └── Adjudicated
                    ├── Guilty Plea (~90% of convictions)
                    ├── Trial Conviction
                    └── Acquittal
                          └── Sentencing (USSC)
                                └── BOP Custody / Probation

PySpark: Prosecution Funnel

# COMMAND ----------

# MAGIC %md
# MAGIC # Federal Prosecution Pipeline Funnel

# COMMAND ----------

from pyspark.sql import functions as F

df_cases = spark.read.table("lh_silver.doj_federal_cases")

# Funnel metrics by fiscal year and district
df_funnel = (
    df_cases
    .groupBy("fiscal_year", "district")
    .agg(
        F.count("case_id").alias("total_referrals"),
        F.sum(F.when(F.col("status") == "Declined", 1).otherwise(0)).alias("declined"),
        F.sum(F.when(F.col("status") == "Filed", 1).otherwise(0)).alias("filed"),
        F.sum(F.when(F.col("status") == "Dismissed", 1).otherwise(0)).alias("dismissed"),
        F.sum(F.when(F.col("disposition") == "Guilty Plea", 1).otherwise(0)).alias("guilty_plea"),
        F.sum(F.when(F.col("disposition") == "Trial Conviction", 1).otherwise(0)).alias("trial_conviction"),
        F.sum(F.when(F.col("disposition") == "Acquittal", 1).otherwise(0)).alias("acquittal"),
    )
    .withColumn(
        "conviction_rate",
        F.round(
            (F.col("guilty_plea") + F.col("trial_conviction"))
            / (F.col("guilty_plea") + F.col("trial_conviction") + F.col("acquittal"))
            * 100, 2
        )
    )
    .withColumn(
        "declination_rate",
        F.round(F.col("declined") / F.col("total_referrals") * 100, 2)
    )
)

df_funnel.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.prosecution_pipeline_funnel"
)

3. Sentencing Disparity Analysis

The U.S. Sentencing Commission publishes detailed data on federal sentencing outcomes. Analyzing this data reveals patterns in sentencing consistency — or disparity — across districts, demographics, and offense types.

Key Dimensions: - Geographic: Sentencing variation by federal judicial district - Offense type: Drug trafficking, fraud, immigration, firearms, etc. - Departure patterns: Within-guideline vs. above/below guideline sentences - Demographic factors: Analysis of outcomes across different populations

PySpark: Sentencing Disparity

# COMMAND ----------

# MAGIC %md
# MAGIC # Sentencing Disparity Analysis (USSC Data)

# COMMAND ----------

from pyspark.sql import functions as F

df_sentences = spark.read.table("lh_silver.ussc_individual_sentences")

# District-level sentencing variation for drug offenses
df_district_variation = (
    df_sentences
    .filter(F.col("primary_offense_category") == "Drug Trafficking")
    .groupBy("district", "fiscal_year")
    .agg(
        F.count("case_id").alias("case_count"),
        F.avg("prison_months").alias("avg_prison_months"),
        F.percentile_approx("prison_months", 0.5).alias("median_prison_months"),
        F.stddev("prison_months").alias("stddev_prison_months"),
        F.avg("guideline_min_months").alias("avg_guideline_min"),
        # Departure analysis
        F.sum(F.when(F.col("departure_type") == "Below", 1).otherwise(0)).alias("below_guideline"),
        F.sum(F.when(F.col("departure_type") == "Within", 1).otherwise(0)).alias("within_guideline"),
        F.sum(F.when(F.col("departure_type") == "Above", 1).otherwise(0)).alias("above_guideline"),
    )
    .withColumn(
        "below_guideline_rate",
        F.round(F.col("below_guideline") / F.col("case_count") * 100, 2)
    )
    .withColumn(
        "sentence_to_guideline_ratio",
        F.round(F.col("avg_prison_months") / F.col("avg_guideline_min"), 2)
    )
)

df_district_variation.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.sentencing_disparity_by_district"
)

# COMMAND ----------

# MAGIC %md
# MAGIC ## Departure Rate Trends

# COMMAND ----------

df_departure_trends = (
    df_sentences
    .groupBy("fiscal_year", "primary_offense_category", "departure_type")
    .agg(F.count("case_id").alias("case_count"))
    .withColumn(
        "total_in_category",
        F.sum("case_count").over(
            Window.partitionBy("fiscal_year", "primary_offense_category")
        )
    )
    .withColumn(
        "departure_pct",
        F.round(F.col("case_count") / F.col("total_in_category") * 100, 2)
    )
)

df_departure_trends.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.sentencing_departure_trends"
)

DEA seizure data and USSC drug sentencing data together reveal trends in federal drug enforcement — what substances are being targeted, where seizures are concentrated, and how sentences have shifted over time.

Key Metrics: - Seizure volumes by drug type and region - Estimated street values - Sentencing trends by drug category - Mandatory minimum application rates

PySpark: Drug Enforcement Analysis

# COMMAND ----------

# MAGIC %md
# MAGIC # Drug Enforcement Trends

# COMMAND ----------

from pyspark.sql import functions as F

df_seizures = spark.read.table("lh_silver.dea_drug_seizures")

# Seizure trends by drug type
df_seizure_trends = (
    df_seizures
    .groupBy("fiscal_year", "drug_category", "region")
    .agg(
        F.count("seizure_id").alias("seizure_count"),
        F.sum("quantity_kg").alias("total_quantity_kg"),
        F.sum("estimated_value_usd").alias("total_street_value"),
        F.avg("quantity_kg").alias("avg_seizure_size_kg")
    )
    .orderBy("fiscal_year", "drug_category")
)

df_seizure_trends.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.drug_seizure_trends"
)

# COMMAND ----------

# Drug sentencing: mandatory minimum usage
df_drug_sentences = spark.read.table("lh_silver.ussc_individual_sentences").filter(
    F.col("primary_offense_category") == "Drug Trafficking"
)

df_mandatory_min = (
    df_drug_sentences
    .groupBy("fiscal_year", "drug_type")
    .agg(
        F.count("case_id").alias("total_cases"),
        F.sum(
            F.when(F.col("mandatory_minimum_applied") == True, 1).otherwise(0)
        ).alias("mandatory_min_count"),
        F.avg("prison_months").alias("avg_sentence_months")
    )
    .withColumn(
        "mandatory_min_rate",
        F.round(F.col("mandatory_min_count") / F.col("total_cases") * 100, 2)
    )
)

df_mandatory_min.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.drug_mandatory_minimum_trends"
)

Implementation in Fabric

Table Inventory

Layer Table Source Description
Bronze bronze_fbi_nibrs_incidents FBI CDE API Raw incident records
Bronze bronze_fbi_nibrs_offenses FBI CDE API Raw offense detail records
Bronze bronze_doj_federal_cases DOJ Open Data Raw federal case records
Bronze bronze_ussc_sentences USSC Data Files Raw individual sentencing records
Bronze bronze_bop_population BOP Statistics Raw inmate population snapshots
Bronze bronze_dea_seizures DEA Reports Raw drug seizure records
Silver silver_fbi_nibrs_offenses Cleansed, validated, enriched with geo codes
Silver silver_doj_federal_cases Standardized case records with pipeline stage
Silver silver_ussc_individual_sentences Validated sentencing with guideline ranges
Silver silver_bop_inmate_demographics Cleansed inmate demographics and facility data
Silver silver_dea_drug_seizures Standardized seizure records with categories
Gold gold_crime_rate_trends Offense rates per 100K with YoY change
Gold gold_crime_clearance_rates Clearance rates by state and category
Gold gold_prosecution_pipeline_funnel Pipeline metrics by district and year
Gold gold_sentencing_disparity_by_district District-level sentencing variation
Gold gold_sentencing_departure_trends Guideline departure rates over time
Gold gold_drug_seizure_trends Seizure volumes by type and region
Gold gold_drug_mandatory_minimum_trends Mandatory minimum application rates

Notebook Sequence

01_bronze_fbi_nibrs_ingest.py         → Ingest FBI NIBRS incident/offense data
02_bronze_doj_cases_ingest.py         → Ingest DOJ federal case data
03_bronze_ussc_sentences_ingest.py    → Ingest USSC sentencing files
04_bronze_bop_population_ingest.py    → Ingest BOP population statistics
05_bronze_dea_seizures_ingest.py      → Ingest DEA seizure data
06_silver_crime_cleanse.py            → Cleanse and validate NIBRS data
07_silver_cases_cleanse.py            → Standardize DOJ case records
08_silver_sentences_cleanse.py        → Validate sentencing data
09_gold_crime_analytics.py            → Crime rates and clearance analysis
10_gold_prosecution_funnel.py         → Prosecution pipeline metrics
11_gold_sentencing_disparity.py       → Sentencing variation analysis
12_gold_drug_enforcement.py           → Drug seizure and sentencing trends

Power BI Visualizations

Visual Type Data Source Purpose
Crime Rate Map Filled map gold_crime_rate_trends Geographic distribution of offense rates by state
Clearance Rate Comparison Clustered bar gold_crime_clearance_rates Clearance rates by offense category
Prosecution Funnel Funnel chart gold_prosecution_pipeline_funnel Referral → Filed → Convicted flow
Conviction Rate by District Bar chart gold_prosecution_pipeline_funnel District comparison of conviction rates
Sentencing Box Plot Box & whisker gold_sentencing_disparity_by_district Sentence distribution by district
Departure Rate Trend Stacked area gold_sentencing_departure_trends Below/Within/Above guideline over time
Drug Seizure Treemap Treemap gold_drug_seizure_trends Seizure volume by drug type and region
Mandatory Minimum Trend Line chart gold_drug_mandatory_minimum_trends Mandatory minimum rate over time by drug type

DAX Measures

// Conviction Rate
Conviction Rate =
DIVIDE(
    CALCULATE(
        COUNTROWS('ProsecutionFunnel'),
        'ProsecutionFunnel'[Disposition] IN {"Guilty Plea", "Trial Conviction"}
    ),
    CALCULATE(
        COUNTROWS('ProsecutionFunnel'),
        'ProsecutionFunnel'[Disposition] IN {"Guilty Plea", "Trial Conviction", "Acquittal"}
    ),
    0
)

// Sentence-to-Guideline Ratio
Sentence to Guideline Ratio =
DIVIDE(
    AVERAGE('Sentences'[PrisonMonths]),
    AVERAGE('Sentences'[GuidelineMinMonths]),
    0
)

// Below-Guideline Departure Rate
Below Guideline Rate =
DIVIDE(
    CALCULATE(COUNTROWS('Sentences'), 'Sentences'[DepartureType] = "Below"),
    COUNTROWS('Sentences'),
    0
)

Published References

Primary Data Sources

  • FBI Crime Data Explorer (CDE) — NIBRS data, offense statistics, law enforcement data https://cde.ucr.cjis.gov

  • DOJ Open Data Portal — Federal case data, grants, and agency statistics https://www.justice.gov/open/open-data

  • U.S. Sentencing Commission Research — Individual sentencing datafiles, annual reports, guideline manuals https://www.ussc.gov/research

  • Bureau of Justice Statistics (BJS) — Victimization, corrections, courts, law enforcement statistics https://bjs.ojp.gov

  • Bureau of Prisons (BOP) Statistics — Inmate population, demographics, facility data https://www.bop.gov/about/statistics/

  • DEA Data and Statistics — Drug seizure data, national threat assessments https://www.dea.gov/data-and-statistics

Research & Academic Sources

  • ICPSR National Archive of Criminal Justice Data (NACJD) https://www.icpsr.umich.edu/web/pages/NACJD/index.html

  • Vera Institute of Justice — Incarceration Trends https://github.com/vera-institute/incarceration-trends

  • DOJ Antitrust Case Filings (for white-collar crime cross-reference) https://www.justice.gov/atr/antitrust-case-filings


Microsoft Published Resources

The following Microsoft-published white papers and reference architectures support building federal justice analytics on Azure and Microsoft Fabric:

  • Azure for Government — Justice & Public Safety — Azure Government capabilities for law enforcement and justice agencies https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-overview-jps

  • The Future of Public Safety and Justice — Microsoft e-book on digital transformation in public safety https://info.microsoft.com/ww-landing-the-future-of-public-safety-and-justice.html

  • Microsoft Fabric Security White Paper — Data protection and governance for sensitive criminal justice data https://learn.microsoft.com/en-us/fabric/security/white-paper-landing-page

  • Medallion Lakehouse Architecture in Fabric — Official guidance for the Bronze/Silver/Gold pattern https://learn.microsoft.com/en-us/fabric/onelake/onelake-medallion-lakehouse-architecture

  • Real-Time Fraud Detection with Azure Stream Analytics — Applicable pattern for real-time crime alert systems https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-real-time-fraud-detection

  • Azure Synapse Analytics Security White Paper — Security architecture patterns for analytics workloads https://learn.microsoft.com/en-us/azure/synapse-analytics/guidance/security-white-paper-introduction

  • Cloud-Scale Analytics (Cloud Adoption Framework) — Enterprise data governance and data mesh patterns https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/cloud-scale-analytics/



Last Updated: 2026-04-23