Federal Justice Analytics¶
End-to-end analytics across the federal criminal justice system — from crime reporting through prosecution, sentencing, and incarceration — using publicly available FBI, DOJ, USSC, and BOP data on Microsoft Fabric.
Executive Summary¶
The U.S. federal justice system generates vast amounts of structured data at every stage: crime reporting (FBI NIBRS), prosecution (DOJ), sentencing (USSC), and incarceration (BOP). Individually, each dataset tells a partial story. Unified on Microsoft Fabric's lakehouse architecture, these datasets enable a complete pipeline view — from offense to outcome — revealing patterns in enforcement priorities, sentencing consistency, and system-wide trends.
This use case demonstrates how to build that unified view using the Supercharge Microsoft Fabric POC's medallion architecture, PySpark transformations, and Power BI dashboards.
Analytics Domains¶
1. Crime Analytics (FBI NIBRS)¶
The FBI's National Incident-Based Reporting System (NIBRS) replaced the legacy Summary Reporting System (SRS) as the national standard for crime data collection. NIBRS captures detailed information about each criminal incident including offense type, victim/offender demographics, location, property loss, and arrest data.
Key Metrics: - Offense counts and rates per 100,000 population - Clearance rates (percentage of offenses resolved by arrest) - Offense type distribution and year-over-year trends - Geographic patterns by state, MSA, and agency
PySpark: Crime Trend Analysis¶
# Databricks notebook source
# MAGIC %md
# MAGIC # FBI NIBRS Crime Trend Analysis
# COMMAND ----------
from pyspark.sql import functions as F
from pyspark.sql.window import Window
df_offenses = spark.read.table("lh_silver.fbi_nibrs_offenses")
df_population = spark.read.table("lh_silver.census_population")
# Offense rates per 100K population
df_rates = (
df_offenses
.groupBy("state_abbr", "year", "offense_category")
.agg(F.count("incident_id").alias("offense_count"))
.join(df_population, on=["state_abbr", "year"])
.withColumn(
"rate_per_100k",
F.round(F.col("offense_count") / F.col("population") * 100000, 2)
)
)
# Year-over-year change
w = Window.partitionBy("state_abbr", "offense_category").orderBy("year")
df_trends = (
df_rates
.withColumn("prior_rate", F.lag("rate_per_100k").over(w))
.withColumn(
"yoy_change_pct",
F.round(
(F.col("rate_per_100k") - F.col("prior_rate"))
/ F.col("prior_rate") * 100, 2
)
)
)
df_trends.write.format("delta").mode("overwrite").saveAsTable(
"lh_gold.crime_rate_trends"
)
# COMMAND ----------
# MAGIC %md
# MAGIC ## Clearance Rate Analysis
# COMMAND ----------
df_clearance = (
df_offenses
.groupBy("state_abbr", "year", "offense_category")
.agg(
F.count("incident_id").alias("total_offenses"),
F.sum(
F.when(F.col("cleared_flag") == True, 1).otherwise(0)
).alias("cleared_count")
)
.withColumn(
"clearance_rate",
F.round(F.col("cleared_count") / F.col("total_offenses") * 100, 2)
)
)
df_clearance.write.format("delta").mode("overwrite").saveAsTable(
"lh_gold.crime_clearance_rates"
)
2. Prosecution Pipeline¶
The federal prosecution pipeline tracks cases from arrest through final disposition. By joining FBI arrest data, DOJ case management data, and USSC sentencing records, we can measure pipeline throughput and identify bottlenecks.
Pipeline Stages:
Arrest (FBI/Agency)
└── Referral to U.S. Attorney
├── Declined (no prosecution)
└── Filed
├── Dismissed
└── Adjudicated
├── Guilty Plea (~90% of convictions)
├── Trial Conviction
└── Acquittal
└── Sentencing (USSC)
└── BOP Custody / Probation
PySpark: Prosecution Funnel¶
# COMMAND ----------
# MAGIC %md
# MAGIC # Federal Prosecution Pipeline Funnel
# COMMAND ----------
from pyspark.sql import functions as F
df_cases = spark.read.table("lh_silver.doj_federal_cases")
# Funnel metrics by fiscal year and district
df_funnel = (
df_cases
.groupBy("fiscal_year", "district")
.agg(
F.count("case_id").alias("total_referrals"),
F.sum(F.when(F.col("status") == "Declined", 1).otherwise(0)).alias("declined"),
F.sum(F.when(F.col("status") == "Filed", 1).otherwise(0)).alias("filed"),
F.sum(F.when(F.col("status") == "Dismissed", 1).otherwise(0)).alias("dismissed"),
F.sum(F.when(F.col("disposition") == "Guilty Plea", 1).otherwise(0)).alias("guilty_plea"),
F.sum(F.when(F.col("disposition") == "Trial Conviction", 1).otherwise(0)).alias("trial_conviction"),
F.sum(F.when(F.col("disposition") == "Acquittal", 1).otherwise(0)).alias("acquittal"),
)
.withColumn(
"conviction_rate",
F.round(
(F.col("guilty_plea") + F.col("trial_conviction"))
/ (F.col("guilty_plea") + F.col("trial_conviction") + F.col("acquittal"))
* 100, 2
)
)
.withColumn(
"declination_rate",
F.round(F.col("declined") / F.col("total_referrals") * 100, 2)
)
)
df_funnel.write.format("delta").mode("overwrite").saveAsTable(
"lh_gold.prosecution_pipeline_funnel"
)
3. Sentencing Disparity Analysis¶
The U.S. Sentencing Commission publishes detailed data on federal sentencing outcomes. Analyzing this data reveals patterns in sentencing consistency — or disparity — across districts, demographics, and offense types.
Key Dimensions: - Geographic: Sentencing variation by federal judicial district - Offense type: Drug trafficking, fraud, immigration, firearms, etc. - Departure patterns: Within-guideline vs. above/below guideline sentences - Demographic factors: Analysis of outcomes across different populations
PySpark: Sentencing Disparity¶
# COMMAND ----------
# MAGIC %md
# MAGIC # Sentencing Disparity Analysis (USSC Data)
# COMMAND ----------
from pyspark.sql import functions as F
df_sentences = spark.read.table("lh_silver.ussc_individual_sentences")
# District-level sentencing variation for drug offenses
df_district_variation = (
df_sentences
.filter(F.col("primary_offense_category") == "Drug Trafficking")
.groupBy("district", "fiscal_year")
.agg(
F.count("case_id").alias("case_count"),
F.avg("prison_months").alias("avg_prison_months"),
F.percentile_approx("prison_months", 0.5).alias("median_prison_months"),
F.stddev("prison_months").alias("stddev_prison_months"),
F.avg("guideline_min_months").alias("avg_guideline_min"),
# Departure analysis
F.sum(F.when(F.col("departure_type") == "Below", 1).otherwise(0)).alias("below_guideline"),
F.sum(F.when(F.col("departure_type") == "Within", 1).otherwise(0)).alias("within_guideline"),
F.sum(F.when(F.col("departure_type") == "Above", 1).otherwise(0)).alias("above_guideline"),
)
.withColumn(
"below_guideline_rate",
F.round(F.col("below_guideline") / F.col("case_count") * 100, 2)
)
.withColumn(
"sentence_to_guideline_ratio",
F.round(F.col("avg_prison_months") / F.col("avg_guideline_min"), 2)
)
)
df_district_variation.write.format("delta").mode("overwrite").saveAsTable(
"lh_gold.sentencing_disparity_by_district"
)
# COMMAND ----------
# MAGIC %md
# MAGIC ## Departure Rate Trends
# COMMAND ----------
df_departure_trends = (
df_sentences
.groupBy("fiscal_year", "primary_offense_category", "departure_type")
.agg(F.count("case_id").alias("case_count"))
.withColumn(
"total_in_category",
F.sum("case_count").over(
Window.partitionBy("fiscal_year", "primary_offense_category")
)
)
.withColumn(
"departure_pct",
F.round(F.col("case_count") / F.col("total_in_category") * 100, 2)
)
)
df_departure_trends.write.format("delta").mode("overwrite").saveAsTable(
"lh_gold.sentencing_departure_trends"
)
4. Drug Enforcement Trends¶
DEA seizure data and USSC drug sentencing data together reveal trends in federal drug enforcement — what substances are being targeted, where seizures are concentrated, and how sentences have shifted over time.
Key Metrics: - Seizure volumes by drug type and region - Estimated street values - Sentencing trends by drug category - Mandatory minimum application rates
PySpark: Drug Enforcement Analysis¶
# COMMAND ----------
# MAGIC %md
# MAGIC # Drug Enforcement Trends
# COMMAND ----------
from pyspark.sql import functions as F
df_seizures = spark.read.table("lh_silver.dea_drug_seizures")
# Seizure trends by drug type
df_seizure_trends = (
df_seizures
.groupBy("fiscal_year", "drug_category", "region")
.agg(
F.count("seizure_id").alias("seizure_count"),
F.sum("quantity_kg").alias("total_quantity_kg"),
F.sum("estimated_value_usd").alias("total_street_value"),
F.avg("quantity_kg").alias("avg_seizure_size_kg")
)
.orderBy("fiscal_year", "drug_category")
)
df_seizure_trends.write.format("delta").mode("overwrite").saveAsTable(
"lh_gold.drug_seizure_trends"
)
# COMMAND ----------
# Drug sentencing: mandatory minimum usage
df_drug_sentences = spark.read.table("lh_silver.ussc_individual_sentences").filter(
F.col("primary_offense_category") == "Drug Trafficking"
)
df_mandatory_min = (
df_drug_sentences
.groupBy("fiscal_year", "drug_type")
.agg(
F.count("case_id").alias("total_cases"),
F.sum(
F.when(F.col("mandatory_minimum_applied") == True, 1).otherwise(0)
).alias("mandatory_min_count"),
F.avg("prison_months").alias("avg_sentence_months")
)
.withColumn(
"mandatory_min_rate",
F.round(F.col("mandatory_min_count") / F.col("total_cases") * 100, 2)
)
)
df_mandatory_min.write.format("delta").mode("overwrite").saveAsTable(
"lh_gold.drug_mandatory_minimum_trends"
)
Implementation in Fabric¶
Table Inventory¶
| Layer | Table | Source | Description |
|---|---|---|---|
| Bronze | bronze_fbi_nibrs_incidents | FBI CDE API | Raw incident records |
| Bronze | bronze_fbi_nibrs_offenses | FBI CDE API | Raw offense detail records |
| Bronze | bronze_doj_federal_cases | DOJ Open Data | Raw federal case records |
| Bronze | bronze_ussc_sentences | USSC Data Files | Raw individual sentencing records |
| Bronze | bronze_bop_population | BOP Statistics | Raw inmate population snapshots |
| Bronze | bronze_dea_seizures | DEA Reports | Raw drug seizure records |
| Silver | silver_fbi_nibrs_offenses | Cleansed, validated, enriched with geo codes | |
| Silver | silver_doj_federal_cases | Standardized case records with pipeline stage | |
| Silver | silver_ussc_individual_sentences | Validated sentencing with guideline ranges | |
| Silver | silver_bop_inmate_demographics | Cleansed inmate demographics and facility data | |
| Silver | silver_dea_drug_seizures | Standardized seizure records with categories | |
| Gold | gold_crime_rate_trends | Offense rates per 100K with YoY change | |
| Gold | gold_crime_clearance_rates | Clearance rates by state and category | |
| Gold | gold_prosecution_pipeline_funnel | Pipeline metrics by district and year | |
| Gold | gold_sentencing_disparity_by_district | District-level sentencing variation | |
| Gold | gold_sentencing_departure_trends | Guideline departure rates over time | |
| Gold | gold_drug_seizure_trends | Seizure volumes by type and region | |
| Gold | gold_drug_mandatory_minimum_trends | Mandatory minimum application rates |
Notebook Sequence¶
01_bronze_fbi_nibrs_ingest.py → Ingest FBI NIBRS incident/offense data
02_bronze_doj_cases_ingest.py → Ingest DOJ federal case data
03_bronze_ussc_sentences_ingest.py → Ingest USSC sentencing files
04_bronze_bop_population_ingest.py → Ingest BOP population statistics
05_bronze_dea_seizures_ingest.py → Ingest DEA seizure data
06_silver_crime_cleanse.py → Cleanse and validate NIBRS data
07_silver_cases_cleanse.py → Standardize DOJ case records
08_silver_sentences_cleanse.py → Validate sentencing data
09_gold_crime_analytics.py → Crime rates and clearance analysis
10_gold_prosecution_funnel.py → Prosecution pipeline metrics
11_gold_sentencing_disparity.py → Sentencing variation analysis
12_gold_drug_enforcement.py → Drug seizure and sentencing trends
Power BI Visualizations¶
| Visual | Type | Data Source | Purpose |
|---|---|---|---|
| Crime Rate Map | Filled map | gold_crime_rate_trends | Geographic distribution of offense rates by state |
| Clearance Rate Comparison | Clustered bar | gold_crime_clearance_rates | Clearance rates by offense category |
| Prosecution Funnel | Funnel chart | gold_prosecution_pipeline_funnel | Referral → Filed → Convicted flow |
| Conviction Rate by District | Bar chart | gold_prosecution_pipeline_funnel | District comparison of conviction rates |
| Sentencing Box Plot | Box & whisker | gold_sentencing_disparity_by_district | Sentence distribution by district |
| Departure Rate Trend | Stacked area | gold_sentencing_departure_trends | Below/Within/Above guideline over time |
| Drug Seizure Treemap | Treemap | gold_drug_seizure_trends | Seizure volume by drug type and region |
| Mandatory Minimum Trend | Line chart | gold_drug_mandatory_minimum_trends | Mandatory minimum rate over time by drug type |
DAX Measures¶
// Conviction Rate
Conviction Rate =
DIVIDE(
CALCULATE(
COUNTROWS('ProsecutionFunnel'),
'ProsecutionFunnel'[Disposition] IN {"Guilty Plea", "Trial Conviction"}
),
CALCULATE(
COUNTROWS('ProsecutionFunnel'),
'ProsecutionFunnel'[Disposition] IN {"Guilty Plea", "Trial Conviction", "Acquittal"}
),
0
)
// Sentence-to-Guideline Ratio
Sentence to Guideline Ratio =
DIVIDE(
AVERAGE('Sentences'[PrisonMonths]),
AVERAGE('Sentences'[GuidelineMinMonths]),
0
)
// Below-Guideline Departure Rate
Below Guideline Rate =
DIVIDE(
CALCULATE(COUNTROWS('Sentences'), 'Sentences'[DepartureType] = "Below"),
COUNTROWS('Sentences'),
0
)
Published References¶
Primary Data Sources¶
-
FBI Crime Data Explorer (CDE) — NIBRS data, offense statistics, law enforcement data https://cde.ucr.cjis.gov
-
DOJ Open Data Portal — Federal case data, grants, and agency statistics https://www.justice.gov/open/open-data
-
U.S. Sentencing Commission Research — Individual sentencing datafiles, annual reports, guideline manuals https://www.ussc.gov/research
-
Bureau of Justice Statistics (BJS) — Victimization, corrections, courts, law enforcement statistics https://bjs.ojp.gov
-
Bureau of Prisons (BOP) Statistics — Inmate population, demographics, facility data https://www.bop.gov/about/statistics/
-
DEA Data and Statistics — Drug seizure data, national threat assessments https://www.dea.gov/data-and-statistics
Research & Academic Sources¶
-
ICPSR National Archive of Criminal Justice Data (NACJD) https://www.icpsr.umich.edu/web/pages/NACJD/index.html
-
Vera Institute of Justice — Incarceration Trends https://github.com/vera-institute/incarceration-trends
-
DOJ Antitrust Case Filings (for white-collar crime cross-reference) https://www.justice.gov/atr/antitrust-case-filings
Microsoft Published Resources¶
The following Microsoft-published white papers and reference architectures support building federal justice analytics on Azure and Microsoft Fabric:
-
Azure for Government — Justice & Public Safety — Azure Government capabilities for law enforcement and justice agencies https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-overview-jps
-
The Future of Public Safety and Justice — Microsoft e-book on digital transformation in public safety https://info.microsoft.com/ww-landing-the-future-of-public-safety-and-justice.html
-
Microsoft Fabric Security White Paper — Data protection and governance for sensitive criminal justice data https://learn.microsoft.com/en-us/fabric/security/white-paper-landing-page
-
Medallion Lakehouse Architecture in Fabric — Official guidance for the Bronze/Silver/Gold pattern https://learn.microsoft.com/en-us/fabric/onelake/onelake-medallion-lakehouse-architecture
-
Real-Time Fraud Detection with Azure Stream Analytics — Applicable pattern for real-time crime alert systems https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-real-time-fraud-detection
-
Azure Synapse Analytics Security White Paper — Security architecture patterns for analytics workloads https://learn.microsoft.com/en-us/azure/synapse-analytics/guidance/security-white-paper-introduction
-
Cloud-Scale Analytics (Cloud Adoption Framework) — Enterprise data governance and data mesh patterns https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/cloud-scale-analytics/
Related Documentation¶
Last Updated: 2026-04-23