Skip to content

Antitrust Analytics

Leveraging Microsoft Fabric to analyze market concentration, merger activity, and cartel detection patterns using publicly available DOJ, FTC, and federal enforcement data.


Executive Summary

Antitrust enforcement is among the most data-intensive functions in federal law enforcement. The Department of Justice Antitrust Division and the Federal Trade Commission collectively review thousands of merger filings annually under the Hart-Scott-Rodino (HSR) Act, investigate price-fixing conspiracies, and monitor market concentration across every sector of the U.S. economy.

Historically, this analysis has been fragmented across agency systems, spreadsheets, and ad-hoc tools. A unified analytics platform on Microsoft Fabric enables:

  • Real-time market concentration monitoring using the Herfindahl-Hirschman Index (HHI)
  • Merger review pipeline tracking from HSR filing through final disposition
  • Cartel detection pattern recognition across bid-rigging, price-fixing, and market allocation schemes
  • Cross-agency correlation linking antitrust outcomes to small business lending (SBA), environmental compliance (EPA), and economic indicators

This use case demonstrates how the Supercharge Microsoft Fabric POC's medallion architecture, PySpark notebooks, and Power BI visualizations apply to antitrust analytics using real, publicly available data sources.


Data Sources

Primary Sources

Source Agency URL Data Available
Antitrust Case Filings DOJ https://www.justice.gov/atr/antitrust-case-filings Civil and criminal case index, filing dates, outcomes
HSR Filing Data FTC/DOJ https://catalog.data.gov Premerger notification statistics, filing volumes
2023 Merger Guidelines FTC/DOJ https://www.justice.gov/atr/2023-merger-guidelines Current HHI thresholds and analytical framework
Competition Cases FTC https://www.ftc.gov/legal-library/browse/cases-proceedings Enforcement actions, consent orders, merger challenges
Open Data Portal DOJ https://www.justice.gov/open/open-data Cross-division datasets

Supporting Sources

Source Agency URL Use In Analytics
FBI Crime Data Explorer FBI https://cde.ucr.cjis.gov White-collar crime overlay
USSC Research USSC https://www.ussc.gov/research Antitrust sentencing outcomes
BJS Statistics DOJ https://bjs.ojp.gov Federal prosecution pipeline
USAspending Treasury https://www.usaspending.gov Government contract concentration
SBA Lending Data SBA https://data.sba.gov Small business lending by market
SEC EDGAR SEC https://www.sec.gov/edgar Public company filings, M&A disclosures

HHI Market Concentration Analysis

Background

The Herfindahl-Hirschman Index (HHI) is the standard measure of market concentration used by the DOJ and FTC in merger review. It is calculated as the sum of the squares of market share percentages for all firms in a market.

HHI = Σ (market_share_i)²

Where market_share_i is the percentage market share of firm i.

2023 DOJ/FTC Merger Guidelines Thresholds

The revised Merger Guidelines published in December 2023 establish the following HHI thresholds:

Classification HHI Range Enforcement Implication
Unconcentrated HHI < 1,500 Mergers unlikely to raise concerns
Moderately Concentrated 1,500 ≤ HHI ≤ 2,500 Mergers increasing HHI by > 100 warrant scrutiny
Highly Concentrated HHI > 2,500 Mergers increasing HHI by > 100 presumed anticompetitive

Delta (Change in HHI) Thresholds

The delta (ΔHHI) measures the change in market concentration caused by a proposed merger:

ΔHHI = HHI_post_merger - HHI_pre_merger

Simplified calculation for a merger between firms with shares a and b:

ΔHHI = 2 × a × b
Market Status Delta Threshold Presumption
Moderately Concentrated (1,500–2,500) ΔHHI > 200 Presumptively anticompetitive
Highly Concentrated (> 2,500) ΔHHI > 100 Presumptively anticompetitive
Any market ΔHHI > 200 with post-merger HHI > 2,500 Strong presumption; shifts burden to merging parties

PySpark: HHI Calculation

# Databricks notebook source
# MAGIC %md
# MAGIC # HHI Market Concentration Analysis
# MAGIC Bronze → Silver → Gold pipeline for HHI computation

# COMMAND ----------

from pyspark.sql import functions as F
from pyspark.sql.window import Window

# Read market share data from Silver layer
df_market = spark.read.table("lh_silver.antitrust_market_shares")

# Calculate HHI per market per year
df_hhi = (
    df_market
    .withColumn("share_squared", F.col("market_share_pct") ** 2)
    .groupBy("market_id", "naics_code", "market_name", "year")
    .agg(
        F.sum("share_squared").alias("hhi"),
        F.count("firm_id").alias("firm_count"),
        F.max("market_share_pct").alias("top_firm_share"),
        F.collect_list(
            F.struct("firm_name", "market_share_pct")
        ).alias("firms")
    )
)

# Classify concentration levels per 2023 Guidelines
df_hhi_classified = (
    df_hhi
    .withColumn(
        "concentration_level",
        F.when(F.col("hhi") < 1500, "Unconcentrated")
         .when(F.col("hhi") <= 2500, "Moderately Concentrated")
         .otherwise("Highly Concentrated")
    )
    .withColumn(
        "enforcement_flag",
        F.when(F.col("hhi") > 2500, True).otherwise(False)
    )
)

# Write to Gold layer
df_hhi_classified.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.antitrust_hhi_concentration"
)

# COMMAND ----------

# MAGIC %md
# MAGIC ## HHI Year-over-Year Change

# COMMAND ----------

w = Window.partitionBy("market_id").orderBy("year")

df_hhi_delta = (
    df_hhi_classified
    .withColumn("hhi_prior_year", F.lag("hhi").over(w))
    .withColumn("hhi_delta", F.col("hhi") - F.col("hhi_prior_year"))
    .withColumn(
        "trend",
        F.when(F.col("hhi_delta") > 100, "Concentrating")
         .when(F.col("hhi_delta") < -100, "Deconcentrating")
         .otherwise("Stable")
    )
)

df_hhi_delta.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.antitrust_hhi_trends"
)

Merger Review Pipeline

Pipeline Stages

A merger subject to HSR notification follows this pipeline:

HSR Filing
  └── Initial Review (30 calendar days)
        ├── Early Termination (approved before 30 days)
        └── Second Request (extended investigation)
              └── Extended Review (variable, often 6-12 months)
                    ├── Approved (unconditional clearance)
                    ├── Approved with Conditions (consent decree / divestiture)
                    ├── Challenged (DOJ/FTC files suit)
                    │     ├── Blocked (court injunction)
                    │     └── Settlement (negotiated remedy)
                    └── Abandoned (parties withdraw)

Pipeline Data Model

Bronze: raw_hsr_filings
  - filing_id, filing_date, acquirer, target, transaction_value,
    naics_code, agency_assigned (DOJ or FTC)

Silver: hsr_filings_cleansed
  - Standardized names, validated NAICS codes, deduplicated,
    enriched with market_id lookup

Gold: merger_review_pipeline
  - filing_id, stage, stage_entry_date, stage_exit_date,
    days_in_stage, outcome, remedy_type, divestitures

PySpark: Merger Pipeline Analytics

# COMMAND ----------

from pyspark.sql import functions as F

df_mergers = spark.read.table("lh_gold.merger_review_pipeline")

# Stage duration analysis
df_stage_stats = (
    df_mergers
    .groupBy("stage", "outcome")
    .agg(
        F.count("filing_id").alias("filing_count"),
        F.avg("days_in_stage").alias("avg_days"),
        F.percentile_approx("days_in_stage", 0.5).alias("median_days"),
        F.max("days_in_stage").alias("max_days")
    )
    .orderBy("stage", "outcome")
)

# Challenge rate by NAICS sector
df_challenge_rate = (
    df_mergers
    .filter(F.col("stage") == "Decision")
    .groupBy("naics_sector")
    .agg(
        F.count("filing_id").alias("total_reviews"),
        F.sum(
            F.when(F.col("outcome").isin("Challenged", "Blocked"), 1).otherwise(0)
        ).alias("challenged_count"),
    )
    .withColumn(
        "challenge_rate",
        F.round(F.col("challenged_count") / F.col("total_reviews") * 100, 2)
    )
    .orderBy(F.desc("challenge_rate"))
)

df_challenge_rate.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.antitrust_challenge_rates"
)

# COMMAND ----------

# MAGIC %md
# MAGIC ## Second Request Analysis
# MAGIC Second Requests are the "deep dive" investigations. Tracking their
# MAGIC frequency and outcomes reveals enforcement intensity trends.

# COMMAND ----------

df_second_requests = (
    df_mergers
    .filter(F.col("stage") == "Second Request")
    .withColumn("year", F.year("stage_entry_date"))
    .groupBy("year")
    .agg(
        F.count("filing_id").alias("second_requests_issued"),
        F.avg("days_in_stage").alias("avg_review_days"),
        F.sum(
            F.when(F.col("outcome") == "Abandoned", 1).otherwise(0)
        ).alias("abandoned_after_2r")
    )
    .orderBy("year")
)

df_second_requests.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.antitrust_second_request_trends"
)

Cartel Detection Patterns

Types of Antitrust Conspiracies

Pattern Description Detection Signals
Price Fixing Competitors agree to set prices Parallel pricing, identical bid amounts, unusual price stability
Bid Rigging Competitors coordinate bids on contracts Rotating winners, complementary bidding, cover bids with consistent spreads
Market Allocation Competitors divide territories or customers Geographic exclusivity, customer non-compete, sudden market exits
Group Boycott Competitors agree to exclude a rival Coordinated refusal to deal, simultaneous contract terminations

Bid-Rigging Detection Analytics

# COMMAND ----------

# MAGIC %md
# MAGIC # Bid-Rigging Pattern Detection
# MAGIC Identifies suspicious patterns in government contract bidding.

# COMMAND ----------

from pyspark.sql import functions as F
from pyspark.sql.window import Window

df_bids = spark.read.table("lh_silver.government_contract_bids")

# Pattern 1: Rotating winners
# If a small group of firms take turns winning in a market, flag it
w_market = Window.partitionBy("market_id", "contract_category").orderBy("award_date")

df_winners = (
    df_bids
    .filter(F.col("is_winner") == True)
    .withColumn("prev_winner", F.lag("firm_id").over(w_market))
    .withColumn("prev_prev_winner", F.lag("firm_id", 2).over(w_market))
    .withColumn(
        "rotation_flag",
        F.when(
            (F.col("firm_id") == F.col("prev_prev_winner")) &
            (F.col("firm_id") != F.col("prev_winner")),
            True
        ).otherwise(False)
    )
)

# Pattern 2: Cover bids with consistent spreads
# If losing bids are always X% above the winner, that is suspicious
df_bid_spreads = (
    df_bids
    .join(
        df_bids.filter(F.col("is_winner") == True)
        .select(
            F.col("contract_id"),
            F.col("bid_amount").alias("winning_bid")
        ),
        on="contract_id"
    )
    .filter(F.col("is_winner") == False)
    .withColumn(
        "spread_pct",
        F.round(
            (F.col("bid_amount") - F.col("winning_bid"))
            / F.col("winning_bid") * 100, 2
        )
    )
)

# Flag markets where spread variance is suspiciously low
df_suspicious_markets = (
    df_bid_spreads
    .groupBy("market_id", "firm_id")
    .agg(
        F.avg("spread_pct").alias("avg_spread"),
        F.stddev("spread_pct").alias("spread_stddev"),
        F.count("contract_id").alias("bid_count")
    )
    .filter(
        (F.col("spread_stddev") < 1.0) &  # Very consistent spreads
        (F.col("bid_count") >= 5)           # Enough data points
    )
)

df_suspicious_markets.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.antitrust_bidrigging_flags"
)

# COMMAND ----------

# MAGIC %md
# MAGIC ## Price-Fixing Detection: Parallel Pricing

# COMMAND ----------

# Pattern 3: Parallel pricing — competitors change prices
# in lockstep within a narrow time window
df_prices = spark.read.table("lh_silver.market_prices")

w_firm = Window.partitionBy("firm_id", "product_category").orderBy("effective_date")

df_price_changes = (
    df_prices
    .withColumn("prev_price", F.lag("price").over(w_firm))
    .withColumn("price_change_pct", 
        F.round((F.col("price") - F.col("prev_price")) / F.col("prev_price") * 100, 2)
    )
    .filter(F.col("price_change_pct").isNotNull())
)

# Find instances where multiple firms changed prices by similar amounts
# within a short window
df_parallel = (
    df_price_changes.alias("a")
    .join(
        df_price_changes.alias("b"),
        (F.col("a.product_category") == F.col("b.product_category")) &
        (F.col("a.firm_id") != F.col("b.firm_id")) &
        (F.abs(F.datediff(F.col("a.effective_date"), F.col("b.effective_date"))) <= 7) &
        (F.abs(F.col("a.price_change_pct") - F.col("b.price_change_pct")) < 0.5)
    )
    .select(
        F.col("a.product_category"),
        F.col("a.firm_id").alias("firm_a"),
        F.col("b.firm_id").alias("firm_b"),
        F.col("a.effective_date").alias("date_a"),
        F.col("b.effective_date").alias("date_b"),
        F.col("a.price_change_pct").alias("change_a"),
        F.col("b.price_change_pct").alias("change_b")
    )
)

df_parallel.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.antitrust_parallel_pricing_flags"
)

Penalty & Sentencing Analysis

Criminal Antitrust Penalties

DOJ criminal antitrust cases carry significant penalties. Analysis of USSC sentencing data reveals patterns in enforcement outcomes.

# COMMAND ----------

from pyspark.sql import functions as F

df_sentences = spark.read.table("lh_silver.ussc_antitrust_sentences")

# Penalty analysis by offense type
df_penalties = (
    df_sentences
    .groupBy("offense_type", "fiscal_year")
    .agg(
        F.count("case_id").alias("case_count"),
        F.sum("fine_amount").alias("total_fines"),
        F.avg("fine_amount").alias("avg_fine"),
        F.max("fine_amount").alias("max_fine"),
        F.avg("prison_months").alias("avg_prison_months"),
        F.sum(
            F.when(F.col("prison_months") > 0, 1).otherwise(0)
        ).alias("incarceration_count")
    )
    .withColumn(
        "incarceration_rate",
        F.round(F.col("incarceration_count") / F.col("case_count") * 100, 2)
    )
    .orderBy("fiscal_year", "offense_type")
)

df_penalties.write.format("delta").mode("overwrite").saveAsTable(
    "lh_gold.antitrust_penalty_trends"
)

Implementation in Fabric

Table Inventory

Layer Table Source Description
Bronze bronze_doj_antitrust_filings DOJ Case Index Raw case filing records
Bronze bronze_hsr_filings HSR/Data.gov Raw premerger notification data
Bronze bronze_ftc_enforcement FTC Cases Raw FTC enforcement actions
Bronze bronze_ussc_sentences USSC Raw sentencing data
Bronze bronze_government_bids USAspending Raw contract bid data
Silver silver_antitrust_cases Cleansed, deduplicated case records
Silver silver_hsr_filings Standardized HSR filings with NAICS validation
Silver silver_market_shares Computed market shares from revenue/contract data
Silver silver_ussc_antitrust_sentences Validated sentencing records
Gold gold_hhi_concentration HHI scores per market per year
Gold gold_hhi_trends Year-over-year HHI changes
Gold gold_merger_pipeline Merger review stage tracking
Gold gold_challenge_rates Challenge rate by sector
Gold gold_bidrigging_flags Suspicious bidding patterns
Gold gold_parallel_pricing_flags Parallel pricing detections
Gold gold_penalty_trends Penalty and sentencing trends

Notebook Sequence

01_bronze_doj_antitrust_ingest.py     → Ingest DOJ case filings
02_bronze_hsr_ingest.py               → Ingest HSR filing data
03_silver_antitrust_cleanse.py        → Cleanse and standardize cases
04_silver_market_share_compute.py     → Calculate market shares
05_gold_hhi_analysis.py              → HHI concentration & trends
06_gold_merger_pipeline.py           → Merger review pipeline analytics
07_gold_cartel_detection.py          → Bid-rigging & price-fixing patterns
08_gold_penalty_analysis.py          → Sentencing and fine trends

Power BI Visualizations

Visual Type Data Source Purpose
HHI Scatter Plot Scatter chart gold_hhi_concentration X: firm count, Y: HHI, color: concentration level, size: top firm share
HHI Trend Lines Line chart gold_hhi_trends HHI over time per sector with threshold reference lines at 1,500 and 2,500
Merger Funnel Funnel chart gold_merger_pipeline Filing → Review → Second Request → Decision (with branch counts)
Challenge Rate Bar Bar chart gold_challenge_rates Challenge rate by NAICS sector, sorted descending
Penalty Waterfall Waterfall chart gold_penalty_trends Total fines by year showing growth/decline contributions
Bid-Rigging Heat Map Matrix gold_bidrigging_flags Markets × firms with color intensity by flag count
Geographic Map Map gold_hhi_concentration Regional HHI by market geography

DAX Measures

// HHI Classification
HHI Classification =
SWITCH(
    TRUE(),
    [HHI] < 1500, "Unconcentrated",
    [HHI] <= 2500, "Moderately Concentrated",
    "Highly Concentrated"
)

// Challenge Rate
Challenge Rate =
DIVIDE(
    CALCULATE(
        COUNTROWS('MergerPipeline'),
        'MergerPipeline'[Outcome] IN {"Challenged", "Blocked"}
    ),
    COUNTROWS('MergerPipeline'),
    0
)

// YoY HHI Change
HHI Delta YoY =
VAR CurrentHHI = [HHI]
VAR PriorHHI =
    CALCULATE(
        [HHI],
        DATEADD('Calendar'[Date], -1, YEAR)
    )
RETURN
    CurrentHHI - PriorHHI

Cross-Domain Analysis

Antitrust × SBA Small Business Lending

Hypothesis: Market consolidation (rising HHI) in a region correlates with reduced small business loan origination.

# COMMAND ----------

# Join antitrust HHI data with SBA lending data by geography and year
df_hhi = spark.read.table("lh_gold.antitrust_hhi_concentration")
df_sba = spark.read.table("lh_gold.sba_loan_origination_by_region")

df_cross = (
    df_hhi
    .join(df_sba, on=["state_fips", "year"], how="inner")
    .select(
        "state_fips", "year", "market_name",
        "hhi", "concentration_level",
        "total_loans", "total_loan_amount", "avg_loan_size"
    )
)

# Correlation analysis
from pyspark.ml.stat import Correlation
from pyspark.ml.feature import VectorAssembler

assembler = VectorAssembler(
    inputCols=["hhi", "total_loans", "total_loan_amount"],
    outputCol="features"
)
df_vector = assembler.transform(df_cross)
correlation_matrix = Correlation.corr(df_vector, "features").head()[0]
print("HHI vs Loan Count correlation:", correlation_matrix.toArray()[0][1])

Antitrust × EPA Compliance

Hypothesis: Industries with monopolistic market structures (HHI > 2,500) show different environmental compliance patterns than competitive markets.

# COMMAND ----------

df_hhi = spark.read.table("lh_gold.antitrust_hhi_concentration")
df_epa = spark.read.table("lh_gold.epa_enforcement_by_naics")

df_cross_epa = (
    df_hhi
    .join(df_epa, on=["naics_code", "year"], how="inner")
    .groupBy("concentration_level")
    .agg(
        F.avg("violation_count").alias("avg_violations"),
        F.avg("penalty_amount").alias("avg_penalty"),
        F.avg("compliance_rate").alias("avg_compliance_rate")
    )
)

df_cross_epa.show()

Published References

Primary Regulatory Sources

  • 2023 Merger Guidelines — U.S. Department of Justice & Federal Trade Commission (December 2023) https://www.justice.gov/atr/2023-merger-guidelines

  • DOJ Antitrust Case Filings Index https://www.justice.gov/atr/antitrust-case-filings

  • FTC Competition Enforcement Cases https://www.ftc.gov/legal-library/browse/cases-proceedings

  • HSR Filing Statistics — Hart-Scott-Rodino Annual Reports https://www.ftc.gov/policy/reports/policy-reports/annual-competition-reports

Data Sources

  • DOJ Open Data Portal https://www.justice.gov/open/open-data

  • FBI Crime Data Explorer (CDE) https://cde.ucr.cjis.gov

  • U.S. Sentencing Commission Research https://www.ussc.gov/research

  • Bureau of Justice Statistics (BJS) https://bjs.ojp.gov

  • Data.gov Federal Datasets https://catalog.data.gov

  • USAspending.gov https://www.usaspending.gov

Academic & Research

  • ICPSR National Archive of Criminal Justice Data (NACJD) https://www.icpsr.umich.edu/web/pages/NACJD/index.html

  • Vera Institute of Justice — Incarceration Trends https://github.com/vera-institute/incarceration-trends

  • DEA Data and Statistics https://www.dea.gov/data-and-statistics

  • Bureau of Prisons (BOP) Statistics https://www.bop.gov/about/statistics/


Microsoft Published Resources

The following Microsoft-published white papers and reference architectures provide foundational guidance for building antitrust analytics on Azure and Microsoft Fabric:

  • Microsoft Fabric Security White Paper — End-to-end security architecture for protecting sensitive enforcement data https://learn.microsoft.com/en-us/fabric/security/white-paper-landing-page

  • Medallion Lakehouse Architecture in Fabric — Official Bronze/Silver/Gold pattern guidance used in this POC https://learn.microsoft.com/en-us/fabric/onelake/onelake-medallion-lakehouse-architecture

  • Cloud-Scale Analytics (Cloud Adoption Framework) — Enterprise data platform patterns including data mesh and governance https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/cloud-scale-analytics/

  • Real-Time Fraud Detection with Azure Stream Analytics — Pattern applicable to cartel bid-rigging detection https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-real-time-fraud-detection

  • Advanced Analytics with Power BI — Predictive analytics and DAX patterns for enforcement trend analysis https://info.microsoft.com/advanced-analytics-with-power-bi.html?Is=Website

  • Power BI White Papers — Complete collection including security, dataflows, and deployment guidance https://learn.microsoft.com/en-us/power-bi/guidance/whitepapers



Last Updated: 2026-04-23