Tutorial 47: Financial Services - Real-Time Fraud Detection & AML¶

Overview¶

Build an end-to-end financial services analytics platform in Microsoft Fabric that handles real-time fraud detection, anti-money laundering (AML) monitoring, and regulatory reporting for a mid-tier commercial bank.

What You'll Build¶

graph LR
    A[Generate Data] --> B[Bronze Ingest]
    B --> C[Silver Enrich]
    C --> D[Gold Score]
    D --> E[Fraud Dashboard]
    D --> F[AML Alerts]
    D --> G[Risk Reports]

Learning Objectives¶

Generate PCI-compliant synthetic financial transactions
Build a medallion pipeline for fraud detection
Compute velocity features and geo-anomaly detection
Implement BSA/AML monitoring (CTR and structuring detection)
Create fraud operations dashboards with Direct Lake

Prerequisites¶

Requirement	Details
Microsoft Fabric	F64 capacity or trial
Workspace	`ws-financial-services`
Lakehouses	`lh_bronze`, `lh_silver`, `lh_gold`
Python	3.10+ (for data generation)
Environment variable	`FABRIC_POC_HASH_SALT` set for PII hashing

Step 1: Generate Synthetic Transaction Data¶

1.1 Install Dependencies¶

pip install pandas numpy faker tqdm pyarrow

1.2 Generate Transactions¶

from data_generation.generators.financial.transaction_generator import TransactionGenerator

# Create generator with reproducible seed
gen = TransactionGenerator(
    seed=42,
    num_customers=10000,
    fraud_rate=0.0015,  # 0.15% - industry average
)

# Generate 500K transactions
df = gen.generate(num_records=500000)
print(f"Total transactions: {len(df):,}")
print(f"Fraud rate: {df['is_fraud'].mean():.4%}")
print(f"Fraud patterns:\n{df[df['is_fraud']]['fraud_pattern'].value_counts()}")

# Save to landing zone
gen.to_parquet(df, "landing/financial/transactions.parquet")

# Also generate account profiles
accounts = gen.generate_accounts()
gen.to_parquet(accounts, "landing/financial/accounts.parquet")
print(f"Accounts generated: {len(accounts):,}")

1.3 Verify PCI Compliance¶

Confirm no raw PAN exists in the output:

# card_hash should be a 64-char hex string, NOT a card number
assert all(len(h) == 64 for h in df['card_hash']), "PCI violation: raw PAN detected!"
assert not any(df['card_hash'].str.match(r'^\d{13,19}$')), "PCI violation: numeric PAN!"
print("PCI DSS check passed: No raw PAN in output")

PCI DSS v4.0 Req 3.4: Render PAN unreadable anywhere it is stored. The generator uses SHA-256 hashing so only card_hash is persisted.

Step 2: Upload to Fabric¶

2.1 Upload Landing Files¶

Open your lh_bronze Lakehouse in Fabric
Navigate to Files > create folder landing/financial/
Upload transactions.parquet and accounts.parquet

2.2 Verify Upload¶

# In a Fabric notebook
files = mssparkutils.fs.ls("Files/landing/financial/")
for f in files:
    print(f"{f.name} - {f.size / 1024 / 1024:.1f} MB")

Step 3: Bronze Layer - Raw Ingestion¶

3.1 Import Notebook¶

Import notebooks/bronze/51_financial_transactions.py into your Fabric workspace.

3.2 Run the Notebook¶

The Bronze notebook: - Reads Parquet from Files/landing/financial/ - Enforces schema with explicit StructType - Adds metadata: _ingested_at, _input_file, _source_system - Validates no raw PAN exists (PCI compliance gate) - Appends to bronze_financial_transactions Delta table

3.3 Verify Bronze Output¶

SELECT COUNT(*) AS total,
       COUNT(DISTINCT acct_id) AS accounts,
       MIN(txn_timestamp) AS earliest,
       MAX(txn_timestamp) AS latest
FROM lh_bronze.bronze_financial_transactions

Expected: ~500K records, 10K unique accounts.

Step 4: Silver Layer - Enrichment¶

4.1 Import Notebook¶

Import notebooks/silver/51_financial_enriched.py.

4.2 Run the Notebook¶

The Silver notebook computes:

Feature	Description	Purpose
`txn_count_1h`	Transactions in last 1 hour	Velocity detection
`txn_count_24h`	Transactions in last 24 hours	Velocity detection
`txn_count_7d`	Transactions in last 7 days	Behavioral baseline
`avg_amount_30d`	30-day rolling average amount	Amount spike detection
`amount_zscore`	Z-score of amount vs. 30-day avg	Anomaly quantification
`geo_distance_km`	Haversine distance from last txn	Impossible travel detection
`time_since_last_txn_sec`	Seconds since previous txn	Rapid-fire detection
`merchant_risk`	MCC-based risk classification	Risk weighting

4.3 Verify Silver Output¶

SELECT ROUND(AVG(txn_count_1h), 2) AS avg_velocity_1h,
       ROUND(AVG(geo_distance_km), 1) AS avg_geo_km,
       ROUND(AVG(amount_zscore), 2) AS avg_zscore,
       SUM(CASE WHEN merchant_risk = 'high' THEN 1 ELSE 0 END) AS high_risk_count
FROM lh_silver.silver_financial_enriched

Step 5: Gold Layer - Fraud Scoring & AML¶

5.1 Import Notebook¶

Import notebooks/gold/51_financial_fraud_scoring.py.

5.2 Run the Notebook¶

The Gold notebook produces three output tables:

gold_fraud_scores¶

Each transaction gets a composite fraud score (0-1):

Score Component	Trigger	Weight
Velocity	>5 txns/1h	+0.25
Geo-anomaly	>500 km in <1h	+0.30
Amount spike	Z-score >3	+0.20
High-risk MCC	Gambling, crypto, etc.	+0.15
Card-not-present	CNP channel	+0.10

Decision logic: - Score >= 0.85 -> DECLINE - Score 0.50-0.84 -> STEP-UP AUTH (OTP, biometric) - Score < 0.50 -> APPROVE

gold_aml_alerts¶

Alert Type	Trigger	Regulation
CTR	Amount >= $10,000	31 CFR 1010.311
STRUCTURING	2+ txns $8K-$9.9K in 48h	31 CFR 1010.314

gold_daily_fraud_summary¶

Daily KPIs: fraud rate, total losses, declined count, step-up count, total volume.

5.3 Verify Gold Output¶

-- Fraud score distribution
SELECT fraud_decision, COUNT(*) AS cnt,
       ROUND(AVG(fraud_score), 4) AS avg_score
FROM lh_gold.gold_fraud_scores
GROUP BY fraud_decision

-- AML alerts
SELECT alert_type, alert_priority, COUNT(*) AS alerts
FROM lh_gold.gold_aml_alerts
GROUP BY alert_type, alert_priority

Step 6: Power BI Dashboard¶

6.1 Create Semantic Model¶

In lh_gold, select New semantic model
Include: gold_fraud_scores, gold_aml_alerts, gold_daily_fraud_summary
Connection mode: Direct Lake

6.2 Build Fraud Operations Dashboard¶

Create the following visuals:

Visual	Type	Data
Fraud Rate Trend	Line chart	`gold_daily_fraud_summary.fraud_rate` by date
Fraud by Channel	Stacked bar	`gold_fraud_scores` grouped by channel
Decision Distribution	Donut	APPROVE / STEP_UP / DECLINE counts
AML Alert Queue	Table	`gold_aml_alerts` sorted by priority
KPI Cards	Cards	Total losses, fraud rate %, alert count
Geo Heat Map	Map	Fraud locations from `gold_fraud_scores`

6.3 Configure Row-Level Security¶

// BSA Officers see all alerts
[UserRole] = "BSA_Officer"

// Analysts see only aggregated data
[UserRole] = "Analyst" && [alert_type] = BLANK()

Step 7: Validate & Test¶

7.1 Run Unit Tests¶

pytest validation/unit_tests/financial/test_transaction_generator.py -v

Expected: 6 tests passing.

7.2 Data Quality Checks¶

Check	Expected
No raw PAN in any table	0 violations
All amounts > 0	100% pass
Fraud rate 0.05%-0.50%	Within range
CTR alerts match txns > $10K	Counts match
No orphan accounts	All acct_ids link

PCI-DSS v4.0 Compliance Checklist¶

Use this checklist to validate your deployment meets PCI requirements:

Req 3.4 - PAN rendered unreadable (card_hash only, no raw PAN)
Req 3.5 - Cryptographic keys in Azure Key Vault
Req 7.1 - Access restricted via Workspace RBAC
Req 8.3 - MFA enforced via Entra ID Conditional Access
Req 10.1 - Audit logging enabled (Fabric workspace monitoring)
Req 10.4 - Time synchronization (Azure platform NTP)
Req 11.3 - Vulnerability scanning (Defender for Cloud)
Req 12.8 - Azure compliance certifications documented

Troubleshooting¶

Issue	Cause	Solution
`PCI VIOLATION` assertion	Raw PAN in data	Re-run generator; verify card_hash is 64-char hex
Empty AML alerts	No txns above $10K	Increase wire transfer volume or lower threshold for testing
Slow velocity features	Large dataset window joins	Increase Spark executor memory; partition by date
Missing `mssparkutils`	Running outside Fabric	Use Fabric notebook environment
`FABRIC_POC_HASH_SALT` error	Env var not set	Set via Fabric environment or notebook config

Architecture Summary¶

graph TB
    subgraph "Data Generation"
        G[TransactionGenerator] -->|Parquet| L[Landing Zone]
    end

    subgraph "Bronze Layer"
        L --> B[bronze_financial_transactions]
        B -->|Append-only, SOX audit trail| B
    end

    subgraph "Silver Layer"
        B --> S[silver_financial_enriched]
        S -->|Velocity, Geo, Z-score| S
    end

    subgraph "Gold Layer"
        S --> F[gold_fraud_scores]
        S --> A[gold_aml_alerts]
        S --> D[gold_daily_fraud_summary]
    end

    subgraph "Consumption"
        F --> PBI[Power BI Direct Lake]
        A --> PBI
        D --> PBI
    end

Next Steps¶

Connect real Event Hub for streaming card authorizations
Deploy ML model (LightGBM) to Fabric ML endpoint
Set up Reflex alerts for high-priority fraud decisions
Configure Data Activator for AML alert escalation
Build CCAR/DFAST stress testing pipeline

References¶

< Tutorial 46 | Tutorial 48 >

Last Updated: 2026-04-27 | Phase 14 - Wave 6 (Commercial Verticals)

← Previous46 — Commercial HealthcareRead more →Next →48 — Insurance ClaimsRead more →