Source:
examples/financial-fraud-detection/README.md— this page is rendered live from that file.
Financial Fraud Detection — Real-Time Transaction Scoring¶
Examples > Financial Fraud Detection
[!TIP] TL;DR — Real-time transaction fraud detection pipeline built on Azure Cloud Scale Analytics. Ingests payment transactions through a medallion architecture, computes velocity and anomaly features, scores each transaction with an ML model, and produces BSA/AML-compliant suspicious activity reports and daily fraud dashboards.
Table of Contents¶
- Overview
- Key Features
- Data Sources
- Architecture Overview
- Business Drivers
- Data Mesh Integration
- Prerequisites
- Quick Start
- Data Pipeline
- Bronze Layer
- Silver Layer
- Gold Layer
- Sample Analytics Scenarios
- Data Products
- Regulatory Compliance
- Data Contract
- Related Resources
- Contributing
- License
Overview¶
This example provides a production-ready financial fraud detection platform on Azure CSA. It demonstrates how financial institutions and federal payment processors can ingest transaction streams, compute real-time behavioral features, score transactions against ML models, and generate regulatory reports that satisfy BSA/AML, PCI-DSS, and FinCEN requirements.
All data in this example is synthetic and does not represent real financial transactions.
Key Features¶
- Real-Time Transaction Scoring: Sub-second fraud probability scoring on every transaction using velocity, amount, and behavioral features
- Velocity Feature Engineering: Rolling window counts and amounts per account, merchant, and channel
- Amount Anomaly Detection: Z-score computation against account-level spending baselines
- Merchant Risk Profiling: Risk categorization by MCC code, chargeback history, and geographic patterns
- SAR Generation: Automated Suspicious Activity Report aggregation for FinCEN filing
- Daily Fraud Dashboards: Operational metrics for fraud operations teams
Data Sources¶
| Source | Description | Ingestion |
|---|---|---|
| Payment Transactions | Card-present and card-not-present transactions | Event Hubs streaming |
| Account Master | Customer account profiles and demographics | Batch daily from core banking |
| Merchant Registry | Merchant category codes and risk profiles | Batch weekly |
| Fraud Labels | Confirmed fraud cases from investigations | Batch daily |
Architecture Overview¶
flowchart TB
subgraph Sources["Transaction Sources"]
S1[POS Terminals]
S2[Online Payments]
S3[ACH / Wire]
S4[Mobile Wallet]
end
subgraph Ingest["Streaming Ingestion"]
EH[Azure Event Hubs]
ASA[Azure Stream Analytics]
end
subgraph Bronze["Bronze Layer — Raw Staging"]
B1[stg_transactions]
B2[stg_accounts]
end
subgraph Silver["Silver Layer — Enriched & Features"]
SV1[fct_transactions]
SV2[dim_merchants]
SV3[dim_accounts]
end
subgraph Gold["Gold Layer — Fraud Intelligence"]
G1[rpt_fraud_scores]
G2[rpt_suspicious_activity]
G3[rpt_daily_fraud_summary]
end
subgraph Consumers["Consumers"]
C1[Fraud Ops Dashboard]
C2[Case Management]
C3[FinCEN e-Filing]
C4[Risk Committee Reports]
end
S1 --> EH
S2 --> EH
S3 --> EH
S4 --> EH
EH --> ASA
ASA --> B1
ASA --> B2
B1 --> SV1
B2 --> SV3
SV1 --> SV2
SV1 --> G1
SV2 --> G1
SV3 --> G1
SV1 --> G2
SV1 --> G3
G1 --> C1
G1 --> C2
G2 --> C3
G3 --> C4 Business Drivers¶
Regulatory Compliance¶
| Requirement | Description | How This Example Addresses It |
|---|---|---|
| BSA/AML | Bank Secrecy Act / Anti-Money Laundering | Automated SAR aggregation with FinCEN-aligned fields |
| PCI-DSS v4 | Payment Card Industry Data Security Standard | Tokenized card data, audit trails, access controls |
| FinCEN CTR | Currency Transaction Reports (>$10k) | Threshold-based detection and aggregation |
| FFIEC BSA Exam | Federal examination procedures | Documented model governance and validation |
| Reg E | Electronic Fund Transfer Act | Dispute tracking and provisional credit monitoring |
Operational Value¶
- False Positive Reduction: ML scoring reduces manual review queues by 40-60%
- Detection Latency: Transaction scoring within seconds of authorization
- SAR Filing Efficiency: Automated pre-population reduces analyst time per SAR from hours to minutes
- Model Transparency: Feature-level explanations for every score support examiner inquiries
Data Mesh Integration¶
Fraud Domain Ownership¶
The fraud detection vertical operates as a Fraud Analytics Domain within the CSA Data Mesh:
| Aspect | Details |
|---|---|
| Domain Owner | Chief Risk Officer (CRO) |
| Domain Team | Fraud Analysts, Data Scientists, BSA Officers |
| Self-Service Platform | Databricks workspace + Event Hubs |
| Governance | BSA/AML, PCI-DSS, model risk management |
Prerequisites¶
Azure Resources¶
- Azure Subscription with Event Hubs namespace
- Azure Data Lake Storage Gen2 (for medallion layers)
- Azure Databricks Workspace
- Azure Stream Analytics (for real-time ingestion)
Tools Required¶
- Azure CLI >= 2.50
- dbt-core >= 1.7 with dbt-databricks adapter
- Python >= 3.10
Permissions¶
Storage Blob Data Contributoron ADLS Gen2Event Hubs Data Receiveron the namespaceContributoron the Databricks workspace
Quick Start¶
1. Clone and Navigate¶
git clone https://github.com/your-org/csa-inabox.git
cd csa-inabox/examples/financial-fraud-detection
2. Set Up Environment¶
3. Load Synthetic Data¶
# Upload sample transactions to ADLS Bronze container
az storage blob upload-batch \
--destination bronze/transactions \
--source data/ \
--account-name csadatalakedev
4. Run dbt Models¶
5. Verify Output¶
-- Check fraud scores
SELECT * FROM gold.rpt_fraud_scores
WHERE fraud_probability > 0.7
ORDER BY fraud_probability DESC
LIMIT 20;
Data Pipeline¶
Bronze Layer — Raw Ingestion¶
Raw transaction and account data lands in ADLS Gen2 with minimal transformation. Each record preserves the source schema.
| Table | Source | Format | Refresh |
|---|---|---|---|
stg_transactions | Event Hubs via Stream Analytics | JSON/CSV | Near real-time |
stg_accounts | Core banking batch extract | CSV | Daily |
Silver Layer — Enriched & Features¶
Cleansed, deduplicated transactions enriched with velocity features, amount anomaly scores, and merchant risk metadata.
| Table | Description | Key Joins |
|---|---|---|
fct_transactions | Enriched transactions with velocity counts, amount z-scores, and channel risk flags | dim_merchants, dim_accounts |
dim_merchants | Merchant dimension with MCC-based risk categories | — |
dim_accounts | Account dimension with tenure, type, and average spend | — |
Gold Layer — Fraud Intelligence¶
Aggregated, business-ready views for fraud operations and regulatory reporting.
| Table | Description | Refresh |
|---|---|---|
rpt_fraud_scores | Per-transaction fraud probability with ML feature columns | Near real-time |
rpt_suspicious_activity | SAR-ready aggregation by account over 30-day windows | Daily |
rpt_daily_fraud_summary | Daily fraud metrics: volume, rates, amounts, channel breakdown | Daily |
Sample Analytics Scenarios¶
1. High-Risk Transactions in the Last 24 Hours¶
Identify transactions with elevated fraud scores for immediate review.
SELECT
transaction_id,
account_id,
amount,
merchant_name,
channel,
fraud_probability,
velocity_1h,
amount_zscore
FROM gold.rpt_fraud_scores
WHERE fraud_probability > 0.75
AND transaction_ts >= DATEADD(HOUR, -24, CURRENT_TIMESTAMP())
ORDER BY fraud_probability DESC;
2. Structuring Detection — Multiple Sub-$10k Transactions¶
Flag accounts that may be structuring deposits to avoid Currency Transaction Report thresholds.
SELECT
account_id,
COUNT(*) AS txn_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount
FROM silver.fct_transactions
WHERE amount BETWEEN 8000 AND 9999
AND transaction_ts >= DATEADD(DAY, -7, CURRENT_TIMESTAMP())
GROUP BY account_id
HAVING COUNT(*) >= 3 AND SUM(amount) > 25000
ORDER BY total_amount DESC;
3. Velocity Spike Detection¶
Find accounts with unusually high transaction counts in a short window.
SELECT
account_id,
MAX(velocity_1h) AS max_hourly_velocity,
MAX(velocity_24h) AS max_daily_velocity,
COUNT(*) AS flagged_txn_count
FROM silver.fct_transactions
WHERE velocity_1h > 10
AND transaction_ts >= DATEADD(DAY, -1, CURRENT_TIMESTAMP())
GROUP BY account_id
ORDER BY max_hourly_velocity DESC;
4. Cross-Channel Anomaly¶
Detect accounts transacting across multiple channels in a short window, a common fraud indicator.
SELECT
account_id,
COUNT(DISTINCT channel) AS channel_count,
COUNT(*) AS txn_count,
SUM(amount) AS total_amount,
MIN(transaction_ts) AS first_txn,
MAX(transaction_ts) AS last_txn
FROM silver.fct_transactions
WHERE transaction_ts >= DATEADD(HOUR, -4, CURRENT_TIMESTAMP())
GROUP BY account_id
HAVING COUNT(DISTINCT channel) >= 3
ORDER BY channel_count DESC;
Data Products¶
| Data Product | Description | Consumers |
|---|---|---|
| Transaction Fraud Scores | Per-transaction ML probability with feature explanations | Fraud Ops, Case Management |
| Suspicious Activity Reports | FinCEN-aligned SAR aggregation with account narratives | BSA Officer, Compliance, Regulators |
| Daily Fraud Summary | Operational metrics: approval rates, fraud rates, loss amounts | CRO, Risk Committee, Audit |
| Merchant Risk Profiles | MCC-based risk scoring with chargeback and velocity history | Merchant Acquiring, Underwriting |
Regulatory Compliance¶
BSA/AML¶
The gold-layer rpt_suspicious_activity model produces SAR-ready output aligned with FinCEN e-filing requirements. Fields include subject identification, suspicious activity characterization, and cumulative transaction amounts over the reporting window.
PCI-DSS v4¶
This example does not store raw card numbers. All synthetic data uses tokenized account identifiers. In production, card-present data should flow through a PCI-compliant tokenization service before reaching the Bronze layer.
FinCEN Currency Transaction Reports¶
The fct_transactions model flags individual transactions exceeding $10,000 and aggregates same-day transactions per account to detect structuring patterns below the CTR threshold.
See also: Compliance — PCI-DSS v4
Data Contract¶
The contracts/transactions.yml data contract defines the schema, quality thresholds, and source configuration for the transaction data product.
Key guarantees: - Freshness: Transactions available in Silver within 5 minutes of authorization - Completeness: >= 99.9% of transactions ingested without data loss - Accuracy: Fraud scores validated against labeled holdout set (AUC > 0.92) - Availability: 99.9% uptime for Gold layer query access
Related Resources¶
- Industry — Financial Services — CSA patterns for financial institutions
- Tutorial 05 — Streaming Lambda — Streaming architecture patterns
- Example — ML Lifecycle — End-to-end MLflow model management
- Compliance — PCI-DSS v4 — Payment card security standard
- FinCEN BSA/AML Resources — Federal BSA/AML regulations
- FFIEC BSA/AML Examination Manual — Examination procedures
Contributing¶
- Fork the repository
- Create a feature branch:
git checkout -b feature/fraud-detection-enhancement - Follow existing coding conventions and data contract patterns
- Submit a pull request with test evidence
License¶
This project is part of CSA-in-a-Box and follows the repository-level license.
Directory Structure¶
financial-fraud-detection/
├── contracts/ # Data product contracts (schemas, SLOs, owners)
│ └── transactions.yml
├── data/ # Synthetic sample data
│ └── sample_transactions.csv
├── domains/ # dbt models (bronze / silver / gold)
│ ├── bronze/
│ │ ├── stg_transactions.sql
│ │ └── stg_accounts.sql
│ ├── silver/
│ │ ├── fct_transactions.sql
│ │ ├── dim_merchants.sql
│ │ └── dim_accounts.sql
│ └── gold/
│ ├── rpt_fraud_scores.sql
│ ├── rpt_suspicious_activity.sql
│ └── rpt_daily_fraud_summary.sql
└── README.md # This file