Industry — Financial Services¶
Scope: Banking, capital markets, insurance, wealth management. Heavy regulator presence, high data volumes, low-latency requirements, fraud as a constant adversary.
Top scenarios¶
| Scenario | Pattern | Latency | Reference |
|---|---|---|---|
| Real-time fraud detection | Streaming + ML scoring + write-back | sub-100ms | Tutorial 05 — Streaming Lambda, Use Case — Anomaly Detection |
| AML transaction monitoring | Batch + graph + alert workflow | minutes-hours | Example — ML Lifecycle (loan default → adapt for AML) |
| Customer 360 | Medallion gold + reverse-ETL + Power BI | minutes | Reference Arch — Data Flow |
| Risk modeling (FRTB, IFRS 9) | Spark + Monte Carlo + result persistence | overnight | Tutorial 06 — AI Foundry |
| Regulatory reporting (BCBS 239, MiFID II) | dbt models + audit trail + signing | daily | Best Practices — Data Governance |
| Algorithmic trading research | Tick data + backtesting + ML | research / batch | Example — Streaming (adapt) |
| Insurance claims AI triage | RAG + agents + claims-system integration | seconds | Tutorial 08 — RAG, Tutorial 07 — Agents |
| Customer GenAI (chat, doc Q&A) | RAG + grounding + content safety | seconds | Example — AI Agents, Example — Fabric Data Agent |
Regulatory landscape¶
| Framework | Where in CSA-in-a-Box |
|---|---|
| SOC 2 Type II | Compliance — SOC 2 |
| PCI-DSS v4.0 (if handling card data) | Compliance — PCI-DSS |
| GDPR / CCPA | Compliance — GDPR |
| SOX (public companies) | Same controls as SOC 2 + financial-reporting evidence |
| GLBA (US banks) | Compliance — NIST 800-53 covers most |
| Basel III / FRTB (capital adequacy) | Out of scope for platform; the risk model results must be reproducible (use dbt + git) |
| MiFID II (EU markets) | Transaction reporting + best-execution evidence — capture in bronze, report from gold |
| DORA (EU operational resilience, 2025) | Heavy overlap with DR.md + Runbooks; also third-party risk |
Reference architecture variations¶
- Tier-1 isolation: separate DLZ subscription per LOB (retail / commercial / investment); shared DMLZ for governance
- Sub-100ms inference: Azure ML real-time endpoint behind a Premium APIM; deploy ONNX models to a dedicated GPU SKU
- Tick data: Event Hubs → Azure Data Explorer (Eventhouse in Fabric) for sub-second queries on TB+/day
- Lineage for regulators: Purview + dbt docs is the source of record for "where did this number come from?"
Why the standard CSA-in-a-Box pattern works for FSI¶
- Medallion + dbt = reproducible regulatory reports
- Bronze immutability = audit trail
- Federated identity + PIM = separation of duties (CC6.x in SOC 2; SOX-relevant)
- Defender for Cloud + Sentinel = continuous monitoring (DORA, NYDFS Part 500)
- AOAI + content filters = safe customer GenAI
What's specific to FSI¶
- Latency: real-time fraud scoring needs sub-100ms; standard batch dbt won't work. Use ML real-time endpoint + Cosmos for state.
- Right of explanation (EU AI Act, FCRA in US): every adverse decision must be explainable. Use SHAP / LIME in your training pipeline; log feature contributions per inference.
- Model risk management (SR 11-7, OCC 2011-12): formal model lifecycle — registration, validation, monitoring, retirement. Wrap MLflow + Azure ML Model Registry in a governance workflow.
- Tick / market data is the most expensive data category in any FSI platform. Azure Data Explorer / Fabric Eventhouse is purpose-built; don't try to use Synapse SQL for sub-second queries on TB-scale tick data.
Getting started¶
- Read Reference Architecture — Hub-Spoke and Identity & Secrets Flow
- Pick a starting scenario from the table above
- Walk the most-relevant tutorial end-to-end in dev
- Adapt the closest example — usually
ml-lifecycleorcybersecurityis the closest fit for FSI patterns - Review Compliance — SOC 2 and your specific regulator's framework
- Engage your model risk management team before deploying any ML model that drives a customer decision
Transaction data-flow reference architecture¶
The following diagram shows how transaction data flows through the CSA-in-a-Box landing zone into a medallion lakehouse, through ML scoring, and out to regulatory reporting. Regulatory boundaries are marked explicitly so your compliance team can scope their review.
flowchart TB
subgraph Sources[Source Systems]
Core[Core Banking<br/>accounts, balances]
Cards[Card Processor<br/>auth + settlement]
Wire[Wire / ACH<br/>payment rails]
Market[Market Data<br/>tick feeds]
end
subgraph DLZ[Data Landing Zone]
EH[Event Hubs<br/>streaming txns]
ADF[ADF / Fabric Pipeline<br/>batch extracts]
end
subgraph Medallion[Medallion Lakehouse]
Bronze[(Bronze<br/>raw immutable)]
Silver[(Silver<br/>conformed + PII-tagged)]
Gold[(Gold<br/>analytics-ready)]
end
subgraph ML[ML Scoring]
RT[Real-Time Endpoint<br/>fraud scoring<br/>sub-100ms]
Batch[Batch Scoring<br/>AML + risk models<br/>overnight]
end
subgraph Reporting[Reporting & Action]
PBI[Power BI<br/>risk dashboards]
RegRpt[Regulatory Reports<br/>BCBS 239, MiFID II]
Alert[Alert Engine<br/>SAR + fraud cases]
end
Core --> ADF
Cards --> EH
Wire --> EH
Market --> EH
ADF --> Bronze
EH --> Bronze
Bronze --> Silver
Silver --> Gold
EH --> RT
Gold --> Batch
RT --> Alert
Batch --> Alert
Gold --> PBI
Gold --> RegRpt
style Sources fill:#f5f5f5
style DLZ fill:#fff4cc
style Medallion fill:#cce4ff
style ML fill:#e4ccff
style Reporting fill:#ccffe4 Note
The real-time fraud-scoring path bypasses the medallion pipeline intentionally. Transactions hit the ML endpoint directly from Event Hubs for sub-100ms latency, then land in bronze asynchronously for audit. See Patterns -- Streaming & CDC for the dual-write pattern.
Regulatory deep-dive¶
Each framework listed in the regulatory landscape table above has specific implications for how you configure the platform. This section expands on Azure implementation considerations for each.
SOC 2 Type II¶
SOC 2 maps directly to the platform's security controls: CC6 (logical access) maps to Entra ID + PIM + Conditional Access, CC7 (system operations) maps to Defender for Cloud + Sentinel, and CC8 (change management) maps to IaC + GitHub PR approval workflows. The audit period is typically 6-12 months; start collecting evidence from day 1. See Compliance -- SOC 2 for the full control mapping.
PCI-DSS v4.0¶
The single most important design decision is scope minimization. Tokenize card data at the payment gateway or POS terminal so that raw PANs never reach the analytics platform. If you must process PANs, isolate them in a dedicated subscription with network segmentation and encrypt at rest with customer-managed keys in Azure Key Vault. SAQ-D self-assessment questionnaire applies when storing card data; SAQ-A applies when fully tokenized. See Compliance -- PCI-DSS.
GDPR¶
For EU banking customers, the right to erasure (Art. 17) conflicts with AML record-retention requirements (5+ years). Document this legal-basis conflict formally. Implement pseudonymization at the silver layer using deterministic encryption so that erasure requests can be satisfied by destroying the encryption key rather than scanning every table. Data Protection Impact Assessments (DPIAs) are required for automated decisioning that affects customers (Art. 35).
SOX (Sarbanes-Oxley)¶
SOX Section 404 requires internal controls over financial reporting. In the data platform context this means: immutable bronze for audit trail, dbt version-controlled transformations for reproducibility, PIM-enforced separation of duties between data engineers and report publishers, and Power BI row-level security for financial data access. The external auditor needs evidence of these controls over a 12-month period.
GLBA (Gramm-Leach-Bliley Act)¶
GLBA's Safeguards Rule requires a written information security plan. Most of the technical controls overlap with NIST 800-53 (see Compliance -- NIST 800-53). The key FSI-specific requirement is the annual privacy notice to customers explaining data sharing practices, which affects what data you can use for analytics and ML model training.
Basel III / FRTB¶
The Fundamental Review of the Trading Book (FRTB) requires banks to compute risk measures (Expected Shortfall replacing VaR) using either the standardized approach or internal models approach (IMA). The platform's role is to provide reproducible computation pipelines: store market data in bronze, compute sensitivities in silver using Spark, aggregate risk measures in gold using dbt, and version everything with git tags. BCBS 239 (risk data aggregation) requires lineage from source to report, which Purview + dbt docs provide.
MiFID II¶
MiFID II's transaction reporting obligation (Art. 26) requires T+1 reporting to the national competent authority. Best-execution evidence (Art. 27) requires capturing order timestamps, venue prices, and execution quality metrics. Capture raw execution data in bronze, compute best-execution analytics in silver, and generate RTS 25/28 reports from gold. The clock-synchronization requirement (UTC, microsecond precision) affects how you ingest timestamps from trading systems.
DORA (Digital Operational Resilience Act)¶
DORA (effective January 2025 in the EU) requires ICT risk management, incident reporting, resilience testing, and third-party risk management. The CSA-in-a-Box DR plan and runbooks cover the operational resilience aspects. Third-party risk management requires documenting Azure as a critical ICT provider (Microsoft is classified as a Critical Third-Party Provider under DORA). Maintain a register of all ICT services and conduct annual scenario-based testing.
Fraud detection patterns¶
Real-time vs batch¶
| Dimension | Real-time scoring | Batch scoring |
|---|---|---|
| Latency | sub-100ms per transaction | minutes to hours |
| Use case | Card-present auth, wire screening | AML pattern detection, account review |
| Architecture | Event Hubs → ML real-time endpoint → Cosmos DB for state | Gold tables → Spark/dbt → ML batch endpoint |
| Model update frequency | Daily champion/challenger deploy | Weekly retrain, monthly validation |
| False-positive handling | Auto-block + manual review queue | Case management workflow |
Rule-based vs ML¶
Start with rules for compliance-mandated checks (OFAC screening, velocity limits, country blocks) and layer ML for pattern detection. A common production architecture uses a rule engine as the first gate (fast, explainable, auditable) with ML as a second pass for risk scoring. Never deploy ML-only without a rule backstop — regulators expect deterministic controls for known fraud typologies.
Ensemble approaches¶
Production fraud systems rarely rely on a single model. A typical ensemble combines:
- Gradient-boosted trees (XGBoost/LightGBM) for tabular transaction features — these remain the best performers for structured fraud data
- Graph neural networks for relationship-based fraud (money mule rings, bust-out fraud) — use the transaction graph from silver
- Isolation forests or autoencoders for anomaly detection on unseen fraud patterns
- Business rules as hard constraints that override model scores
Combine scores using a calibrated meta-learner or a simple weighted average with business-rule overrides.
Velocity features¶
Velocity features are the highest-signal inputs for real-time fraud models. Key examples:
- Transaction count in the last 1 / 5 / 15 / 60 minutes per card, per device, per IP
- Cumulative amount in the last 1 / 24 hours per account
- Distinct merchant count in the last hour per card
- Geographic velocity — distance between consecutive transactions divided by time elapsed (impossible-travel detection)
- Channel switching — number of channel changes (ATM → online → mobile) in a time window
Store velocity state in Cosmos DB (or Redis) for sub-millisecond lookups during real-time scoring. Backfill from silver for model training.
Tip
Feature stores (Azure ML managed feature store or Feast on AKS) unify real-time and batch feature computation. Define features once, compute them in both streaming and batch contexts, and avoid training/serving skew.
Risk analytics¶
VaR calculation pipeline¶
Value at Risk (VaR) — and its successor Expected Shortfall (ES) under FRTB — requires Monte Carlo simulation at portfolio scale. The pipeline:
- Market data — bronze captures daily closes, curves, and volatility surfaces
- Risk factor generation — silver models produce correlated scenario paths using Cholesky decomposition on the covariance matrix
- Pricing — Spark distributes valuation of instruments across scenarios (typically 10,000+ paths)
- Aggregation — dbt models in gold compute portfolio-level VaR/ES at multiple confidence levels (97.5%, 99%) and holding periods (1-day, 10-day)
- Backtesting — compare predicted VaR against actual P&L; Basel traffic-light test determines capital add-on
Use Spark for the Monte Carlo engine (embarrassingly parallel across scenarios). Store scenario-level results in silver for regulatory drill-down; store aggregate risk measures in gold for dashboards and FRTB reporting.
FRTB sensitivities¶
The FRTB standardized approach requires computing sensitivities (delta, vega, curvature) for every position across prescribed risk factors. Organize sensitivity computation as a dbt DAG: raw positions → instrument-level sensitivities → bucket-level aggregation → capital charge. Version the entire DAG with git tags so regulators can reproduce any historical calculation.
Stress testing with Spark¶
Stress testing (CCAR/DFAST in the US, EBA in the EU) requires evaluating portfolio losses under macroeconomic scenarios. Use Spark to:
- Apply scenario-specific shocks to risk factors (interest rates, credit spreads, FX, equity)
- Revalue the portfolio under each scenario
- Compute loss distributions and capital impact
- Generate the scenario narrative + quantitative results for regulator submission
Store stress-test results in gold with full lineage back to the scenario definition and portfolio snapshot. Typical run times are overnight for a large bank portfolio; Spark auto-scaling handles the burst compute.
Sample data model¶
The following entity-relationship diagram represents a core FSI schema covering accounts, transactions, fraud alerts, and investigation cases. This is a gold-layer logical model — adapt field names and types to your institution's data dictionary.
erDiagram
CUSTOMER {
string customer_id PK
string customer_type
date onboarding_date
string risk_rating
string kyc_status
date last_review_date
}
ACCOUNT {
string account_id PK
string customer_id FK
string account_type
string currency
decimal balance
string status
date opened_date
}
TRANSACTION {
string transaction_id PK
string account_id FK
string counterparty_account FK
datetime transaction_time
decimal amount
string currency
string channel
string merchant_category
string geo_location
decimal fraud_score
}
ALERT {
string alert_id PK
string transaction_id FK
string alert_type
decimal risk_score
string model_version
datetime created_at
string status
string disposition
}
CASE {
string case_id PK
string alert_id FK
string investigator_id
string case_status
string outcome
datetime opened_at
datetime closed_at
boolean sar_filed
}
CUSTOMER ||--o{ ACCOUNT : owns
ACCOUNT ||--o{ TRANSACTION : generates
TRANSACTION ||--o{ ALERT : triggers
ALERT ||--o| CASE : escalates_to Note
This schema intentionally separates alerts from cases. Not every alert warrants investigation — the alert-to-case ratio is a key operational metric (target < 10% for mature systems). Track this ratio in your gold-layer KPI tables.
AML and KYC analytics¶
Anti-Money Laundering (AML) and Know Your Customer (KYC) are distinct from fraud detection — they are compliance obligations with different analytical patterns.
Transaction monitoring¶
AML transaction monitoring looks for patterns that indicate money laundering, terrorist financing, or sanctions evasion. Unlike fraud (which protects the bank), AML protects the financial system. Key pattern types:
- Structuring — multiple transactions just below reporting thresholds ($10K CTR, $3K record-keeping)
- Rapid movement — funds deposited and withdrawn quickly, often across accounts or entities
- Geographic risk — transactions involving OFAC-sanctioned countries or high-risk jurisdictions
- Shell company activity — complex layering through entities with no legitimate business activity
- Unusual behavior — activity inconsistent with the customer's profile, occupation, or stated purpose
Implement as a batch pipeline (daily or intra-day) in dbt. Each pattern becomes a dbt model that scores accounts and transactions. Alerts above threshold feed the case management workflow. The key metric is SAR (Suspicious Activity Report) conversion rate — too many false alerts exhaust investigators; too few mean missed obligations.
Graph analytics for AML¶
Money laundering is inherently a network problem — funds flow through chains of accounts and entities. Graph analytics reveals patterns invisible to transaction-level analysis:
- Community detection — identify clusters of accounts that transact primarily with each other (potential mule networks)
- Shortest-path analysis — trace the path from high-risk source to destination, identifying intermediaries
- Centrality scoring — find hub accounts that connect otherwise-unrelated transaction clusters
Use Azure Cosmos DB (Gremlin API) or a graph library in Spark (GraphX, GraphFrames) for graph analytics. Store graph-derived features in the gold layer alongside transaction-monitoring alerts.
Model risk management (SR 11-7)¶
Any ML model that drives a customer-facing decision or regulatory output must go through a formal model risk management process per Federal Reserve SR 11-7 and OCC 2011-12. The lifecycle:
| Phase | Activities | Platform component |
|---|---|---|
| Development | Feature engineering, model selection, training, initial validation | Azure ML + MLflow experiment tracking |
| Validation | Independent review of model assumptions, performance, and limitations | Separate validation environment; documented challenger testing |
| Approval | Model risk committee review and sign-off | Git PR approval workflow + documented approval |
| Deployment | Champion/challenger deployment, A/B testing | Azure ML managed endpoints + traffic splitting |
| Monitoring | Ongoing performance tracking, data drift, concept drift | Azure ML data drift monitors + custom dbt KPI models |
| Retirement | Formal decommission with documentation of replacement | MLflow model stage transition + archived artifacts |
Tip
For Fabric-native ML implementations, see Fabric Lakehouse patterns for integrating MLflow with the Fabric workspace.
Trade-offs¶
| Give | Get |
|---|---|
| Sub-100ms real-time scoring (dedicated GPU endpoints) | Higher infrastructure cost but fraud blocked at point of transaction |
| Separate subscription per LOB | Stronger isolation for regulators but more operational overhead |
| SHAP/LIME explainability on every inference | Regulatory compliance for adverse actions but 2-5x inference latency |
| Full graph analytics for AML | Better network-level detection but significant data engineering investment |
| Customer-managed keys (CMK) for encryption | Key control for compliance but operational complexity for key rotation |
FSI example¶
For a complete end-to-end walkthrough of fraud detection on this platform, see:
Example -- Financial Fraud Detection
Related¶
- Use Case — Real-Time Anomaly Detection
- Use Case — Casino & Gaming Analytics (fraud patterns transfer to FSI)
- Patterns — LLMOps & Evaluation
- Patterns — Streaming & CDC