Unified Analytics Platform on Microsoft Fabric¶
Fragmented analytics environments — separate systems for ingestion, warehousing, data science, and reporting — increase operational cost, slow time-to-insight, and create governance gaps. When data is copied across ADLS Gen2 accounts, Databricks workspaces, and Power BI import datasets, lineage breaks, security policies diverge, and storage costs compound.
Microsoft Fabric consolidates these workloads into a single SaaS platform built on OneLake. This guide explains how CSA-in-a-Box patterns (medallion architecture, dbt transformations, data contracts, Purview governance) map directly to Fabric, and walks through a concrete domain migration from Databricks to Fabric.
Why Fabric for Unified Analytics¶
Fabric introduces OneLake, a tenant-wide logical data lake that serves as the single storage layer for all Fabric workloads. Key architectural properties:
- Single copy of data: OneLake stores all data in Delta Parquet format by default. Spark notebooks, SQL analytics, KQL queries, and Power BI all read from the same physical files — no import copies, no extract pipelines between engines.
- Shortcuts and mirroring: OneLake shortcuts create logical pointers to data in external storage (ADLS Gen2, S3, GCS, Dataverse) without copying bytes. Mirroring replicates operational databases (Azure SQL, Cosmos DB, Snowflake) into OneLake as Delta Parquet in near real-time.
- Zero-ETL between engines: A table written by a Spark notebook is immediately queryable from the SQL analytics endpoint and from Power BI via Direct Lake — no staging, no materialization pipeline.
- Unified capacity model: A single Fabric capacity (F-SKU) powers Spark, SQL, Data Factory pipelines, KQL, and Power BI. Capacity scales vertically and pauses when idle.
This model eliminates the data duplication problem that drives most analytics cost overruns. Instead of N copies of the same dataset across N tools, OneLake maintains one authoritative copy with multiple compute engines reading it in place.
Architecture¶
The following diagram shows how CSA-in-a-Box source systems flow through Fabric into consumption layers, with Purview governance spanning the entire estate.
graph TB
subgraph Sources
S1[HSR Filings]
S2[Case Management]
S3[Economic Data]
S4[Document Productions]
S5[D365 / ERP Systems]
end
subgraph OneLake["OneLake (Tenant-Wide Data Lake)"]
SC[Shortcuts / Mirroring]
subgraph Lakehouse["Fabric Lakehouse"]
B[Bronze Layer<br/>Raw ingestion]
SL[Silver Layer<br/>Cleaned, conformed]
G[Gold Layer<br/>Business-ready aggregates]
end
end
subgraph Compute["Fabric Compute Engines"]
DW[Fabric Data Warehouse<br/>SQL Analytics]
SP[Fabric Spark<br/>Data Science / ML]
KQL[KQL Database<br/>Real-Time Analytics]
end
subgraph Consumption
PBI[Power BI<br/>Direct Lake Mode]
API[SQL Endpoint<br/>External Tools]
NB[Notebooks<br/>Ad-hoc Analysis]
end
subgraph Governance["Microsoft Purview"]
CAT[Data Catalog]
LIN[Lineage Tracking]
DLP[Sensitivity Labels & DLP]
AUD[Audit Logging]
end
S1 --> SC
S2 --> SC
S3 --> SC
S4 --> SC
S5 --> SC
SC --> B
B --> SL
SL --> G
G --> DW
G --> SP
G --> KQL
DW --> PBI
DW --> API
SP --> NB
KQL --> PBI
Governance -.->|spans entire estate| OneLake
Governance -.-> Compute
Governance -.-> Consumption Each domain (DOJ, finance, inventory, sales) gets its own Fabric Lakehouse within a workspace, maintaining the domain-oriented ownership model from CSA-in-a-Box while sharing a common OneLake storage layer.
Mapping CSA-in-a-Box Patterns to Fabric¶
The following table maps existing CSA-in-a-Box implementation patterns to their Fabric equivalents. The medallion architecture, dbt transformations, and governance patterns transfer with minimal rework.
| CSA-in-a-Box Pattern | Current Implementation | Fabric Equivalent |
|---|---|---|
| Medallion architecture | Databricks Delta Lake | Fabric Lakehouse (Delta Parquet on OneLake) |
| Data transformation | dbt on Databricks | dbt-fabric adapter on Fabric Data Warehouse, or Spark notebooks |
| Data contracts | YAML-based contract definitions | Purview data products + OneLake Catalog entries |
| Data governance | Manual Purview integration via APIs | Native Purview integration (automatic catalog, lineage, DLP) |
| Storage | ADLS Gen2 with hierarchical namespace | OneLake (built on ADLS Gen2, Delta Parquet default) |
| Compute | Databricks interactive + job clusters | Fabric capacity (Spark pools, SQL engine, KQL engine) |
| Orchestration | Azure Data Factory / Databricks Workflows | Fabric Data Factory (native pipelines) |
| Visualization | Power BI with DirectQuery to Databricks | Power BI with Direct Lake (no import, no DirectQuery) |
| Real-time ingestion | Event Hubs → Databricks Structured Streaming | Event Hubs → Fabric Eventstream → KQL Database |
| Security | Unity Catalog + custom RBAC | Fabric workspace roles + OneLake data access roles + RLS/CLS |
CSA-in-a-Box + Fabric
CSA-in-a-Box domains (DOJ, finance, inventory, sales) use dbt with Delta Lake on Databricks today. The medallion architecture, data contracts, and governance patterns transfer directly to Fabric — OneLake uses Delta Parquet natively, and the dbt-fabric adapter runs dbt models against Fabric Data Warehouse with no model SQL changes for standard transformations.
What changes¶
- Storage paths: ADLS Gen2
abfss://paths become OneLake paths (abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>.Lakehouse/...) - Compute configuration: Databricks cluster policies become Fabric capacity settings
- Authentication: Service principal auth remains, but endpoint URLs change to
*.datawarehouse.fabric.microsoft.com
What stays the same¶
- dbt models: SQL transformations in
models/directories work with thedbt-fabricadapter - Medallion layer structure: Bronze/Silver/Gold organization is identical
- Delta format: OneLake uses Delta Parquet natively — no format conversion needed
- Purview integration: Lineage and cataloging improve (native vs. manual)
Step-by-Step: Migrating a Domain to Fabric¶
This walkthrough migrates the DOJ Antitrust domain (domains/doj/) from Databricks to Fabric. The same steps apply to any CSA-in-a-Box domain.
Step 1 — Create a Fabric Workspace¶
A Fabric workspace is the security and billing boundary for a set of related items (Lakehouses, Warehouses, notebooks, reports).
Via Azure CLI and Fabric REST API:
# Ensure you have the Fabric capacity created (F64 or higher for production)
# Workspace creation uses the Fabric REST API
# Get an access token
TOKEN=$(az account get-access-token \
--resource "https://api.fabric.microsoft.com" \
--query accessToken -o tsv)
# Create the workspace
curl -X POST "https://api.fabric.microsoft.com/v1/workspaces" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"displayName": "CSA-DOJ-Antitrust",
"description": "DOJ Antitrust domain - CSA-in-a-Box",
"capacityId": "<your-fabric-capacity-id>"
}'
Via Fabric Portal:
- Navigate to app.fabric.microsoft.com
- Select Workspaces → New workspace
- Name:
CSA-DOJ-Antitrust - Under Advanced, assign the workspace to your Fabric capacity
Capacity Sizing
For development and testing, an F2 capacity is sufficient. Production workloads with concurrent Spark jobs and Power BI queries typically require F64 or higher. Fabric capacities can be paused when not in use to control cost.
Step 2 — Create a Lakehouse¶
The Lakehouse is the OneLake-backed storage container that holds Bronze, Silver, and Gold tables.
# Create a Lakehouse via Fabric REST API
curl -X POST "https://api.fabric.microsoft.com/v1/workspaces/<workspace-id>/items" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"type": "Lakehouse",
"displayName": "doj_antitrust"
}'
OneLake organizes Lakehouse content into two top-level folders:
doj_antitrust.Lakehouse/
├── Tables/ # Managed Delta tables (queryable via SQL endpoint)
│ ├── bronze_antitrust_cases/
│ ├── silver_antitrust_cases/
│ └── gold_case_summary/
└── Files/ # Unstructured files (PDFs, images, raw exports)
├── raw_filings/
└── document_productions/
Items in Tables/ are automatically registered as Delta tables and exposed through the Lakehouse SQL analytics endpoint.
Step 3 — Set Up OneLake Shortcuts¶
Shortcuts create zero-copy logical pointers to existing data in ADLS Gen2. This allows Fabric to read CSA-in-a-Box Bronze data without moving it.
# create_shortcuts.py
# Creates OneLake shortcuts from existing ADLS Gen2 Bronze layer to Fabric Lakehouse
import requests
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
token = credential.get_token("https://api.fabric.microsoft.com/.default").token
WORKSPACE_ID = "<workspace-id>"
LAKEHOUSE_ID = "<lakehouse-id>"
BASE_URL = f"https://api.fabric.microsoft.com/v1/workspaces/{WORKSPACE_ID}/items/{LAKEHOUSE_ID}/shortcuts"
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
# Define shortcuts for each Bronze table
shortcuts = [
{
"path": "Tables/bronze_antitrust_cases",
"name": "bronze_antitrust_cases",
"target": {
"adlsGen2": {
"location": "https://csadatalake.dfs.core.windows.net",
"subpath": "/bronze/doj/antitrust_cases",
"connectionId": "<adls-connection-id>"
}
}
},
{
"path": "Tables/bronze_hsr_filings",
"name": "bronze_hsr_filings",
"target": {
"adlsGen2": {
"location": "https://csadatalake.dfs.core.windows.net",
"subpath": "/bronze/doj/hsr_filings",
"connectionId": "<adls-connection-id>"
}
}
},
{
"path": "Tables/bronze_economic_data",
"name": "bronze_economic_data",
"target": {
"adlsGen2": {
"location": "https://csadatalake.dfs.core.windows.net",
"subpath": "/bronze/doj/economic_data",
"connectionId": "<adls-connection-id>"
}
}
}
]
for shortcut in shortcuts:
response = requests.post(BASE_URL, headers=headers, json=shortcut)
if response.status_code == 201:
print(f"Created shortcut: {shortcut['name']}")
else:
print(f"Failed: {shortcut['name']} — {response.status_code}: {response.text}")
Shortcut vs. Copy
Shortcuts are metadata pointers — no data is copied, no egress charges are incurred (within the same Azure region), and updates to the source are immediately visible through the shortcut. Use shortcuts for the Bronze layer to avoid duplicating raw data. Silver and Gold layers are typically materialized as managed Delta tables in the Lakehouse.
Step 4 — Run dbt on Fabric¶
The existing dbt project in domains/doj/dbt/ transforms Bronze → Silver → Gold. To run it against Fabric Data Warehouse, update the connection profile.
Install the dbt-fabric adapter:
Configure profiles.yml:
# ~/.dbt/profiles.yml (or CI/CD environment config)
csa_analytics:
target: fabric
outputs:
fabric:
type: fabric
driver: "ODBC Driver 18 for SQL Server"
server: "<workspace-guid>.datawarehouse.fabric.microsoft.com"
database: "doj_antitrust"
port: 1433
schema: "dbo"
authentication: serviceprincipal
tenant_id: "{{ env_var('AZURE_TENANT_ID') }}"
client_id: "{{ env_var('AZURE_CLIENT_ID') }}"
client_secret: "{{ env_var('AZURE_CLIENT_SECRET') }}"
threads: 4
retries: 2
Run the dbt pipeline:
cd domains/doj/dbt
# Test the connection
dbt debug --target fabric
# Run Bronze → Silver → Gold transformations
dbt run --target fabric --select tag:silver
dbt run --target fabric --select tag:gold
# Run data quality tests
dbt test --target fabric
dbt-fabric Adapter Differences
The dbt-fabric adapter supports most dbt Core features, but there are differences from dbt-databricks:
- **Materializations**: `table`, `view`, and `incremental` are supported. `ephemeral` models work as CTEs.
- **Incremental strategies**: `append` and `delete+insert` are supported. `merge` requires Fabric Data Warehouse (not Lakehouse SQL endpoint).
- **Python models**: Not supported in dbt-fabric. Use Fabric Spark notebooks for Python-based transformations.
Test your dbt project with `dbt run` in a dev workspace before migrating production pipelines.
Step 5 — Configure Purview Governance¶
Fabric integrates natively with Microsoft Purview. When a Fabric workspace is connected to Purview, catalog entries, lineage, and sensitivity labels propagate automatically.
Enable Purview integration:
- In the Fabric Admin Portal, navigate to Governance and insights → Purview Hub
- Connect your Purview account to the Fabric tenant
- Enable automatic scanning for the
CSA-DOJ-Antitrustworkspace
Apply sensitivity labels:
# Apply sensitivity labels to a Lakehouse item via Purview REST API
curl -X PATCH "https://api.purview.azure.com/catalog/api/atlas/v2/entity/guid/<entity-guid>" \
-H "Authorization: Bearer $PURVIEW_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"entity": {
"attributes": {
"sensitivityLabel": "Confidential - Attorney Work Product"
}
}
}'
What you get automatically with Purview + Fabric:
| Capability | Manual Setup Required? | Notes |
|---|---|---|
| Data catalog entries | No | All Lakehouse tables auto-registered |
| Column-level lineage | No | Tracked through Spark notebooks and SQL pipelines |
| Sensitivity labels | Yes | Define label policies in Purview, apply to items |
| DLP policies | Yes | Configure in Microsoft 365 Compliance Center |
| Access audit logging | No | All data access logged to unified audit log |
| Data quality rules | Yes | Define in Purview data quality (preview) |
Step 6 — OneLake Security (Row/Column-Level)¶
CSA-in-a-Box handles sensitive legal data — attorney work product, privileged communications, and PII. Fabric provides row-level security (RLS) and column-level security (CLS) at the SQL analytics endpoint.
Column-level security — restrict PII access:
-- Create roles for different access levels
CREATE ROLE [analyst_general];
CREATE ROLE [analyst_privileged];
CREATE ROLE [attorney_staff];
-- Grant base table access
GRANT SELECT ON doj_antitrust.dbo.slv_antitrust_cases TO [analyst_general];
GRANT SELECT ON doj_antitrust.dbo.slv_antitrust_cases TO [analyst_privileged];
GRANT SELECT ON doj_antitrust.dbo.slv_antitrust_cases TO [attorney_staff];
-- Deny PII columns to general analysts
DENY SELECT ON doj_antitrust.dbo.slv_antitrust_cases(defendant_name) TO [analyst_general];
DENY SELECT ON doj_antitrust.dbo.slv_antitrust_cases(defendant_ssn) TO [analyst_general];
DENY SELECT ON doj_antitrust.dbo.slv_antitrust_cases(contact_email) TO [analyst_general];
-- Deny SSN even to privileged analysts (attorney-only)
DENY SELECT ON doj_antitrust.dbo.slv_antitrust_cases(defendant_ssn) TO [analyst_privileged];
Row-level security — filter by case assignment:
-- Create a function that checks the user's case assignments
CREATE FUNCTION dbo.fn_case_access(@assigned_attorney VARCHAR(256))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS access_granted
WHERE @assigned_attorney = SESSION_CONTEXT(N'attorney_upn')
OR IS_MEMBER('attorney_supervisors') = 1;
-- Apply the security policy
CREATE SECURITY POLICY doj_case_filter
ADD FILTER PREDICATE dbo.fn_case_access(assigned_attorney)
ON dbo.slv_antitrust_cases
WITH (STATE = ON);
Set session context in application code:
# When connecting from an application, set the attorney context
import pyodbc
conn = pyodbc.connect(connection_string)
cursor = conn.cursor()
cursor.execute(
"EXEC sp_set_session_context @key=N'attorney_upn', @value=?",
(current_user_upn,)
)
Direct Lake and RLS
Row-level security defined on the Fabric Data Warehouse SQL endpoint is enforced when Power BI uses DirectQuery mode. For Direct Lake mode, RLS must be defined in the Power BI semantic model using DAX expressions. Ensure your security policy covers both access paths.
Evidence: Production Deployments¶
The following table summarizes publicly documented Fabric deployments at enterprise scale. All figures are sourced from Microsoft customer stories or independent validations.
| Organization | Scale | Outcome | Source |
|---|---|---|---|
| Microsoft IDEAS | 420 PiB across 600+ teams | 50% efficiency improvement from consolidation to OneLake | Microsoft Learn, Customer Story |
| Edith Cowan University | 2,000 staff, 3.5× user growth | 50% platform cost reduction, 70% faster report development | Customer Story |
| Dentsu | Global media analytics, D365 integration | 55% faster data replication, near real-time campaign metrics | Customer Story |
| OBOS BBL | 600+ notebooks and pipelines migrated | 30% faster processing, 20% lower operational cost | Fabric Blog |
Independent Validation
Enterprise Strategy Group (ESG) independently validated Fabric Data Warehouse query performance at up to 75% faster than Azure Synapse dedicated SQL pools at comparable price points. The validation covered TPC-DS derived workloads across multiple concurrency levels. See ESG Technical Validation for methodology and results.
Government Deployment Considerations¶
Federal and state government agencies have specific compliance requirements that affect Fabric adoption. Key considerations:
FedRAMP Authorization¶
- Fabric is authorized at FedRAMP High in Azure Commercial via a Provisional Authority to Operate (P-ATO). This covers most civilian federal workloads. See Fabric FedRAMP Blog.
- Azure Government (US Gov Virginia, US Gov Arizona, US Gov Texas) availability for Fabric components is evolving. Check the Azure Government product roadmap for current status.
- For DoD IL4/IL5 workloads, confirm service-by-service availability in the FedRAMP audit scope documentation.
Data Residency¶
- OneLake stores data in the Azure region associated with the Fabric capacity. For government workloads, ensure the capacity is provisioned in a region that meets data residency requirements.
- Shortcuts to external storage respect the data residency of the source — data does not move when accessed through a shortcut.
Identity and Access¶
- Fabric uses Entra ID (Microsoft Entra ID) for authentication. Government tenants using Entra ID for Government are supported.
- Conditional Access policies, MFA, and Privileged Identity Management (PIM) apply to Fabric workspace access.
Azure Government vs. Azure Commercial
FedRAMP High in Azure Commercial and FedRAMP High in Azure Government are distinct authorization boundaries. Government workloads requiring US-sovereign data handling, US-persons operational access, or specific DoD Impact Level controls must validate Azure Government availability separately. The FedRAMP High P-ATO in Azure Commercial does not automatically satisfy Azure Government requirements.
Recommended Pre-Deployment Checklist¶
| Item | Action |
|---|---|
| FedRAMP scope | Verify each Fabric component (Lakehouse, Warehouse, Spark, Power BI) is in the FedRAMP audit scope for your authorization boundary |
| Data classification | Map data sensitivity levels to Purview sensitivity labels before ingestion |
| Network isolation | Configure Private Link for Fabric workspace if network isolation is required |
| Audit logging | Confirm unified audit log exports to your SIEM (Sentinel, Splunk) |
| Capacity region | Provision Fabric capacity in a compliant Azure region |
| Service principal governance | Register Fabric service principals in your agency's identity governance system |
Migration Checklist¶
Use this checklist when migrating a CSA-in-a-Box domain from Databricks to Fabric:
- Create Fabric workspace and assign capacity
- Create Lakehouse for the domain
- Create OneLake shortcuts to existing ADLS Gen2 Bronze data
- Install
dbt-fabricadapter and updateprofiles.yml - Run
dbt debugto validate Fabric connection - Run Silver and Gold dbt models against Fabric Data Warehouse
- Run
dbt testto validate data quality post-migration - Configure Purview integration (catalog, lineage, sensitivity labels)
- Implement row-level and column-level security policies
- Create Power BI semantic model with Direct Lake connection
- Validate RLS enforcement in Power BI reports
- Update CI/CD pipelines to target Fabric endpoints
- Run parallel validation: compare Databricks and Fabric outputs
- Decommission Databricks compute after validation period
Related Resources¶
- ADR-0010: Fabric as Strategic Target
- DOJ Antitrust: Step-by-Step Domain Build — the domain this migration guide references
- Azure Analytics: White Papers & Resources
- Microsoft Fabric Documentation
- OneLake Documentation
- dbt-fabric Adapter
- Fabric REST API Reference
- Microsoft Purview Data Governance Overview