Unified Analytics Platform on Microsoft Fabric¶

Fragmented analytics environments — separate systems for ingestion, warehousing, data science, and reporting — increase operational cost, slow time-to-insight, and create governance gaps. When data is copied across ADLS Gen2 accounts, Databricks workspaces, and Power BI import datasets, lineage breaks, security policies diverge, and storage costs compound.

Microsoft Fabric consolidates these workloads into a single SaaS platform built on OneLake. This guide explains how CSA-in-a-Box patterns (medallion architecture, dbt transformations, data contracts, Purview governance) map directly to Fabric, and walks through a concrete domain migration from Databricks to Fabric.

Why Fabric for Unified Analytics¶

Fabric introduces OneLake, a tenant-wide logical data lake that serves as the single storage layer for all Fabric workloads. Key architectural properties:

Single copy of data: OneLake stores all data in Delta Parquet format by default. Spark notebooks, SQL analytics, KQL queries, and Power BI all read from the same physical files — no import copies, no extract pipelines between engines.
Shortcuts and mirroring: OneLake shortcuts create logical pointers to data in external storage (ADLS Gen2, S3, GCS, Dataverse) without copying bytes. Mirroring replicates operational databases (Azure SQL, Cosmos DB, Snowflake) into OneLake as Delta Parquet in near real-time.
Zero-ETL between engines: A table written by a Spark notebook is immediately queryable from the SQL analytics endpoint and from Power BI via Direct Lake — no staging, no materialization pipeline.
Unified capacity model: A single Fabric capacity (F-SKU) powers Spark, SQL, Data Factory pipelines, KQL, and Power BI. Capacity scales vertically and pauses when idle.

This model eliminates the data duplication problem that drives most analytics cost overruns. Instead of N copies of the same dataset across N tools, OneLake maintains one authoritative copy with multiple compute engines reading it in place.

Architecture¶

The following diagram shows how CSA-in-a-Box source systems flow through Fabric into consumption layers, with Purview governance spanning the entire estate.

graph TB
    subgraph Sources
        S1[HSR Filings]
        S2[Case Management]
        S3[Economic Data]
        S4[Document Productions]
        S5[D365 / ERP Systems]
    end

    subgraph OneLake["OneLake (Tenant-Wide Data Lake)"]
        SC[Shortcuts / Mirroring]
        subgraph Lakehouse["Fabric Lakehouse"]
            B[Bronze Layer<br/>Raw ingestion]
            SL[Silver Layer<br/>Cleaned, conformed]
            G[Gold Layer<br/>Business-ready aggregates]
        end
    end

    subgraph Compute["Fabric Compute Engines"]
        DW[Fabric Data Warehouse<br/>SQL Analytics]
        SP[Fabric Spark<br/>Data Science / ML]
        KQL[KQL Database<br/>Real-Time Analytics]
    end

    subgraph Consumption
        PBI[Power BI<br/>Direct Lake Mode]
        API[SQL Endpoint<br/>External Tools]
        NB[Notebooks<br/>Ad-hoc Analysis]
    end

    subgraph Governance["Microsoft Purview"]
        CAT[Data Catalog]
        LIN[Lineage Tracking]
        DLP[Sensitivity Labels & DLP]
        AUD[Audit Logging]
    end

    S1 --> SC
    S2 --> SC
    S3 --> SC
    S4 --> SC
    S5 --> SC

    SC --> B
    B --> SL
    SL --> G

    G --> DW
    G --> SP
    G --> KQL

    DW --> PBI
    DW --> API
    SP --> NB
    KQL --> PBI

    Governance -.->|spans entire estate| OneLake
    Governance -.-> Compute
    Governance -.-> Consumption

Each domain (DOJ, finance, inventory, sales) gets its own Fabric Lakehouse within a workspace, maintaining the domain-oriented ownership model from CSA-in-a-Box while sharing a common OneLake storage layer.

Mapping CSA-in-a-Box Patterns to Fabric¶

The following table maps existing CSA-in-a-Box implementation patterns to their Fabric equivalents. The medallion architecture, dbt transformations, and governance patterns transfer with minimal rework.

CSA-in-a-Box Pattern	Current Implementation	Fabric Equivalent
Medallion architecture	Databricks Delta Lake	Fabric Lakehouse (Delta Parquet on OneLake)
Data transformation	dbt on Databricks	dbt-fabric adapter on Fabric Data Warehouse, or Spark notebooks
Data contracts	YAML-based contract definitions	Purview data products + OneLake Catalog entries
Data governance	Manual Purview integration via APIs	Native Purview integration (automatic catalog, lineage, DLP)
Storage	ADLS Gen2 with hierarchical namespace	OneLake (built on ADLS Gen2, Delta Parquet default)
Compute	Databricks interactive + job clusters	Fabric capacity (Spark pools, SQL engine, KQL engine)
Orchestration	Azure Data Factory / Databricks Workflows	Fabric Data Factory (native pipelines)
Visualization	Power BI with DirectQuery to Databricks	Power BI with Direct Lake (no import, no DirectQuery)
Real-time ingestion	Event Hubs → Databricks Structured Streaming	Event Hubs → Fabric Eventstream → KQL Database
Security	Unity Catalog + custom RBAC	Fabric workspace roles + OneLake data access roles + RLS/CLS

CSA-in-a-Box + Fabric

CSA-in-a-Box domains (DOJ, finance, inventory, sales) use dbt with Delta Lake on Databricks today. The medallion architecture, data contracts, and governance patterns transfer directly to Fabric — OneLake uses Delta Parquet natively, and the dbt-fabric adapter runs dbt models against Fabric Data Warehouse with no model SQL changes for standard transformations.

What changes¶

Storage paths: ADLS Gen2 abfss:// paths become OneLake paths (abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>.Lakehouse/...)
Compute configuration: Databricks cluster policies become Fabric capacity settings
Authentication: Service principal auth remains, but endpoint URLs change to *.datawarehouse.fabric.microsoft.com

What stays the same¶

dbt models: SQL transformations in models/ directories work with the dbt-fabric adapter
Medallion layer structure: Bronze/Silver/Gold organization is identical
Delta format: OneLake uses Delta Parquet natively — no format conversion needed
Purview integration: Lineage and cataloging improve (native vs. manual)

Step-by-Step: Migrating a Domain to Fabric¶

This walkthrough migrates the DOJ Antitrust domain (domains/doj/) from Databricks to Fabric. The same steps apply to any CSA-in-a-Box domain.

Step 1 — Create a Fabric Workspace¶

A Fabric workspace is the security and billing boundary for a set of related items (Lakehouses, Warehouses, notebooks, reports).

Via Azure CLI and Fabric REST API:

# Ensure you have the Fabric capacity created (F64 or higher for production)
# Workspace creation uses the Fabric REST API

# Get an access token
TOKEN=$(az account get-access-token \
  --resource "https://api.fabric.microsoft.com" \
  --query accessToken -o tsv)

# Create the workspace
curl -X POST "https://api.fabric.microsoft.com/v1/workspaces" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "CSA-DOJ-Antitrust",
    "description": "DOJ Antitrust domain - CSA-in-a-Box",
    "capacityId": "<your-fabric-capacity-id>"
  }'

Via Fabric Portal:

Navigate to app.fabric.microsoft.com
Select Workspaces → New workspace
Name: CSA-DOJ-Antitrust
Under Advanced, assign the workspace to your Fabric capacity

Capacity Sizing

For development and testing, an F2 capacity is sufficient. Production workloads with concurrent Spark jobs and Power BI queries typically require F64 or higher. Fabric capacities can be paused when not in use to control cost.

Step 2 — Create a Lakehouse¶

The Lakehouse is the OneLake-backed storage container that holds Bronze, Silver, and Gold tables.

# Create a Lakehouse via Fabric REST API
curl -X POST "https://api.fabric.microsoft.com/v1/workspaces/<workspace-id>/items" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Lakehouse",
    "displayName": "doj_antitrust"
  }'

OneLake organizes Lakehouse content into two top-level folders:

doj_antitrust.Lakehouse/
├── Tables/          # Managed Delta tables (queryable via SQL endpoint)
│   ├── bronze_antitrust_cases/
│   ├── silver_antitrust_cases/
│   └── gold_case_summary/
└── Files/           # Unstructured files (PDFs, images, raw exports)
    ├── raw_filings/
    └── document_productions/

Items in Tables/ are automatically registered as Delta tables and exposed through the Lakehouse SQL analytics endpoint.

Step 3 — Set Up OneLake Shortcuts¶

Shortcuts create zero-copy logical pointers to existing data in ADLS Gen2. This allows Fabric to read CSA-in-a-Box Bronze data without moving it.

# create_shortcuts.py
# Creates OneLake shortcuts from existing ADLS Gen2 Bronze layer to Fabric Lakehouse

import requests
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
token = credential.get_token("https://api.fabric.microsoft.com/.default").token

WORKSPACE_ID = "<workspace-id>"
LAKEHOUSE_ID = "<lakehouse-id>"
BASE_URL = f"https://api.fabric.microsoft.com/v1/workspaces/{WORKSPACE_ID}/items/{LAKEHOUSE_ID}/shortcuts"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

# Define shortcuts for each Bronze table
shortcuts = [
    {
        "path": "Tables/bronze_antitrust_cases",
        "name": "bronze_antitrust_cases",
        "target": {
            "adlsGen2": {
                "location": "https://csadatalake.dfs.core.windows.net",
                "subpath": "/bronze/doj/antitrust_cases",
                "connectionId": "<adls-connection-id>"
            }
        }
    },
    {
        "path": "Tables/bronze_hsr_filings",
        "name": "bronze_hsr_filings",
        "target": {
            "adlsGen2": {
                "location": "https://csadatalake.dfs.core.windows.net",
                "subpath": "/bronze/doj/hsr_filings",
                "connectionId": "<adls-connection-id>"
            }
        }
    },
    {
        "path": "Tables/bronze_economic_data",
        "name": "bronze_economic_data",
        "target": {
            "adlsGen2": {
                "location": "https://csadatalake.dfs.core.windows.net",
                "subpath": "/bronze/doj/economic_data",
                "connectionId": "<adls-connection-id>"
            }
        }
    }
]

for shortcut in shortcuts:
    response = requests.post(BASE_URL, headers=headers, json=shortcut)
    if response.status_code == 201:
        print(f"Created shortcut: {shortcut['name']}")
    else:
        print(f"Failed: {shortcut['name']} — {response.status_code}: {response.text}")

Shortcut vs. Copy

Shortcuts are metadata pointers — no data is copied, no egress charges are incurred (within the same Azure region), and updates to the source are immediately visible through the shortcut. Use shortcuts for the Bronze layer to avoid duplicating raw data. Silver and Gold layers are typically materialized as managed Delta tables in the Lakehouse.

Step 4 — Run dbt on Fabric¶

The existing dbt project in domains/doj/dbt/ transforms Bronze → Silver → Gold. To run it against Fabric Data Warehouse, update the connection profile.

Install the dbt-fabric adapter:

pip install dbt-fabric

Configure profiles.yml:

# ~/.dbt/profiles.yml (or CI/CD environment config)
csa_analytics:
    target: fabric
    outputs:
        fabric:
            type: fabric
            driver: "ODBC Driver 18 for SQL Server"
            server: "<workspace-guid>.datawarehouse.fabric.microsoft.com"
            database: "doj_antitrust"
            port: 1433
            schema: "dbo"
            authentication: serviceprincipal
            tenant_id: "{{ env_var('AZURE_TENANT_ID') }}"
            client_id: "{{ env_var('AZURE_CLIENT_ID') }}"
            client_secret: "{{ env_var('AZURE_CLIENT_SECRET') }}"
            threads: 4
            retries: 2

Run the dbt pipeline:

cd domains/doj/dbt

# Test the connection
dbt debug --target fabric

# Run Bronze → Silver → Gold transformations
dbt run --target fabric --select tag:silver
dbt run --target fabric --select tag:gold

# Run data quality tests
dbt test --target fabric

dbt-fabric Adapter Differences

The dbt-fabric adapter supports most dbt Core features, but there are differences from dbt-databricks:

- **Materializations**: `table`, `view`, and `incremental` are supported. `ephemeral` models work as CTEs.
- **Incremental strategies**: `append` and `delete+insert` are supported. `merge` requires Fabric Data Warehouse (not Lakehouse SQL endpoint).
- **Python models**: Not supported in dbt-fabric. Use Fabric Spark notebooks for Python-based transformations.

Test your dbt project with `dbt run` in a dev workspace before migrating production pipelines.

Step 5 — Configure Purview Governance¶

Fabric integrates natively with Microsoft Purview. When a Fabric workspace is connected to Purview, catalog entries, lineage, and sensitivity labels propagate automatically.

Enable Purview integration:

In the Fabric Admin Portal, navigate to Governance and insights → Purview Hub
Connect your Purview account to the Fabric tenant
Enable automatic scanning for the CSA-DOJ-Antitrust workspace

Apply sensitivity labels:

# Apply sensitivity labels to a Lakehouse item via Purview REST API
curl -X PATCH "https://api.purview.azure.com/catalog/api/atlas/v2/entity/guid/<entity-guid>" \
  -H "Authorization: Bearer $PURVIEW_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "entity": {
      "attributes": {
        "sensitivityLabel": "Confidential - Attorney Work Product"
      }
    }
  }'

What you get automatically with Purview + Fabric:

Capability	Manual Setup Required?	Notes
Data catalog entries	No	All Lakehouse tables auto-registered
Column-level lineage	No	Tracked through Spark notebooks and SQL pipelines
Sensitivity labels	Yes	Define label policies in Purview, apply to items
DLP policies	Yes	Configure in Microsoft 365 Compliance Center
Access audit logging	No	All data access logged to unified audit log
Data quality rules	Yes	Define in Purview data quality (preview)

Step 6 — OneLake Security (Row/Column-Level)¶

CSA-in-a-Box handles sensitive legal data — attorney work product, privileged communications, and PII. Fabric provides row-level security (RLS) and column-level security (CLS) at the SQL analytics endpoint.

Column-level security — restrict PII access:

-- Create roles for different access levels
CREATE ROLE [analyst_general];
CREATE ROLE [analyst_privileged];
CREATE ROLE [attorney_staff];

-- Grant base table access
GRANT SELECT ON doj_antitrust.dbo.slv_antitrust_cases TO [analyst_general];
GRANT SELECT ON doj_antitrust.dbo.slv_antitrust_cases TO [analyst_privileged];
GRANT SELECT ON doj_antitrust.dbo.slv_antitrust_cases TO [attorney_staff];

-- Deny PII columns to general analysts
DENY SELECT ON doj_antitrust.dbo.slv_antitrust_cases(defendant_name) TO [analyst_general];
DENY SELECT ON doj_antitrust.dbo.slv_antitrust_cases(defendant_ssn) TO [analyst_general];
DENY SELECT ON doj_antitrust.dbo.slv_antitrust_cases(contact_email) TO [analyst_general];

-- Deny SSN even to privileged analysts (attorney-only)
DENY SELECT ON doj_antitrust.dbo.slv_antitrust_cases(defendant_ssn) TO [analyst_privileged];

Row-level security — filter by case assignment:

-- Create a function that checks the user's case assignments
CREATE FUNCTION dbo.fn_case_access(@assigned_attorney VARCHAR(256))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS access_granted
WHERE @assigned_attorney = SESSION_CONTEXT(N'attorney_upn')
   OR IS_MEMBER('attorney_supervisors') = 1;

-- Apply the security policy
CREATE SECURITY POLICY doj_case_filter
ADD FILTER PREDICATE dbo.fn_case_access(assigned_attorney)
ON dbo.slv_antitrust_cases
WITH (STATE = ON);

Set session context in application code:

# When connecting from an application, set the attorney context
import pyodbc

conn = pyodbc.connect(connection_string)
cursor = conn.cursor()
cursor.execute(
    "EXEC sp_set_session_context @key=N'attorney_upn', @value=?",
    (current_user_upn,)
)

Direct Lake and RLS

Row-level security defined on the Fabric Data Warehouse SQL endpoint is enforced when Power BI uses DirectQuery mode. For Direct Lake mode, RLS must be defined in the Power BI semantic model using DAX expressions. Ensure your security policy covers both access paths.

Evidence: Production Deployments¶

The following table summarizes publicly documented Fabric deployments at enterprise scale. All figures are sourced from Microsoft customer stories or independent validations.

Organization	Scale	Outcome	Source
Microsoft IDEAS	420 PiB across 600+ teams	50% efficiency improvement from consolidation to OneLake	Microsoft Learn, Customer Story
Edith Cowan University	2,000 staff, 3.5× user growth	50% platform cost reduction, 70% faster report development	Customer Story
Dentsu	Global media analytics, D365 integration	55% faster data replication, near real-time campaign metrics	Customer Story
OBOS BBL	600+ notebooks and pipelines migrated	30% faster processing, 20% lower operational cost	Fabric Blog

Independent Validation

Enterprise Strategy Group (ESG) independently validated Fabric Data Warehouse query performance at up to 75% faster than Azure Synapse dedicated SQL pools at comparable price points. The validation covered TPC-DS derived workloads across multiple concurrency levels. See ESG Technical Validation for methodology and results.

Government Deployment Considerations¶

Federal and state government agencies have specific compliance requirements that affect Fabric adoption. Key considerations:

FedRAMP Authorization¶

Fabric is authorized at FedRAMP High in Azure Commercial via a Provisional Authority to Operate (P-ATO). This covers most civilian federal workloads. See Fabric FedRAMP Blog.
Azure Government (US Gov Virginia, US Gov Arizona, US Gov Texas) availability for Fabric components is evolving. Check the Azure Government product roadmap for current status.
For DoD IL4/IL5 workloads, confirm service-by-service availability in the FedRAMP audit scope documentation.

Data Residency¶

OneLake stores data in the Azure region associated with the Fabric capacity. For government workloads, ensure the capacity is provisioned in a region that meets data residency requirements.
Shortcuts to external storage respect the data residency of the source — data does not move when accessed through a shortcut.

Identity and Access¶

Fabric uses Entra ID (Microsoft Entra ID) for authentication. Government tenants using Entra ID for Government are supported.
Conditional Access policies, MFA, and Privileged Identity Management (PIM) apply to Fabric workspace access.

Azure Government vs. Azure Commercial

FedRAMP High in Azure Commercial and FedRAMP High in Azure Government are distinct authorization boundaries. Government workloads requiring US-sovereign data handling, US-persons operational access, or specific DoD Impact Level controls must validate Azure Government availability separately. The FedRAMP High P-ATO in Azure Commercial does not automatically satisfy Azure Government requirements.

Recommended Pre-Deployment Checklist¶

Item	Action
FedRAMP scope	Verify each Fabric component (Lakehouse, Warehouse, Spark, Power BI) is in the FedRAMP audit scope for your authorization boundary
Data classification	Map data sensitivity levels to Purview sensitivity labels before ingestion
Network isolation	Configure Private Link for Fabric workspace if network isolation is required
Audit logging	Confirm unified audit log exports to your SIEM (Sentinel, Splunk)
Capacity region	Provision Fabric capacity in a compliant Azure region
Service principal governance	Register Fabric service principals in your agency's identity governance system

Migration Checklist¶

Use this checklist when migrating a CSA-in-a-Box domain from Databricks to Fabric:

ADR-0010: Fabric as Strategic Target
DOJ Antitrust: Step-by-Step Domain Build — the domain this migration guide references
Azure Analytics: White Papers & Resources
Microsoft Fabric Documentation
OneLake Documentation
dbt-fabric Adapter
Fabric REST API Reference
Microsoft Purview Data Governance Overview

Unified Analytics Platform on Microsoft Fabric¶

Why Fabric for Unified Analytics¶

Architecture¶

Mapping CSA-in-a-Box Patterns to Fabric¶

What changes¶

What stays the same¶

Step-by-Step: Migrating a Domain to Fabric¶

Step 1 — Create a Fabric Workspace¶

Step 2 — Create a Lakehouse¶

Step 3 — Set Up OneLake Shortcuts¶

Step 4 — Run dbt on Fabric¶

Step 5 — Configure Purview Governance¶

Step 6 — OneLake Security (Row/Column-Level)¶

Evidence: Production Deployments¶

Government Deployment Considerations¶

FedRAMP Authorization¶

Data Residency¶

Identity and Access¶

Recommended Pre-Deployment Checklist¶

Migration Checklist¶

Related Resources¶