Home > Tutorials > Databricks to Fabric Migration

🧱 Tutorial 42: Databricks → Microsoft Fabric Migration¶

Last Updated: 2026-04-27 | Phase: 14 (Wave 4) | Companion to Tutorial 41 (Synapse) Status: ✅ Final | Maintainer: Platform Team

Third-party references — publicly sourced, good-faith comparison

This page references non-Microsoft products and services. That information is drawn from each vendor's publicly available documentation and is offered for honest, good-faith comparison only. This is a personal project written from a Microsoft Fabric and Azure perspective; it does not claim expertise in, or authority over, any third-party product, and nothing here is an official statement by, or endorsed by, those vendors. Capabilities, pricing, and features change often — always verify against the vendor's current official documentation. Where a third-party offering is the stronger choice, we say so plainly.

🧱 Tutorial 42: Databricks → Microsoft Fabric Migration¶


Difficulty	⭐⭐⭐ Advanced
Time	⏱️ 200-260 minutes
Focus	Workspace + Unity Catalog + Delta tables + Workflows + MLflow + DBSQL → Fabric Workspace, OneLake Catalog, Lakehouse Delta, Fabric Pipelines, Fabric MLflow, Fabric Warehouse SQL endpoint

📊 Progress Tracker¶

Tutorial	Status
00 — Environment Setup → 38 — DOJ Justice	✅ Complete (Phases 1-13)
41 — Synapse → Fabric	✅ Complete
42 — Databricks → Fabric (this tutorial)	🔵 YOU ARE HERE
43 — Redshift → Fabric	⬜
44 — BigQuery → Fabric	⬜
45 — On-Prem SSAS/SSIS/SSRS → Fabric	⬜

Navigation
⬅️ Previous	41 — Synapse → Fabric
➡️ Next	43 — Redshift → Fabric

📖 Overview¶

Databricks → Fabric tends to be a relatively smooth cloud-to-cloud migration because both platforms speak Delta Lake natively. Databricks is a mature, best-in-class lakehouse and Spark platform with deep ML/AI tooling, and for many teams it is the right long-term home. This tutorial is for teams that have decided to consolidate onto a Microsoft platform that bundles BI, real-time, governance, and ML alongside Spark, with a single capacity-based commercial model and tighter Microsoft 365 integration — not an argument that one platform is better than the other.

Comparing Databricks and Fabric¶

The table below summarizes differences based on each vendor's publicly available documentation (as of this doc's date). It is descriptive, not a ranking — Databricks' Photon engine, DLT, and Unity Catalog are genuine strengths, and the most cost-effective choice depends entirely on your workload mix.

Concern	Databricks Behavior	Fabric Behavior
Compute model	DBUs per cluster, billed per workspace	Capacity Units (CUs) shared across Spark, SQL, Power BI, RTI, ML
Storage	Customer-managed ADLS / S3 / GCS	OneLake unified, but ADLS shortcuts work too (no copy)
BI integration	Power BI via DBSQL connector or DirectQuery	Direct Lake (no copy, no refresh) on Delta — read-from-OneLake
Real-time	Structured Streaming + DLT (Delta Live Tables)	Eventstream + Eventhouse + Materialized Lake Views
Governance	Unity Catalog (Databricks-native, mature)	OneLake Catalog + Microsoft Purview (org-wide)
MLOps	MLflow (managed) + Model Serving	Fabric MLflow + AutoML + Real-Time Endpoints
DBSQL Warehouses	Photon-accelerated SQL warehouses (separately billed)	Fabric Warehouse + SQL endpoint over Lakehouse
Notebook UX	Databricks notebooks	Fabric notebooks (similar; `mssparkutils` API differences)
Workflows	Databricks Workflows / Jobs	Fabric Data Pipelines
DLT (Declarative)	Delta Live Tables	Materialized Lake Views (Wave 9)
Cost shape	Variable (per-DBU + storage + serverless DBSQL surcharge)	Fixed monthly F-SKU — predictability differs by workload pattern
Microsoft estate fit	Bring your own M365 / Power BI / Purview	Native, single tenant, single bill

What This Tutorial Covers¶

Inventory + complexity scoring of your Databricks workspace
Mapping Unity Catalog → OneLake Catalog (governance migration)
Moving Delta tables to OneLake (in-place via shortcut, OR rewrite for V-Order)
Notebook conversion: dbutils → mssparkutils, magic command parity, library management
Databricks Workflows → Fabric Data Pipelines
Delta Live Tables → Materialized Lake Views (or scheduled notebooks)
MLflow registry export + import to Fabric MLflow
DBSQL → Fabric Warehouse SQL endpoint
DBU → CU sizing
Three-tier validation (row count + hash + ML model parity)
Coexistence pattern (both platforms read same OneLake/ADLS Delta)
Cutover + decommission

🎯 Learning Objectives¶

By the end of this tutorial, you will be able to:

🏗️ Reference Architecture¶

flowchart LR
    subgraph DB["🧱 Source: Databricks Workspace"]
        DBWS[Databricks Workspace]
        UC[(Unity Catalog)]
        DBT[(Delta Tables on ADLS/S3)]
        DBWF[Workflows / Jobs]
        DLT[Delta Live Tables]
        DBSQL[(DBSQL Warehouses)]
        DBNB[Notebooks]
        DBML[MLflow Registry]
        DBCP[Cluster Pools]
    end

    subgraph Fabric["⚡ Target: Microsoft Fabric"]
        FWS[Fabric Workspace]
        OLC[(OneLake Catalog)]
        LH[(Lakehouse Delta in OneLake)]
        FPIPE[Fabric Pipelines]
        MLV[Materialized Lake Views]
        FWH[(Fabric Warehouse / SQL endpoint)]
        FNB[Fabric Notebooks]
        FML[Fabric MLflow]
        FCAP[F-SKU Capacity]
    end

    subgraph Migration["🔄 Migration Layer"]
        Assess[Assessment Script]
        DeltaCompat[Delta Compatibility Check]
        WFConvert[Workflow → Pipeline Converter]
        MLExport[MLflow Registry Export]
        Validate[Validation Suite]
    end

    DBWS --> Assess
    UC --> Assess
    DBT --> DeltaCompat
    DBWF --> WFConvert
    DBML --> MLExport
    DBNB --> Assess

    Assess --> FWS
    UC -.governance map.-> OLC
    DeltaCompat -->|shortcut OR rewrite| LH
    WFConvert --> FPIPE
    DLT -.declarative map.-> MLV
    MLExport --> FML
    DBSQL -.SQL endpoint.-> FWH
    DBNB -.dbutils shim.-> FNB
    DBCP -.DBU→CU.-> FCAP

    LH --> Validate
    FPIPE --> Validate
    FML --> Validate

Component Mapping (the canonical reference)¶

Databricks Component	Fabric Equivalent	Migration Approach
Databricks Workspace	Fabric Workspace	New workspace per source workspace; preserve naming
Unity Catalog (catalog.schema.table)	OneLake Catalog + Lakehouse schemas	Direct map — three-part name preserved
Hive Metastore (legacy)	Lakehouse default schema	Bulk register tables to Lakehouse
Delta tables on ADLS Gen2	Delta in OneLake	Shortcut (zero-copy) OR rewrite for V-Order
Delta tables on S3 / GCS	Delta in OneLake	OneLake S3/GCS shortcut OR copy
Iceberg tables (UniForm)	OneLake Iceberg shortcut	Direct — see Iceberg Interop
Databricks Notebooks	Fabric Notebooks	Port code; `dbutils` → `mssparkutils` shim
Databricks Workflows / Jobs	Fabric Data Pipelines	Activity translation (see Step 5)
Delta Live Tables (DLT)	Materialized Lake Views	Declarative SQL maps directly; Python DLT rewrites
DBR Runtime (e.g., 14.3 LTS)	Fabric Spark Runtime 1.3 / 2.0	Most code portable; see runtime migration
Cluster pools	Fabric Spark capacity (CUs)	No 1:1 — Fabric autoscales within F-SKU
Photon engine	Fabric V-Order + Native Execution Engine	Different code path, similar perf for BI workloads
MLflow Tracking Server	Fabric MLflow (built-in)	Re-point experiments; export-import models
MLflow Model Registry	Fabric Model Registry	`mlflow export-import` tool (Step 7)
Model Serving	Fabric Real-Time Endpoints	Re-deploy registered model
DBSQL Warehouses	Fabric Warehouse + SQL endpoint	DDL conversion + V-Order; Power BI repoints
Unity Catalog Volumes	Lakehouse Files section	Copy non-tabular files
Secret Scopes	Azure Key Vault + Workspace Identity	Recreate; reference via OAP
Cluster init scripts	Fabric Environment files	Repackage as conda/pip env
Repos (Git folders)	Fabric Git Integration	Connect workspace to same repo

📋 Prerequisites¶

🚀 Step-by-Step¶

Step 1 — Assess Your Databricks Workspace¶

Inventory notebooks, jobs, clusters, Unity Catalog tables, and MLflow models.

cd tutorials/42-databricks-to-fabric/
python 01_assessment.py \
    --databricks-host "https://adb-1234567890123456.1.azuredatabricks.net" \
    --databricks-token "$DATABRICKS_TOKEN" \
    --output-dir "./assessment-output"

✅ Verification: Output directory contains: - notebooks_inventory.csv — every notebook with size, last-run, language, dependencies - jobs_inventory.csv — Workflows, schedules, cluster config, last-run status - clusters_inventory.csv — interactive + job clusters with DBR runtime, node type, autoscale ranges - unity_catalog_inventory.csv — every table with catalog.schema.table, format (Delta/Iceberg/Parquet), size, last-modified - mlflow_models_inventory.csv — every registered model + production-stage version - dlt_pipelines_inventory.csv — Delta Live Tables pipelines with mode (triggered/continuous) - dbsql_warehouses_inventory.csv — DBSQL warehouses with t-shirt size + Photon flag - dbu_consumption_30d.csv — last 30 days DBU usage by workspace + workload type - complexity_scores.csv — per-asset Easy/Medium/Hard/VeryHard rating - unsupported_features.md — items with no 1:1 Fabric equivalent (Photon-only SQL functions, Workflows-only triggers, etc.)

💡 Tip: Run assessment monthly during the migration project. New tables, dropped models, and shifting cluster usage all surface here.

⚠️ Gotcha: If your Databricks workspace was created before Unity Catalog (legacy hive_metastore), the assessment will report tables under hive_metastore.<schema>.<table>. These need explicit promotion to a UC catalog first or direct Lakehouse registration. Don't skip this — Hive metastore tables can have weaker schema enforcement.

Step 2 — Plan Migration Wave Order¶

Use the dependency graph to plan migration waves. Migrate leaves (no consumers) first, roots (most-consumed) last.

python 01_assessment.py --command wave-plan \
    --inventory ./assessment-output/ \
    --max-wave-effort-days 10

The tool produces migration-waves.md with:

Wave 1 — Independent reference tables (dimensions with no consumers)
Wave 2 — Bronze tables (raw ingestion → Fabric Bronze Lakehouse)
Wave 3 — Silver / cleansed dimension tables
Wave 4 — Gold / fact tables + aggregations
Wave 5 — DLT pipelines → Materialized Lake Views
Wave 6 — MLflow models + serving endpoints
Wave 7 — DBSQL workloads + Power BI rebinding
Wave 8 — Decommission Databricks workspace

⚠️ Gotcha: Cross-workspace Delta Sharing creates hidden dependencies. If Workspace A consumes a share from Workspace B, migrate the producer first, then the consumer. The assessment script checks for system.information_schema.shares and surfaces these.

Step 3 — Migrate Delta Tables to OneLake¶

This is the easiest part of the migration. Both Databricks and Fabric speak Delta Lake natively. You have two options:

Option A — OneLake Shortcut (zero-copy, recommended)¶

If your Databricks Delta tables live on ADLS Gen2 (Azure) or S3 (AWS), create a OneLake shortcut. No data movement.

# ADLS Gen2 shortcut
az fabric onelake-shortcut create \
    --workspace-id "$WS_ID" \
    --lakehouse-id "$LH_ID" \
    --name "databricks-bronze" \
    --path "Tables/databricks_bronze" \
    --target-adls-uri "abfss://databricks-data@adlsgendb01.dfs.core.windows.net/bronze/"

Or via Bicep:

resource shortcut 'Microsoft.Fabric/workspaces/items/shortcuts@2024-01-01' = {
  name: 'databricks-bronze'
  properties: {
    target: {
      adlsGen2: {
        location: 'https://${storageAccount}.dfs.core.windows.net/'
        path: '/databricks-data/bronze'
        subpath: '/'
      }
    }
  }
}

Why shortcuts are huge here: - Petabyte-scale Delta tables are not copied - Source ADLS / S3 retains full history - Both Databricks and Fabric read the same Delta files in place - Zero downtime, true bidirectional during coexistence

Option B — Rewrite for V-Order¶

If your top BI tables benefit from V-Order (Power BI Direct Lake), rewrite them through Fabric Spark with delta.parquet.vorder.default=true:

# In a Fabric notebook
from delta.tables import DeltaTable

source_path = "abfss://databricks-data@adlsgendb01.dfs.core.windows.net/gold/fact_slot_revenue"
target_table = "lh_gold.fact_slot_revenue"

(spark.read.format("delta").load(source_path)
    .write
    .format("delta")
    .mode("overwrite")
    .option("delta.parquet.vorder.default", "true")
    .saveAsTable(target_table))

# Then OPTIMIZE for full V-Order benefit
spark.sql(f"OPTIMIZE {target_table} VORDER")

OPTIMIZE / Z-ORDER Differences¶

Databricks	Fabric
`OPTIMIZE table ZORDER BY (col)`	`OPTIMIZE table ZORDER BY (col)` (supported)
`OPTIMIZE table` (file compaction only)	`OPTIMIZE table VORDER` (V-Order + compaction)
`VACUUM table RETAIN N HOURS`	`VACUUM table RETAIN N HOURS` (same syntax)
Liquid clustering (`CLUSTER BY`)	Liquid clustering supported in Fabric Runtime 1.3+
Photon-only functions	NEE-supported subset; check assessment output

✅ Verification:
# In Fabric notebook
df = spark.read.format("delta").load("Tables/databricks-bronze/fact_slot_revenue")
print(f"Row count: {df.count()}")

# Compare against Databricks notebook output
# Row count must match exactly
💡 Tip: Don't blindly rewrite every table. Shortcut first; only rewrite the top 10-20 BI hot paths where V-Order pays for itself in CU savings.

⚠️ Gotcha: Databricks-managed Delta tables (no explicit LOCATION) live in DBFS root, which is not accessible to Fabric shortcuts. You must ALTER TABLE ... SET LOCATION to ADLS first, or copy via Spark.

Step 4 — Convert Databricks Notebooks¶

Databricks and Fabric notebooks are very similar. Most port with mechanical changes.

Conversion Checklist¶

Databricks Pattern	Fabric Pattern
`dbutils.fs.ls(...)`	`mssparkutils.fs.ls(...)`
`dbutils.fs.cp(...)`	`mssparkutils.fs.cp(...)`
`dbutils.secrets.get(scope=..., key=...)`	`mssparkutils.credentials.getSecret(akv_uri, secret_name)`
`dbutils.notebook.run(...)`	`mssparkutils.notebook.run(...)` (same API)
`dbutils.widgets.text(...)`	`mssparkutils.notebook.exit(...)` + Pipeline parameters
`%pip install pkg`	`%pip install pkg` (works) — prefer Environment file
`%conda install pkg`	Environment file conda_dependencies block
`%sql` magic	`%%sql` magic (note double percent)
`%md` magic	`%%md` magic (double percent)
`%run ./other_notebook`	`%run ./other_notebook` (works)
`display(df)`	`display(df)` (works in Fabric)
`spark.conf.set("spark.databricks.delta...")`	Map to `spark.conf.set("spark.microsoft.delta...")` where applicable
Cluster init script	Environment custom library or pip requirements
`/Workspace/Repos/...` paths	Fabric Git Integration paths
`/dbfs/...` paths	OneLake `Files/` paths

Code-Level Shim¶

Add this small shim at the top of each migrated notebook for backwards-compatible code that uses dbutils:

# Fabric notebook compatibility shim
try:
    dbutils  # Already defined in Databricks
except NameError:
    from notebookutils import mssparkutils
    # Lightweight dbutils proxy
    class _DBUtilsShim:
        fs = mssparkutils.fs
        notebook = mssparkutils.notebook
        secrets = mssparkutils.credentials  # API differences — see migration script
    dbutils = _DBUtilsShim()

✅ Verification: Run the migrated notebook end-to-end. Output must match Databricks output (row counts exact; aggregations within float tolerance).

💡 Tip: Run the conversion script in batch mode. It rewrites known patterns automatically and produces a list of manual-review hotspots:
python 02_notebook_migration.py \
    --source-export ./databricks-export/Workspace/ \
    --target-dir ./fabric-notebooks/ \
    --apply-shim
⚠️ Gotcha: dbutils.widgets and Fabric notebook parameters are not 1:1. Fabric uses notebook parameter cells (tagged with parameters) and pipeline activity parameters. The conversion script flags every widget and recommends a parameter-cell rewrite.

Step 5 — Migrate Databricks Workflows → Fabric Data Pipelines¶

Workflows JSON definitions translate to Fabric Pipeline JSON via the conversion script. Most task types have direct Fabric activity equivalents.

python 03_workflow_migration.py \
    --source-jobs-folder ./databricks-export/Jobs/ \
    --target-workspace "$FABRIC_WS_ID" \
    --output-dir ./fabric-pipelines/

Task / Activity Compatibility¶

Databricks Task Type	Fabric Activity	Notes
`notebook_task`	Notebook	Re-point notebook reference
`python_wheel_task`	Spark Job Definition	Repackage wheel
`spark_jar_task`	Spark Job Definition	Repackage JAR
`spark_submit_task`	Spark Job Definition	Translate args
`sql_task` (DBSQL query)	Script (T-SQL) or Lookup	Re-point to Fabric Warehouse / SQL endpoint
`pipeline_task` (DLT)	Materialized Lake Views or chained notebooks	Manual rewrite (Step 6)
`dbt_task`	Notebook with `dbt-fabric`	See dbt integration
`run_job_task`	Invoke Pipeline	Direct map
`condition_task` (If/Else)	If Condition	Direct map
`for_each_task`	ForEach	Direct map
Job parameters	Pipeline parameters / Variable Library	Promote to Variable Library where reused
Job triggers (cron)	Pipeline schedule trigger	Direct map
Job triggers (file arrival)	Storage event trigger / Eventstream	Re-author
Job clusters	Fabric Spark capacity (autoscale)	No explicit cluster — Fabric manages

⚠️ Gotcha: Continuous job triggers in Databricks (always-on streaming jobs) become Fabric streaming jobs running against Eventstream — not Pipelines. The conversion script flags these and points to Tutorial 04 (Real-Time Analytics) for re-authoring.

💡 Tip: Promote any parameter referenced by ≥ 3 jobs to the Variable Library. Cleanup opportunity during migration.

Step 6 — Migrate Delta Live Tables → Materialized Lake Views¶

DLT pipelines are declarative. Pure-SQL DLT maps cleanly to Materialized Lake Views (MLV — Wave 9 feature). Python DLT pipelines need rewrite.

SQL DLT → Materialized Lake View¶

-- Databricks DLT (Python or SQL):
CREATE OR REFRESH STREAMING TABLE bronze_slot_telemetry
AS SELECT * FROM cloud_files("/mnt/raw/slots/", "json");

CREATE OR REFRESH LIVE TABLE silver_slot_telemetry
COMMENT "Cleansed slot telemetry"
AS SELECT
    machine_id,
    CAST(event_time AS TIMESTAMP) AS event_time,
    coin_in,
    coin_out
FROM LIVE.bronze_slot_telemetry
WHERE coin_in IS NOT NULL;

-- Fabric Materialized Lake View:
CREATE MATERIALIZED LAKE VIEW silver_slot_telemetry
AS SELECT
    machine_id,
    CAST(event_time AS TIMESTAMP) AS event_time,
    coin_in,
    coin_out
FROM bronze_slot_telemetry
WHERE coin_in IS NOT NULL;

Python DLT → Scheduled Notebook¶

If the DLT pipeline uses @dlt.table decorators with custom Python logic, rewrite as a scheduled Fabric notebook with explicit MERGE INTO:

# Databricks DLT:
# import dlt
# @dlt.table
# @dlt.expect_or_drop("valid_amount", "coin_in > 0")
# def silver_slot_telemetry():
#     return dlt.read_stream("bronze_slot_telemetry")...

# Fabric scheduled notebook with quality enforcement:
df = (spark.readStream.format("delta")
      .table("lh_bronze.bronze_slot_telemetry")
      .filter("coin_in > 0"))  # equivalent to expect_or_drop

(df.writeStream
   .format("delta")
   .outputMode("append")
   .option("checkpointLocation", "Files/checkpoints/silver_slot_telemetry")
   .trigger(availableNow=True)
   .toTable("lh_silver.silver_slot_telemetry"))

✅ Verification: Compare last 7 days of output between DLT pipeline and the migrated MLV / notebook. Row counts and aggregations must match.

⚠️ Gotcha: DLT Expectations with expect_or_fail action have no direct MLV equivalent. Migrate as Great Expectations checks in the upstream notebook, or as a CHECK constraint when the table is rewritten.

Step 7 — Migrate MLflow Models¶

Databricks MLflow → Fabric MLflow is a registry export-import.

# Step 7a: Export from Databricks
pip install mlflow-export-import

mlflow-export-import export-models \
    --output-dir ./mlflow-export/ \
    --models "fraud_detection,player_ltv,slot_anomaly" \
    --stages "Production,Staging" \
    --notebook-formats SOURCE \
    --databricks-host "$DATABRICKS_HOST" \
    --databricks-token "$DATABRICKS_TOKEN"

# Step 7b: Import to Fabric MLflow
mlflow-export-import import-models \
    --input-dir ./mlflow-export/ \
    --tracking-uri "$FABRIC_MLFLOW_URI" \
    --experiment-name-prefix "migrated_"

Item	Databricks	Fabric	Migration
Tracking server	Managed MLflow	Fabric MLflow (per workspace)	Re-point `MLFLOW_TRACKING_URI`
Registered models	UC `models` schema	Fabric Model Registry	`mlflow-export-import`
Model versions	Stages (Staging, Production, Archived)	Stages (preserved)	Stages map directly
Run artifacts	DBFS / UC Volumes	OneLake `Files/mlflow-artifacts/`	Bulk copy during export
Model Serving endpoints	Databricks Model Serving	Fabric Real-Time Endpoints	Re-deploy registered model
Feature Store	UC Feature Store	Fabric Lakehouse + Semantic Link	Republish features

ML Model Parity Validation¶

Score the same hold-out dataset on both platforms; predictions must match within tolerance.

# Validate model parity
import mlflow

src_model = mlflow.pyfunc.load_model("models:/fraud_detection/Production",
                                     tracking_uri=DATABRICKS_TRACKING_URI)
tgt_model = mlflow.pyfunc.load_model("models:/migrated_fraud_detection/Production",
                                     tracking_uri=FABRIC_TRACKING_URI)

src_pred = src_model.predict(holdout_df)
tgt_pred = tgt_model.predict(holdout_df)

# Classification: exact match expected
assert (src_pred == tgt_pred).all(), "Prediction mismatch!"
# Regression: within 1e-6 tolerance
# np.allclose(src_pred, tgt_pred, atol=1e-6)

⚠️ Gotcha: Models trained on Photon-specific UDFs may produce slightly different predictions in Fabric Spark due to floating-point operation ordering. Tolerance ≤ 1e-6 is normal; > 1e-3 is a red flag — investigate.

Step 8 — Migrate DBSQL → Fabric Warehouse / SQL Endpoint¶

Two paths depending on workload:

DBSQL Workload	Recommended Target	Why
Read-only BI queries on Delta tables	Fabric SQL endpoint over Lakehouse	Zero ETL — just query the same Delta
Heavy stored procedures, multi-statement transactions	Fabric Warehouse	Full T-SQL DDL + DML
Power BI Direct Query	Direct Lake (preferred)	Read-from-OneLake, no refresh
Tabular semantic models	Fabric semantic model	Re-author or use Power BI desktop migrate

DDL conversion via the helper:

python 04_dbsql_ddl_conversion.py \
    --source-host "$DBSQL_HOST" \
    --target-warehouse "wh-analytics-prod" \
    --target-workspace "$FABRIC_WS_ID" \
    --output-ddl ./output-ddl/

DBSQL Type	Fabric Warehouse Type	Notes
`BIGINT`	`BIGINT`	Direct
`INT`	`INT`	Direct
`DECIMAL(p,s)`	`DECIMAL(p,s)`	Direct
`STRING`	`VARCHAR(8000)` (sized)	Choose explicit length; `VARCHAR(MAX)` for unbounded
`BOOLEAN`	`BIT`	Direct
`TIMESTAMP`	`DATETIME2(6)`	Recommended
`DATE`	`DATE`	Direct
`BINARY`	`VARBINARY(MAX)`	Direct
`ARRAY<T>`	NOT SUPPORTED in Warehouse	Move to Lakehouse + Spark; or normalize
`MAP<K,V>`	NOT SUPPORTED	Convert to JSON `VARCHAR(MAX)`
`STRUCT<...>`	NOT SUPPORTED	Convert to JSON `VARCHAR(MAX)`

💡 Tip: Most BI dashboards backed by DBSQL don't need a Warehouse — point them at the Lakehouse SQL endpoint instead. Same Delta tables, no ETL.

Step 9 — Iceberg Interoperability (UniForm Tables)¶

If your Databricks tables use Delta UniForm (write Delta, expose as Iceberg), Fabric reads them directly via OneLake Iceberg shortcuts. No conversion required.

# Create an Iceberg shortcut
az fabric onelake-shortcut create \
    --workspace-id "$WS_ID" \
    --lakehouse-id "$LH_ID" \
    --name "databricks-iceberg" \
    --type "iceberg" \
    --path "Tables/databricks_iceberg" \
    --target-uri "abfss://uniform@adlsdb01.dfs.core.windows.net/iceberg-tables/"

✅ Verification:
df = spark.read.format("iceberg").load("Tables/databricks-iceberg/fact_table")
print(df.count())  # Match Databricks Iceberg query
💡 Tip: For read-only consumption from Snowflake, BigQuery, or Trino in parallel, UniForm + Iceberg shortcut means one copy of data, three engines.

Step 10 — Photon vs Fabric Spark — Performance Considerations¶

Concern	Databricks Photon	Fabric Spark
Vectorized engine	Photon (C++ runtime)	Native Execution Engine (NEE)
Format optimization	Delta + Photon-aware caching	Delta + V-Order + NEE caching
BI workload latency	Sub-second on hot cache	Sub-second on Direct Lake (no cache needed)
Function coverage	Wide; some Photon-only	Growing; check NEE coverage matrix
Separate billing	Photon cluster surcharge	None — included in F-SKU

⚠️ Gotcha: Some Photon-accelerated SQL functions (e.g., regexp_extract_all) may run on JVM Spark in Fabric (not NEE) — slower for big datasets. Assessment script flags Photon-heavy workloads. For these, plan an extra perf-tuning pass post-migration.

Step 11 — Unity Catalog → OneLake Catalog (Governance Migration)¶

Unity Catalog organizes objects as catalog.schema.table with grants and lineage. OneLake Catalog provides equivalent capability inside Fabric, with Microsoft Purview for org-wide governance.

Unity Catalog Concept	Fabric Equivalent	Migration Approach
`catalog`	Fabric domain or workspace	Map 1:1 (domain for orgs; workspace for teams)
`schema`	Lakehouse / Warehouse	Direct map
`table`	Delta table	Direct map
`view`	View (T-SQL or Spark)	DDL conversion
`volume`	Lakehouse Files section	Copy non-tabular files
`function`	Spark UDF / T-SQL function	Re-register
`model`	MLflow model	See Step 7
GRANT `SELECT` ON TABLE	OneLake Security row/column policies	Re-author
GRANT `MODIFY`	Workspace role (Contributor)	Map principals to roles
Lineage	Microsoft Purview lineage	Auto-captured once Purview connected
Tags / labels	Sensitivity labels (M365)	Re-apply via Purview

python 05_unity_catalog_export.py \
    --metastore-id "$UC_METASTORE_ID" \
    --output-dir ./uc-export/

# Outputs:
#   uc-export/catalogs.json
#   uc-export/schemas.json
#   uc-export/tables.json
#   uc-export/grants.json   ← critical for security migration
#   uc-export/lineage.json  ← reference for Purview re-bootstrap

💡 Tip: Connect Microsoft Purview to the Fabric tenant before migration. Lineage captures from day one and you don't have to backfill.

Step 12 — Size Fabric Capacity from DBU Consumption¶

python 01_assessment.py --command capacity-recommendation \
    --inventory ./assessment-output/ \
    --dbu-30d ./assessment-output/dbu_consumption_30d.csv

Output: F-SKU recommendation with rationale.

DBU → CU Rough Sizing (Starting Point — Validate Empirically)¶

Databricks Workload (avg active DBU/hr)	Approximate Fabric F-SKU	Notes
~1 DBU/hr (small dev)	F8	One small team
~4 DBU/hr	F16	Single project, light Power BI
~10 DBU/hr	F32	Mid-size team, Bronze/Silver/Gold
~25 DBU/hr	F64	POC/production baseline
~50 DBU/hr	F128	Multi-team, RTI workloads
~100 DBU/hr	F256	Enterprise scale
~200 DBU/hr	F512	Multi-LOB
~400 DBU/hr	F1024	Federation / very large estate
~800 DBU/hr	F2048	Top-tier enterprise

⚠️ Gotcha: This table is a starting point only. Photon-DBU and serverless-SQL-DBU consume different ratios in Databricks; in Fabric all draw from the same CU pool plus Power BI users. Always measure during coexistence and adjust before cutover.

💡 Tip: Use capacity planning best practices for the full sizing methodology and CU smoothing patterns.

Step 13 — Validate Migration¶

Validation runs four categories of checks: row counts, content hashes, query parity, and ML model parity.

python 06_validation_suite.py \
    --source-databricks-host "$DATABRICKS_HOST" \
    --source-databricks-token "$DATABRICKS_TOKEN" \
    --target-workspace "$FABRIC_WS_ID" \
    --tables-yaml ./tables-to-validate.yaml \
    --models-yaml ./models-to-validate.yaml \
    --output-report ./validation-report.html

Four-Tier Validation¶

Tier 1: ROW COUNT       ──▶  count(*) match per table
Tier 2: CONTENT HASH    ──▶  sha2(concat_ws('|', columns)) match per row
Tier 3: QUERY PARITY    ──▶  representative aggregations match within tolerance
Tier 4: MODEL PARITY    ──▶  predictions on hold-out match within tolerance

Check	Pass Threshold	Action on Fail
Row count	exact match	Re-run incremental load, check filter predicates
Content hash	100% match	Investigate type conversion / NULL handling / collation
Aggregation parity	within 0.001%	Type precision review
Query latency	within 2× source p95	Tune V-Order, partitioning, NEE-eligible rewrites
Model prediction parity (classification)	exact match	Investigate Photon-only function in featurization
Model prediction parity (regression)	within 1e-6	Check float ordering; usually safe

✅ Verification: All four tiers pass for ≥ 95% of in-scope assets before declaring migration complete.

Step 14 — Coexistence Period (2-4 Weeks)¶

During coexistence: - Both Databricks and Fabric read the same OneLake-shortcut Delta tables - Writes go to one truth — pick Fabric early; replicate read-only mirror to Databricks if needed - BI repoints to Fabric (Direct Lake); Databricks SQL warehouses suspended for cost savings - Daily validation report confirms parity

💡 Tip: Pause expensive Databricks job clusters and DBSQL warehouses during coexistence to halve coexistence-period cost. They're not running production traffic — Fabric is.

Step 15 — Cutover¶

When validation has been green for 7 consecutive days:

Freeze Databricks — pause Workflows, set Unity Catalog tables to read-only
Final delta sync — capture last incremental changes
Switch BI — repoint Power BI workspaces to Fabric semantic models / Direct Lake
Switch consumers — apps, APIs, dbt projects to Fabric endpoints
Switch ML serving — re-route inference traffic to Fabric Real-Time Endpoints
Monitor for 48 hours — close on-call bridge after no incidents
Decommission (Step 16)

Step 16 — Decommission Databricks¶

# After 30-day grace period with all validation green:

# 1. Disable all Workflows
databricks jobs list --output JSON | jq -r '.jobs[].job_id' \
  | xargs -I {} databricks jobs delete --job-id {}

# 2. Stop and pin DBSQL warehouses (no auto-restart)
databricks warehouses list --output JSON | jq -r '.warehouses[].id' \
  | xargs -I {} databricks warehouses stop --id {}

# 3. Backup workspace metadata for audit
python 07_backup_databricks_metadata.py \
    --databricks-host "$DATABRICKS_HOST" \
    --databricks-token "$DATABRICKS_TOKEN" \
    --output ./databricks-backup/

# 4. After 90 more days, delete clusters + DBSQL warehouses
# 5. After 180 days total, delete workspace if no other workloads remain

⚠️ Gotcha: ADLS Gen2 / S3 storage referenced by OneLake shortcuts is NOT decommissioned — Fabric reads in place. Only the Databricks compute + control plane is removed. If you also want to retire the storage account, copy data to OneLake first via Step 3 Option B.

✅ Verification (Final Checklist)¶

🧹 Cleanup¶

This tutorial creates assessment artifacts in ./assessment-output/, conversion artifacts in ./fabric-notebooks/ and ./fabric-pipelines/, and a metadata backup in ./databricks-backup/. Retain for audit; archive to long-term storage.

If your assessment ran against a real Databricks workspace, the read-only API calls leave no trace. No rollback needed.

🚦 Next Steps¶

Tutorial 43 — Redshift → Fabric — for shops also retiring AWS Redshift
Tutorial 44 — BigQuery → Fabric — for GCP exits
Migration Patterns Best Practices — cross-source patterns
Spark Runtime Migration — DBR → Fabric Spark Runtime details
OneLake Iceberg Interop — UniForm and cross-engine reads
dbt Fabric Integration — for shops migrating dbt projects
fabric-cicd Deployment — automate the deployment of migrated artifacts

🛠️ Troubleshooting¶

Issue	Cause	Resolution
OneLake shortcut shows 0 rows from ADLS	Workspace Identity lacks Storage Blob Data Reader	Grant role on storage account or container
`dbutils` is undefined in Fabric notebook	Databricks-specific module	Apply shim from Step 4 OR replace with `mssparkutils`
`%conda install` fails in Fabric	`%conda` not supported	Use Environment file conda dependencies, or `%pip install`
Photon-only function fails on Fabric	Function not in NEE coverage	Rewrite using portable Spark SQL; flag in assessment
MLflow import errors on `models/<n>/MLmodel`	Stage encoding differences across MLflow versions	Pin `mlflow-export-import` to same major MLflow version on both sides
DLT `expect_or_fail` has no equivalent	DLT-specific quality control	Add Great Expectations checkpoint upstream; or `CHECK` constraint
Row counts off by ~1% post-migration	Time-zone drift on partitioned timestamps	Cast partition columns to `TIMESTAMP_NTZ` explicitly
Direct Lake fallback to DirectQuery	Schema not V-Ordered or unsupported types	Run `OPTIMIZE table VORDER`; check VARCHAR(MAX) usage
Cluster init script broke on migration	Init scripts not portable	Repackage as Fabric Environment custom library
F-SKU throttles after cutover	Concurrent BI users + Spark + ML draw same CU pool	Scale F-SKU; review throttling runbook

📁 Key Files Referenced¶

Step	File
1, 2, 12	`tutorials/42-databricks-to-fabric/01_assessment.py` (planned, batch 4b)
3	`tutorials/42-databricks-to-fabric/01_delta_compatibility.py` (planned, batch 4b)
4	`tutorials/42-databricks-to-fabric/02_notebook_migration.py` (planned, batch 4b)
5	`tutorials/42-databricks-to-fabric/02_workflow_migration.md` (planned, batch 4b)
7	`tutorials/42-databricks-to-fabric/03_mlflow_migration.py` (planned, batch 4b)
8	`tutorials/42-databricks-to-fabric/04_dbsql_ddl_conversion.py` (planned, batch 4b)
11	`tutorials/42-databricks-to-fabric/05_unity_catalog_export.py` (planned, batch 4b)
13	`tutorials/42-databricks-to-fabric/06_validation_suite.py` (planned, batch 4b)
16	`tutorials/42-databricks-to-fabric/07_backup_databricks_metadata.py` (planned, batch 4b)
All	`data_generation/generators/migration/databricks_workload_inventory.py` (planned)

📚 References¶

Databricks Documentation¶

Microsoft Documentation¶

⬆️ Back to Top | 📚 Tutorial Index | 🏠 Home

← Previous41 — Synapse → FabricRead more →Next →Workflow MigrationRead more →

🧱 Tutorial 42: Databricks → Microsoft Fabric Migration¶

🧱 Tutorial 42: Databricks → Microsoft Fabric Migration¶

📊 Progress Tracker¶

📖 Overview¶

Comparing Databricks and Fabric¶

What This Tutorial Covers¶

🎯 Learning Objectives¶

🏗️ Reference Architecture¶

Component Mapping (the canonical reference)¶

📋 Prerequisites¶

🚀 Step-by-Step¶

Step 1 — Assess Your Databricks Workspace¶

Step 2 — Plan Migration Wave Order¶

Step 3 — Migrate Delta Tables to OneLake¶

Option A — OneLake Shortcut (zero-copy, recommended)¶

Option B — Rewrite for V-Order¶

OPTIMIZE / Z-ORDER Differences¶

Step 4 — Convert Databricks Notebooks¶

Conversion Checklist¶

Code-Level Shim¶

Step 5 — Migrate Databricks Workflows → Fabric Data Pipelines¶

Task / Activity Compatibility¶

Step 6 — Migrate Delta Live Tables → Materialized Lake Views¶

SQL DLT → Materialized Lake View¶

Python DLT → Scheduled Notebook¶

Step 7 — Migrate MLflow Models¶

ML Model Parity Validation¶

Step 8 — Migrate DBSQL → Fabric Warehouse / SQL Endpoint¶

Step 9 — Iceberg Interoperability (UniForm Tables)¶

Step 10 — Photon vs Fabric Spark — Performance Considerations¶

Step 11 — Unity Catalog → OneLake Catalog (Governance Migration)¶

Step 12 — Size Fabric Capacity from DBU Consumption¶

DBU → CU Rough Sizing (Starting Point — Validate Empirically)¶

Step 13 — Validate Migration¶

Four-Tier Validation¶

Step 14 — Coexistence Period (2-4 Weeks)¶

Step 15 — Cutover¶

Step 16 — Decommission Databricks¶

✅ Verification (Final Checklist)¶

🧹 Cleanup¶

🚦 Next Steps¶

🛠️ Troubleshooting¶

📁 Key Files Referenced¶

📚 References¶

Databricks Documentation¶

Microsoft Documentation¶

Related Tutorials & Docs¶