🤝 Tutorial 57: Databricks Better Together with Fabric¶
Last Updated: 2026-05-28 | Version: 1.0 Status: ✅ Final | Maintainer: Documentation Team
Third-party references — publicly sourced, good-faith comparison
This page references non-Microsoft products and services. That information is drawn from each vendor's publicly available documentation and is offered for honest, good-faith comparison only. This is a personal project written from a Microsoft Fabric and Azure perspective; it does not claim expertise in, or authority over, any third-party product, and nothing here is an official statement by, or endorsed by, those vendors. Capabilities, pricing, and features change often — always verify against the vendor's current official documentation. Where a third-party offering is the stronger choice, we say so plainly.
🤝 Tutorial 57: Databricks Better Together — Mirroring, Security, and the Defense-in-Depth Story¶
| Difficulty | ⭐⭐⭐⭐ Advanced |
| Time | ⏱️ 3–4 hours (longer if you also build the Power BI reports) |
| Focus | Cross-platform integration + multi-layer security |
| Navigation | |
|---|---|
| ⬅️ Previous | 56 — Informatica → Fabric |
| ➡️ Next | (end of tutorial set — see Notebook Guides) |
📖 Overview¶
This is not another "migrate from Databricks to Fabric" tutorial — that's already in tutorial 42. This tutorial covers the opposite posture: you want to keep both platforms, with Databricks owning Unity-Catalog–governed transformation and Fabric owning analytics + BI on the same data, without copying anything.
You'll learn how to:
- 🔄 Register a Databricks mirror in all three configurations (full / inclusion list / exclusion list).
- 🧭 Use the new Hitchhiker's Guide to Fabric notebooks as a connectivity reference for ADLS / S3 / GCS / on-prem / Snowflake / etc.
- 🛡️ Apply defense-in-depth security across all eight Fabric layers, with one notebook that automates the lot.
- 📊 Publish a Direct Lake semantic model with three RLS roles and build three Power BI reports demonstrating progressively stricter access.
🎯 Learning Objectives¶
By the end you will be able to:
- Choose between the four "Databricks ↔ Fabric" integration shapes.
- Register a Mirrored Azure Databricks Catalog item programmatically.
- Configure OneLake RLS + CLS (Preview) against a mirrored item.
- Explain when to use OneLake security vs Warehouse security vs semantic-model security.
- Avoid the Direct Lake → DirectQuery fallback trap.
- Automate every defense-in-depth layer from a single notebook.
🗺️ Architecture¶
flowchart LR
subgraph DBX["🧱 Databricks (existing)"]
UC[(Unity Catalog<br/><b>better_together</b>)]
STG["retail_raw<br/>retail_curated<br/>retail_secure"]
UC --> STG
end
subgraph FAB["🏠 Fabric Workspace"]
MIR["Mirrored Databricks<br/>Catalog item<br/>(zero-copy shortcut)"]
LH[("lh_btfabric_gold<br/>Lakehouse")]
WH[(wh_btfabric_gold<br/>Warehouse)]
SM["Direct Lake<br/>Semantic Model"]
RP1["📊 Regional Sales"]
RP2["📊 Finance Performance"]
RP3["📊 Executive Scorecard"]
MIR --> LH
LH --> SM
WH --> SM
SM --> RP1
SM --> RP2
SM --> RP3
end
UC --> MIR
classDef good fill:#1B5E20,stroke:#fff,color:#fff
classDef secure fill:#0D47A1,stroke:#fff,color:#fff
class MIR,LH,WH,SM,RP1,RP2,RP3 good
class UC,STG secure The mirror is shortcut-backed — no data is copied. The semantic model is Direct Lake — no import, no DAX→SQL translation as long as the Warehouse RLS/CLS trap is avoided (see defense-in-depth doc).
📋 Prerequisites¶
- Completed Tutorial 00 — Environment Setup.
- Completed Tutorial 08 — Database Mirroring (background only).
- Azure Databricks workspace, Premium SKU, with Unity Catalog enabled and a metastore you can
CREATE CATALOGon. - Fabric workspace on F-SKU or trial.
- Ability to create Entra ID groups (you'll need delegated
Group.ReadWrite.Allor an SP withGroup.Create). - Azure Key Vault for storing the Databricks connection ID + SP client secret.
💡 You do not need to deploy a fresh Databricks workspace — the included Bicep gates the workspace module behind
deployDatabricks = falseby default. See infra/main.bicep.
🧭 Tutorial map¶
00. Generate sample data on the host
01. Deploy Azure scaffolding (Key Vault + landing storage; DBW optional)
02. Set up Databricks Unity Catalog estate (notebooks/setup/00)
03. Load sample data into UC (notebooks/setup/01)
04. Register the Fabric mirror — Pattern A (notebooks/mirroring/01)
05. Register the Fabric mirror — Patterns B & C (notebooks/mirroring/02)
06. Query the mirror from Spark, T-SQL, sempy (notebooks/mirroring/03)
07. Compare with shortcuts / Iceberg / Open Mirroring (04)
08. Build the Gold star schema (notebooks/gold/01)
09. Publish the Direct Lake semantic model (semantic-model/)
10. Run the defense-in-depth automation (notebooks/security/01)
11. Build the three Power BI reports (semantic-model/README.md)
12. Validate using the test plan (TEST_PLAN.md)
🚀 Step 1 — Generate sample data¶
Outputs land in sample-data/57-better-together/:
retail/— five parquet files (customers, products, orders, order_lines, returns)personas/—users.csv+groups.csvfor the security automation later
Now upload retail/ to a Databricks UC volume of your choice; we recommend /Volumes/better_together/retail_raw/landing/retail/.
🏗️ Step 2 — Deploy Azure scaffolding¶
az login --use-device-code --tenant <your-tenant-id>
az account set --subscription <your-subscription-id>
az deployment sub create \
--location eastus2 \
--template-file tutorials/57-databricks-better-together/infra/main.bicep \
--parameters tutorials/57-databricks-better-together/infra/dev.bicepparam
💡 Run
--what-iffirst to confirm the blast radius:
The dev parameter file uses deployDatabricks = false. Set it to true only if you don't already have a Premium DBW workspace.
🧱 Step 3 — Unity Catalog estate¶
Open notebooks/setup/00_create_unity_catalog.py in your Databricks workspace and run it on a UC-enabled cluster. Creates:
- Catalog:
better_together - Schemas:
retail_raw,retail_curated,retail_secure - Volume:
retail_raw.landing
📥 Step 4 — Load sample data¶
Run notebooks/setup/01_load_sample_data.py — populates the five raw tables and creates two dynamic views in retail_secure that filter by region via is_account_group_member(). These are the same views the inclusion-list mirror will surface in Fabric.
🔄 Step 5 — Register the Databricks mirror in Fabric¶
Three notebooks, run in order from a Fabric notebook:
| File | Pattern | Outcome |
|---|---|---|
notebooks/mirroring/01_register_full_catalog_mirror.py | Full catalog | All schemas + tables surfaced. |
notebooks/mirroring/02_register_partial_mirror.py | Inclusion and exclusion list | Two more mirror items for comparison. |
notebooks/mirroring/03_query_mirror_from_spark.py | (no register) | Read the mirror from Spark, T-SQL, sempy. |
notebooks/mirroring/04_compare_mirror_vs_shortcut_vs_iceberg.py | (reference) | Side-by-side decision matrix. |
⚠️ Per Microsoft's publicly documented mirroring behavior, Unity Catalog row filters / column masks are not carried through the mirror — you re-author security in Fabric. That's exactly what Step 8 does. (Verify against the current Databricks and Fabric docs, as both evolve.)
🏗️ Step 6 — Gold star schema¶
notebooks/gold/01_gold_star_schema.py builds:
dim_region,dim_customer(PII-stripped),dim_product,dim_datefact_sales(order-line grain),fact_returns(same grain)
…all as Delta tables in lh_btfabric_gold.
📐 Step 7 — Publish the semantic model¶
Open semantic-model/model.tmdl in Power BI Desktop (March 2026+) → File → Open report → select the parent folder. Save as a .pbip project, then publish to your Fabric workspace.
See semantic-model/README.md for the three reports + the fixed-identity refresh pattern.
🛡️ Step 8 — Apply defense-in-depth¶
Run notebooks/security/01_apply_defense_in_depth.py from a Fabric notebook with sufficient permissions (Graph Group.ReadWrite.All, Fabric workspace Admin, Lakehouse OneLake-security writer, Warehouse db_owner).
The notebook is idempotent — safe to re-run; existing groups/roles/ assignments are detected and skipped.
It configures:
- Entra ID groups for every persona (Microsoft Graph)
- Fabric workspace role assignments
- OneLake security roles — RLS (GA) + CLS (Preview)
- Warehouse RLS + DDM
- Fixed identity instructions for semantic model refresh
- The
rls_user_region_mapDelta table that powers the dynamic role
See docs/best-practices/security/onelake-defense-in-depth.md for the conceptual model + the OneLake-vs-everything comparison.
✅ Step 9 — Validate¶
Walk the TEST_PLAN.md checklist. Every step has expected output you can verify before declaring victory.
🧠 Big ideas¶
-
"Databricks mirroring" is catalog-shaped, not table-shaped. Pick full / inclusion / exclusion posture per workspace, not per table.
-
Mirror data is zero-copy. No replication delay, no storage cost beyond the free-mirror entitlement.
-
UC security does NOT survive the mirror. Re-author in Fabric.
-
OneLake security is the only cross-engine layer. RLS GA, CLS Preview.
-
Warehouse RLS/CLS triggers Direct Lake → DirectQuery fallback. If you want Direct Lake performance, push enforcement to OneLake.
-
SPNs cannot be RLS/OLS members. Use Fixed Identity for refresh.
-
notebookutilsis the 2026 namespace.mssparkutilsstill works but is deprecated;dbutilsdoes not exist in Fabric.
🗂️ Files in this tutorial¶
tutorials/57-databricks-better-together/
README.md ← you are here
TEST_PLAN.md ← end-to-end manual test checklist
infra/
main.bicep ← subscription-scope orchestrator
dev.bicepparam ← dev environment parameters
notebooks/
setup/
00_create_unity_catalog.py
01_load_sample_data.py
mirroring/
01_register_full_catalog_mirror.py
02_register_partial_mirror.py
03_query_mirror_from_spark.py
04_compare_mirror_vs_shortcut_vs_iceberg.py
gold/
01_gold_star_schema.py
security/
01_apply_defense_in_depth.py
semantic-model/
model.tmdl
README.md
scripts/
generate_sample_data.py
Companion files outside this directory:
docs/best-practices/security/onelake-defense-in-depth.md— defense mapnotebooks/hitchhikers-guide/— six persona-based cheat-sheet notebooksdata_generation/generators/better_together/— sample-data generatorsinfra/modules/databricks/databricks-workspace.bicep— DBW moduleinfra/modules/security/key-vault.bicep— Key Vault module
📓 Open the notebooks¶
The docs site renders this page, not the
.pynotebooks. The links below open each notebook's source on GitHub — download/copy it, then import into your workspace (Databricks for setup, Fabric for mirroring/gold/security).
| Notebook | Runs in | Purpose |
|---|---|---|
setup/00_create_unity_catalog.py | Databricks | UC catalog + schemas + volume |
setup/01_load_sample_data.py | Databricks | Load 5 Delta tables + secure views |
mirroring/01_register_full_catalog_mirror.py | Fabric | Full catalog mirror (REST) |
mirroring/02_register_partial_mirror.py | Fabric | Inclusion + exclusion-list mirrors |
mirroring/03_query_mirror_from_spark.py | Fabric | Read the mirror (Spark / T-SQL / sempy) |
mirroring/04_compare_mirror_vs_shortcut_vs_iceberg.py | Fabric | Decision matrix |
gold/01_gold_star_schema.py | Fabric | Direct-Lake star schema from the mirror |
security/01_apply_defense_in_depth.py | Fabric | Entra groups, OneLake RLS/CLS, Warehouse RLS/DDM |