🧪 Tutorial 57 — End-to-End Test Plan¶
Walk this checklist to validate both paths end-to-end: (1) the Databricks Better Together tutorial, and (2) the Hitchhiker's Guide notebooks.
Each step has explicit expected output so you can stop early if something diverges.
Third-party references — publicly sourced, good-faith comparison
This page references non-Microsoft products and services. That information is drawn from each vendor's publicly available documentation and is offered for honest, good-faith comparison only. This is a personal project written from a Microsoft Fabric and Azure perspective; it does not claim expertise in, or authority over, any third-party product, and nothing here is an official statement by, or endorsed by, those vendors. Capabilities, pricing, and features change often — always verify against the vendor's current official documentation. Where a third-party offering is the stronger choice, we say so plainly.
Path 1 — Tutorial 57 happy path¶
Phase 0 — Sample data¶
- Run
python tutorials/57-databricks-better-together/scripts/generate_sample_data.py. - Expect:
sample-data/57-better-together/retail/*.parquet(5 files) +sample-data/57-better-together/personas/{users,groups}.csv. - Expect: deterministic output — re-running produces byte-identical files (seed=57).
Phase 1 — Azure scaffolding¶
-
az login --use-device-code --tenant <tenant-id>. -
az account set --subscription <sub-id>. -
az deployment sub what-if --location eastus2 \ --template-file infra/main.bicep \ --parameters infra/dev.bicepparam. - Expect: planned resources = 1 RG, 1 KV, 1 storage account; 0 Databricks workspaces (because
deployDatabricks=false). - If satisfied,
az deployment sub create ...with the same args.
Phase 2 — Unity Catalog¶
- In Databricks: open
notebooks/setup/00_create_unity_catalog.pyon a UC-enabled cluster, run all cells. - Expect:
SHOW SCHEMAS IN better_togetherlistsretail_raw,retail_curated,retail_secure. - Expect:
SHOW VOLUMES IN better_together.retail_rawlistslanding.
Phase 3 — Load data¶
- Upload
sample-data/57-better-together/retail/*.parquetto/Volumes/better_together/retail_raw/landing/retail/. - Run
notebooks/setup/01_load_sample_data.py. - Expect: five tables in
retail_raw(customers,products,orders,order_lines,returns). - Expect: two views in
retail_secure(orders_by_region,audit_revenue_summary). - As a logged-in user not in any
grp-sales-mgr-*:SELECT COUNT(*) FROM retail_secure.orders_by_region→ 0.
Phase 4 — Mirroring¶
- In Fabric: import
notebooks/mirroring/*.pyinto a workspace bound to F-SKU capacity. - Make sure the Databricks connection ID is stored in Key Vault as
fabric-databricks-connection-id. - Run
01_register_full_catalog_mirror.py. - Expect: HTTP 201 or 202; new item
MirrorDBX_FullCatalogappears in the workspace; tables visible underTables/MirroredAzureDatabricksCatalog/retail_raw/*. - Run
02_register_partial_mirror.py. - Expect: two more items —
MirrorDBX_Inclusion(only theretail_secure.*views) andMirrorDBX_Exclusion(everything exceptretail_raw.customers). - Run
03_query_mirror_from_spark.py→ confirm row counts match what you saw in Databricks.
Phase 5 — Gold + semantic model¶
- Create
lh_btfabric_goldlakehouse in the same workspace. - Run
notebooks/gold/01_gold_star_schema.py(attach the gold lakehouse). - Expect: six tables in
lh_btfabric_gold(dim_region, dim_customer, dim_product, dim_date, fact_sales, fact_returns). - Expect: the assert at the end of the notebook passes ("OK — fact_sales → dim_customer integrity holds").
- In Power BI Desktop: open the
semantic-model/folder as a.pbipproject; publish to the workspace.
Phase 6 — Defense-in-depth automation¶
- Upload
sample-data/57-better-together/personas/*.csvtoFiles/57-better-together/personas/in the gold lakehouse. - Run
notebooks/security/01_apply_defense_in_depth.py. - Expect: ~13 Entra groups created (or reused), workspace role assignments logged, OneLake data access roles PUT returns 200,
rls_user_region_mapDelta table populated with ~8 rows. - Manual step: set the Fixed Identity on the published semantic model.
Phase 7 — Power BI persona testing¶
For each of the three reports, log in as one of the synthetic-user UPNs (or use View as → Other user) and verify the row counts below match.
| Persona | Report | Expected fact_sales row count |
|---|---|---|
grp-sales-mgr-us-east user | Regional Sales | only US-EAST rows |
grp-sales-mgr-emea user | Regional Sales | only EMEA rows |
grp-finance user | Finance Performance | all rows, customer_id not visible |
grp-exec user | Executive Scorecard | all rows, all columns |
grp-audit user | (any) | aggregate views only |
Path 2 — Hitchhiker's Guide validation¶
Each Hitchhiker's notebook is a flat list of recipes. The validation goal is syntax correctness + runnability on a clean workspace, not exhaustive output comparison.
- Import all 7
notebooks/hitchhikers-guide/*.pyfiles into a Fabric workspace. - Open
00_index.py→ run the "Pre-flight" cells. Expect: a workspace name, a notebook ID, a Spark version printed.
For each subsequent notebook, run the cells that don't require external infrastructure (sections marked with 🔗 link to Learn but the snippet itself is self-contained):
| Notebook | Cells to run | Skip cells that |
|---|---|---|
01_connectivity.py | A (ADLS mount), N (Fabric REST /workspaces) | hit Snowflake / on-prem / mirror items unless those exist in your environment |
02_lakehouse_warehouse_ops.py | All A, B, D, F, G | I (Warehouse from Spark) unless you have a Warehouse |
03_security_identity.py | F, G, H, I — token & secret cells | A (PUT data access roles) unless you have a target lakehouse |
04_admin_governance.py | A (boilerplate), I (tenant settings read) | C, F, G unless you have a fresh workspace to mutate |
05_automation_utilities.py | All — every cell is pure utility | none |
06_troubleshooting.py | All self-diagnostic cells | none |
Expected: every executed cell returns without an unhandled exception.
Path 3 — Static validation (run from the host)¶
These checks run from your dev machine and don't touch Fabric/Databricks at all.
# 1. Generators produce identical output across runs (determinism)
python tutorials/57-databricks-better-together/scripts/generate_sample_data.py
md5sum sample-data/57-better-together/retail/*.parquet > /tmp/run1.md5
rm -rf sample-data/57-better-together
python tutorials/57-databricks-better-together/scripts/generate_sample_data.py
md5sum sample-data/57-better-together/retail/*.parquet > /tmp/run2.md5
diff /tmp/run1.md5 /tmp/run2.md5 # expect empty diff
# 2. Bicep static build
az bicep build --file infra/modules/databricks/databricks-workspace.bicep
az bicep build --file infra/modules/security/key-vault.bicep
az bicep build --file tutorials/57-databricks-better-together/infra/main.bicep
# expect: warnings about a new Bicep release only; no errors
# 3. Notebook Python syntax check (parses every .py as Python, ignoring magics)
python -m py_compile \
tutorials/57-databricks-better-together/notebooks/setup/00_create_unity_catalog.py \
tutorials/57-databricks-better-together/notebooks/setup/01_load_sample_data.py \
tutorials/57-databricks-better-together/notebooks/mirroring/*.py \
tutorials/57-databricks-better-together/notebooks/gold/*.py \
tutorials/57-databricks-better-together/notebooks/security/*.py \
notebooks/hitchhikers-guide/*.py
# expect: silent (0 exit)
# 4. Existing test suite still passes
pytest validation/unit_tests/ -q
# expect: same count as before this PR; no regressions
# 5. Docs site builds without broken links
mkdocs build --strict
# expect: 0 warnings, 0 errors
Acceptance criteria¶
This tutorial is accepted when:
- All Path 3 static checks pass.
- At least one persona per role (Regional Manager, Finance, Exec) has been validated via "View as" in Power BI with the expected row filtering.
- The defense-in-depth doc cross-links resolve from
mkdocs build --strict. - The PR has no merge conflicts with
main.