🧪 Tutorial 57 — End-to-End Test Plan¶

Walk this checklist to validate both paths end-to-end: (1) the Databricks Better Together tutorial, and (2) the Hitchhiker's Guide notebooks.

Each step has explicit expected output so you can stop early if something diverges.

Third-party references — publicly sourced, good-faith comparison

This page references non-Microsoft products and services. That information is drawn from each vendor's publicly available documentation and is offered for honest, good-faith comparison only. This is a personal project written from a Microsoft Fabric and Azure perspective; it does not claim expertise in, or authority over, any third-party product, and nothing here is an official statement by, or endorsed by, those vendors. Capabilities, pricing, and features change often — always verify against the vendor's current official documentation. Where a third-party offering is the stronger choice, we say so plainly.

Path 1 — Tutorial 57 happy path¶

Phase 0 — Sample data¶

Run python tutorials/57-databricks-better-together/scripts/generate_sample_data.py.
Expect: sample-data/57-better-together/retail/*.parquet (5 files) + sample-data/57-better-together/personas/{users,groups}.csv.
Expect: deterministic output — re-running produces byte-identical files (seed=57).

Phase 1 — Azure scaffolding¶

az login --use-device-code --tenant <tenant-id>.
az account set --subscription <sub-id>.
az deployment sub what-if --location eastus2 \ --template-file infra/main.bicep \ --parameters infra/dev.bicepparam.
Expect: planned resources = 1 RG, 1 KV, 1 storage account; 0 Databricks workspaces (because deployDatabricks=false).
If satisfied, az deployment sub create ... with the same args.

Phase 2 — Unity Catalog¶

In Databricks: open notebooks/setup/00_create_unity_catalog.py on a UC-enabled cluster, run all cells.
Expect: SHOW SCHEMAS IN better_together lists retail_raw, retail_curated, retail_secure.
Expect: SHOW VOLUMES IN better_together.retail_raw lists landing.

Phase 3 — Load data¶

Upload sample-data/57-better-together/retail/*.parquet to /Volumes/better_together/retail_raw/landing/retail/.
Run notebooks/setup/01_load_sample_data.py.
Expect: five tables in retail_raw (customers, products, orders, order_lines, returns).
Expect: two views in retail_secure (orders_by_region, audit_revenue_summary).
As a logged-in user not in any grp-sales-mgr-*: SELECT COUNT(*) FROM retail_secure.orders_by_region → 0.

Phase 4 — Mirroring¶

In Fabric: import notebooks/mirroring/*.py into a workspace bound to F-SKU capacity.
Make sure the Databricks connection ID is stored in Key Vault as fabric-databricks-connection-id.
Run 01_register_full_catalog_mirror.py.
Expect: HTTP 201 or 202; new item MirrorDBX_FullCatalog appears in the workspace; tables visible under Tables/MirroredAzureDatabricksCatalog/retail_raw/*.
Run 02_register_partial_mirror.py.
Expect: two more items — MirrorDBX_Inclusion (only the retail_secure.* views) and MirrorDBX_Exclusion (everything except retail_raw.customers).
Run 03_query_mirror_from_spark.py → confirm row counts match what you saw in Databricks.

Phase 5 — Gold + semantic model¶

Create lh_btfabric_gold lakehouse in the same workspace.
Run notebooks/gold/01_gold_star_schema.py (attach the gold lakehouse).
Expect: six tables in lh_btfabric_gold (dim_region, dim_customer, dim_product, dim_date, fact_sales, fact_returns).
Expect: the assert at the end of the notebook passes ("OK — fact_sales → dim_customer integrity holds").
In Power BI Desktop: open the semantic-model/ folder as a .pbip project; publish to the workspace.

Phase 6 — Defense-in-depth automation¶

Upload sample-data/57-better-together/personas/*.csv to Files/57-better-together/personas/ in the gold lakehouse.
Run notebooks/security/01_apply_defense_in_depth.py.
Expect: ~13 Entra groups created (or reused), workspace role assignments logged, OneLake data access roles PUT returns 200, rls_user_region_map Delta table populated with ~8 rows.
Manual step: set the Fixed Identity on the published semantic model.

Phase 7 — Power BI persona testing¶

For each of the three reports, log in as one of the synthetic-user UPNs (or use View as → Other user) and verify the row counts below match.

Persona	Report	Expected `fact_sales` row count
`grp-sales-mgr-us-east` user	Regional Sales	only US-EAST rows
`grp-sales-mgr-emea` user	Regional Sales	only EMEA rows
`grp-finance` user	Finance Performance	all rows, `customer_id` not visible
`grp-exec` user	Executive Scorecard	all rows, all columns
`grp-audit` user	(any)	aggregate views only

Path 2 — Hitchhiker's Guide validation¶

Each Hitchhiker's notebook is a flat list of recipes. The validation goal is syntax correctness + runnability on a clean workspace, not exhaustive output comparison.

Import all 7 notebooks/hitchhikers-guide/*.py files into a Fabric workspace.
Open 00_index.py → run the "Pre-flight" cells. Expect: a workspace name, a notebook ID, a Spark version printed.

For each subsequent notebook, run the cells that don't require external infrastructure (sections marked with 🔗 link to Learn but the snippet itself is self-contained):

Notebook	Cells to run	Skip cells that
`01_connectivity.py`	A (ADLS mount), N (Fabric REST `/workspaces`)	hit Snowflake / on-prem / mirror items unless those exist in your environment
`02_lakehouse_warehouse_ops.py`	All A, B, D, F, G	I (Warehouse from Spark) unless you have a Warehouse
`03_security_identity.py`	F, G, H, I — token & secret cells	A (PUT data access roles) unless you have a target lakehouse
`04_admin_governance.py`	A (boilerplate), I (tenant settings read)	C, F, G unless you have a fresh workspace to mutate
`05_automation_utilities.py`	All — every cell is pure utility	none
`06_troubleshooting.py`	All self-diagnostic cells	none

Expected: every executed cell returns without an unhandled exception.

Path 3 — Static validation (run from the host)¶

These checks run from your dev machine and don't touch Fabric/Databricks at all.

# 1. Generators produce identical output across runs (determinism)
python tutorials/57-databricks-better-together/scripts/generate_sample_data.py
md5sum sample-data/57-better-together/retail/*.parquet  > /tmp/run1.md5
rm -rf sample-data/57-better-together
python tutorials/57-databricks-better-together/scripts/generate_sample_data.py
md5sum sample-data/57-better-together/retail/*.parquet  > /tmp/run2.md5
diff /tmp/run1.md5 /tmp/run2.md5   # expect empty diff

# 2. Bicep static build
az bicep build --file infra/modules/databricks/databricks-workspace.bicep
az bicep build --file infra/modules/security/key-vault.bicep
az bicep build --file tutorials/57-databricks-better-together/infra/main.bicep
# expect: warnings about a new Bicep release only; no errors

# 3. Notebook Python syntax check (parses every .py as Python, ignoring magics)
python -m py_compile \
  tutorials/57-databricks-better-together/notebooks/setup/00_create_unity_catalog.py \
  tutorials/57-databricks-better-together/notebooks/setup/01_load_sample_data.py \
  tutorials/57-databricks-better-together/notebooks/mirroring/*.py \
  tutorials/57-databricks-better-together/notebooks/gold/*.py \
  tutorials/57-databricks-better-together/notebooks/security/*.py \
  notebooks/hitchhikers-guide/*.py
# expect: silent (0 exit)

# 4. Existing test suite still passes
pytest validation/unit_tests/ -q
# expect: same count as before this PR; no regressions

# 5. Docs site builds without broken links
mkdocs build --strict
# expect: 0 warnings, 0 errors

Acceptance criteria¶

This tutorial is accepted when:

All Path 3 static checks pass.
At least one persona per role (Regional Manager, Finance, Exec) has been validated via "View as" in Power BI with the expected row filtering.
The defense-in-depth doc cross-links resolve from mkdocs build --strict.
The PR has no merge conflicts with main.

← PreviousOverviewRead more →Next →Semantic Model & ReportsRead more →