Day 3 — Transform & Lakehouse & Warehouse (Commercial CoE)¶
Track: 5-Day Commercial CoE Workshop · Day 3 of 5 · Transform & Lakehouse & Warehouse
Day 3 builds a medallion (Bronze → Silver → Gold) on the chosen workload, runs SQL over the lakehouse via a Databricks SQL Warehouse (Photon), and explores real-time data with KQL over ADX.
Azure-native by default
Transform = Databricks notebooks + dbt; warehouse = Databricks SQL Warehouse (Photon) on Commercial, or Synapse dedicated pool; real-time = Azure Data Explorer. No Fabric Warehouse, no OneLake.
Learning objectives¶
- Build a medallion pipeline on a real workload.
- Use Databricks notebooks + dbt with Git source control.
- Query Gold via Databricks SQL Warehouse (Photon).
- Explore streaming data with KQL over ADX.
- Schedule transforms as recurring jobs.
Facilitator guide¶
Timing (8-hour day)¶
| Time | Activity | Mode |
|---|---|---|
| 09:00 | Day-2 recap + medallion architecture | Lecture |
| 09:30 | Bronze → Silver (notebook + dbt) | Lab |
| 10:45 | Break | — |
| 11:00 | Silver → Gold (dbt business aggregates) | Lab |
| 12:00 | Lunch | — |
| 13:00 | Databricks SQL Warehouse query over Gold (Photon) | Lab |
| 14:00 | KQL exploration over the IoT stream (ADX) | Lab |
| 15:00 | Break | — |
| 15:15 | Schedule the medallion as a recurring job | Lab |
| 16:15 | Commit to Git + wrap-up | Plenary |
Talking points¶
- Medallion on Azure-native: same Bronze/Silver/Gold Delta pattern as the Federal track. On Commercial, Photon accelerates the SQL Warehouse reads over Gold.
- dbt + Git: transformations as code; identical to the Fabric forward- migration artifact (Day 5 — demonstrated live on Commercial).
- Warehouse choice: Databricks SQL Warehouse (Photon) is the commercial default; Synapse dedicated pool remains available. No Fabric Warehouse needed.
Exercises¶
- Pairs review each other's Silver layer for correct vertical data handling (PHI/PCI masking before Gold).
- Group compares a Photon vs non-Photon query on the same Gold table.
Common pitfalls¶
- SQL Warehouse stopped/auto-terminated → resume it (or rely on auto-start) and retry.
- Un-partitioned Gold → slow reads; partition by query dimension.
Participant lab — build the medallion¶
- Bronze → Silver. In Notebook (
/notebook), readbronze.sensor_readings, cleanse, conform timestamps, writesilver.sensor_readings_clean. Commit to Git. - Silver → Gold (dbt). Add a dbt model →
gold.device_hourly; rundbt run+dbt test. - Warehouse query. In Warehouse (
/warehouse), run a top-10 query overgold.device_hourlyon the Databricks SQL Warehouse; confirm Photon plan. - KQL over ADX. In Realtime hub (
/realtime-hub), run a KQL timechart over the ingested IoT events. - Schedule. Create a recurring nightly job for the medallion; confirm the schedule.
Validation (Day-3 done): Silver + Gold exist, dbt tests pass, a Photon warehouse query returns Gold rows, a KQL chart renders, medallion scheduled.
Datasets¶
- Synthetic IoT — medallion source.
- Synthetic financial transactions or synthetic clinical encounters — vertical-specific alternates.
Homework¶
- Commit transform notebooks + dbt models to the customer Git repo.
Commercial-specific emphasis¶
- Databricks SQL Warehouse (Photon) as the warehouse backend.
- Vertical masking: PHI (HIPAA), PCI, GDPR-controlled fields masked before Gold promotion.
Slide deck¶
make loom-decks DECK=docs/fiab/workshops/5-day-commercial-coe/day-3-transform.md.
Related¶
- ← Day 2 · Day 4 — BI & AI →
- Federal CoE Day 3 — sibling variant
- Lakehouse workload