Tutorial 16: dbt Development Workflow¶
Estimated Time: 2-3 hours Difficulty: Intermediate
Build a complete dbt project from scratch on the CSA-in-a-Box platform. You will initialize a new dbt project, connect it to Databricks, create models following the medallion architecture (Bronze staging, Silver facts/dims, Gold reports), add data quality tests, generate documentation, and wire up CI/CD with GitHub Actions. By the end you will have a production-ready dbt workflow that mirrors the patterns used throughout the CSA-in-a-Box domain examples.
Prerequisites¶
Before starting, ensure you have the following installed and configured:
- Completed Tutorial 01: Foundation Platform -- Databricks workspace and ADLS storage deployed
- Python 3.11+ -- python.org
- Databricks Personal Access Token -- created in Tutorial 01, Step 8b
- Git -- git-scm.com
- GitHub CLI (
gh) -- Install guide (for Step 10) - (Optional) VS Code with the dbt Power User extension
Verify your tools:
Architecture Diagram¶
graph LR
subgraph Sources["Source Data"]
RAW[("Raw CSV / Parquet<br/>in ADLS Bronze")]
end
subgraph Init["dbt Project"]
DBT_INIT["dbt init"]
PROFILES["profiles.yml<br/>(Databricks connection)"]
end
subgraph Bronze["Bronze Layer (Staging)"]
STG["stg_orders<br/>stg_customers<br/>stg_products"]
end
subgraph Silver["Silver Layer (Core)"]
FCT["fct_order_items"]
DIM_C["dim_customers"]
DIM_P["dim_products"]
end
subgraph Gold["Gold Layer (Reports)"]
RPT["rpt_daily_sales<br/>rpt_product_performance"]
end
subgraph Quality["Data Quality"]
TESTS["Schema Tests<br/>Custom Tests<br/>Freshness Checks"]
end
subgraph Docs["Documentation"]
DBT_DOCS["dbt docs generate<br/>dbt docs serve"]
end
subgraph CICD["CI/CD"]
PR["PR: dbt build --select state:modified+"]
MERGE["Merge: dbt run --target prod"]
end
RAW --> DBT_INIT
DBT_INIT --> PROFILES
PROFILES --> STG
STG --> FCT
STG --> DIM_C
STG --> DIM_P
FCT --> RPT
DIM_C --> RPT
DIM_P --> RPT
RPT --> TESTS
RPT --> DBT_DOCS
RPT --> CICD
style Sources fill:#fff3e6,stroke:#ff8c00
style Bronze fill:#e6f0ff,stroke:#0078d4
style Silver fill:#f0e6ff,stroke:#5c2d91
style Gold fill:#e6ffe6,stroke:#107c10
style Quality fill:#ffe6e6,stroke:#d13438
style Docs fill:#f5f5f5,stroke:#737373
style CICD fill:#e6f0ff,stroke:#0078d4 Step 1: Install dbt-core and the Databricks Adapter¶
Create a virtual environment and install dbt with the Databricks adapter.
cd csa-inabox
python -m venv .venv
# Linux / macOS
source .venv/bin/activate
# Windows PowerShell
# .venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install dbt-core dbt-databricks
Verify the installation:
Expected Output
Troubleshooting¶
| Symptom | Cause | Fix |
|---|---|---|
pip install fails with SSL error | Corporate proxy | Set pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org |
dbt: command not found | dbt not on PATH | Activate the virtual environment or use python -m dbt |
| Dependency conflict | Existing packages | Use a fresh virtual environment |
Step 2: Initialize a New dbt Project¶
Scaffold a new dbt project that follows CSA-in-a-Box conventions.
When prompted:
- Which database: select
databricks - host: enter your Databricks workspace URL (without
https://) - http_path: enter your SQL Warehouse or cluster HTTP path
- token: enter your Databricks PAT
- catalog: enter
hive_metastore(or your Unity Catalog name) - schema: enter
tutorial_dev - threads: enter
4
Generated project structure
Tip
Compare the generated dbt_project.yml with the existing domain examples at domains/sales/dbt/dbt_project.yml to see CSA-in-a-Box conventions like medallion-layer schema routing and shared macro paths.
Step 3: Configure profiles.yml for Databricks¶
The dbt init command creates ~/.dbt/profiles.yml. Verify and tune it for your environment.
It should look like this (adjust values for your workspace):
csa_tutorial:
target: dev
outputs:
dev:
type: databricks
catalog: hive_metastore
schema: tutorial_dev
host: adb-1234567890.12.azuredatabricks.net
http_path: /sql/1.0/warehouses/abcdef1234567890
token: "{{ env_var('DATABRICKS_TOKEN') }}"
threads: 4
prod:
type: databricks
catalog: hive_metastore
schema: tutorial_prod
host: adb-1234567890.12.azuredatabricks.net
http_path: /sql/1.0/warehouses/abcdef1234567890
token: "{{ env_var('DATABRICKS_TOKEN') }}"
threads: 8
Tip
Use {{ env_var('DATABRICKS_TOKEN') }} instead of hardcoding tokens. Set the environment variable before running dbt: export DATABRICKS_TOKEN="dapi...".
Test the connection:
Expected Output
Configuration:
profiles.yml file [OK found and valid]
dbt_project.yml file [OK found and valid]
profile: csa_tutorial [OK found]
target: dev [OK found]
Required dependencies:
- git [OK found]
Connection:
host: adb-1234567890.12.azuredatabricks.net
http_path: /sql/1.0/warehouses/abcdef1234567890
catalog: hive_metastore
schema: tutorial_dev
Connection test: [OK connection ok]
All checks passed!
Troubleshooting¶
| Symptom | Cause | Fix |
|---|---|---|
Connection test: FAILED | Wrong host, path, or token | Verify workspace URL and regenerate PAT if expired |
Catalog does not exist | Unity Catalog not enabled or wrong catalog name | Use hive_metastore for non-UC workspaces |
Cluster terminated | Idle cluster auto-terminated | Start the cluster in the Databricks UI and retry |
Step 4: Create Source Definitions¶
Define the raw data sources that dbt will read from. Remove the generated example models first.
Create models/bronze/sources.yml:
version: 2
sources:
- name: raw_tutorial
description: >
Raw tutorial data. In dev, loaded via dbt seed.
In production, ingested by ADF to ADLS Bronze.
schema: seeds
loaded_at_field: _ingested_at
freshness:
warn_after: { count: 24, period: hour }
error_after: { count: 48, period: hour }
tables:
- name: raw_orders
description: Raw order records with customer and product references.
columns:
- name: order_id
description: Unique order identifier.
tests:
- not_null
- unique
- name: customer_id
description: Foreign key to customers.
- name: product_id
description: Foreign key to products.
- name: quantity
description: Units ordered.
- name: unit_price
description: Price per unit in USD.
- name: order_date
description: Date the order was placed.
- name: _ingested_at
description: Ingestion timestamp (set by seed or ADF).
- name: raw_customers
description: Customer master data.
columns:
- name: customer_id
tests:
- not_null
- unique
- name: customer_name
- name: region
- name: segment
- name: raw_products
description: Product catalog.
columns:
- name: product_id
tests:
- not_null
- unique
- name: product_name
- name: category
- name: list_price
Create seed CSV files so you can develop without external data:
Create seeds/raw_orders.csv:
order_id,customer_id,product_id,quantity,unit_price,order_date,_ingested_at
1001,C001,P001,5,29.99,2024-01-15,2024-01-15T10:00:00
1002,C002,P003,2,49.99,2024-01-16,2024-01-16T10:00:00
1003,C001,P002,1,199.99,2024-01-17,2024-01-17T10:00:00
1004,C003,P001,10,29.99,2024-01-18,2024-01-18T10:00:00
1005,C002,P004,3,14.99,2024-01-19,2024-01-19T10:00:00
1006,C004,P002,2,199.99,2024-01-20,2024-01-20T10:00:00
1007,C005,P003,1,49.99,2024-01-21,2024-01-21T10:00:00
1008,C001,P005,4,9.99,2024-01-22,2024-01-22T10:00:00
Create seeds/raw_customers.csv:
customer_id,customer_name,region,segment
C001,Acme Corp,West,Enterprise
C002,Beta LLC,East,SMB
C003,Gamma Inc,Central,Enterprise
C004,Delta Co,West,Government
C005,Epsilon Ltd,East,SMB
Create seeds/raw_products.csv:
product_id,product_name,category,list_price
P001,Widget A,Widgets,29.99
P002,Gadget Pro,Gadgets,199.99
P003,Widget B,Widgets,49.99
P004,Accessory X,Accessories,14.99
P005,Accessory Y,Accessories,9.99
Load the seed data:
Expected Output
Running with dbt=1.8.x
Found 0 models, 0 tests, 0 snapshots, 3 seeds
Concurrency: 4 threads (target='dev')
1 of 3 OK loaded seed file tutorial_dev.raw_orders ........... [INSERT 8 in 2.34s]
2 of 3 OK loaded seed file tutorial_dev.raw_customers ........ [INSERT 5 in 1.56s]
3 of 3 OK loaded seed file tutorial_dev.raw_products ......... [INSERT 5 in 1.23s]
Completed successfully.
Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
Step 5: Build Bronze Staging Models¶
Bronze models are thin wrappers over raw sources. They cast types, rename columns, and add metadata -- but do not apply business logic.
Create models/bronze/stg_orders.sql:
-- stg_orders.sql
-- Bronze staging: type-cast and add lineage metadata.
-- Pattern reference: domains/sales/dbt/models/bronze/brz_sales_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
file_format='delta'
)
}}
with source as (
select * from {{ source('raw_tutorial', 'raw_orders') }}
),
staged as (
select
cast(order_id as int) as order_id,
cast(customer_id as string) as customer_id,
cast(product_id as string) as product_id,
cast(quantity as int) as quantity,
cast(unit_price as decimal(10,2)) as unit_price,
cast(order_date as date) as order_date,
cast(_ingested_at as timestamp) as _ingested_at,
current_timestamp() as _dbt_loaded_at
from source
{% if is_incremental() %}
where _ingested_at > (select max(_ingested_at) from {{ this }})
{% endif %}
)
select * from staged
Create models/bronze/stg_customers.sql:
{{
config(
materialized='table',
file_format='delta'
)
}}
select
cast(customer_id as string) as customer_id,
cast(customer_name as string) as customer_name,
cast(region as string) as region,
cast(segment as string) as segment,
current_timestamp() as _dbt_loaded_at
from {{ source('raw_tutorial', 'raw_customers') }}
Create models/bronze/stg_products.sql:
{{
config(
materialized='table',
file_format='delta'
)
}}
select
cast(product_id as string) as product_id,
cast(product_name as string) as product_name,
cast(category as string) as category,
cast(list_price as decimal(10,2)) as list_price,
current_timestamp() as _dbt_loaded_at
from {{ source('raw_tutorial', 'raw_products') }}
Run the bronze models:
Step 6: Build Silver Fact and Dimension Models¶
Silver models apply business logic, join sources, handle deduplication, and enforce data contracts.
Create models/silver/dim_customers.sql:
-- dim_customers.sql
-- Silver dimension: enriched customer dimension with derived attributes.
{{
config(
materialized='table',
file_format='delta'
)
}}
with customers as (
select * from {{ ref('stg_customers') }}
)
select
customer_id,
customer_name,
region,
segment,
case
when segment = 'Enterprise' then 'Tier 1'
when segment = 'Government' then 'Tier 1'
when segment = 'SMB' then 'Tier 2'
else 'Tier 3'
end as customer_tier,
current_timestamp() as _dbt_loaded_at
from customers
Create models/silver/dim_products.sql:
{{
config(
materialized='table',
file_format='delta'
)
}}
with products as (
select * from {{ ref('stg_products') }}
)
select
product_id,
product_name,
category,
list_price,
case
when list_price >= 100 then 'Premium'
when list_price >= 25 then 'Standard'
else 'Value'
end as price_tier,
current_timestamp() as _dbt_loaded_at
from products
Create models/silver/fct_order_items.sql:
-- fct_order_items.sql
-- Silver fact: order line items with computed total and joined dimensions.
{{
config(
materialized='incremental',
unique_key='order_id',
file_format='delta',
incremental_strategy='merge'
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
{% if is_incremental() %}
where _dbt_loaded_at > (select max(_dbt_loaded_at) from {{ this }})
{% endif %}
),
customers as (
select * from {{ ref('dim_customers') }}
),
products as (
select * from {{ ref('dim_products') }}
)
select
o.order_id,
o.customer_id,
o.product_id,
c.customer_name,
c.region,
c.customer_tier,
p.product_name,
p.category,
p.price_tier,
o.quantity,
o.unit_price,
(o.quantity * o.unit_price) as line_total,
o.order_date,
o._ingested_at,
current_timestamp() as _dbt_loaded_at
from orders o
left join customers c on o.customer_id = c.customer_id
left join products p on o.product_id = p.product_id
Run the silver models:
Step 7: Build Gold Report Models¶
Gold models aggregate data for consumption by dashboards, APIs, and analysts. They are the external-facing contract.
Create models/gold/rpt_daily_sales.sql:
-- rpt_daily_sales.sql
-- Gold report: daily sales summary for executive dashboards.
{{
config(
materialized='table',
file_format='delta'
)
}}
with order_items as (
select * from {{ ref('fct_order_items') }}
)
select
order_date,
region,
count(distinct order_id) as total_orders,
sum(quantity) as total_units,
sum(line_total) as total_revenue,
avg(line_total) as avg_order_value,
count(distinct customer_id) as unique_customers
from order_items
group by order_date, region
Create models/gold/rpt_product_performance.sql:
-- rpt_product_performance.sql
-- Gold report: product performance metrics for merchandising.
{{
config(
materialized='table',
file_format='delta'
)
}}
with order_items as (
select * from {{ ref('fct_order_items') }}
)
select
product_id,
product_name,
category,
price_tier,
count(distinct order_id) as times_ordered,
sum(quantity) as total_units_sold,
sum(line_total) as total_revenue,
avg(unit_price) as avg_selling_price,
count(distinct customer_id) as unique_buyers
from order_items
group by product_id, product_name, category, price_tier
Run all models end-to-end:
Expected Output
Running with dbt=1.8.x
Found 7 models, 0 tests, 0 snapshots, 3 seeds
Concurrency: 4 threads (target='dev')
1 of 7 OK created sql incremental model tutorial_dev.stg_orders ........ [OK in 4.56s]
2 of 7 OK created sql table model tutorial_dev.stg_customers ........... [OK in 3.21s]
3 of 7 OK created sql table model tutorial_dev.stg_products ............ [OK in 2.98s]
4 of 7 OK created sql table model tutorial_dev.dim_customers ........... [OK in 3.45s]
5 of 7 OK created sql table model tutorial_dev.dim_products ............ [OK in 2.87s]
6 of 7 OK created sql incremental model tutorial_dev.fct_order_items ... [OK in 5.12s]
7 of 7 OK created sql table model tutorial_dev.rpt_daily_sales ......... [OK in 3.67s]
Finished running 3 table models, 2 incremental models, 2 table models in 28.34s.
Completed successfully.
Done. PASS=7 WARN=0 ERROR=0 SKIP=0 TOTAL=7
Step 8: Add Data Tests¶
dbt supports schema tests (declared in YAML), custom singular tests (SQL files), and source freshness checks.
8a. Schema Tests¶
Create models/silver/schema.yml:
version: 2
models:
- name: fct_order_items
description: Fact table of order line items with joined dimensions.
columns:
- name: order_id
description: Unique order identifier.
tests:
- not_null
- unique
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: product_id
tests:
- not_null
- relationships:
to: ref('dim_products')
field: product_id
- name: line_total
tests:
- not_null
- name: quantity
tests:
- not_null
- accepted_values:
values: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
severity: warn
- name: dim_customers
description: Customer dimension with derived tier.
columns:
- name: customer_id
tests:
- not_null
- unique
- name: customer_tier
tests:
- accepted_values:
values: ["Tier 1", "Tier 2", "Tier 3"]
- name: dim_products
description: Product dimension with price tier classification.
columns:
- name: product_id
tests:
- not_null
- unique
- name: price_tier
tests:
- accepted_values:
values: ["Premium", "Standard", "Value"]
8b. Custom Singular Test¶
Create tests/assert_no_negative_revenue.sql:
-- Custom test: ensure no order line has negative revenue.
-- dbt treats any rows returned by a test as failures.
select
order_id,
line_total
from {{ ref('fct_order_items') }}
where line_total < 0
8c. Source Freshness¶
Source freshness was already declared in sources.yml (Step 4). Check it:
8d. Run All Tests¶
Expected Output
Running with dbt=1.8.x
Found 7 models, 14 tests, 0 snapshots, 3 seeds, 3 sources
Concurrency: 4 threads (target='dev')
1 of 14 PASS not_null_fct_order_items_order_id ............. [PASS in 1.23s]
2 of 14 PASS unique_fct_order_items_order_id ............... [PASS in 1.45s]
3 of 14 PASS not_null_fct_order_items_customer_id .......... [PASS in 1.12s]
4 of 14 PASS relationships_fct_order_items_customer_id ..... [PASS in 1.67s]
5 of 14 PASS not_null_fct_order_items_product_id ........... [PASS in 1.10s]
6 of 14 PASS relationships_fct_order_items_product_id ...... [PASS in 1.54s]
7 of 14 PASS not_null_fct_order_items_line_total ........... [PASS in 1.08s]
8 of 14 WARN accepted_values_fct_order_items_quantity ...... [WARN in 1.32s]
9 of 14 PASS not_null_dim_customers_customer_id ............ [PASS in 0.98s]
10 of 14 PASS unique_dim_customers_customer_id .............. [PASS in 1.02s]
11 of 14 PASS accepted_values_dim_customers_customer_tier ... [PASS in 0.89s]
12 of 14 PASS not_null_dim_products_product_id .............. [PASS in 0.91s]
13 of 14 PASS unique_dim_products_product_id ................ [PASS in 0.95s]
14 of 14 PASS assert_no_negative_revenue .................... [PASS in 1.15s]
Finished running 14 tests in 18.41s.
Done. PASS=13 WARN=1 ERROR=0 SKIP=0 TOTAL=14
Troubleshooting¶
| Symptom | Cause | Fix |
|---|---|---|
relationships test fails | Seed data missing referenced IDs | Verify FK values in seed CSVs match PK values |
accepted_values warns but does not fail | Severity set to warn | Change to severity: error for strict enforcement |
| Source freshness errors | loaded_at_field column missing | Add _ingested_at to your seed CSV and source definition |
Step 9: Generate and Serve Documentation¶
dbt auto-generates a documentation website from your YAML descriptions, model lineage, and SQL.
# Generate the docs catalog
dbt docs generate
# Serve locally (opens http://localhost:8080)
dbt docs serve --port 8080
What you will see
The documentation site includes: - **Model lineage graph** showing Bronze --> Silver --> Gold flow - **Column-level descriptions** from your `schema.yml` files - **Test coverage** for each model and column - **Source freshness** status - **SQL compiled** for each model (click any model to see the compiled query) Press `Ctrl+C` to stop the local server.Tip
For production, deploy the docs site to GitHub Pages. Add dbt docs generate to your CI pipeline and publish the target/ folder as a static site.
Step 10: Set Up CI/CD Pipeline¶
Automate dbt runs with GitHub Actions: validate on pull requests, deploy on merge to main.
10a. Update dbt_project.yml for Layer Routing¶
Edit dbt_project.yml to route models to medallion schemas (matching the CSA-in-a-Box convention used in domains/sales/dbt/dbt_project.yml):
name: "csa_tutorial"
version: "1.0.0"
config-version: 2
require-dbt-version: [">=1.7.0", "<2.0.0"]
profile: "csa_tutorial"
model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
analysis-paths: ["analyses"]
vars:
file_format: "delta"
incremental_strategy: "merge"
models:
csa_tutorial:
bronze:
+materialized: incremental
+file_format: delta
+schema: bronze
+tags: ["bronze"]
silver:
+materialized: incremental
+file_format: delta
+schema: silver
+tags: ["silver"]
+incremental_strategy: merge
gold:
+materialized: table
+file_format: delta
+schema: gold
+tags: ["gold"]
seeds:
csa_tutorial:
+schema: seeds
10b. Create GitHub Actions Workflow¶
Create .github/workflows/dbt-ci.yml:
name: dbt CI/CD
on:
pull_request:
paths:
- "models/**"
- "seeds/**"
- "macros/**"
- "tests/**"
- "dbt_project.yml"
push:
branches: [main]
paths:
- "models/**"
- "seeds/**"
- "macros/**"
- "tests/**"
- "dbt_project.yml"
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
jobs:
dbt-pr-check:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dbt
run: pip install dbt-core dbt-databricks
- name: Configure profiles
run: |
mkdir -p ~/.dbt
cat > ~/.dbt/profiles.yml << 'EOF'
csa_tutorial:
target: ci
outputs:
ci:
type: databricks
catalog: hive_metastore
schema: tutorial_ci_${{ github.event.pull_request.number }}
host: ${{ env.DATABRICKS_HOST }}
http_path: /sql/1.0/warehouses/${{ secrets.DATABRICKS_SQL_WAREHOUSE_ID }}
token: ${{ env.DATABRICKS_TOKEN }}
threads: 4
EOF
- name: dbt deps
run: dbt deps
- name: dbt seed
run: dbt seed
- name: dbt build (models + tests)
run: dbt build
- name: Upload docs artifact
run: |
dbt docs generate
echo "Documentation generated successfully."
dbt-deploy:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dbt
run: pip install dbt-core dbt-databricks
- name: Configure profiles
run: |
mkdir -p ~/.dbt
cat > ~/.dbt/profiles.yml << 'EOF'
csa_tutorial:
target: prod
outputs:
prod:
type: databricks
catalog: hive_metastore
schema: tutorial_prod
host: ${{ env.DATABRICKS_HOST }}
http_path: /sql/1.0/warehouses/${{ secrets.DATABRICKS_SQL_WAREHOUSE_ID }}
token: ${{ env.DATABRICKS_TOKEN }}
threads: 8
EOF
- name: dbt deps
run: dbt deps
- name: dbt seed
run: dbt seed
- name: dbt run (production)
run: dbt run
- name: dbt test (production)
run: dbt test
10c. Add Repository Secrets¶
# Set secrets for the GitHub Actions workflow
gh secret set DATABRICKS_HOST --body "adb-1234567890.12.azuredatabricks.net"
gh secret set DATABRICKS_TOKEN --body "dapi..."
gh secret set DATABRICKS_SQL_WAREHOUSE_ID --body "abcdef1234567890"
Testing the pipeline
# Create a feature branch and make a change
git checkout -b feature/add-dbt-models
git add .
git commit -m "feat: add dbt tutorial models"
git push -u origin feature/add-dbt-models
# Open a PR -- this triggers the CI check
gh pr create --title "Add dbt tutorial models" --body "Tutorial 16 dbt workflow"
# Watch the CI run
gh run watch
Troubleshooting¶
| Symptom | Cause | Fix |
|---|---|---|
CI fails with ConnectionError | Secrets not set | Run gh secret list to verify secrets exist |
SchemaAlreadyExists in CI | PR schema collision | Schema name includes PR number for isolation |
| Prod deploy runs on PR | Workflow trigger misconfigured | Check the if: conditions on each job |
Step 11: Advanced -- dbt Packages, Macros, and Hooks¶
11a. Install dbt Packages¶
Create packages.yml in the project root:
packages:
- package: dbt-labs/dbt_utils
version: [">=1.1.0", "<2.0.0"]
- package: calogica/dbt_expectations
version: [">=0.10.0", "<1.0.0"]
Use dbt_utils in your models:
-- Example: generate a surrogate key
select
{{ dbt_utils.generate_surrogate_key(['order_id', 'product_id']) }} as order_item_key,
*
from {{ ref('stg_orders') }}
11b. Custom Macros¶
Create a reusable macro at macros/cents_to_dollars.sql:
{% macro cents_to_dollars(column_name) %}
round(cast({{ column_name }} as decimal(18,2)) / 100, 2)
{% endmacro %}
The CSA-in-a-Box platform includes shared macros at domains/shared/dbt/macros/ that you can reference by adding the path to your dbt_project.yml:
Available shared macros include:
| Macro | Purpose |
|---|---|
generate_schema_name.sql | Routes models to medallion schemas |
incremental_filter.sql | Standardized incremental filtering |
tenant_filter.sql | Multi-tenant row-level filtering |
data_quality.sql | Reusable data quality checks |
scd_type2.sql | Slowly changing dimension Type 2 |
11c. Pre/Post Hooks¶
Add hooks to dbt_project.yml for logging and optimization:
models:
csa_tutorial:
gold:
+post-hook:
- "OPTIMIZE {{ this }}"
- "ANALYZE TABLE {{ this }} COMPUTE STATISTICS"
Tip
OPTIMIZE runs Z-ordering on Delta tables for faster queries. ANALYZE TABLE updates Databricks statistics for the query optimizer.
Completion Checklist¶
- dbt-core and dbt-databricks installed and
dbt debugpasses - Project initialized with
dbt init -
profiles.ymlconfigured for Databricks (dev and prod targets) - Source definitions created with freshness checks
- Seed data loaded with
dbt seed - Bronze staging models built (stg_ prefix)
- Silver fact and dimension models built (fct, dim prefix)
- Gold report models built (rpt_ prefix)
- Schema tests and custom tests passing (
dbt test) - Documentation generated (
dbt docs generate) - CI/CD pipeline configured (GitHub Actions)
-
dbt_utilspackage installed and working
Next Steps¶
- Tutorial 01: Foundation Platform -- if you need to set up Databricks first
- Tutorial 15: Multi-Tenant Onboarding -- isolate dbt schemas per tenant
- Tutorial 12: Monitoring & Observability -- add dbt run monitoring and alerting
See the Tutorial Index for all available paths.