Tutorial: Migrate a dbt-Snowflake Project to dbt-Databricks or dbt-Fabric¶
Status: Authored 2026-04-30 Audience: dbt developers and data engineers with existing dbt-snowflake projects Prerequisites: Working dbt project on Snowflake, Databricks workspace with Unity Catalog (or Fabric Lakehouse), Git access to dbt project
What you will build¶
By the end of this tutorial, you will have:
- Converted a
dbt-snowflakeproject todbt-databricks(ordbt-fabric) - Fixed all SQL dialect differences
- Translated Snowflake-specific materializations to Delta Lake equivalents
- Validated data parity between Snowflake and Databricks
- Set up CI/CD for the migrated project
Step 1: Assess your dbt project¶
Before touching code, understand the scope of the migration.
1.1 Count models by materialization¶
Typical output:
45 materialized='view'
32 materialized='table'
18 materialized='incremental'
5 materialized='ephemeral'
1.2 Identify Snowflake-specific SQL¶
# Find Snowflake-specific function calls
grep -rn "IFF\|DATEADD\|DATEDIFF.*day\|TRY_TO_\|ARRAY_AGG\|OBJECT_CONSTRUCT\|VARIANT\|::" models/ --include="*.sql"
# Find Snowflake-specific materializations
grep -rn "dynamic_table\|secure_view" models/ --include="*.sql"
# Find Snowflake-specific macros
grep -rn "snowflake__\|target.type.*snowflake" macros/ --include="*.sql"
1.3 Document sources¶
Step 2: Swap the dbt adapter¶
2.1 Update requirements¶
# requirements.txt (before)
dbt-snowflake==1.7.0
# requirements.txt (after -- choose one)
dbt-databricks==1.7.0
# OR
dbt-fabric==1.7.0
2.2 Update profiles.yml¶
Snowflake profile (before):
my_project:
outputs:
prod:
type: snowflake
account: ACMEGOV.us-gov-west-1.snowflake-gov
user: DBT_SVC
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: DATA_ENGINEER
database: ANALYTICS_DB
warehouse: TRANSFORM_WH
schema: MARTS
threads: 8
dev:
type: snowflake
account: ACMEGOV.us-gov-west-1.snowflake-gov
user: "{{ env_var('SNOWFLAKE_USER') }}"
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: DATA_ENGINEER
database: ANALYTICS_DEV
warehouse: DEV_WH
schema: "dev_{{ env_var('USER') }}"
threads: 4
target: dev
Databricks profile (after):
my_project:
outputs:
prod:
type: databricks
host: adb-acmegov-analytics.12.databricks.azure.us
http_path: /sql/1.0/warehouses/abc123def456
catalog: analytics_prod
schema: marts
token: "{{ env_var('DATABRICKS_TOKEN') }}"
threads: 8
dev:
type: databricks
host: adb-acmegov-analytics.12.databricks.azure.us
http_path: /sql/1.0/warehouses/dev789ghi012
catalog: analytics_dev
schema: "dev_{{ env_var('USER') }}"
token: "{{ env_var('DATABRICKS_TOKEN') }}"
threads: 4
target: dev
Fabric profile (alternative):
my_project:
outputs:
prod:
type: fabric
driver: "ODBC Driver 18 for SQL Server"
server: "acmegov-analytics.datawarehouse.fabric.microsoft.com"
database: "analytics_lakehouse"
schema: marts
authentication: ServicePrincipal
tenant_id: "{{ env_var('AZURE_TENANT_ID') }}"
client_id: "{{ env_var('AZURE_CLIENT_ID') }}"
client_secret: "{{ env_var('AZURE_CLIENT_SECRET') }}"
threads: 8
target: prod
2.3 Update dbt_project.yml¶
# dbt_project.yml changes
# Remove Snowflake-specific configs
# Before:
# models:
# my_project:
# +transient: true # Snowflake-specific
# After (Databricks):
models:
my_project:
staging:
+materialized: view
marts:
+materialized: incremental
+incremental_strategy: merge
+file_format: delta
Step 3: Fix SQL dialect differences¶
3.1 Automated fixes (safe to batch)¶
Create a migration script for common changes:
#!/bin/bash
# migrate-sql-dialect.sh
# Run from dbt project root
echo "Migrating SQL dialect from Snowflake to Databricks..."
# IFF() -> IF()
find models/ -name "*.sql" -exec sed -i 's/\bIFF(/IF(/g' {} +
# DATEADD(unit, amount, date) -> DATE_ADD(date, amount) for day unit
# NOTE: This handles the simple case. Complex DATEADD with non-day units needs manual review.
echo "WARNING: DATEADD conversions need manual review for non-day units"
# TRY_TO_NUMBER -> TRY_CAST
find models/ -name "*.sql" -exec sed -i 's/TRY_TO_NUMBER(\([^)]*\))/TRY_CAST(\1 AS NUMERIC)/g' {} +
# TRY_TO_DATE -> TRY_CAST
find models/ -name "*.sql" -exec sed -i 's/TRY_TO_DATE(\([^)]*\))/TRY_CAST(\1 AS DATE)/g' {} +
# ARRAY_AGG -> COLLECT_LIST
find models/ -name "*.sql" -exec sed -i 's/\bARRAY_AGG(/COLLECT_LIST(/g' {} +
echo "Done. Review changes before committing."
echo "Run: git diff models/"
3.2 Manual fixes (require review)¶
DATEADD / DATEDIFF¶
-- Snowflake
DATEADD(day, 7, order_date)
DATEADD(month, -3, CURRENT_TIMESTAMP())
DATEDIFF(day, start_date, end_date)
-- Databricks
DATE_ADD(order_date, 7)
ADD_MONTHS(CURRENT_TIMESTAMP(), -3)
DATEDIFF(end_date, start_date) -- Note: argument order reversed!
Warning: DATEDIFF argument order is reversed between Snowflake and Databricks. This is the most common source of bugs during migration.
VARIANT / semi-structured data¶
-- Snowflake: VARIANT column with colon notation
SELECT
raw_json:customer.name::STRING AS customer_name,
raw_json:order.items[0].price::NUMBER AS first_item_price
FROM raw.events;
-- Databricks: STRUCT/MAP with dot notation
SELECT
raw_json.customer.name AS customer_name,
raw_json.order.items[0].price AS first_item_price
FROM analytics_prod.raw.events;
-- If raw_json is a STRING containing JSON:
SELECT
GET_JSON_OBJECT(raw_json, '$.customer.name') AS customer_name,
GET_JSON_OBJECT(raw_json, '$.order.items[0].price') AS first_item_price
FROM analytics_prod.raw.events;
-- Or parse with FROM_JSON:
SELECT
FROM_JSON(raw_json, 'STRUCT<customer: STRUCT<name: STRING>, order: STRUCT<items: ARRAY<STRUCT<price: DOUBLE>>>>') AS parsed
FROM analytics_prod.raw.events;
OBJECT_CONSTRUCT¶
-- Snowflake
SELECT OBJECT_CONSTRUCT(
'name', customer_name,
'email', customer_email,
'total', total_revenue
) AS customer_json
FROM marts.customer_summary;
-- Databricks
SELECT TO_JSON(NAMED_STRUCT(
'name', customer_name,
'email', customer_email,
'total', total_revenue
)) AS customer_json
FROM analytics_prod.marts.customer_summary;
QUALIFY¶
-- Snowflake (QUALIFY is supported)
SELECT *
FROM raw.events
QUALIFY ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY created_at DESC) = 1;
-- Databricks (QUALIFY is supported in Runtime 13+)
-- Same syntax works! No change needed.
SELECT *
FROM analytics_prod.raw.events
QUALIFY ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY created_at DESC) = 1;
3.3 Complete SQL translation reference¶
| Snowflake | Databricks | Notes |
|---|---|---|
IFF(cond, a, b) | IF(cond, a, b) | Function name |
DATEADD(day, n, date) | DATE_ADD(date, n) | Arg order; day-only shortcut |
DATEADD(month, n, date) | ADD_MONTHS(date, n) | Month-specific function |
DATEADD(hour, n, ts) | ts + INTERVAL n HOURS | Interval syntax |
DATEDIFF(day, a, b) | DATEDIFF(b, a) | Args reversed |
TRY_TO_NUMBER(x) | TRY_CAST(x AS NUMERIC) | Use TRY_CAST |
TRY_TO_DATE(x) | TRY_CAST(x AS DATE) | Use TRY_CAST |
TRY_TO_TIMESTAMP(x) | TRY_CAST(x AS TIMESTAMP) | Use TRY_CAST |
ARRAY_AGG(x) | COLLECT_LIST(x) | Function name |
OBJECT_CONSTRUCT(...) | TO_JSON(NAMED_STRUCT(...)) | Two functions |
col:field::TYPE | col.field or GET_JSON_OBJECT | Notation differs |
FLATTEN(...) | EXPLODE(...) | Array/object expansion |
PARSE_JSON(x) | FROM_JSON(x, schema) | Schema required |
TYPEOF(x) | TYPEOF(x) | Same (Runtime 13+) |
LISTAGG(x, ',') | CONCAT_WS(',', COLLECT_LIST(x)) | Two functions |
QUALIFY | QUALIFY | Same (Runtime 13+) |
CURRENT_TIMESTAMP() | CURRENT_TIMESTAMP() | Same |
CURRENT_DATE() | CURRENT_DATE() | Same |
TO_VARCHAR(x, fmt) | DATE_FORMAT(x, fmt) | Format strings differ |
SPLIT_PART(x, d, n) | SPLIT(x, d)[n-1] | 0-indexed array |
REGEXP_SUBSTR(x, p) | REGEXP_EXTRACT(x, p) | Function name |
$1, $2 (stage columns) | Named columns | No positional refs |
Step 4: Convert materializations¶
4.1 Dynamic Tables to incremental models¶
-- Snowflake dynamic table (before)
-- This was defined in Snowflake directly, not in dbt
-- dbt incremental model (after)
-- models/marts/fct_daily_revenue.sql
{{ config(
materialized='incremental',
unique_key='revenue_date',
incremental_strategy='merge',
tblproperties={
'delta.autoOptimize.autoCompact': 'true',
'delta.autoOptimize.optimizeWrite': 'true'
}
) }}
SELECT
DATE_TRUNC('day', order_date) AS revenue_date,
region,
SUM(amount) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_customers,
COUNT(*) AS order_count
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE order_date > (SELECT MAX(revenue_date) - INTERVAL 1 DAY FROM {{ this }})
{% endif %}
GROUP BY DATE_TRUNC('day', order_date), region
4.2 Transient tables¶
# Snowflake: transient tables (no fail-safe, lower cost)
# dbt_project.yml
# models:
# +transient: true # Snowflake-specific, remove this
# Databricks: all Delta tables have configurable retention
# dbt_project.yml
models:
my_project:
staging:
+materialized: view # Views for staging (no storage cost)
marts:
+materialized: incremental
4.3 Secure views¶
-- Snowflake: secure view
{{ config(materialized='view', secure=true) }}
-- Databricks: views with row filters and column masks
-- Secure views don't exist as a concept; security is enforced via UC
{{ config(materialized='view') }}
-- Apply row filters and column masks via Unity Catalog ALTER TABLE/VIEW commands
Step 5: Update sources¶
5.1 Update source definitions¶
# models/sources.yml (before)
sources:
- name: raw
database: ANALYTICS_DB
schema: RAW
tables:
- name: orders
- name: customers
- name: products
# models/sources.yml (after -- Databricks)
sources:
- name: raw
database: analytics_prod # catalog name
schema: raw
tables:
- name: orders
- name: customers
- name: products
5.2 Handle case sensitivity¶
Snowflake uppercases all unquoted identifiers. Databricks preserves case.
# If your Snowflake sources use uppercase:
sources:
- name: raw
database: analytics_prod
schema: raw
tables:
- name: orders
# If the actual Delta table is lowercase, this works as-is
# If you need to reference an uppercase table from Snowflake:
# identifier: ORDERS # explicit override
Step 6: Run and validate¶
6.1 Compile first (catch syntax errors)¶
Fix any compilation errors. Most will be SQL dialect issues from Step 3.
6.2 Run against dev¶
# Run all models
dbt run --target dev
# Run a specific model to debug
dbt run --target dev --select stg_orders
# Run with full refresh (rebuild all incremental models)
dbt run --target dev --full-refresh
6.3 Run tests¶
6.4 Data reconciliation¶
Create a reconciliation model that compares Snowflake and Databricks outputs:
-- models/reconciliation/recon_orders.sql
-- Run this during parallel-run phase
WITH snowflake_counts AS (
SELECT
DATE_TRUNC('day', order_date) AS dt,
COUNT(*) AS sf_count,
SUM(amount) AS sf_total
FROM {{ source('snowflake_bridge', 'orders') }}
GROUP BY 1
),
databricks_counts AS (
SELECT
DATE_TRUNC('day', order_date) AS dt,
COUNT(*) AS db_count,
SUM(amount) AS db_total
FROM {{ ref('stg_orders') }}
GROUP BY 1
)
SELECT
COALESCE(s.dt, d.dt) AS dt,
s.sf_count,
d.db_count,
s.sf_count - d.db_count AS count_diff,
s.sf_total,
d.db_total,
ABS(s.sf_total - d.db_total) AS total_diff,
CASE WHEN ABS(s.sf_total - d.db_total) / NULLIF(s.sf_total, 0) > 0.005
THEN 'FAIL' ELSE 'PASS' END AS status
FROM snowflake_counts s
FULL OUTER JOIN databricks_counts d ON s.dt = d.dt
ORDER BY dt DESC
Step 7: Set up CI/CD¶
7.1 GitHub Actions workflow¶
# .github/workflows/dbt-ci.yml
name: dbt CI
on:
pull_request:
paths:
- "models/**"
- "macros/**"
- "tests/**"
- "dbt_project.yml"
jobs:
dbt-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: pip install dbt-databricks==1.7.0
- name: dbt deps
run: dbt deps
- name: dbt compile
run: dbt compile --target ci
env:
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
- name: dbt run (changed models only)
run: dbt run --target ci --select state:modified+
env:
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
- name: dbt test
run: dbt test --target ci --select state:modified+
env:
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
7.2 Contract validation¶
Wire to csa-inabox contract validation:
# Add to .github/workflows/validate-contracts.yml
- name: Validate dbt contracts
run: |
dbt compile --target ci
python scripts/validate_contracts.py --dbt-manifest target/manifest.json
Step 8: Production deployment¶
8.1 Deploy to production¶
# Full refresh for initial production deployment
dbt run --target prod --full-refresh
# Run tests
dbt test --target prod
# Generate docs
dbt docs generate --target prod
8.2 Schedule production runs¶
Set up a Databricks Job to run dbt on schedule:
{
"name": "dbt-production-daily",
"schedule": {
"quartz_cron_expression": "0 0 6 * * ?",
"timezone_id": "America/New_York"
},
"tasks": [
{
"task_key": "dbt-run",
"dbt_task": {
"commands": [
"dbt deps",
"dbt run --target prod",
"dbt test --target prod"
],
"project_directory": "/Repos/production/dbt-project"
}
}
],
"email_notifications": {
"on_failure": ["data-engineering@agency.gov"]
}
}
Common pitfalls and solutions¶
| Pitfall | Symptom | Solution |
|---|---|---|
| DATEDIFF arg order | Negative values where positive expected | Swap arguments: DATEDIFF(end, start) on Databricks |
| Case sensitivity | Table not found errors | Check catalog/schema/table name casing |
| VARIANT column access | Syntax errors with : notation | Use dot notation or GET_JSON_OBJECT |
| Snowflake-specific macros | Compilation errors in macros/ | Rewrite macros for Databricks SQL dialect |
| Transient table config | Warning about unknown config | Remove +transient: true from dbt_project.yml |
| Warehouse not found | Connection errors | Verify http_path in profiles.yml |
Related documents¶
- Warehouse Migration -- compute sizing for dbt workloads
- Streams & Tasks Migration -- Task to dbt Job conversion
- Feature Mapping -- full SQL function translation
- Best Practices -- parallel-run and reconciliation guidance
- Master playbook -- Section 4 for the original worked example
Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team