Skip to content

Tutorial: Migrate a dbt-Snowflake Project to dbt-Databricks or dbt-Fabric

Status: Authored 2026-04-30 Audience: dbt developers and data engineers with existing dbt-snowflake projects Prerequisites: Working dbt project on Snowflake, Databricks workspace with Unity Catalog (or Fabric Lakehouse), Git access to dbt project


What you will build

By the end of this tutorial, you will have:

  1. Converted a dbt-snowflake project to dbt-databricks (or dbt-fabric)
  2. Fixed all SQL dialect differences
  3. Translated Snowflake-specific materializations to Delta Lake equivalents
  4. Validated data parity between Snowflake and Databricks
  5. Set up CI/CD for the migrated project

Step 1: Assess your dbt project

Before touching code, understand the scope of the migration.

1.1 Count models by materialization

# In your dbt project directory
grep -r "materialized=" models/ | sort | uniq -c | sort -rn

Typical output:

  45 materialized='view'
  32 materialized='table'
  18 materialized='incremental'
   5 materialized='ephemeral'

1.2 Identify Snowflake-specific SQL

# Find Snowflake-specific function calls
grep -rn "IFF\|DATEADD\|DATEDIFF.*day\|TRY_TO_\|ARRAY_AGG\|OBJECT_CONSTRUCT\|VARIANT\|::" models/ --include="*.sql"

# Find Snowflake-specific materializations
grep -rn "dynamic_table\|secure_view" models/ --include="*.sql"

# Find Snowflake-specific macros
grep -rn "snowflake__\|target.type.*snowflake" macros/ --include="*.sql"

1.3 Document sources

# List all sources
grep -rn "source(" models/ --include="*.sql" | sort -u

Step 2: Swap the dbt adapter

2.1 Update requirements

# requirements.txt (before)
dbt-snowflake==1.7.0

# requirements.txt (after -- choose one)
dbt-databricks==1.7.0
# OR
dbt-fabric==1.7.0
pip install dbt-databricks
# OR
pip install dbt-fabric

2.2 Update profiles.yml

Snowflake profile (before):

my_project:
    outputs:
        prod:
            type: snowflake
            account: ACMEGOV.us-gov-west-1.snowflake-gov
            user: DBT_SVC
            password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
            role: DATA_ENGINEER
            database: ANALYTICS_DB
            warehouse: TRANSFORM_WH
            schema: MARTS
            threads: 8
        dev:
            type: snowflake
            account: ACMEGOV.us-gov-west-1.snowflake-gov
            user: "{{ env_var('SNOWFLAKE_USER') }}"
            password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
            role: DATA_ENGINEER
            database: ANALYTICS_DEV
            warehouse: DEV_WH
            schema: "dev_{{ env_var('USER') }}"
            threads: 4
    target: dev

Databricks profile (after):

my_project:
    outputs:
        prod:
            type: databricks
            host: adb-acmegov-analytics.12.databricks.azure.us
            http_path: /sql/1.0/warehouses/abc123def456
            catalog: analytics_prod
            schema: marts
            token: "{{ env_var('DATABRICKS_TOKEN') }}"
            threads: 8
        dev:
            type: databricks
            host: adb-acmegov-analytics.12.databricks.azure.us
            http_path: /sql/1.0/warehouses/dev789ghi012
            catalog: analytics_dev
            schema: "dev_{{ env_var('USER') }}"
            token: "{{ env_var('DATABRICKS_TOKEN') }}"
            threads: 4
    target: dev

Fabric profile (alternative):

my_project:
    outputs:
        prod:
            type: fabric
            driver: "ODBC Driver 18 for SQL Server"
            server: "acmegov-analytics.datawarehouse.fabric.microsoft.com"
            database: "analytics_lakehouse"
            schema: marts
            authentication: ServicePrincipal
            tenant_id: "{{ env_var('AZURE_TENANT_ID') }}"
            client_id: "{{ env_var('AZURE_CLIENT_ID') }}"
            client_secret: "{{ env_var('AZURE_CLIENT_SECRET') }}"
            threads: 8
    target: prod

2.3 Update dbt_project.yml

# dbt_project.yml changes

# Remove Snowflake-specific configs
# Before:
# models:
#   my_project:
#     +transient: true  # Snowflake-specific

# After (Databricks):
models:
    my_project:
        staging:
            +materialized: view
        marts:
            +materialized: incremental
            +incremental_strategy: merge
            +file_format: delta

Step 3: Fix SQL dialect differences

3.1 Automated fixes (safe to batch)

Create a migration script for common changes:

#!/bin/bash
# migrate-sql-dialect.sh
# Run from dbt project root

echo "Migrating SQL dialect from Snowflake to Databricks..."

# IFF() -> IF()
find models/ -name "*.sql" -exec sed -i 's/\bIFF(/IF(/g' {} +

# DATEADD(unit, amount, date) -> DATE_ADD(date, amount) for day unit
# NOTE: This handles the simple case. Complex DATEADD with non-day units needs manual review.
echo "WARNING: DATEADD conversions need manual review for non-day units"

# TRY_TO_NUMBER -> TRY_CAST
find models/ -name "*.sql" -exec sed -i 's/TRY_TO_NUMBER(\([^)]*\))/TRY_CAST(\1 AS NUMERIC)/g' {} +

# TRY_TO_DATE -> TRY_CAST
find models/ -name "*.sql" -exec sed -i 's/TRY_TO_DATE(\([^)]*\))/TRY_CAST(\1 AS DATE)/g' {} +

# ARRAY_AGG -> COLLECT_LIST
find models/ -name "*.sql" -exec sed -i 's/\bARRAY_AGG(/COLLECT_LIST(/g' {} +

echo "Done. Review changes before committing."
echo "Run: git diff models/"

3.2 Manual fixes (require review)

DATEADD / DATEDIFF

-- Snowflake
DATEADD(day, 7, order_date)
DATEADD(month, -3, CURRENT_TIMESTAMP())
DATEDIFF(day, start_date, end_date)

-- Databricks
DATE_ADD(order_date, 7)
ADD_MONTHS(CURRENT_TIMESTAMP(), -3)
DATEDIFF(end_date, start_date)  -- Note: argument order reversed!

Warning: DATEDIFF argument order is reversed between Snowflake and Databricks. This is the most common source of bugs during migration.

VARIANT / semi-structured data

-- Snowflake: VARIANT column with colon notation
SELECT
    raw_json:customer.name::STRING AS customer_name,
    raw_json:order.items[0].price::NUMBER AS first_item_price
FROM raw.events;

-- Databricks: STRUCT/MAP with dot notation
SELECT
    raw_json.customer.name AS customer_name,
    raw_json.order.items[0].price AS first_item_price
FROM analytics_prod.raw.events;

-- If raw_json is a STRING containing JSON:
SELECT
    GET_JSON_OBJECT(raw_json, '$.customer.name') AS customer_name,
    GET_JSON_OBJECT(raw_json, '$.order.items[0].price') AS first_item_price
FROM analytics_prod.raw.events;

-- Or parse with FROM_JSON:
SELECT
    FROM_JSON(raw_json, 'STRUCT<customer: STRUCT<name: STRING>, order: STRUCT<items: ARRAY<STRUCT<price: DOUBLE>>>>') AS parsed
FROM analytics_prod.raw.events;

OBJECT_CONSTRUCT

-- Snowflake
SELECT OBJECT_CONSTRUCT(
    'name', customer_name,
    'email', customer_email,
    'total', total_revenue
) AS customer_json
FROM marts.customer_summary;

-- Databricks
SELECT TO_JSON(NAMED_STRUCT(
    'name', customer_name,
    'email', customer_email,
    'total', total_revenue
)) AS customer_json
FROM analytics_prod.marts.customer_summary;

QUALIFY

-- Snowflake (QUALIFY is supported)
SELECT *
FROM raw.events
QUALIFY ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY created_at DESC) = 1;

-- Databricks (QUALIFY is supported in Runtime 13+)
-- Same syntax works! No change needed.
SELECT *
FROM analytics_prod.raw.events
QUALIFY ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY created_at DESC) = 1;

3.3 Complete SQL translation reference

Snowflake Databricks Notes
IFF(cond, a, b) IF(cond, a, b) Function name
DATEADD(day, n, date) DATE_ADD(date, n) Arg order; day-only shortcut
DATEADD(month, n, date) ADD_MONTHS(date, n) Month-specific function
DATEADD(hour, n, ts) ts + INTERVAL n HOURS Interval syntax
DATEDIFF(day, a, b) DATEDIFF(b, a) Args reversed
TRY_TO_NUMBER(x) TRY_CAST(x AS NUMERIC) Use TRY_CAST
TRY_TO_DATE(x) TRY_CAST(x AS DATE) Use TRY_CAST
TRY_TO_TIMESTAMP(x) TRY_CAST(x AS TIMESTAMP) Use TRY_CAST
ARRAY_AGG(x) COLLECT_LIST(x) Function name
OBJECT_CONSTRUCT(...) TO_JSON(NAMED_STRUCT(...)) Two functions
col:field::TYPE col.field or GET_JSON_OBJECT Notation differs
FLATTEN(...) EXPLODE(...) Array/object expansion
PARSE_JSON(x) FROM_JSON(x, schema) Schema required
TYPEOF(x) TYPEOF(x) Same (Runtime 13+)
LISTAGG(x, ',') CONCAT_WS(',', COLLECT_LIST(x)) Two functions
QUALIFY QUALIFY Same (Runtime 13+)
CURRENT_TIMESTAMP() CURRENT_TIMESTAMP() Same
CURRENT_DATE() CURRENT_DATE() Same
TO_VARCHAR(x, fmt) DATE_FORMAT(x, fmt) Format strings differ
SPLIT_PART(x, d, n) SPLIT(x, d)[n-1] 0-indexed array
REGEXP_SUBSTR(x, p) REGEXP_EXTRACT(x, p) Function name
$1, $2 (stage columns) Named columns No positional refs

Step 4: Convert materializations

4.1 Dynamic Tables to incremental models

-- Snowflake dynamic table (before)
-- This was defined in Snowflake directly, not in dbt

-- dbt incremental model (after)
-- models/marts/fct_daily_revenue.sql
{{ config(
    materialized='incremental',
    unique_key='revenue_date',
    incremental_strategy='merge',
    tblproperties={
        'delta.autoOptimize.autoCompact': 'true',
        'delta.autoOptimize.optimizeWrite': 'true'
    }
) }}

SELECT
    DATE_TRUNC('day', order_date) AS revenue_date,
    region,
    SUM(amount) AS total_revenue,
    COUNT(DISTINCT customer_id) AS unique_customers,
    COUNT(*) AS order_count
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE order_date > (SELECT MAX(revenue_date) - INTERVAL 1 DAY FROM {{ this }})
{% endif %}
GROUP BY DATE_TRUNC('day', order_date), region

4.2 Transient tables

# Snowflake: transient tables (no fail-safe, lower cost)
# dbt_project.yml
# models:
#   +transient: true  # Snowflake-specific, remove this

# Databricks: all Delta tables have configurable retention
# dbt_project.yml
models:
    my_project:
        staging:
            +materialized: view # Views for staging (no storage cost)
        marts:
            +materialized: incremental

4.3 Secure views

-- Snowflake: secure view
{{ config(materialized='view', secure=true) }}

-- Databricks: views with row filters and column masks
-- Secure views don't exist as a concept; security is enforced via UC
{{ config(materialized='view') }}
-- Apply row filters and column masks via Unity Catalog ALTER TABLE/VIEW commands

Step 5: Update sources

5.1 Update source definitions

# models/sources.yml (before)
sources:
  - name: raw
    database: ANALYTICS_DB
    schema: RAW
    tables:
      - name: orders
      - name: customers
      - name: products

# models/sources.yml (after -- Databricks)
sources:
  - name: raw
    database: analytics_prod  # catalog name
    schema: raw
    tables:
      - name: orders
      - name: customers
      - name: products

5.2 Handle case sensitivity

Snowflake uppercases all unquoted identifiers. Databricks preserves case.

# If your Snowflake sources use uppercase:
sources:
    - name: raw
      database: analytics_prod
      schema: raw
      tables:
          - name: orders
            # If the actual Delta table is lowercase, this works as-is
            # If you need to reference an uppercase table from Snowflake:
            # identifier: ORDERS  # explicit override

Step 6: Run and validate

6.1 Compile first (catch syntax errors)

dbt compile --target dev

Fix any compilation errors. Most will be SQL dialect issues from Step 3.

6.2 Run against dev

# Run all models
dbt run --target dev

# Run a specific model to debug
dbt run --target dev --select stg_orders

# Run with full refresh (rebuild all incremental models)
dbt run --target dev --full-refresh

6.3 Run tests

dbt test --target dev

6.4 Data reconciliation

Create a reconciliation model that compares Snowflake and Databricks outputs:

-- models/reconciliation/recon_orders.sql
-- Run this during parallel-run phase

WITH snowflake_counts AS (
    SELECT
        DATE_TRUNC('day', order_date) AS dt,
        COUNT(*) AS sf_count,
        SUM(amount) AS sf_total
    FROM {{ source('snowflake_bridge', 'orders') }}
    GROUP BY 1
),
databricks_counts AS (
    SELECT
        DATE_TRUNC('day', order_date) AS dt,
        COUNT(*) AS db_count,
        SUM(amount) AS db_total
    FROM {{ ref('stg_orders') }}
    GROUP BY 1
)
SELECT
    COALESCE(s.dt, d.dt) AS dt,
    s.sf_count,
    d.db_count,
    s.sf_count - d.db_count AS count_diff,
    s.sf_total,
    d.db_total,
    ABS(s.sf_total - d.db_total) AS total_diff,
    CASE WHEN ABS(s.sf_total - d.db_total) / NULLIF(s.sf_total, 0) > 0.005
         THEN 'FAIL' ELSE 'PASS' END AS status
FROM snowflake_counts s
FULL OUTER JOIN databricks_counts d ON s.dt = d.dt
ORDER BY dt DESC

Step 7: Set up CI/CD

7.1 GitHub Actions workflow

# .github/workflows/dbt-ci.yml
name: dbt CI

on:
    pull_request:
        paths:
            - "models/**"
            - "macros/**"
            - "tests/**"
            - "dbt_project.yml"

jobs:
    dbt-test:
        runs-on: ubuntu-latest
        steps:
            - uses: actions/checkout@v4

            - name: Set up Python
              uses: actions/setup-python@v5
              with:
                  python-version: "3.11"

            - name: Install dependencies
              run: pip install dbt-databricks==1.7.0

            - name: dbt deps
              run: dbt deps

            - name: dbt compile
              run: dbt compile --target ci
              env:
                  DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}

            - name: dbt run (changed models only)
              run: dbt run --target ci --select state:modified+
              env:
                  DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}

            - name: dbt test
              run: dbt test --target ci --select state:modified+
              env:
                  DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}

7.2 Contract validation

Wire to csa-inabox contract validation:

# Add to .github/workflows/validate-contracts.yml
- name: Validate dbt contracts
  run: |
      dbt compile --target ci
      python scripts/validate_contracts.py --dbt-manifest target/manifest.json

Step 8: Production deployment

8.1 Deploy to production

# Full refresh for initial production deployment
dbt run --target prod --full-refresh

# Run tests
dbt test --target prod

# Generate docs
dbt docs generate --target prod

8.2 Schedule production runs

Set up a Databricks Job to run dbt on schedule:

{
    "name": "dbt-production-daily",
    "schedule": {
        "quartz_cron_expression": "0 0 6 * * ?",
        "timezone_id": "America/New_York"
    },
    "tasks": [
        {
            "task_key": "dbt-run",
            "dbt_task": {
                "commands": [
                    "dbt deps",
                    "dbt run --target prod",
                    "dbt test --target prod"
                ],
                "project_directory": "/Repos/production/dbt-project"
            }
        }
    ],
    "email_notifications": {
        "on_failure": ["data-engineering@agency.gov"]
    }
}

Common pitfalls and solutions

Pitfall Symptom Solution
DATEDIFF arg order Negative values where positive expected Swap arguments: DATEDIFF(end, start) on Databricks
Case sensitivity Table not found errors Check catalog/schema/table name casing
VARIANT column access Syntax errors with : notation Use dot notation or GET_JSON_OBJECT
Snowflake-specific macros Compilation errors in macros/ Rewrite macros for Databricks SQL dialect
Transient table config Warning about unknown config Remove +transient: true from dbt_project.yml
Warehouse not found Connection errors Verify http_path in profiles.yml


Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team