Snowflake Warehouse Migration Guide¶

Status: Authored 2026-04-30 Audience: Data engineers, platform engineers, DBAs managing Snowflake virtual warehouses Scope: Multi-cluster warehouses to Databricks SQL Warehouses / Fabric capacity, sizing, auto-scaling, query optimization, resource monitors

1. Architecture comparison¶

Snowflake virtual warehouses¶

Snowflake virtual warehouses are named compute clusters that execute SQL queries:

Fixed sizes from X-Small (1 credit/hour) to 6X-Large (512 credits/hour)
Multi-cluster mode scales out by cloning the entire warehouse
Auto-suspend stops billing after configurable idle period (minimum 60 seconds)
Auto-resume starts the warehouse when a query arrives
Each warehouse is isolated -- no resource sharing between warehouses
Query queue management per warehouse

Databricks SQL Warehouses¶

Databricks SQL Warehouses are the direct replacement:

Sizes from 2X-Small (4 DBU/hour) to 4X-Large (320 DBU/hour)
Auto-scaling scales individual nodes, not full warehouse clones
Auto-stop halts billing after configurable idle period (1 minute on classic, 10 minutes on serverless)
Serverless warehouses spin up in under 10 seconds
Photon engine accelerates scan-heavy queries automatically
Query queue management with configurable concurrency

Fabric SQL analytics endpoint¶

For teams moving to Microsoft Fabric rather than Databricks:

Fabric Lakehouse SQL analytics endpoint provides T-SQL interface over Delta Lake
Fabric capacity (CU/hour) shared across all Fabric workloads
Direct Lake mode for Power BI eliminates import/export
No per-warehouse sizing -- capacity is pool-level

2. Warehouse size mapping¶

Direct size translation¶

Snowflake size	Credits/hr	Databricks SQL size	DBU/hr	Fabric capacity (approximate)
X-Small	1	2X-Small	4	F4
Small	2	X-Small	6	F8
Medium	4	Small	12	F16
Large	8	Medium	24	F32-F64
X-Large	16	Large	40	F64-F128
2X-Large	32	X-Large	80	F128-F256
3X-Large	64	2X-Large	144	F256-F512
4X-Large	128	3X-Large	240	F512-F1024
5X-Large	256	4X-Large	320	F1024-F2048
6X-Large	512	4X-Large (multi)	640	F2048

Right-sizing methodology¶

Do not blindly translate sizes. Snowflake and Databricks have different performance characteristics at each tier.

Step 1: Profile current Snowflake usage

-- Snowflake: query warehouse utilization
SELECT
    warehouse_name,
    warehouse_size,
    AVG(avg_running) AS avg_concurrent_queries,
    MAX(avg_running) AS peak_concurrent_queries,
    AVG(avg_queued_load) AS avg_queue_depth,
    SUM(credits_used) AS total_credits,
    COUNT(DISTINCT DATE_TRUNC('day', start_time)) AS active_days
FROM snowflake.account_usage.warehouse_metering_history
WHERE start_time >= DATEADD(month, -3, CURRENT_TIMESTAMP())
GROUP BY warehouse_name, warehouse_size
ORDER BY total_credits DESC;

Step 2: Analyze query patterns

-- Snowflake: query duration and resource consumption
SELECT
    warehouse_name,
    query_type,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY total_elapsed_time) AS p50_ms,
    PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY total_elapsed_time) AS p90_ms,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY total_elapsed_time) AS p99_ms,
    AVG(bytes_scanned) / POWER(1024, 3) AS avg_gb_scanned,
    COUNT(*) AS query_count
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(month, -3, CURRENT_TIMESTAMP())
GROUP BY warehouse_name, query_type
ORDER BY query_count DESC;

Step 3: Start one size smaller on Databricks

Databricks Photon engine and Delta Lake file pruning often deliver equivalent or better performance at a smaller size. Start one tier smaller and benchmark.

Step 4: Monitor and adjust

-- Databricks: query warehouse performance
SELECT
    warehouse_id,
    COUNT(*) AS query_count,
    AVG(duration) AS avg_duration_ms,
    PERCENTILE(duration, 0.90) AS p90_duration_ms,
    SUM(total_task_duration_ms) AS total_compute_ms
FROM system.query.history
WHERE start_time >= CURRENT_DATE - INTERVAL 7 DAYS
GROUP BY warehouse_id;

3. Multi-cluster warehouse translation¶

Snowflake multi-cluster behavior¶

Snowflake multi-cluster warehouses scale by cloning the entire warehouse:

Economy mode: queues queries until load justifies a new cluster
Standard mode: starts a new cluster immediately when queries queue
Min/max clusters: configurable range (e.g., 1 min, 10 max)
Each cluster is a full copy of the warehouse size (credits multiply linearly)

Databricks auto-scaling¶

Databricks SQL Warehouses scale differently:

Scaling is per-node, not per-warehouse-clone
Cluster size range is configurable (e.g., 1 to 10 nodes)
Scaling decisions are based on query queue depth and compute utilization
More granular than Snowflake's all-or-nothing cluster cloning

Translation rules:

Snowflake config	Databricks equivalent
Multi-cluster: min 1, max 1	Auto-scaling: min 1, max 1 (fixed)
Multi-cluster: min 1, max 3	Auto-scaling: min cluster size, max 3x cluster size
Multi-cluster: min 1, max 10	Auto-scaling: use next-larger warehouse with max scaling
Multi-cluster: min 3, max 10	Consider always-on medium + auto-scaling large

Concurrency management¶

Snowflake	Databricks
8 concurrent queries per cluster (default)	Configurable concurrent queries per warehouse
Queue depth triggers multi-cluster scaling	Queue depth triggers node scaling
Economy vs Standard scaling mode	Single scaling policy with configurable aggressiveness
Separate warehouses for isolation	Separate warehouses or query tagging for isolation

4. Auto-suspend to auto-stop translation¶

Snowflake auto-suspend¶

Configurable in seconds (minimum 60 seconds, default 600 seconds)
Warehouse remains running and billing during the suspend timer
Resume takes 2-30 seconds depending on warehouse size and cold/warm state
Suspended warehouses retain cached data (warm resume)

Databricks auto-stop¶

Classic SQL Warehouses:

Configurable in minutes (minimum 1 minute, default 10 minutes)
Auto-stop completely deallocates the warehouse
Restart takes 30-120 seconds for classic
No cached data retention after stop

Serverless SQL Warehouses:

Minimum 10 minutes auto-stop
Restart in under 10 seconds (near-instant)
Serverless billing is per-query, not per-time
Recommended for interactive/ad-hoc workloads

Translation guidance¶

Snowflake auto-suspend	Databricks recommendation
60 seconds (aggressive)	Serverless warehouse (instant resume)
300 seconds (moderate)	Classic with 5-min auto-stop
600 seconds (default)	Classic with 10-min auto-stop
3600 seconds (conservative)	Classic with 15-min auto-stop; consider reservation
Never (always on)	Always-on classic warehouse with reserved capacity

5. Query optimization differences¶

Snowflake optimizations that translate directly¶

Snowflake optimization	Databricks equivalent	Notes
Micro-partition pruning	Delta file pruning	Automatic based on file statistics
Clustering keys	Z-ORDER / liquid clustering	`OPTIMIZE table ZORDER BY (col)` or liquid clustering
Result cache	SQL Warehouse result cache	Automatic; same query returns cached result
Metadata cache	Delta file metadata cache	Automatic
Materialized views	Materialized views (Databricks)	GA in Runtime 13+; syntax slightly different

Snowflake optimizations that require changes¶

Snowflake optimization	Databricks approach	Migration action
Search Optimization Service	Z-ORDER + liquid clustering	Apply Z-ORDER on lookup columns; evaluate liquid clustering
Query acceleration	Photon engine (automatic)	No action needed; Photon is included
Automatic clustering	Liquid clustering (Runtime 14+)	Enable liquid clustering on frequently-queried tables
Warehouse-level query timeout	SQL Warehouse statement timeout	Configure via warehouse settings

Query tuning checklist¶

After migrating each warehouse:

Run the top 20 queries by frequency and compare execution plans
Check scan sizes -- Delta file pruning should match or beat micro-partition pruning
Apply Z-ORDER on columns used in WHERE/JOIN clauses for large tables
Enable Photon on the SQL Warehouse (usually default)
Set appropriate statement timeout to catch runaway queries
Monitor spill-to-disk -- if excessive, consider a larger warehouse or query rewrite
Compare result cache hit rates between Snowflake and Databricks

6. Resource monitor translation¶

Snowflake resource monitors¶

Snowflake resource monitors track credit consumption and can:

Send notifications at configurable thresholds (50%, 75%, 100%)
Suspend warehouse at a threshold
Suspend and kill running queries at a threshold
Apply per-warehouse or account-wide

Azure cost governance¶

Azure provides multiple layers:

Layer	Tool	Equivalent to
Budget alerts	Azure Cost Management	Resource monitor notifications
Warehouse auto-stop	Databricks SQL Warehouse config	Resource monitor suspend
Hard kill-switch	`scripts/deploy/teardown-platform.sh`	Resource monitor suspend + kill
Tag-based tracking	Azure resource tags	Resource monitor per-warehouse tracking
Anomaly detection	Azure Cost Management anomaly alerts	No Snowflake equivalent

Setup example (Azure Cost Management budget):

{
    "name": "finance-warehouse-monthly",
    "amount": 5000,
    "timeGrain": "Monthly",
    "timePeriod": {
        "startDate": "2026-05-01"
    },
    "notifications": {
        "notification50pct": {
            "enabled": true,
            "operator": "GreaterThanOrEqualTo",
            "threshold": 50,
            "contactEmails": ["data-platform-team@agency.gov"]
        },
        "notification90pct": {
            "enabled": true,
            "operator": "GreaterThanOrEqualTo",
            "threshold": 90,
            "contactEmails": [
                "data-platform-team@agency.gov",
                "cfo-office@agency.gov"
            ]
        }
    }
}

7. Migration execution steps¶

Per-warehouse migration procedure¶

Profile the Snowflake warehouse (queries, sizes, schedules, consumers)
Create the Databricks SQL Warehouse one size smaller
Configure auto-stop, scaling, and concurrency limits
Migrate the dbt models or queries that use this warehouse (see dbt tutorial)
Benchmark the top 20 queries; adjust warehouse size if needed
Apply Z-ORDER on key tables
Configure cost budgets and alerts
Run parallel for 2 weeks minimum; reconcile results
Cutover consumers to the Databricks warehouse
Decommission the Snowflake warehouse (suspend, then drop after 30 days)

Parallel-run monitoring¶

During the parallel-run phase, track these metrics daily:

Metric	Snowflake source	Databricks source
Query count	`QUERY_HISTORY` view	`system.query.history` table
P50/P90/P99 latency	`QUERY_HISTORY` view	`system.query.history` table
Total compute cost	`WAREHOUSE_METERING_HISTORY`	Databricks usage logs
Error rate	`QUERY_HISTORY` (error queries)	`system.query.history` (failed)
Data freshness	Source table timestamps	Source table timestamps

8. Common pitfalls¶

Pitfall 1: Over-sizing Databricks warehouses¶

Databricks Photon engine is more efficient per-compute-unit than Snowflake's engine for scan-heavy workloads. Start one size smaller and scale up only if benchmarks justify it.

Pitfall 2: Ignoring auto-stop configuration¶

The default auto-stop on Databricks classic warehouses is 10 minutes. For development warehouses, set it to 1 minute. For serverless warehouses, the minimum is 10 minutes but billing is per-query.

Pitfall 3: Not applying Z-ORDER¶

Snowflake's micro-partition pruning works automatically based on natural data ordering. Delta Lake benefits from explicit Z-ORDER on high-cardinality columns used in filters. Skipping this step can result in slower queries.

Pitfall 4: Carrying over multi-cluster patterns¶

Snowflake multi-cluster warehouses scale by cloning the entire warehouse. Do not replicate this pattern 1:1 on Databricks. Use auto-scaling (per-node) and concurrency limits instead.

Pitfall 5: Ignoring serverless warehouses¶

For interactive and ad-hoc workloads, serverless SQL warehouses provide near-instant startup (under 10 seconds) at a higher per-DBU rate. The total cost is often lower because you eliminate idle billing.

9. Fabric capacity alternative¶

For teams choosing Microsoft Fabric over Databricks:

Capacity mapping¶

Snowflake warehouse count	Recommended Fabric capacity
1-3 small warehouses	F32-F64
3-5 mixed warehouses	F64-F128
5-10 mixed warehouses	F128-F256
10+ warehouses or heavy workloads	F256-F512

Key differences from Databricks¶

Fabric capacity is shared across all Fabric workloads (data engineering, data science, BI, real-time analytics)
No per-warehouse sizing -- capacity is pool-level
Fabric capacity can be paused for dev/test (scale-to-zero)
Direct Lake mode eliminates data import for Power BI
T-SQL interface via SQL analytics endpoint (familiar for SQL Server teams)

When to choose Fabric over Databricks¶

Power BI is the primary consumption layer
T-SQL skills are stronger than Spark/Python skills
Unified platform is preferred over best-of-breed components
Simpler capacity model is valued over per-warehouse control

When to choose Databricks over Fabric¶

Complex Spark workloads (ML, streaming, large-scale ETL)
Unity Catalog is needed for fine-grained access control
Multi-cloud strategy (Databricks runs on Azure, AWS, GCP)
Existing Databricks skills and infrastructure

Feature Mapping -- all 66 features mapped
Tutorial: dbt Migration -- step-by-step dbt adapter swap
Best Practices -- warehouse-by-warehouse migration strategy
TCO Analysis -- cost comparison by warehouse tier
Benchmarks -- performance comparison data
Master playbook -- Section 4.5 for the original warehouse sizing table

Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team