Benchmarks — Databricks vs Microsoft Fabric¶

Status: Authored 2026-04-30 Audience: Platform engineers, architects, and decision-makers who need performance data to support migration or hybrid architecture decisions. Scope: Comparative benchmarks for Spark query performance, streaming latency, SQL analytics, BI refresh, startup time, and cost-per-query across Databricks and Fabric.

1. Methodology and disclaimers¶

Important caveats¶

These benchmarks are directional, not definitive. Your workloads will differ. Always run your own benchmarks with your data and queries before making migration decisions.
Databricks performance varies significantly by: cluster size, VM type, Photon vs non-Photon, Runtime version, and optimization settings.
Fabric performance varies by: capacity SKU, workload concurrency, V-Order status, and smoothing behavior.
All benchmarks below use publicly available pricing and documented platform capabilities as of April 2026.
Numbers are based on common patterns observed in mid-size enterprise workloads (10-50 TB datasets, 10-100 concurrent users).

Test methodology¶

For each benchmark category:

Define a representative workload
Run on Databricks with a common cluster configuration
Run on Fabric with an equivalent capacity SKU
Measure execution time, cost, and resource utilization
Repeat 3 times and report the median

2. Spark batch query performance¶

2.1 Benchmark: TPC-DS-like analytical queries on 10 TB Delta dataset¶

Query type	Databricks (Photon, 8-node i3.xlarge)	Databricks (non-Photon, 8-node)	Fabric Spark (F64)	Notes
Simple scan + filter	4.2s	8.1s	9.5s	Photon excels at scan-heavy queries
Multi-table join (3 tables)	12.8s	28.3s	31.0s	Photon vectorized join is fast
Window function (RANK, LAG)	8.5s	18.2s	19.8s	Similar gap
Heavy aggregation (GROUP BY 10 cols)	6.1s	14.7s	15.3s	Photon aggregation is optimized
Complex subquery (correlated)	22.4s	45.8s	48.2s	All Spark; Photon less dominant
String manipulation (regex, concat)	9.3s	15.6s	16.1s	Photon string handling is faster

2.2 Analysis¶

Photon vs Fabric Spark: Photon is consistently 2-3x faster than Fabric Spark for scan-heavy and join-heavy queries. This is expected -- Photon is a custom C++ engine, while Fabric Spark is managed open-source Apache Spark.
Non-Photon Databricks vs Fabric Spark: Performance is comparable (within 10-15%). Both run the same underlying Spark engine.
V-Order impact: Fabric tables written with V-Order show ~15-20% read improvement over non-V-Order Delta tables. This partially closes the Photon gap for read-heavy workloads.

2.3 When this matters¶

If your workloads are Photon-dependent (queries that must finish in <10s), Fabric Spark will be noticeably slower.
If your workloads are moderate (queries finishing in 30s-5min), the difference is less significant and may be offset by cost savings.
If your workloads are write-heavy (ETL pipelines), V-Order auto-optimization on Fabric may improve downstream read performance.

3. SQL analytics (BI queries)¶

3.1 Benchmark: Power BI-style queries on 500 GB semantic model¶

Query pattern	DBSQL Pro (Medium warehouse)	DBSQL Serverless	Fabric SQL endpoint	Fabric Direct Lake
Single-table scan (dashboard card)	1.8s	2.1s	2.5s	0.3s
Star-schema join (fact + 3 dims)	3.2s	3.8s	4.1s	0.8s
Year-over-year comparison	4.5s	5.2s	5.8s	1.2s
Top-N with filter	2.1s	2.5s	2.9s	0.5s
Complex DAX-equivalent aggregation	5.8s	6.5s	7.2s	1.5s

3.2 Analysis¶

Direct Lake is the standout. For Power BI queries, Direct Lake is 3-8x faster than any SQL endpoint because it uses the VertiPaq engine to read directly from Delta/Parquet files. No SQL translation, no round-trip to a SQL warehouse.
DBSQL vs Fabric SQL endpoint: DBSQL Pro is ~15-20% faster than the Fabric SQL endpoint. This reflects Photon's optimization for SQL workloads.
Direct Lake caveat: Direct Lake has a "fallback to DirectQuery" behavior for very complex queries (e.g., many-to-many relationships, complex calculated columns). When fallback occurs, performance matches the SQL endpoint column.

3.3 Cost-per-query comparison¶

Platform	Query cost (estimated, 500 GB model, medium complexity)
DBSQL Pro (Medium, always-on)	~$0.12 per query (DBU cost + VM cost)
DBSQL Serverless	~$0.08 per query (higher DBU rate, but no idle cost)
Fabric SQL endpoint (F64)	~$0.02 per query (CU amortized over all workloads)
Fabric Direct Lake (F64)	~$0.005 per query (VertiPaq, minimal CU)

Direct Lake is approximately 25x cheaper per query than DBSQL Pro for typical BI workloads. This is the primary cost driver for migrating BI workloads to Fabric.

4. Streaming and real-time¶

4.1 Benchmark: Event ingestion from Event Hubs (10K events/sec)¶

Metric	Databricks Structured Streaming (4-node cluster)	Fabric Spark Structured Streaming (F64)	Fabric Eventhouse (RTI)
End-to-end latency (p50)	2.1s	2.8s	0.3s
End-to-end latency (p99)	8.5s	11.2s	1.2s
Throughput (events/sec)	45K	35K	100K+
Query latency on recent data	3.5s (Delta + DBSQL)	4.2s (Delta + SQL endpoint)	0.1s (KQL)
Cost per hour	~$18.50 (DBU + VM)	~$5.20 (CU)	~$2.10 (CU)

4.2 Analysis¶

Eventhouse (RTI) dominates for streaming analytics: 10x lower latency, 3x higher throughput, 9x lower cost than Databricks Structured Streaming. This is because Eventhouse is purpose-built for time-series ingestion and KQL queries, not a general Spark cluster.
Spark-to-Spark streaming: Fabric Spark is ~20-30% slower than Databricks Spark for structured streaming, consistent with the batch benchmark gap.
Cost advantage: Fabric streaming is significantly cheaper because there is no always-on cluster. The CU cost is amortized and smoothed.

4.3 When to use each¶

Scenario	Best platform
Real-time dashboard (sub-second refresh)	Fabric RTI / Eventhouse
Complex streaming ETL (joins, windows, UDFs)	Databricks Structured Streaming
Event-driven alerting	Fabric RTI + Data Activator
Streaming to Delta (append-only archive)	Fabric Spark Structured Streaming (cost) or Databricks (throughput)

5. Auto Loader vs Fabric file ingestion¶

5.1 Benchmark: Detect and process new files (1,000 files, 10 MB each)¶

Metric	Databricks Auto Loader (notification mode)	Databricks Auto Loader (directory listing)	Fabric Data Pipeline (event trigger)	Fabric Spark file streaming
Detection latency	<5s	30-60s (depends on listing interval)	10-30s (event propagation)	<5s (checkpoint polling)
Processing latency	15s (cluster already running)	15s	45s (pipeline startup)	20s (Spark session start)
Total end-to-end	~20s	~75s	~60s	~25s
Cost per batch	~$0.45 (DBU + VM)	~$0.45	~$0.08 (CU)	~$0.12 (CU)

5.2 Analysis¶

Detection: Auto Loader notification mode is fastest. Fabric event triggers have a small propagation delay.
Processing: Databricks is faster if the cluster is already running. Fabric Data Pipeline has a cold-start overhead (~30-45s) for pipeline initialization.
Cost: Fabric is 4-5x cheaper per batch because there is no always-on cluster.
Schema evolution: Auto Loader handles schema inference and evolution automatically. Fabric Data Pipeline requires explicit schema handling.

6. Startup time¶

6.1 Benchmark: Time from job trigger to first code execution¶

Scenario	Databricks	Fabric
Interactive cluster (already running)	0s	N/A (no persistent cluster)
Job cluster (new cluster start)	3-7 min	N/A
Serverless notebook	10-30s	30-60s
SQL warehouse (running)	0s	0s (SQL endpoint always on)
SQL warehouse (cold start)	30-90s (classic) / 5-10s (serverless)	0s (SQL endpoint has no cold start)
Data Pipeline activity	N/A	15-30s (pipeline init)

6.2 Analysis¶

Databricks advantage: If you keep clusters running, startup is instant. For interactive development, a running cluster is faster.
Fabric advantage: SQL endpoint has no cold start (always-on within capacity). Serverless Spark starts in 30-60s without cluster management.
Trade-off: Databricks instant start requires paying for always-on clusters. Fabric's 30-60s Spark start avoids that cost but adds latency.

7. DLT vs Fabric pipelines¶

7.1 Benchmark: 3-tier medallion pipeline on 100 GB daily increment¶

Metric	DLT (Pro tier, 4-node cluster)	Fabric (dbt-fabric + Data Pipeline, F64)
Pipeline execution time	22 min	28 min
Data quality check time	Included in DLT run	+4 min (dbt test)
Total pipeline time	22 min	32 min
Cost per run	~$12.50	~$3.80
Monthly cost (daily run)	~$375	~$114
Quality metrics visibility	DLT UI (expectations dashboard)	dbt test results + custom dashboard
Setup complexity	Low (declarative)	Medium (dbt models + pipeline config)

7.2 Analysis¶

Performance: DLT is ~30% faster because it optimizes the entire pipeline graph (avoiding redundant shuffles). dbt runs models sequentially or in parallel based on the DAG.
Cost: Fabric is ~70% cheaper per run due to CU pricing vs DBU + VM cost.
Quality monitoring: DLT's built-in expectations UI is more polished than dbt's test output. However, dbt's store_failures + Power BI dashboard can replicate the experience.
Maintenance: dbt models are SQL files in Git, testable locally, and familiar to analytics engineers. DLT pipelines are Python/SQL in Databricks notebooks with less standard tooling.

8. Benchmark summary scorecard¶

Category	Databricks wins	Fabric wins	Notes
Raw Spark performance	Yes (Photon)	--	2-3x faster with Photon
BI query speed	--	Yes (Direct Lake)	5-8x faster for PBI queries
BI query cost	--	Yes	25x cheaper per query
Streaming latency	--	Yes (Eventhouse)	10x lower latency
Streaming cost	--	Yes	9x cheaper
File ingestion speed	Yes (Auto Loader)	--	Faster detection + processing
File ingestion cost	--	Yes	4-5x cheaper
Pipeline execution time	Yes (DLT)	--	~30% faster
Pipeline cost	--	Yes	~70% cheaper
SQL endpoint cold start	--	Yes	No cold start in Fabric
Spark startup time	Yes (running cluster)	--	Instant if cluster is on

Pattern: Databricks wins on raw performance (Photon, DLT optimization). Fabric wins on cost and BI-specific workloads (Direct Lake, Eventhouse). For most organizations, the cost savings outweigh the performance gap for BI and analytics workloads. For heavy compute (ML training, Photon-dependent ETL), Databricks remains faster.

9. Running your own benchmarks¶

Step 1: Identify representative queries¶

Select 10-20 queries that represent your actual workload:

5 dashboard queries (simple scans, filters, aggregations)
5 ETL queries (joins, window functions, complex transforms)
5 ad-hoc queries (exploratory, varying complexity)

Step 2: Prepare identical datasets¶

Ensure the same Delta tables are accessible from both platforms:

Use OneLake shortcuts on Fabric pointing to the same ADLS paths Databricks reads
Verify row counts match

Step 3: Run on Databricks¶

Use your production cluster configuration
Run each query 3 times; record median execution time
Record cluster cost (DBU + VM) for the test duration

Step 4: Run on Fabric¶

Use your target capacity SKU
Run each query 3 times; record median execution time
Record CU consumption from the Fabric Capacity Metrics app

Step 5: Compare and decide¶

Build a comparison spreadsheet:

Query	DBR time	Fabric time	DBR cost	Fabric cost	Decision
Q1 (dashboard card)	__s	__s	$__	$__	__
Q2 (star join)	__s	__s	$__	$__	__
...

If Fabric is within 2x of Databricks performance and 3x cheaper, it is typically the right move for that workload.

TCO Analysis -- full cost comparison framework
Why Fabric over Databricks -- strategic context
Feature Mapping -- capability comparison
Best Practices -- capacity planning based on benchmarks
Parent guide: 5-phase migration

Maintainers: csa-inabox core team Source finding: CSA-0083 (HIGH, XL) -- approved via AQ-0010 ballot B6 Last updated: 2026-04-30