ADF vs Databricks Workflows vs Fabric Data Pipelines¶

TL;DR¶

Azure Data Factory for hybrid/enterprise orchestration with 100+ connectors, Databricks Workflows for Spark-centric DAGs and ML pipelines, Fabric Data Pipelines for Fabric-native workloads with OneLake-first simplicity.

When this question comes up¶

A data platform needs an orchestration layer and the team is choosing between native Azure, Databricks, or Fabric tooling.
Existing ADF pipelines are being evaluated for migration to Fabric or Databricks.
The workload mixes on-prem/hybrid sources with cloud-native Spark transformations.
Cost or licensing consolidation is driving a "one orchestrator" decision.

Decision tree¶

flowchart TD
    start["Where do workloads run?"] -->|All in Microsoft Fabric| q_fabric
    start -->|Spark / ML-heavy on Databricks| q_spark
    start -->|Hybrid / on-prem + cloud mix| q_hybrid

    q_fabric{"Need orchestration beyond<br/>Fabric items (external APIs,<br/>on-prem SFTP)?"}
    q_fabric -->|No — Fabric-native| rec_fp["**Recommend:** Fabric<br/>Data Pipelines"]
    q_fabric -->|Yes — external sources| q_connector

    q_connector{"Need >100 connectors or<br/>SHIR for on-prem?"}
    q_connector -->|Yes| rec_adf["**Recommend:** Azure Data Factory"]
    q_connector -->|No — few external calls| rec_fp

    q_spark{"Need DAG orchestration<br/>across notebooks + JAR tasks<br/>+ model serving?"}
    q_spark -->|Yes — Spark-centric DAGs| rec_dbw["**Recommend:** Databricks Workflows"]
    q_spark -->|No — simple notebook trigger| q_existing

    q_existing{"Already using ADF for<br/>broader orchestration?"}
    q_existing -->|Yes| rec_adf
    q_existing -->|No| rec_dbw

    q_hybrid{"On-prem sources via SHIR<br/>or VNet-managed endpoints?"}
    q_hybrid -->|Yes — SHIR required| rec_adf
    q_hybrid -->|No — cloud-to-cloud| q_engine

    q_engine{"Primary compute engine?"}
    q_engine -->|Databricks| rec_dbw
    q_engine -->|Fabric| rec_fp
    q_engine -->|Mixed / no preference| rec_adf

Per-recommendation detail¶

When: Hybrid/enterprise orchestration spanning on-prem, multi-cloud, and Azure-native services; need for 100+ built-in connectors, Self-Hosted Integration Runtime (SHIR), or Mapping Data Flows. Why: Broadest connector catalog on Azure; SHIR bridges on-prem SQL Server, Oracle, SAP, and file shares; mature CI/CD via ARM/Bicep export; integrates with Databricks and Fabric as downstream compute. Tradeoffs: Cost — per-activity-run + DIU-hours for copy activities; Latency — minutes for pipeline triggers, seconds for activity dispatch; Compliance — FedRAMP High, IL5 in Azure Gov; Skill — low-code authoring in portal, JSON pipeline definitions. Anti-patterns:

Pure Spark/notebook DAGs with no external sources — Databricks Workflows is more native and avoids ADF overhead.
All-Fabric estate with no hybrid needs — Fabric Data Pipelines is simpler and included in capacity.

Linked example: ADF Setup Guide | ADR-0001: ADF + dbt over Airflow

When: Spark-centric DAGs orchestrating notebooks, Python/JAR tasks, Delta Live Tables, and ML model training/serving within Databricks. Why: Native task orchestration inside Databricks with job clusters that spin up/down per run; supports multi-task DAGs with dependencies, retries, and conditional logic; integrates with MLflow for experiment tracking and model registry. Tradeoffs: Cost — DBU-based per job cluster; Latency — cluster cold-start 2-5 min (mitigated with pools); Compliance — FedRAMP High, IL4/IL5 with qualifying SKUs; Skill — Spark + Python, Databricks workspace familiarity. Anti-patterns:

Orchestrating non-Databricks services (Azure SQL, Blob copy, SFTP) as primary pattern — ADF has better connectors.
Cost-sensitive workloads with simple scheduling needs — Fabric Pipelines or ADF Mapping Data Flows may be cheaper.

Linked example: Databricks Guide

When: All workloads live inside Microsoft Fabric (lakehouses, warehouses, notebooks, dataflows); need simple orchestration without leaving the Fabric control plane. Why: Included in Fabric capacity (no per-pipeline billing); familiar ADF-like authoring UX; native OneLake integration eliminates copy-activity overhead for Fabric-to-Fabric moves; Copy job for high-scale ingestion. Tradeoffs: Cost — consumed from F-SKU capacity (no separate billing); Latency — comparable to ADF for copy/notebook activities; Compliance — Commercial GA only (Azure Gov pending); Skill — low (ADF experience transfers directly). Anti-patterns:

Hybrid on-prem sources requiring SHIR — Fabric Pipelines lacks SHIR support today; use ADF.
Complex multi-cloud orchestration with 50+ diverse connectors — ADF connector catalog is broader.
Azure Government workloads — Fabric is not yet GA in Gov (2026-Q2).

Linked example: Fabric vs. Databricks vs. Synapse | ADR-0001: ADF + dbt over Airflow

Guide: ADF Setup
Guide: Databricks Guide
Guide: Microsoft Fabric Platform Guide
Decision: Fabric vs. Databricks vs. Synapse
ADR: 0001 - ADF + dbt over Airflow
Companion: Supercharge Microsoft Fabric — Data Pipelines — hands-on Fabric Data Pipelines tutorial
Companion: Supercharge Microsoft Fabric — Metadata-Driven Pipelines — production metadata-driven patterns

ADF vs Databricks Workflows vs Fabric Data Pipelines¶

TL;DR¶

When this question comes up¶

Decision tree¶

Per-recommendation detail¶

Recommend: Azure Data Factory¶

Recommend: Databricks Workflows¶

Recommend: Fabric Data Pipelines¶

Related¶