Skip to content

ADF vs Databricks Workflows vs Fabric Data Pipelines

TL;DR

Azure Data Factory for hybrid/enterprise orchestration with 100+ connectors, Databricks Workflows for Spark-centric DAGs and ML pipelines, Fabric Data Pipelines for Fabric-native workloads with OneLake-first simplicity.

When this question comes up

  • A data platform needs an orchestration layer and the team is choosing between native Azure, Databricks, or Fabric tooling.
  • Existing ADF pipelines are being evaluated for migration to Fabric or Databricks.
  • The workload mixes on-prem/hybrid sources with cloud-native Spark transformations.
  • Cost or licensing consolidation is driving a "one orchestrator" decision.

Decision tree

flowchart TD
    start["Where do workloads run?"] -->|All in Microsoft Fabric| q_fabric
    start -->|Spark / ML-heavy on Databricks| q_spark
    start -->|Hybrid / on-prem + cloud mix| q_hybrid

    q_fabric{"Need orchestration beyond<br/>Fabric items (external APIs,<br/>on-prem SFTP)?"}
    q_fabric -->|No — Fabric-native| rec_fp["**Recommend:** Fabric<br/>Data Pipelines"]
    q_fabric -->|Yes — external sources| q_connector

    q_connector{"Need >100 connectors or<br/>SHIR for on-prem?"}
    q_connector -->|Yes| rec_adf["**Recommend:** Azure Data Factory"]
    q_connector -->|No — few external calls| rec_fp

    q_spark{"Need DAG orchestration<br/>across notebooks + JAR tasks<br/>+ model serving?"}
    q_spark -->|Yes — Spark-centric DAGs| rec_dbw["**Recommend:** Databricks Workflows"]
    q_spark -->|No — simple notebook trigger| q_existing

    q_existing{"Already using ADF for<br/>broader orchestration?"}
    q_existing -->|Yes| rec_adf
    q_existing -->|No| rec_dbw

    q_hybrid{"On-prem sources via SHIR<br/>or VNet-managed endpoints?"}
    q_hybrid -->|Yes — SHIR required| rec_adf
    q_hybrid -->|No — cloud-to-cloud| q_engine

    q_engine{"Primary compute engine?"}
    q_engine -->|Databricks| rec_dbw
    q_engine -->|Fabric| rec_fp
    q_engine -->|Mixed / no preference| rec_adf

Per-recommendation detail

Recommend: Azure Data Factory

When: Hybrid/enterprise orchestration spanning on-prem, multi-cloud, and Azure-native services; need for 100+ built-in connectors, Self-Hosted Integration Runtime (SHIR), or Mapping Data Flows. Why: Broadest connector catalog on Azure; SHIR bridges on-prem SQL Server, Oracle, SAP, and file shares; mature CI/CD via ARM/Bicep export; integrates with Databricks and Fabric as downstream compute. Tradeoffs: Cost — per-activity-run + DIU-hours for copy activities; Latency — minutes for pipeline triggers, seconds for activity dispatch; Compliance — FedRAMP High, IL5 in Azure Gov; Skill — low-code authoring in portal, JSON pipeline definitions. Anti-patterns:

  • Pure Spark/notebook DAGs with no external sources — Databricks Workflows is more native and avoids ADF overhead.
  • All-Fabric estate with no hybrid needs — Fabric Data Pipelines is simpler and included in capacity.

Linked example: ADF Setup Guide | ADR-0001: ADF + dbt over Airflow

Recommend: Databricks Workflows

When: Spark-centric DAGs orchestrating notebooks, Python/JAR tasks, Delta Live Tables, and ML model training/serving within Databricks. Why: Native task orchestration inside Databricks with job clusters that spin up/down per run; supports multi-task DAGs with dependencies, retries, and conditional logic; integrates with MLflow for experiment tracking and model registry. Tradeoffs: Cost — DBU-based per job cluster; Latency — cluster cold-start 2-5 min (mitigated with pools); Compliance — FedRAMP High, IL4/IL5 with qualifying SKUs; Skill — Spark + Python, Databricks workspace familiarity. Anti-patterns:

  • Orchestrating non-Databricks services (Azure SQL, Blob copy, SFTP) as primary pattern — ADF has better connectors.
  • Cost-sensitive workloads with simple scheduling needs — Fabric Pipelines or ADF Mapping Data Flows may be cheaper.

Linked example: Databricks Guide

Recommend: Fabric Data Pipelines

When: All workloads live inside Microsoft Fabric (lakehouses, warehouses, notebooks, dataflows); need simple orchestration without leaving the Fabric control plane. Why: Included in Fabric capacity (no per-pipeline billing); familiar ADF-like authoring UX; native OneLake integration eliminates copy-activity overhead for Fabric-to-Fabric moves; Copy job for high-scale ingestion. Tradeoffs: Cost — consumed from F-SKU capacity (no separate billing); Latency — comparable to ADF for copy/notebook activities; Compliance — Commercial GA only (Azure Gov pending); Skill — low (ADF experience transfers directly). Anti-patterns:

  • Hybrid on-prem sources requiring SHIR — Fabric Pipelines lacks SHIR support today; use ADF.
  • Complex multi-cloud orchestration with 50+ diverse connectors — ADF connector catalog is broader.
  • Azure Government workloads — Fabric is not yet GA in Gov (2026-Q2).

Linked example: Fabric vs. Databricks vs. Synapse | ADR-0001: ADF + dbt over Airflow