Skip to content

Complete Feature Mapping: Cloudera to Azure

Every Cloudera component -- CDH, CDP Private Cloud, and CDP Public Cloud -- mapped to its Azure equivalent with migration complexity, CSA-in-a-Box evidence paths, and practical notes.


How to use this document

Each section below maps a Cloudera component or capability to its Azure-native equivalent. The migration complexity rating uses a three-tier scale:

Rating Meaning
Low Configuration change or near-direct replacement; minimal code changes
Medium Requires code modification, schema conversion, or workflow redesign
High Fundamental redesign required; no direct equivalent exists

1. Storage layer

HDFS

Cloudera component Azure equivalent Migration complexity Notes
HDFS (NameNode + DataNodes) ADLS Gen2 + OneLake Medium Directory structure maps to container/folder hierarchy. No NameNode HA to manage. Redundancy handled by LRS/ZRS/GRS.
HDFS Federation Multiple ADLS Gen2 storage accounts Low Each HDFS namespace maps to a storage account or container.
HDFS Snapshots ADLS Gen2 blob versioning + soft delete Low Enable versioning on storage account; no manual snapshot management.
HDFS Encryption Zones ADLS Gen2 encryption (SSE + customer-managed keys) Low All data encrypted at rest by default. Customer-managed keys via Key Vault.
HDFS Erasure Coding ADLS Gen2 storage tiers Low Erasure coding for storage efficiency replaced by hot/cool/archive tiering.
WebHDFS / HttpFS ADLS Gen2 REST API / abfss:// driver Low Standard REST API; Hadoop-compatible abfss:// filesystem driver.

Kudu

Cloudera component Azure equivalent Migration complexity Notes
Apache Kudu Delta Lake on ADLS Gen2 Medium Kudu's fast-insert mutable storage maps to Delta Lake ACID transactions, MERGE operations, and time travel. See Impala Migration for Kudu-to-Delta conversion.

2. SQL and query engines

Hive

Cloudera component Azure equivalent Migration complexity Notes
Hive on Tez Databricks SQL + dbt models Medium HiveQL ports to Spark SQL with minor syntax changes. See playbook Section 6.
Hive LLAP Databricks SQL Warehouse (Serverless) Medium LLAP's caching behavior replaced by Photon engine + result caching.
Hive Metastore (HMS) Unity Catalog Medium HMS schemas export to Unity Catalog. Three-level namespace: catalog.schema.table.
Hive ACID tables Delta tables Medium Hive ACID transactions replaced by Delta Lake ACID. MERGE, UPDATE, DELETE supported natively.
Hive UDFs (Java) Python UDFs / pandas_udf / built-in functions High Java UDFs must be rewritten. Budget 30% of workload migration effort. See playbook Section 6.3.
Hive SerDes Spark format readers / Delta Lake Medium Custom SerDes replaced by Spark's built-in format support or custom readers.
Hive Views Databricks SQL views / dbt models Low Views port directly; consider converting to dbt models for lineage.
Beeline CLI Databricks SQL CLI / Azure Data Studio Low Direct replacement for interactive SQL access.

Impala

Cloudera component Azure equivalent Migration complexity Notes
Impala (interactive SQL) Databricks SQL Warehouse Medium Impala SQL is close to Spark SQL. See Impala Migration.
Impala COMPUTE STATS Databricks ANALYZE TABLE Low Syntax change only.
Impala metadata caching Databricks result caching + Photon Low Photon + adaptive query execution replace Impala's catalog caching.
Impala Parquet reader Delta Lake (Parquet-native) Low Delta Lake reads Parquet natively with additional features (time travel, Z-ordering).
Impala shell Databricks SQL CLI / JDBC Low Connection string change.

3. Compute and processing

Spark

Cloudera component Azure equivalent Migration complexity Notes
Spark on YARN Azure Databricks (Jobs + SQL) Low-Medium PySpark/Scala Spark code is highly portable. Remove YARN configs, update paths. See playbook Section 7.
Spark Streaming (DStreams) Databricks Structured Streaming Medium DStreams deprecated; rewrite to Structured Streaming API.
Spark Structured Streaming Databricks Structured Streaming Low Near-direct port; update source/sink configurations.
spark-submit scripts Databricks Jobs API / Workflows Low Submit scripts become Job definitions (JSON/YAML). See playbook Section 7.3.
Spark History Server Databricks Spark UI / Azure Monitor Low Built-in Spark UI per cluster; historical data in Azure Monitor.
Spark Thrift Server Databricks SQL Warehouse Low JDBC/ODBC endpoint with Photon acceleration.

MapReduce

Cloudera component Azure equivalent Migration complexity Notes
MapReduce jobs Databricks Spark jobs High MapReduce must be rewritten to Spark. No direct equivalent.
Streaming MapReduce Spark Structured Streaming High Complete rewrite required.

YARN

Cloudera component Azure equivalent Migration complexity Notes
YARN ResourceManager Databricks cluster autoscaling Low Managed by Databricks; no user-managed resource manager.
YARN queues / Capacity Scheduler Databricks cluster policies Low Queue-based isolation becomes policy-based isolation.
YARN NodeManager Databricks worker nodes Low Managed by Databricks auto-scaling.
YARN ApplicationMaster Databricks driver node Low Transparent; Databricks manages driver lifecycle.

4. Data ingestion and integration

NiFi

Cloudera component Azure equivalent Migration complexity Notes
Apache NiFi Azure Data Factory + Logic Apps Medium-High Different paradigm. See NiFi Migration for processor mapping.
NiFi Registry ADF Git integration (Azure DevOps / GitHub) Low Version control model is different but functionally equivalent.
NiFi clustering ADF Integration Runtime scaling Low ADF handles scaling internally.
NiFi Site-to-Site ADF Self-Hosted Integration Runtime Medium SHIR provides secure on-prem to cloud data movement.
MiNiFi (edge agents) Azure IoT Edge + ADF Medium Edge data collection and forwarding.

Sqoop

Cloudera component Azure equivalent Migration complexity Notes
Sqoop import (RDBMS to HDFS) ADF Copy Activity Low Direct replacement with more connectors and better parallelism.
Sqoop export (HDFS to RDBMS) ADF Copy Activity (reverse) Low Same activity, different direction.
Sqoop incremental import ADF tumbling window trigger + watermark Low ADF handles incremental patterns natively with watermarking.

Flume

Cloudera component Azure equivalent Migration complexity Notes
Flume agents (source/channel/sink) Event Hubs + Azure Functions Medium Event Hubs replaces the channel; Functions replace sink logic.
Flume interceptors Event Hubs event processing + Functions Medium Transform logic moves to Functions or Databricks Structured Streaming.
Flume to HDFS sink Event Hubs Capture (to ADLS Gen2) Low Event Hubs Capture writes Avro/Parquet directly to ADLS.

5. Messaging and streaming

Kafka

Cloudera component Azure equivalent Migration complexity Notes
Kafka brokers Azure Event Hubs (Kafka endpoint) Low Kafka wire-protocol compatible. Config change only. See ADR-0005.
Kafka Connect ADF connectors / Event Hubs connectors Medium Reimplement connectors using ADF or custom Functions.
Kafka Streams Databricks Structured Streaming / Azure Stream Analytics Medium Rewrite Kafka Streams apps to Spark Streaming or ASA.
Schema Registry Azure Schema Registry (Event Hubs) Low Schema Registry built into Event Hubs namespace.
Kafka MirrorMaker Event Hubs geo-DR / Event Hubs Capture Low Built-in geo-replication and capture.
Streams Messaging Manager (SMM) Azure Monitor + Event Hubs metrics Low Monitoring and alerting via Azure Monitor dashboards.
Kafka topics (retention) Event Hubs retention (1-90 days, or capture to ADLS) Low Configure retention per Event Hub; long-term via Capture.

6. Orchestration

Oozie

Cloudera component Azure equivalent Migration complexity Notes
Oozie Workflow ADF Pipeline / Databricks Workflows Medium See playbook Section 8 for conversion patterns.
Oozie Coordinator ADF Schedule/Tumbling Window Trigger Low Time and data triggers map directly.
Oozie Bundle ADF Execute Pipeline (nested) Low Group related pipelines into a parent.
Oozie Fork/Join ADF parallel activities Low Native parallel execution in ADF.
Oozie Decision node ADF If Condition / Switch Low Expression-based branching.
Oozie Shell action ADF Custom Activity / Azure Batch Medium Arbitrary scripts via Azure Batch.
Oozie Email action Logic App (triggered by ADF) Low ADF triggers Logic App for notifications.
Oozie SLA monitoring ADF monitoring + alerts Low Azure Monitor alerts on pipeline duration/failure.

7. Security and governance

Ranger

Cloudera component Azure equivalent Migration complexity Notes
Ranger (database/table access) Unity Catalog GRANT Medium See playbook Section 9.1 for policy decomposition.
Ranger column masking Unity Catalog column masks Medium Masking functions + ALTER TABLE SET COLUMN MASK.
Ranger row-level filtering Unity Catalog row filters Medium Filter functions + ALTER TABLE SET ROW FILTER.
Ranger HDFS policies ADLS Gen2 RBAC + ACLs Medium Azure IAM role assignments on containers/folders.
Ranger Kafka policies Event Hubs RBAC Low Entra ID roles: Data Sender / Data Receiver.
Ranger tag-based policies Purview classifications + sensitivity labels Medium Purview auto-classification replaces Atlas tags + Ranger tag policies.
Ranger KMS Azure Key Vault Low Centralized key management with HSM backing.
Ranger audit Azure Monitor + Log Analytics Low Unified audit trail across all services.

Atlas

Cloudera component Azure equivalent Migration complexity Notes
Atlas metadata catalog Microsoft Purview Medium Business glossary, classifications, data lineage. See ADR-0006.
Atlas lineage tracking Purview lineage + ADF lineage + Unity Catalog lineage Low Automatic lineage from ADF pipelines and Databricks queries.
Atlas classifications/tags Purview classifications + sensitivity labels Medium Auto-classification scans replace manual Atlas tagging.
Atlas business glossary Purview business glossary Low Term-level mapping is straightforward.
Atlas REST API Purview REST API / Purview SDK Low API-based catalog access with Python SDK.

Kerberos / Authentication

Cloudera component Azure equivalent Migration complexity Notes
Kerberos KDC Entra ID Medium Cloud-managed identity; no on-prem KDC.
Keytab files Service principals + managed identities Medium Managed identities preferred for Azure service-to-service auth.
kinit in scripts MSAL token acquisition / managed identity Medium Remove kinit calls; use DefaultAzureCredential.
Sentry (legacy) Entra ID RBAC Low Sentry roles map cleanly to Entra ID groups + Unity Catalog grants.

Knox

Cloudera component Azure equivalent Migration complexity Notes
Apache Knox gateway Azure API Management Medium Knox topology-based URL rewriting becomes APIM policy-based routing.
Knox SSO Entra ID SSO Low Enterprise SSO with SAML/OIDC.

8. Cluster management and monitoring

Cloudera Manager

Cloudera component Azure equivalent Migration complexity Notes
Cloudera Manager Azure Portal + Azure Monitor Low Service health, metrics, and alerting via Azure Monitor.
CM host health checks Azure Monitor VM insights Low Built-in VM and service monitoring.
CM service monitoring Azure Monitor + Databricks Admin Console Low Per-service dashboards and alerts.
CM configuration management Bicep IaC / Terraform Low Infrastructure as Code replaces CM configuration profiles.
CM rolling upgrades Managed by Azure services Low No manual upgrade orchestration.
CM HDFS reports ADLS Gen2 storage metrics + Azure Monitor Low Built-in storage analytics.
CM YARN reports Databricks cluster metrics Low Cluster utilization dashboards in Databricks admin console.

Hue

Cloudera component Azure equivalent Migration complexity Notes
Hue SQL editor Databricks SQL Editor / Azure Data Studio Low Direct replacement for interactive SQL.
Hue job browser Databricks Workflows UI / ADF Monitor Low Built-in job monitoring per service.
Hue file browser Azure Storage Explorer / Azure Portal Low GUI-based storage browsing.
Hue Oozie editor ADF Pipeline editor (visual) Low Visual pipeline design in ADF Studio.

9. CDP-specific components

CDP Data Engineering (CDE)

Cloudera component Azure equivalent Migration complexity Notes
CDE virtual clusters Databricks workspaces Low Workspace-level isolation replaces virtual cluster isolation.
CDE Spark jobs Databricks Jobs Low Spark job definitions map directly.
CDE Airflow Databricks Workflows / ADF Medium Airflow DAGs convert to Databricks multi-task jobs or ADF pipelines.
CDE CLI Databricks CLI / REST API Low CLI tooling for job management.
CDE job monitoring Databricks Jobs UI + Azure Monitor Low Built-in monitoring and alerting.

For detailed CDE migration patterns, see CDP Data Engineering Guide.

CDP Machine Learning (CML)

Cloudera component Azure equivalent Migration complexity Notes
CML Sessions Databricks Notebooks / Azure ML Compute Low Jupyter-compatible environments on both targets.
CML Experiments MLflow on Databricks / Azure ML Experiments Low MLflow is available on both platforms.
CML Models (serving) Databricks Model Serving / Azure ML Endpoints Medium Model packaging and serving configuration differs.
CML Applied ML Prototypes Databricks Solution Accelerators Low Template-based quick-start patterns.
CML Spark integration Databricks native Spark Low Tighter integration on Databricks.

CDP Data Warehouse (CDW)

Cloudera component Azure equivalent Migration complexity Notes
CDW Hive Virtual Warehouse Databricks SQL Warehouse Medium HiveQL to Spark SQL conversion.
CDW Impala Virtual Warehouse Databricks SQL Warehouse Medium See Impala Migration.
CDW auto-scaling Databricks SQL Serverless auto-scaling Low Serverless scaling on Databricks is more granular.

10. Infrastructure services

ZooKeeper

Cloudera component Azure equivalent Migration complexity Notes
ZooKeeper Managed by Azure services internally Low No user-managed ZooKeeper. Event Hubs, Databricks, and Cosmos DB handle coordination internally.

Miscellaneous

Cloudera component Azure equivalent Migration complexity Notes
Cloudera Navigator (legacy) Microsoft Purview Medium Legacy governance tool; mapped to Purview.
Cloudera Data Steward Studio Purview Data Catalog Low Data stewardship and quality monitoring.
Cloudera Replication Manager ADF Copy Activity / ADLS geo-replication Low Data replication and DR.
Cloudera Workload XM Databricks Overwatch / Azure Monitor Low Workload performance analysis.
HBase Azure Cosmos DB (NoSQL or Table API) High Wide-column key-value; requires schema remapping.
Phoenix (SQL on HBase) Cosmos DB SQL API / Azure SQL High SQL layer over key-value store; redesign likely.
Solr (Cloudera Search) Azure AI Search Medium Full-text search; index schema conversion required.

Migration complexity summary

Complexity Component count Examples
Low 28 YARN, ZooKeeper, Sqoop, Kafka, Hue, Beeline, Knox SSO, CM monitoring
Medium 15 HDFS, Hive, Impala, NiFi, Ranger, Atlas, Kerberos, Oozie, CDE Airflow
High 5 Hive UDFs, MapReduce, HBase, Phoenix, NiFi (complex flows)

Takeaway: The majority of Cloudera components have low-to-medium complexity migrations. The highest-effort items are Hive UDFs, HBase, and complex NiFi flows. Plan accordingly and staff UDF rewrites early.


Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team