ADR 0005 — Event Hubs over open-source Kafka for streaming ingestion¶
Context and Problem Statement¶
Several vertical examples (IoT streaming, EPA air-quality, NOAA weather, casino analytics) require a durable, high-throughput streaming buffer in front of Bronze. We must pick a default streaming broker that works in Azure Government on day one, survives burst ingestion, integrates with our Purview/Unity Catalog governance, and does not force customers to operate a stateful distributed system.
Decision Drivers¶
- Azure Government availability — broker must be a Gov-GA PaaS with FedRAMP High inheritance.
- Operational burden — a managed broker eliminates ZooKeeper/KRaft, broker patching, and partition rebalancing as customer responsibilities.
- Kafka protocol compatibility — customers with existing Kafka clients must be able to connect without code rewrites.
- Native integration with Stream Analytics, Azure Functions, ADX, and Databricks Structured Streaming.
- Cost predictability — throughput-unit or auto-inflate pricing should be simpler to forecast than self-hosted MSK/Confluent Cloud equivalents.
Considered Options¶
- Azure Event Hubs (chosen) — Managed PaaS, Gov-GA, Kafka-protocol endpoint for drop-in clients, native connectors to Stream Analytics, ADX, Functions, and Databricks.
- Self-hosted Apache Kafka on AKS — Full control, open-source, but customer-owned cluster management.
- Confluent Cloud on Azure — Managed Kafka with full Kafka ecosystem (Connect, Schema Registry, ksqlDB).
- Azure Service Bus — Reliable queuing/pub-sub for business messaging but not a data-stream broker.
Decision Outcome¶
Chosen: Option 1 — Event Hubs as the default streaming broker, with a Kafka-protocol endpoint enabled so Kafka-client code connects without changes. Event Hubs Capture writes directly to Bronze (ADLS Gen2) as Avro/Parquet. A Service Bus lane is reserved for transactional/business messaging and is not considered for data streaming.
Consequences¶
- Positive: Managed PaaS in Azure Gov; FedRAMP High inheritance; no broker operations.
- Positive: Kafka-protocol endpoint means existing producers/consumers (librdkafka, kafka-python, Spark kafka source) work unmodified.
- Positive: Event Hubs Capture gives zero-code Bronze ingestion — no bespoke consumer needed for archival.
- Positive: First-class integration with ADX (for hot-path analytics), Databricks Structured Streaming, and Stream Analytics.
- Negative: Event Hubs is a narrower subset of Kafka — no Kafka Connect, no Kafka Streams, no KSQL, no broker-side transactions across topics.
- Negative: Partition counts are fixed at creation (premium tier relaxes this) — capacity planning matters.
- Negative: Schema Registry is available (Event Hubs Schema Registry) but less mature than Confluent's; we pair it with dbt contracts where possible.
- Neutral: If a customer requires the full Kafka ecosystem, Confluent Cloud on Azure remains a viable alternate path.
Pros and Cons of the Options¶
Option 1 — Event Hubs¶
- Pros: Managed PaaS; Gov-GA; Kafka-protocol compatible; Capture to ADLS; native ADX/Databricks integration; predictable throughput-unit pricing.
- Cons: Subset of Kafka features; no Kafka Connect; fixed partition counts on standard tier.
Option 2 — Self-hosted Kafka on AKS¶
- Pros: Full Kafka feature set; Kafka Connect ecosystem; no vendor markup.
- Cons: Customer-owned cluster operations, upgrades, and HA; stateful workload on AKS is operationally expensive.
Option 3 — Confluent Cloud on Azure¶
- Pros: Managed full Kafka; Schema Registry; Kafka Connect; ksqlDB.
- Cons: Third-party service; Gov-GA story is weaker; additional vendor procurement; cross-account networking complexity.
Option 4 — Service Bus¶
- Pros: Strong business-messaging semantics (FIFO, sessions, DLQ).
- Cons: Not a streaming broker; no partitioned append log; wrong tool for high-throughput ingest.
Validation¶
We will know this decision is right if:
- All streaming vertical examples ingest with zero custom broker-management code.
- Event Hubs Capture covers >90% of Bronze-archival use cases.
- If two or more customers blocked on Kafka Connect or transactional cross-topic writes, revisit with Confluent Cloud as the alternate.
References¶
- Decision tree: Kafka vs. Event Hubs vs. Service Bus
- Decision tree: Batch vs. Streaming
- Related code:
examples/iot-streaming/,examples/noaa/,examples/epa/,examples/casino-analytics/(streaming ingestion patterns) - Framework controls: NIST 800-53 SC-7 (boundary protection via Private Endpoints on the Event Hubs namespace), AU-2 (diagnostic logs to Log Analytics), SC-8 (TLS in transit). See
governance/compliance/nist-800-53-rev5.yaml. - Discussion: CSA-0087