📨 Azure Event Hubs¶
See also: CSA-in-a-Box platform guide
This is the generic Azure reference for Azure Event Hubs. For how CSA-in-a-Box specifically deploys, configures, and integrates this service, see the platform guide: Azure Event Hubs guide.
Big data streaming platform and event ingestion service for millions of events per second.
🌟 Service Overview¶
Azure Event Hubs is a fully managed, real-time data ingestion service that can stream millions of events per second from any source. It provides a distributed streaming platform with low latency and seamless integration with Azure and third-party services.
🔥 Key Value Propositions¶
- Massive Scale: Ingest millions of events per second with elastic throughput
- Kafka Compatibility: Drop-in replacement for Apache Kafka with native protocol support
- Auto-Capture: Automatically capture streaming data to Azure Data Lake or Blob Storage
- Global Distribution: Multi-region replication with geo-disaster recovery
- Enterprise Security: Advanced authentication, encryption, and network isolation
🏗️ Architecture Overview¶
graph TB
subgraph "Event Producers"
IoT[IoT Devices]
Apps[Applications]
Logs[Log Collectors]
APIs[API Services]
end
subgraph "Azure Event Hubs"
subgraph "Event Hub Instance"
P1[Partition 1]
P2[Partition 2]
P3[Partition 3]
P4[Partition N]
end
Schema[Schema Registry]
Capture[Event Hub Capture]
end
subgraph "Event Consumers"
SA[Stream Analytics]
Spark[Databricks/Synapse]
Functions[Azure Functions]
Apps2[Custom Applications]
end
subgraph "Storage"
ADLS[Data Lake Gen2]
Blob[Blob Storage]
end
IoT --> P1
Apps --> P2
Logs --> P3
APIs --> P4
P1 --> SA
P2 --> Spark
P3 --> Functions
P4 --> Apps2
Capture --> ADLS
Capture --> Blob
Schema -.-> P1
Schema -.-> P2 💰 Pricing Tiers¶
🥉 Standard Tier¶
Best For: Development, testing, and variable production workloads
Features:
- Throughput Units (TUs): 1-20 auto-inflate capable
- Retention: 1-7 days configurable
- Consumer Groups: Up to 20 per Event Hub
- Partitions: Up to 32 per Event Hub
- Kafka Support: ✅ Native protocol support
- Capture: ✅ To Data Lake or Blob Storage
- Schema Registry: ✅ Included
Pricing Model:
- Base charge per Throughput Unit
- Ingress events (per million)
- Capture charge (per GB stored)
🥇 Premium Tier¶
Best For: Production workloads with predictable performance requirements
Features:
- Processing Units (PUs): 1-16 dedicated capacity
- Retention: Up to 90 days
- Consumer Groups: Unlimited
- Partitions: Up to 100 per Event Hub
- Performance Isolation: Dedicated resources
- Enhanced Security: Private Link, customer-managed keys
- Larger Messages: Up to 1 MB message size
Additional Benefits:
- Guaranteed capacity and latency
- Network isolation with Private Link
- Customer-managed encryption keys
- Multi-region disaster recovery
🏆 Dedicated Tier¶
Best For: Mission-critical enterprise workloads with extreme scale requirements
Features:
- Capacity Units (CUs): Single-tenant deployments
- Retention: Up to 90 days
- Throughput: Multiple GB/sec per CU
- Event Hubs: Unlimited namespaces and Event Hubs
- Complete Isolation: Physical hardware isolation
- Bring Your Own Key (BYOK): Full encryption control
Ideal For:
- Multi-tenant SaaS platforms
- Extremely high-volume scenarios (>100 MB/sec)
- Compliance requirements needing physical isolation
- Predictable monthly costs for large-scale operations
🎯 Core Concepts¶
Throughput Units (Standard Tier)¶
A throughput unit controls capacity for Event Hubs:
- Ingress: Up to 1 MB/sec or 1,000 events/sec per TU
- Egress: Up to 2 MB/sec or 4,096 events/sec per TU
- Auto-inflate: Automatically scale TUs based on demand
# Enable auto-inflate for an Event Hub namespace
az eventhubs namespace update \
--resource-group myResourceGroup \
--name myNamespace \
--enable-auto-inflate true \
--maximum-throughput-units 20
Partitions¶
Partitions are ordered sequences of events within an Event Hub:
- Purpose: Enable parallel processing and scaling
- Count: 1-32 (Standard), up to 100 (Premium)
- Partition Keys: Route related events to same partition
- Ordering: Guaranteed within a partition, not across partitions
# Send event with partition key for ordering
from azure.eventhub import EventHubProducerClient, EventData
producer = EventHubProducerClient.from_connection_string(
conn_str="your_connection_string",
eventhub_name="your_eventhub"
)
# Events with same partition key go to same partition
event_data = EventData("Sensor reading: 23.5°C")
producer.send_event(event_data, partition_key="sensor-123")
Consumer Groups¶
Consumer groups enable multiple applications to read from the same Event Hub independently:
- Default:
$Defaultconsumer group always available - Isolation: Each consumer group maintains its own offset
- Limit: Up to 20 (Standard), Unlimited (Premium)
# Read from specific consumer group
from azure.eventhub import EventHubConsumerClient
consumer = EventHubConsumerClient.from_connection_string(
conn_str="your_connection_string",
consumer_group="analytics-team",
eventhub_name="your_eventhub"
)
📊 Use Cases¶
📱 IoT Telemetry Ingestion¶
Scenario: Ingest millions of sensor readings per second
graph LR
Devices[IoT Devices] -->|HTTPS/AMQP| EventHub[Event Hubs]
EventHub --> Stream[Stream Analytics]
EventHub --> Capture[Capture to ADLS]
Stream --> Alerts[Real-time Alerts]
Capture --> Analytics[Batch Analytics] 📊 Application Logging & Monitoring¶
Scenario: Centralized logging for distributed applications
# Send application logs to Event Hubs
import logging
from azure.eventhub import EventHubProducerClient, EventData
import json
def send_log_event(level, message, metadata):
producer = EventHubProducerClient.from_connection_string(
conn_str=os.getenv("EVENTHUB_CONNECTION_STRING"),
eventhub_name="application-logs"
)
log_event = {
"timestamp": datetime.utcnow().isoformat(),
"level": level,
"message": message,
"metadata": metadata
}
event_data = EventData(json.dumps(log_event))
producer.send_event(event_data)
producer.close()
🔄 Change Data Capture (CDC)¶
Scenario: Stream database changes to Event Hubs for downstream processing
📈 Real-time Analytics Pipeline¶
Scenario: Process streaming data with Stream Analytics and visualize in Power BI
🚀 Quick Start¶
Create Event Hub Namespace and Hub¶
# Create resource group
az group create --name rg-eventhub-demo --location eastus
# Create Event Hubs namespace (Standard tier)
az eventhubs namespace create \
--name eventhub-demo-ns \
--resource-group rg-eventhub-demo \
--location eastus \
--sku Standard \
--enable-auto-inflate true \
--maximum-throughput-units 10
# Create Event Hub with 4 partitions
az eventhubs eventhub create \
--name telemetry-events \
--namespace-name eventhub-demo-ns \
--resource-group rg-eventhub-demo \
--partition-count 4 \
--message-retention 3
# Create consumer group
az eventhubs eventhub consumer-group create \
--eventhub-name telemetry-events \
--namespace-name eventhub-demo-ns \
--resource-group rg-eventhub-demo \
--name analytics-consumers
Send Events (Python)¶
from azure.eventhub import EventHubProducerClient, EventData
import json
# Initialize producer
producer = EventHubProducerClient.from_connection_string(
conn_str="Endpoint=sb://eventhub-demo-ns.servicebus.windows.net/;...",
eventhub_name="telemetry-events"
)
# Create batch and send events
try:
event_batch = producer.create_batch()
for i in range(100):
event_data = {
"sensor_id": f"sensor-{i % 10}",
"temperature": 20 + (i % 15),
"humidity": 50 + (i % 30),
"timestamp": datetime.utcnow().isoformat()
}
event_batch.add(EventData(json.dumps(event_data)))
producer.send_batch(event_batch)
print(f"Sent batch of {len(event_batch)} events")
finally:
producer.close()
Receive Events (Python)¶
from azure.eventhub import EventHubConsumerClient
def on_event_batch(partition_context, events):
for event in events:
print(f"Received event from partition {partition_context.partition_id}")
print(f"Event data: {event.body_as_str()}")
# Update checkpoint for this partition
partition_context.update_checkpoint()
# Initialize consumer
consumer = EventHubConsumerClient.from_connection_string(
conn_str="Endpoint=sb://eventhub-demo-ns.servicebus.windows.net/;...",
consumer_group="$Default",
eventhub_name="telemetry-events"
)
# Start receiving
try:
with consumer:
consumer.receive_batch(
on_event_batch=on_event_batch,
starting_position="-1" # Start from beginning
)
except KeyboardInterrupt:
print("Stopped receiving")
🔗 Related Topics¶
📚 Deep Dive Guides¶
- Event Streaming Basics - Fundamental concepts and patterns
- Kafka Compatibility - Using Event Hubs as Kafka replacement
- Capture to Storage - Automatic archival to Data Lake
- Schema Registry - Schema management for Avro data
🛠️ Integration Scenarios¶
🎯 Best Practices¶
- Performance Optimization
- Security Configuration
- Cost Optimization
Last Updated: 2025-01-28 Service Version: General Availability Documentation Status: Complete