Skip to content

📨 Kafka on HDInsight

Status Level Duration

Deploy and manage Kafka on HDInsight. Learn cluster setup, topic management, and high availability.

🎯 Learning Objectives

  • Create Kafka cluster on HDInsight
  • Configure topics and partitions
  • Implement producers and consumers
  • Ensure high availability
  • Monitor cluster health

📋 Prerequisites

  • Azure subscription
  • Understanding of messaging systems
  • Event streaming concepts

🚀 Create Kafka Cluster

# Azure CLI
az hdinsight create \
  --name kafka-cluster \
  --resource-group rg-kafka \
  --type kafka \
  --component-version Kafka=2.4 \
  --cluster-tier standard \
  --worker-node-count 3 \
  --worker-node-data-disks-per-node 2

📡 Topic Management

# List brokers
cat /etc/hosts | grep wn

# Create topic
kafka-topics.sh --create \
  --topic orders \
  --partitions 3 \
  --replication-factor 2 \
  --bootstrap-server wn0:9092

# Describe topic
kafka-topics.sh --describe \
  --topic orders \
  --bootstrap-server wn0:9092

💻 Python Producer/Consumer

# Producer
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers=['broker1:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

producer.send('orders', {'order_id': 1, 'amount': 99.99})
producer.flush()
# Consumer
from kafka import KafkaConsumer

consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['broker1:9092'],
    auto_offset_reset='earliest',
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

for message in consumer:
    print(message.value)

📚 Resources


Last Updated: January 2025