🌊 Kafka Streaming on HDInsight¶
Build real-time streaming pipelines with Kafka on HDInsight. Learn topics, producers, consumers, and Spark integration.
🎯 Learning Objectives¶
- Create Kafka topics
- Implement producers and consumers
- Process streams with Spark Structured Streaming
- Handle exactly-once semantics
- Monitor and troubleshoot
📋 Prerequisites¶
- HDInsight Kafka cluster
- Event Hubs knowledge
- Spark Structured Streaming
📡 Kafka Basics¶
# Create topic
kafka-topics.sh --create \
--topic events \
--partitions 3 \
--replication-factor 2 \
--bootstrap-server broker1:9092
# Produce messages
kafka-console-producer.sh \
--topic events \
--bootstrap-server broker1:9092
# Consume messages
kafka-console-consumer.sh \
--topic events \
--from-beginning \
--bootstrap-server broker1:9092
🔥 Spark Streaming¶
# Read from Kafka
df = spark.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "broker1:9092") \
.option("subscribe", "events") \
.load()
# Process stream
query = df.selectExpr("CAST(value AS STRING)") \
.writeStream \
.format("console") \
.start()
📚 Resources¶
Last Updated: January 2025