Skip to content

🌊 Kafka Streaming on HDInsight

Status Level Duration

Build real-time streaming pipelines with Kafka on HDInsight. Learn topics, producers, consumers, and Spark integration.

🎯 Learning Objectives

  • Create Kafka topics
  • Implement producers and consumers
  • Process streams with Spark Structured Streaming
  • Handle exactly-once semantics
  • Monitor and troubleshoot

📋 Prerequisites

  • HDInsight Kafka cluster
  • Event Hubs knowledge
  • Spark Structured Streaming

📡 Kafka Basics

# Create topic
kafka-topics.sh --create \
  --topic events \
  --partitions 3 \
  --replication-factor 2 \
  --bootstrap-server broker1:9092

# Produce messages
kafka-console-producer.sh \
  --topic events \
  --bootstrap-server broker1:9092

# Consume messages
kafka-console-consumer.sh \
  --topic events \
  --from-beginning \
  --bootstrap-server broker1:9092

🔥 Spark Streaming

# Read from Kafka
df = spark.readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "broker1:9092") \
    .option("subscribe", "events") \
    .load()

# Process stream
query = df.selectExpr("CAST(value AS STRING)") \
    .writeStream \
    .format("console") \
    .start()

📚 Resources


Last Updated: January 2025