Skip to content

Streaming Architecture Patterns for Cloud Scale Analytics

Status Patterns Complexity

Comprehensive guide to streaming architecture patterns for real-time data processing on Azure Cloud Scale Analytics platforms.


Overview

Streaming architectures enable organizations to process and analyze data in real-time, providing immediate insights and enabling event-driven decision making. This section covers four fundamental streaming patterns, each optimized for different use cases and organizational requirements.

Why Streaming Architectures?

Modern enterprises require real-time insights to:

  • Detect fraud and security threats immediately
  • Monitor IoT devices and respond to anomalies
  • Provide personalized customer experiences
  • Enable real-time operational dashboards
  • Process high-velocity event streams
  • Maintain audit trails with complete event history

Architecture Patterns Overview

graph TB
    subgraph "Streaming Pattern Selection"
        UC[Use Case Requirements]

        UC --> Q1{Need Batch<br/>Processing?}
        Q1 -->|Yes| Lambda[Lambda Architecture]
        Q1 -->|No| Q2{Need Event<br/>History?}

        Q2 -->|Complete History| ES[Event Sourcing]
        Q2 -->|Only Stream| Kappa[Kappa Architecture]

        Lambda --> Q3{Separate<br/>Read/Write?}
        ES --> Q3
        Kappa --> Q3

        Q3 -->|Yes| CQRS[+ CQRS Pattern]
        Q3 -->|No| Done[Implementation]
        CQRS --> Done
    end

    style Lambda fill:#e1f5fe
    style Kappa fill:#f3e5f5
    style ES fill:#fff3e0
    style CQRS fill:#e8f5e9

Pattern Catalog

Lambda Architecture

Real-time Complexity

Dual-layer architecture combining batch and stream processing for comprehensive analytics.

Key Components: - Batch Layer: Historical data processing with high accuracy - Speed Layer: Real-time stream processing for low latency - Serving Layer: Unified query interface merging both views

When to Use: - Need both real-time and historical analytics - Require different processing SLAs for different data - Complex transformations benefit from batch processing - Can tolerate some data duplication between layers

Azure Implementation: - Batch: Synapse Spark Pools + Delta Lake - Speed: Stream Analytics + Event Hubs - Serving: Synapse SQL Pools + Power BI

View Detailed Guide →


Kappa Architecture

Stream-First Complexity

Stream-first architecture that processes all data as continuous, infinite streams.

Key Components: - Stream Processing Layer: Single processing paradigm for all data - Storage Layer: Immutable event log with replay capability - Serving Layer: Materialized views derived from streams

When to Use: - All processing can be expressed as stream operations - Want to avoid maintaining separate batch/stream code - Need to reprocess data by replaying event log - Simpler architecture is preferred over Lambda

Azure Implementation: - Streaming: Event Hubs + Databricks Structured Streaming - Storage: Event Hubs Capture + Data Lake - Serving: Cosmos DB + Synapse Link

View Detailed Guide →


Event Sourcing

Event-Driven Complexity

Store all changes to application state as an immutable sequence of events.

Key Components: - Event Store: Append-only log of all state changes - Event Processors: Reconstruct current state from event history - Read Models: Materialized views for efficient querying - Event Replay: Ability to rebuild state from events

When to Use: - Need complete audit trail of all changes - Temporal queries (state at any point in time) - Event-driven microservices architecture - Regulatory compliance requirements - Complex business logic with state transitions

Azure Implementation: - Event Store: Event Hubs + Cosmos DB - Processors: Azure Functions + Durable Functions - Read Models: Cosmos DB + Synapse Analytics

View Detailed Guide →


CQRS Pattern

Command-Query Complexity

Command Query Responsibility Segregation - separate read and write data models.

Key Components: - Command Side: Optimized for write operations and business logic - Query Side: Optimized for read operations and reporting - Event Bus: Synchronizes changes between command and query sides - Read Models: Denormalized views for efficient queries

When to Use: - Read and write workloads have different patterns - Need to optimize queries independently from writes - Complex business logic on write side - High scalability requirements - Often combined with Event Sourcing

Azure Implementation: - Commands: Cosmos DB (transactional) + Azure Functions - Queries: Synapse SQL Pools + Cosmos DB (analytical) - Event Bus: Event Grid + Event Hubs - Sync: Change Feed + Stream Analytics

View Detailed Guide →


Pattern Comparison Matrix

Pattern Latency Complexity Consistency Scalability Use Case Fit
Lambda Mixed (ms-minutes) High Eventual Very High Real-time + Historical
Kappa Low (ms-seconds) Medium Eventual High Stream-first
Event Sourcing Low (ms) High Strong High Audit & Temporal
CQRS Low-Medium High Eventual/Strong Very High Read/Write Optimization

Detailed Comparison

Data Processing Model

Pattern Batch Processing Stream Processing Event History
Lambda ✅ Primary ✅ Primary ⚠️ Optional
Kappa ❌ None ✅ Only ✅ Via Replay
Event Sourcing ⚠️ Via Replay ✅ Primary ✅ Complete
CQRS ⚠️ Optional ✅ Sync ⚠️ Optional

Implementation Characteristics

Pattern Code Complexity Operational Overhead Learning Curve Tooling Maturity
Lambda High (2 systems) High Steep Mature
Kappa Medium Medium Moderate Mature
Event Sourcing High Medium-High Steep Growing
CQRS Very High High Very Steep Moderate

Azure Service Mapping

Core Streaming Services

Service Primary Use Pattern Fit Key Features
Event Hubs Event ingestion All patterns High throughput, partitioning, capture
Stream Analytics Real-time processing Lambda, Kappa SQL-based, windowing, built-in ML
Databricks Advanced streaming Kappa, Lambda Structured Streaming, Delta Lake
Event Grid Event routing CQRS, Event Sourcing Pub/sub, filtering, webhooks
Cosmos DB Event store/serving Event Sourcing, CQRS Multi-model, global distribution

Supporting Services

Service Role Pattern Support
Azure Functions Event processing Event Sourcing, CQRS
Synapse Spark Batch processing Lambda
Synapse SQL Serving layer Lambda, CQRS
Data Lake Gen2 Long-term storage All patterns
Service Bus Reliable messaging CQRS, Event Sourcing

Pattern Selection Guide

By Business Requirements

Real-time Analytics with Historical Context

Recommended: Lambda Architecture

graph LR
    Req[Requirement] --> Lambda
    Lambda --> Batch[Batch Layer<br/>Historical Analytics]
    Lambda --> Speed[Speed Layer<br/>Real-time Metrics]
    Batch --> Serve[Unified Serving]
    Speed --> Serve

Use When: - Dashboards need both real-time and historical data - Complex batch computations (ML models, aggregations) - Different SLAs for real-time vs batch queries


Pure Stream Processing

Recommended: Kappa Architecture

graph LR
    Req[Requirement] --> Kappa
    Kappa --> Stream[Stream Processing<br/>Only]
    Stream --> Views[Materialized Views]
    Views --> Query[Queryable State]

Use When: - All processing is stream-compatible - Simplicity is priority - Want to avoid maintaining dual systems - Replay capability needed


Complete Audit Trail

Recommended: Event Sourcing

graph LR
    Req[Requirement] --> ES[Event Sourcing]
    ES --> Events[Immutable Events]
    Events --> State[Current State]
    Events --> History[Historical State]
    Events --> Audit[Audit Trail]

Use When: - Regulatory compliance requires complete history - Need temporal queries (state at point in time) - Event-driven microservices - Complex state transitions


Performance Optimization

Recommended: CQRS

graph LR
    Req[Requirement] --> CQRS
    CQRS --> Write[Write Model<br/>Optimized]
    CQRS --> Read[Read Model<br/>Optimized]
    Write -->|Event Bus| Read

Use When: - Read/write patterns very different - High query performance critical - Complex business logic on writes - Independent scaling needed


Implementation Patterns

Combining Patterns

Patterns can be combined for enhanced capabilities:

Event Sourcing + CQRS

graph TB
    subgraph "Write Side"
        Cmd[Commands] --> Val[Validation]
        Val --> ES[Event Store]
    end

    subgraph "Event Bus"
        ES --> EB[Event Stream]
    end

    subgraph "Read Side"
        EB --> P1[Projector 1]
        EB --> P2[Projector 2]
        P1 --> RM1[Read Model 1]
        P2 --> RM2[Read Model 2]
    end

    RM1 --> Q[Queries]
    RM2 --> Q

Benefits: - Complete audit trail from Event Sourcing - Optimized read models from CQRS - Independent scaling of read/write - Multiple specialized read models

Challenges: - Highest complexity of any combination - Eventually consistent reads - Operational overhead


Lambda + CQRS

graph TB
    subgraph "Ingestion"
        Data[Data Sources] --> EH[Event Hubs]
    end

    subgraph "Lambda Processing"
        EH --> Batch[Batch Layer]
        EH --> Speed[Speed Layer]
    end

    subgraph "CQRS Serving"
        Batch --> Write[Write Store]
        Speed --> Write
        Write --> Sync[Synchronization]
        Sync --> Read[Read Store]
    end

Benefits: - Real-time and batch processing - Optimized query performance - Separation of concerns

Challenges: - Complex architecture - Multiple sync points - Operational complexity


Performance Considerations

Latency Requirements

Pattern Typical Latency Throughput Best For
Lambda Speed < 100ms Very High Real-time alerts
Lambda Batch Minutes-Hours Extremely High Historical analysis
Kappa < 1s High Near real-time
Event Sourcing < 10ms Medium-High Transaction processing
CQRS Writes < 10ms Medium-High Transactional
CQRS Reads < 100ms Very High Analytics queries

Scalability Patterns

Horizontal Scaling

graph LR
    subgraph "Partitioned Event Stream"
        P1[Partition 1] --> C1[Consumer 1]
        P2[Partition 2] --> C2[Consumer 2]
        P3[Partition 3] --> C3[Consumer 3]
        PN[Partition N] --> CN[Consumer N]
    end

Best Practices: - Use Event Hubs partitions for parallelism - Partition by key for related events - Scale consumers to match partitions - Monitor partition lag

Vertical Scaling

graph TB
    Small[Small Instance] -->|Upgrade| Medium[Medium Instance]
    Medium -->|Upgrade| Large[Large Instance]
    Large -->|Upgrade| XLarge[XLarge Instance]

Best Practices: - Start small, measure, then scale - Use auto-scaling for variable workloads - Consider cost vs performance tradeoffs


Cost Optimization

Cost Drivers by Pattern

Pattern Primary Costs Optimization Strategies
Lambda Compute (2 systems), Storage Auto-pause Spark, optimize batch schedules
Kappa Streaming compute, Event Hub Right-size throughput units, retention
Event Sourcing Storage (events), Cosmos DB Event archival, compact snapshots
CQRS Dual storage, Sync overhead Selective projections, eventual consistency

Cost Optimization Strategies

Event Hubs Optimization

# Configure appropriate retention and throughput
event_hub_config = {
    "partition_count": 4,  # Match expected parallelism
    "message_retention_days": 1,  # Minimize unless replay needed
    "throughput_units": 2,  # Auto-inflate for spikes
    "enable_auto_inflate": True,
    "maximum_throughput_units": 10
}

Spark Pool Optimization

# Configure auto-pause and right-size nodes
spark_pool_config = {
    "node_size": "Medium",  # Start small
    "min_nodes": 3,
    "max_nodes": 10,
    "auto_pause_minutes": 15,  # Pause when idle
    "auto_scale_enabled": True
}

Cosmos DB Optimization

# Use appropriate consistency and TTL
cosmos_config = {
    "consistency_level": "Session",  # Balance performance/cost
    "time_to_live": 86400 * 30,  # 30 days, not forever
    "indexing_mode": "Lazy",  # For write-heavy workloads
    "partition_key": "/deviceId"  # Distribute load
}

Security Patterns

Authentication & Authorization

graph TB
    Client[Client App] --> AAD[Azure AD]
    AAD --> Token[Access Token]

    Token --> EH[Event Hubs]
    Token --> SA[Stream Analytics]
    Token --> DB[Cosmos DB]

    subgraph "RBAC Policies"
        EH --> RBAC1[Event Sender]
        SA --> RBAC2[Contributor]
        DB --> RBAC3[Data Reader/Writer]
    end

Data Protection

Pattern Encryption at Rest Encryption in Transit Key Management
All Azure Storage Service Encryption TLS 1.2+ Azure Key Vault
Event Sourcing Cosmos DB encryption HTTPS only Customer-managed keys
CQRS Multi-store encryption Private endpoints Separate keys per store

Network Security

graph TB
    subgraph "Public Internet"
        Client[External Clients]
    end

    subgraph "Azure Virtual Network"
        subgraph "Ingestion Subnet"
            PE1[Private Endpoint] --> EH[Event Hubs]
        end

        subgraph "Processing Subnet"
            EH --> SA[Stream Analytics]
            SA --> Spark[Synapse Spark]
        end

        subgraph "Storage Subnet"
            Spark --> PE2[Private Endpoint]
            PE2 --> DL[Data Lake]
        end
    end

    Client -->|VPN/ExpressRoute| PE1

Monitoring & Observability

Key Metrics by Pattern

Lambda Architecture Metrics

lambda_metrics = {
    "batch_layer": [
        "batch_job_duration",
        "batch_job_success_rate",
        "data_freshness_hours"
    ],
    "speed_layer": [
        "stream_lag_seconds",
        "events_per_second",
        "processing_latency_ms"
    ],
    "serving_layer": [
        "query_latency_ms",
        "cache_hit_rate",
        "merge_conflicts"
    ]
}

Kappa Architecture Metrics

kappa_metrics = {
    "streaming": [
        "checkpoint_interval",
        "replay_lag",
        "backpressure_indicator"
    ],
    "state_store": [
        "state_size_bytes",
        "compaction_rate",
        "read_write_ratio"
    ]
}

Event Sourcing Metrics

event_sourcing_metrics = {
    "event_store": [
        "event_write_rate",
        "storage_growth_rate",
        "event_stream_lag"
    ],
    "projections": [
        "projection_lag_seconds",
        "rebuild_duration",
        "snapshot_frequency"
    ]
}

CQRS Metrics

cqrs_metrics = {
    "command_side": [
        "command_processing_time",
        "validation_failures",
        "event_publication_lag"
    ],
    "query_side": [
        "read_model_staleness",
        "query_performance",
        "synchronization_lag"
    ]
}

Monitoring Dashboard Example

graph TB
    subgraph "Monitoring Stack"
        App[Applications] --> Logs[Azure Monitor Logs]
        App --> Metrics[Azure Monitor Metrics]

        Logs --> LA[Log Analytics]
        Metrics --> LA

        LA --> Alerts[Alert Rules]
        LA --> Dash[Dashboards]

        Alerts --> AG[Action Groups]
        AG --> Teams[Teams/Email]
    end

Getting Started

Quick Start by Pattern

1. Lambda Architecture

# Deploy Lambda architecture starter template
az deployment group create \
  --resource-group rg-streaming \
  --template-file lambda-template.json \
  --parameters environment=dev

2. Kappa Architecture

# Deploy Kappa architecture starter template
az deployment group create \
  --resource-group rg-streaming \
  --template-file kappa-template.json \
  --parameters environment=dev

3. Event Sourcing

# Deploy Event Sourcing infrastructure
az deployment group create \
  --resource-group rg-streaming \
  --template-file event-sourcing-template.json \
  --parameters environment=dev

4. CQRS

# Deploy CQRS infrastructure
az deployment group create \
  --resource-group rg-streaming \
  --template-file cqrs-template.json \
  --parameters environment=dev

Learning Path

Beginner Path

  1. Start with Kappa - Simplest streaming pattern
  2. Read: Kappa Architecture Guide
  3. Implement: Simple event processing pipeline
  4. Monitor: Set up basic metrics and alerts

Intermediate Path

  1. Understand Lambda - Add batch processing capability
  2. Read: Lambda Architecture Guide
  3. Implement: Hybrid batch/stream pipeline
  4. Optimize: Performance tuning and cost optimization

Advanced Path

  1. Master Event Sourcing - Complete event history
  2. Read: Event Sourcing Guide
  3. Combine with CQRS - Optimized read/write models
  4. Read: CQRS Pattern Guide
  5. Implement: Full event-driven architecture

Additional Resources

Documentation

Azure Services

Best Practices

Code Examples


Last Updated: 2025-01-28 Pattern Count: 4 Maturity: Production Ready