Streaming Architecture Patterns for Cloud Scale Analytics¶

Comprehensive guide to streaming architecture patterns for real-time data processing on Azure Cloud Scale Analytics platforms.

Overview¶

Streaming architectures enable organizations to process and analyze data in real-time, providing immediate insights and enabling event-driven decision making. This section covers four fundamental streaming patterns, each optimized for different use cases and organizational requirements.

Why Streaming Architectures?¶

Modern enterprises require real-time insights to:

Detect fraud and security threats immediately
Monitor IoT devices and respond to anomalies
Provide personalized customer experiences
Enable real-time operational dashboards
Process high-velocity event streams
Maintain audit trails with complete event history

Architecture Patterns Overview¶

graph TB
    subgraph "Streaming Pattern Selection"
        UC[Use Case Requirements]

        UC --> Q1{Need Batch<br/>Processing?}
        Q1 -->|Yes| Lambda[Lambda Architecture]
        Q1 -->|No| Q2{Need Event<br/>History?}

        Q2 -->|Complete History| ES[Event Sourcing]
        Q2 -->|Only Stream| Kappa[Kappa Architecture]

        Lambda --> Q3{Separate<br/>Read/Write?}
        ES --> Q3
        Kappa --> Q3

        Q3 -->|Yes| CQRS[+ CQRS Pattern]
        Q3 -->|No| Done[Implementation]
        CQRS --> Done
    end

    style Lambda fill:#e1f5fe
    style Kappa fill:#f3e5f5
    style ES fill:#fff3e0
    style CQRS fill:#e8f5e9

Pattern Catalog¶

Lambda Architecture¶

Dual-layer architecture combining batch and stream processing for comprehensive analytics.

Key Components: - Batch Layer: Historical data processing with high accuracy - Speed Layer: Real-time stream processing for low latency - Serving Layer: Unified query interface merging both views

When to Use: - Need both real-time and historical analytics - Require different processing SLAs for different data - Complex transformations benefit from batch processing - Can tolerate some data duplication between layers

Azure Implementation: - Batch: Synapse Spark Pools + Delta Lake - Speed: Stream Analytics + Event Hubs - Serving: Synapse SQL Pools + Power BI

View Detailed Guide →

Kappa Architecture¶

Stream-first architecture that processes all data as continuous, infinite streams.

Key Components: - Stream Processing Layer: Single processing paradigm for all data - Storage Layer: Immutable event log with replay capability - Serving Layer: Materialized views derived from streams

When to Use: - All processing can be expressed as stream operations - Want to avoid maintaining separate batch/stream code - Need to reprocess data by replaying event log - Simpler architecture is preferred over Lambda

Azure Implementation: - Streaming: Event Hubs + Databricks Structured Streaming - Storage: Event Hubs Capture + Data Lake - Serving: Cosmos DB + Synapse Link

View Detailed Guide →

Event Sourcing¶

Store all changes to application state as an immutable sequence of events.

Key Components: - Event Store: Append-only log of all state changes - Event Processors: Reconstruct current state from event history - Read Models: Materialized views for efficient querying - Event Replay: Ability to rebuild state from events

When to Use: - Need complete audit trail of all changes - Temporal queries (state at any point in time) - Event-driven microservices architecture - Regulatory compliance requirements - Complex business logic with state transitions

Azure Implementation: - Event Store: Event Hubs + Cosmos DB - Processors: Azure Functions + Durable Functions - Read Models: Cosmos DB + Synapse Analytics

View Detailed Guide →

CQRS Pattern¶

Command Query Responsibility Segregation - separate read and write data models.

Key Components: - Command Side: Optimized for write operations and business logic - Query Side: Optimized for read operations and reporting - Event Bus: Synchronizes changes between command and query sides - Read Models: Denormalized views for efficient queries

When to Use: - Read and write workloads have different patterns - Need to optimize queries independently from writes - Complex business logic on write side - High scalability requirements - Often combined with Event Sourcing

Azure Implementation: - Commands: Cosmos DB (transactional) + Azure Functions - Queries: Synapse SQL Pools + Cosmos DB (analytical) - Event Bus: Event Grid + Event Hubs - Sync: Change Feed + Stream Analytics

View Detailed Guide →

Pattern Comparison Matrix¶

Pattern	Latency	Complexity	Consistency	Scalability	Use Case Fit
Lambda	Mixed (ms-minutes)	High	Eventual	Very High	Real-time + Historical
Kappa	Low (ms-seconds)	Medium	Eventual	High	Stream-first
Event Sourcing	Low (ms)	High	Strong	High	Audit & Temporal
CQRS	Low-Medium	High	Eventual/Strong	Very High	Read/Write Optimization

Detailed Comparison¶

Data Processing Model¶

Pattern	Batch Processing	Stream Processing	Event History
Lambda	✅ Primary	✅ Primary	⚠️ Optional
Kappa	❌ None	✅ Only	✅ Via Replay
Event Sourcing	⚠️ Via Replay	✅ Primary	✅ Complete
CQRS	⚠️ Optional	✅ Sync	⚠️ Optional

Implementation Characteristics¶

Pattern	Code Complexity	Operational Overhead	Learning Curve	Tooling Maturity
Lambda	High (2 systems)	High	Steep	Mature
Kappa	Medium	Medium	Moderate	Mature
Event Sourcing	High	Medium-High	Steep	Growing
CQRS	Very High	High	Very Steep	Moderate

Azure Service Mapping¶

Core Streaming Services¶

Service	Primary Use	Pattern Fit	Key Features
Event Hubs	Event ingestion	All patterns	High throughput, partitioning, capture
Stream Analytics	Real-time processing	Lambda, Kappa	SQL-based, windowing, built-in ML
Databricks	Advanced streaming	Kappa, Lambda	Structured Streaming, Delta Lake
Event Grid	Event routing	CQRS, Event Sourcing	Pub/sub, filtering, webhooks
Cosmos DB	Event store/serving	Event Sourcing, CQRS	Multi-model, global distribution

Supporting Services¶

Service	Role	Pattern Support
Azure Functions	Event processing	Event Sourcing, CQRS
Synapse Spark	Batch processing	Lambda
Synapse SQL	Serving layer	Lambda, CQRS
Data Lake Gen2	Long-term storage	All patterns
Service Bus	Reliable messaging	CQRS, Event Sourcing

Pattern Selection Guide¶

By Business Requirements¶

Real-time Analytics with Historical Context¶

Recommended: Lambda Architecture

graph LR
    Req[Requirement] --> Lambda
    Lambda --> Batch[Batch Layer<br/>Historical Analytics]
    Lambda --> Speed[Speed Layer<br/>Real-time Metrics]
    Batch --> Serve[Unified Serving]
    Speed --> Serve

Use When: - Dashboards need both real-time and historical data - Complex batch computations (ML models, aggregations) - Different SLAs for real-time vs batch queries

Pure Stream Processing¶

Recommended: Kappa Architecture

graph LR
    Req[Requirement] --> Kappa
    Kappa --> Stream[Stream Processing<br/>Only]
    Stream --> Views[Materialized Views]
    Views --> Query[Queryable State]

Use When: - All processing is stream-compatible - Simplicity is priority - Want to avoid maintaining dual systems - Replay capability needed

Complete Audit Trail¶

Recommended: Event Sourcing

graph LR
    Req[Requirement] --> ES[Event Sourcing]
    ES --> Events[Immutable Events]
    Events --> State[Current State]
    Events --> History[Historical State]
    Events --> Audit[Audit Trail]

Use When: - Regulatory compliance requires complete history - Need temporal queries (state at point in time) - Event-driven microservices - Complex state transitions

Performance Optimization¶

Recommended: CQRS

graph LR
    Req[Requirement] --> CQRS
    CQRS --> Write[Write Model<br/>Optimized]
    CQRS --> Read[Read Model<br/>Optimized]
    Write -->|Event Bus| Read

Use When: - Read/write patterns very different - High query performance critical - Complex business logic on writes - Independent scaling needed

Implementation Patterns¶

Combining Patterns¶

Patterns can be combined for enhanced capabilities:

Event Sourcing + CQRS¶

graph TB
    subgraph "Write Side"
        Cmd[Commands] --> Val[Validation]
        Val --> ES[Event Store]
    end

    subgraph "Event Bus"
        ES --> EB[Event Stream]
    end

    subgraph "Read Side"
        EB --> P1[Projector 1]
        EB --> P2[Projector 2]
        P1 --> RM1[Read Model 1]
        P2 --> RM2[Read Model 2]
    end

    RM1 --> Q[Queries]
    RM2 --> Q

Benefits: - Complete audit trail from Event Sourcing - Optimized read models from CQRS - Independent scaling of read/write - Multiple specialized read models

Challenges: - Highest complexity of any combination - Eventually consistent reads - Operational overhead

Lambda + CQRS¶

graph TB
    subgraph "Ingestion"
        Data[Data Sources] --> EH[Event Hubs]
    end

    subgraph "Lambda Processing"
        EH --> Batch[Batch Layer]
        EH --> Speed[Speed Layer]
    end

    subgraph "CQRS Serving"
        Batch --> Write[Write Store]
        Speed --> Write
        Write --> Sync[Synchronization]
        Sync --> Read[Read Store]
    end

Benefits: - Real-time and batch processing - Optimized query performance - Separation of concerns

Challenges: - Complex architecture - Multiple sync points - Operational complexity

Performance Considerations¶

Latency Requirements¶

Pattern	Typical Latency	Throughput	Best For
Lambda Speed	< 100ms	Very High	Real-time alerts
Lambda Batch	Minutes-Hours	Extremely High	Historical analysis
Kappa	< 1s	High	Near real-time
Event Sourcing	< 10ms	Medium-High	Transaction processing
CQRS Writes	< 10ms	Medium-High	Transactional
CQRS Reads	< 100ms	Very High	Analytics queries

Scalability Patterns¶

Horizontal Scaling¶

graph LR
    subgraph "Partitioned Event Stream"
        P1[Partition 1] --> C1[Consumer 1]
        P2[Partition 2] --> C2[Consumer 2]
        P3[Partition 3] --> C3[Consumer 3]
        PN[Partition N] --> CN[Consumer N]
    end

Best Practices: - Use Event Hubs partitions for parallelism - Partition by key for related events - Scale consumers to match partitions - Monitor partition lag

Vertical Scaling¶

graph TB
    Small[Small Instance] -->|Upgrade| Medium[Medium Instance]
    Medium -->|Upgrade| Large[Large Instance]
    Large -->|Upgrade| XLarge[XLarge Instance]

Best Practices: - Start small, measure, then scale - Use auto-scaling for variable workloads - Consider cost vs performance tradeoffs

Cost Optimization¶

Cost Drivers by Pattern¶

Pattern	Primary Costs	Optimization Strategies
Lambda	Compute (2 systems), Storage	Auto-pause Spark, optimize batch schedules
Kappa	Streaming compute, Event Hub	Right-size throughput units, retention
Event Sourcing	Storage (events), Cosmos DB	Event archival, compact snapshots
CQRS	Dual storage, Sync overhead	Selective projections, eventual consistency

Cost Optimization Strategies¶

Event Hubs Optimization¶

# Configure appropriate retention and throughput
event_hub_config = {
    "partition_count": 4,  # Match expected parallelism
    "message_retention_days": 1,  # Minimize unless replay needed
    "throughput_units": 2,  # Auto-inflate for spikes
    "enable_auto_inflate": True,
    "maximum_throughput_units": 10
}

Spark Pool Optimization¶

# Configure auto-pause and right-size nodes
spark_pool_config = {
    "node_size": "Medium",  # Start small
    "min_nodes": 3,
    "max_nodes": 10,
    "auto_pause_minutes": 15,  # Pause when idle
    "auto_scale_enabled": True
}

Cosmos DB Optimization¶

# Use appropriate consistency and TTL
cosmos_config = {
    "consistency_level": "Session",  # Balance performance/cost
    "time_to_live": 86400 * 30,  # 30 days, not forever
    "indexing_mode": "Lazy",  # For write-heavy workloads
    "partition_key": "/deviceId"  # Distribute load
}

Security Patterns¶

Authentication & Authorization¶

graph TB
    Client[Client App] --> AAD[Azure AD]
    AAD --> Token[Access Token]

    Token --> EH[Event Hubs]
    Token --> SA[Stream Analytics]
    Token --> DB[Cosmos DB]

    subgraph "RBAC Policies"
        EH --> RBAC1[Event Sender]
        SA --> RBAC2[Contributor]
        DB --> RBAC3[Data Reader/Writer]
    end

Data Protection¶

Pattern	Encryption at Rest	Encryption in Transit	Key Management
All	Azure Storage Service Encryption	TLS 1.2+	Azure Key Vault
Event Sourcing	Cosmos DB encryption	HTTPS only	Customer-managed keys
CQRS	Multi-store encryption	Private endpoints	Separate keys per store

Network Security¶

graph TB
    subgraph "Public Internet"
        Client[External Clients]
    end

    subgraph "Azure Virtual Network"
        subgraph "Ingestion Subnet"
            PE1[Private Endpoint] --> EH[Event Hubs]
        end

        subgraph "Processing Subnet"
            EH --> SA[Stream Analytics]
            SA --> Spark[Synapse Spark]
        end

        subgraph "Storage Subnet"
            Spark --> PE2[Private Endpoint]
            PE2 --> DL[Data Lake]
        end
    end

    Client -->|VPN/ExpressRoute| PE1

Monitoring & Observability¶

Key Metrics by Pattern¶

Lambda Architecture Metrics¶

lambda_metrics = {
    "batch_layer": [
        "batch_job_duration",
        "batch_job_success_rate",
        "data_freshness_hours"
    ],
    "speed_layer": [
        "stream_lag_seconds",
        "events_per_second",
        "processing_latency_ms"
    ],
    "serving_layer": [
        "query_latency_ms",
        "cache_hit_rate",
        "merge_conflicts"
    ]
}

Kappa Architecture Metrics¶

kappa_metrics = {
    "streaming": [
        "checkpoint_interval",
        "replay_lag",
        "backpressure_indicator"
    ],
    "state_store": [
        "state_size_bytes",
        "compaction_rate",
        "read_write_ratio"
    ]
}

Event Sourcing Metrics¶

event_sourcing_metrics = {
    "event_store": [
        "event_write_rate",
        "storage_growth_rate",
        "event_stream_lag"
    ],
    "projections": [
        "projection_lag_seconds",
        "rebuild_duration",
        "snapshot_frequency"
    ]
}

CQRS Metrics¶

cqrs_metrics = {
    "command_side": [
        "command_processing_time",
        "validation_failures",
        "event_publication_lag"
    ],
    "query_side": [
        "read_model_staleness",
        "query_performance",
        "synchronization_lag"
    ]
}

Monitoring Dashboard Example¶

graph TB
    subgraph "Monitoring Stack"
        App[Applications] --> Logs[Azure Monitor Logs]
        App --> Metrics[Azure Monitor Metrics]

        Logs --> LA[Log Analytics]
        Metrics --> LA

        LA --> Alerts[Alert Rules]
        LA --> Dash[Dashboards]

        Alerts --> AG[Action Groups]
        AG --> Teams[Teams/Email]
    end

Getting Started¶

Quick Start by Pattern¶

1. Lambda Architecture¶

# Deploy Lambda architecture starter template
az deployment group create \
  --resource-group rg-streaming \
  --template-file lambda-template.json \
  --parameters environment=dev

2. Kappa Architecture¶

# Deploy Kappa architecture starter template
az deployment group create \
  --resource-group rg-streaming \
  --template-file kappa-template.json \
  --parameters environment=dev

3. Event Sourcing¶

# Deploy Event Sourcing infrastructure
az deployment group create \
  --resource-group rg-streaming \
  --template-file event-sourcing-template.json \
  --parameters environment=dev

4. CQRS¶

# Deploy CQRS infrastructure
az deployment group create \
  --resource-group rg-streaming \
  --template-file cqrs-template.json \
  --parameters environment=dev

Learning Path¶

Beginner Path¶

Start with Kappa - Simplest streaming pattern
Read: Kappa Architecture Guide
Implement: Simple event processing pipeline
Monitor: Set up basic metrics and alerts

Intermediate Path¶

Understand Lambda - Add batch processing capability
Read: Lambda Architecture Guide
Implement: Hybrid batch/stream pipeline
Optimize: Performance tuning and cost optimization

Advanced Path¶

Master Event Sourcing - Complete event history
Read: Event Sourcing Guide
Combine with CQRS - Optimized read/write models
Read: CQRS Pattern Guide
Implement: Full event-driven architecture

Additional Resources¶

Documentation¶

Azure Services¶

Best Practices¶

Code Examples¶

Implementation examples are provided throughout each architecture pattern guide
Azure Samples Repository

Last Updated: 2025-01-28 Pattern Count: 4 Maturity: Production Ready