📦 Batch Architecture Patterns¶
Reference architectures and patterns for batch data processing workloads.
🎯 Overview¶
Batch processing handles large volumes of data in scheduled intervals, ideal for:
- ETL/ELT pipelines: Transforming and loading data into analytics systems
- Data warehousing: Building dimensional models for reporting
- Historical analysis: Processing accumulated data for insights
- Machine learning: Training models on large datasets
📊 Pattern Catalog¶
Data Warehouse Patterns¶
Classic dimensional modeling and modern lakehouse approaches.
| Pattern | Use Case | Azure Services |
|---|---|---|
| Star Schema | OLAP reporting | Synapse Dedicated SQL |
| Snowflake Schema | Complex hierarchies | Synapse Dedicated SQL |
| Data Vault | Auditable history | Databricks, Synapse |
| Medallion | Lakehouse layers | Databricks, Synapse Spark |
Lambda Architecture¶
Combining batch and real-time processing layers.
graph LR
subgraph "Data Sources"
S[Event Stream]
end
subgraph "Speed Layer"
SP[Stream Analytics]
end
subgraph "Batch Layer"
B1[Data Lake]
B2[Spark Processing]
end
subgraph "Serving Layer"
SV[Query Engine]
end
S --> SP
S --> B1
B1 --> B2
SP --> SV
B2 --> SV Kappa Architecture¶
Simplified architecture using stream processing for both real-time and batch.
graph LR
S[Event Stream] --> K[Kafka/Event Hubs]
K --> P[Stream Processor]
P --> ST[Data Lake]
ST --> Q[Query Layer] 🏗️ Reference Architecture¶
Modern Data Warehouse¶
graph TB
subgraph "Sources"
S1[Operational DBs]
S2[Files/APIs]
S3[SaaS Apps]
end
subgraph "Ingestion"
I1[Data Factory]
end
subgraph "Storage"
L1[Bronze Layer<br/>Raw Data]
L2[Silver Layer<br/>Cleansed]
L3[Gold Layer<br/>Curated]
end
subgraph "Processing"
P1[Synapse Spark]
P2[Databricks]
end
subgraph "Serving"
SV1[Synapse SQL]
SV2[Power BI]
end
S1 --> I1
S2 --> I1
S3 --> I1
I1 --> L1
L1 --> P1
L1 --> P2
P1 --> L2
P2 --> L2
L2 --> L3
L3 --> SV1
SV1 --> SV2 📚 Related Documentation¶
Last Updated: January 2025