🔗 Integration Scenarios¶

Step-by-step guides for integrating Azure streaming and analytics services in common Cloud Scale Analytics scenarios.

🎯 Overview¶

Integration scenarios provide detailed implementation guides for connecting multiple Azure services to build complete streaming and analytics solutions. Each scenario includes infrastructure templates, configuration examples, and best practices.

Scenario Categories¶

Streaming to Storage: Archive and persist streaming data
Streaming to Databases: Real-time operational data stores
Advanced Processing: Complex streaming analytics with ML
Event-Driven Workflows: Automated data pipeline triggers

📋 Available Scenarios¶

🏗️ Streaming to Storage¶

Streaming to Data Lake ¶

Description: Configure Event Hubs Capture to automatically archive streaming data to Azure Data Lake Storage Gen2.

Services Used: - Azure Event Hubs - Azure Data Lake Storage Gen2

What You'll Build: - Event Hubs namespace with capture enabled - Data Lake Storage with hierarchical namespace - Automatic Avro file archival - Time and size-based partitioning

Estimated Time: 30 minutes

Streaming to SQL ¶

Description: Stream data from Event Hubs to Azure SQL Database or Synapse SQL using Stream Analytics.

Services Used: - Azure Event Hubs - Azure Stream Analytics - Azure SQL Database or Synapse SQL

What You'll Build: - Event Hubs for data ingestion - Stream Analytics job with SQL queries - SQL Database with optimized schema - Real-time data transformation pipeline

Estimated Time: 45 minutes

⚡ Advanced Stream Processing¶

Event Hubs with Databricks ¶

Description: Implement structured streaming from Event Hubs to Databricks for real-time analytics and machine learning.

Services Used: - Azure Event Hubs - Azure Databricks - Azure Data Lake Storage Gen2 (Delta Lake)

What You'll Build: - Event Hubs for high-volume ingestion - Databricks workspace with Delta Lake - Structured streaming pipelines with PySpark - Real-time ML model inference

Estimated Time: 60 minutes

Stream Analytics to Cosmos DB ¶

Description: Process streaming data with Stream Analytics and write to Cosmos DB for globally distributed operational data.

Services Used: - Azure Event Hubs or IoT Hub - Azure Stream Analytics - Azure Cosmos DB

What You'll Build: - Stream Analytics job with windowing functions - Cosmos DB with optimized partitioning - Real-time aggregation pipeline - Global distribution configuration

Estimated Time: 40 minutes

🏢 Enterprise Integration¶

Stream Analytics to Synapse ¶

Description: Integrate Stream Analytics with Synapse Analytics for real-time to batch analytics workflows (Lambda Architecture).

Services Used: - Azure Event Hubs - Azure Stream Analytics - Azure Synapse Analytics - Azure Data Lake Storage Gen2

What You'll Build: - Stream Analytics for real-time processing - Synapse dedicated SQL pool - Delta Lake for unified storage - Lambda architecture implementation

Estimated Time: 60 minutes

Event-Driven Data Pipelines ¶

Description: Build event-driven Data Factory pipelines triggered by Event Grid and custom events.

Services Used: - Azure Event Grid - Azure Data Factory - Azure Storage (Blob/ADLS Gen2)

What You'll Build: - Event Grid custom and system topics - Data Factory with event triggers - Storage blob event integration - Automated pipeline orchestration

Estimated Time: 50 minutes

📊 Scenario Comparison Matrix¶

Scenario	Use Case	Latency	Complexity	Throughput	Best For
Streaming to Data Lake	Archival	Seconds	🟢 Low	Very High	Long-term storage, backup
Streaming to SQL	Operational Analytics	Sub-second	🟡 Medium	Medium	Real-time dashboards
Event Hubs + Databricks	Real-time ML	Seconds	🔴 High	Very High	Fraud detection, anomalies
Stream Analytics + Cosmos	Global Operations	Sub-second	🟡 Medium	High	Multi-region apps
Stream Analytics + Synapse	Enterprise DW	Seconds	🔴 High	High	Hybrid batch/streaming
Event-Driven Pipelines	Automated ETL	Minutes	🟡 Medium	Medium	File-triggered workflows

🏗️ Common Architecture Patterns¶

Pattern 1: Lambda Architecture¶

graph TB
    Sources[Data Sources] --> EventHubs[Event Hubs]

    EventHubs --> Speed[Stream Analytics<br/>Speed Layer]
    EventHubs --> Batch[Data Factory<br/>Batch Layer]

    Speed --> RealTime[Cosmos DB<br/>Real-time Views]
    Batch --> Historical[Data Lake<br/>Historical Views]

    RealTime --> Serving[Serving Layer]
    Historical --> Serving

    Serving --> Apps[Applications]

When to Use: - Need both real-time and batch processing - Historical analysis with recent data queries - Balance between latency and accuracy

Scenarios: Stream Analytics to Synapse, Streaming to Data Lake + SQL

Pattern 2: Kappa Architecture¶

graph LR
    Sources[Data Sources] --> EventHubs[Event Hubs]
    EventHubs --> Processing[Databricks<br/>Stream Processing]
    Processing --> Storage[Delta Lake<br/>Unified Storage]
    Storage --> Serving[Serving Layer]
    Serving --> Analytics[Analytics & ML]

When to Use: - Streaming-first architecture - Real-time analytics only - Simplified data pipeline

Scenarios: Event Hubs with Databricks

Pattern 3: Event-Driven Architecture¶

graph TB
    subgraph "Event Sources"
        Storage[Blob Storage]
        Services[Azure Services]
        Custom[Custom Apps]
    end

    subgraph "Event Infrastructure"
        EventGrid[Event Grid<br/>Event Router]
    end

    subgraph "Event Handlers"
        Pipeline[Data Factory<br/>Pipeline]
        Functions[Azure Functions]
        LogicApps[Logic Apps]
    end

    Storage --> EventGrid
    Services --> EventGrid
    Custom --> EventGrid

    EventGrid --> Pipeline
    EventGrid --> Functions
    EventGrid --> LogicApps

When to Use: - File arrival triggers - Service integration and decoupling - Workflow automation

Scenarios: Event-Driven Data Pipelines

🎯 Choosing the Right Scenario¶

Decision Tree¶

graph TD
    Start[What's your primary goal?] --> Archive{Archive streaming data?}
    Archive -->|Yes| DataLake[Streaming to Data Lake]

    Archive -->|No| RealTime{Real-time analytics?}
    RealTime -->|Yes| Global{Global distribution?}

    Global -->|Yes| Cosmos[Stream Analytics to Cosmos DB]
    Global -->|No| ML{Machine learning?}

    ML -->|Yes| Databricks[Event Hubs + Databricks]
    ML -->|No| SQL{SQL-based analytics?}

    SQL -->|Yes| SQLDB[Streaming to SQL]
    SQL -->|No| Batch{Batch + Streaming?}

    Batch -->|Yes| Synapse[Stream Analytics to Synapse]
    Batch -->|No| Trigger{Event-triggered?}

    Trigger -->|Yes| EventDriven[Event-Driven Pipelines]

By Use Case¶

IoT Telemetry Processing¶

Recommended: Event Hubs + Databricks or Stream Analytics to Data Lake - High-volume ingestion - Real-time processing - Historical analysis

Real-Time Dashboards¶

Recommended: Streaming to SQL or Stream Analytics to Cosmos DB - Low latency queries - Operational analytics - Live visualizations

Fraud Detection¶

Recommended: Event Hubs + Databricks - Real-time ML inference - Complex event processing - Anomaly detection

Enterprise Data Warehouse¶

Recommended: Stream Analytics to Synapse - Hybrid batch and streaming - Large-scale analytics - BI integration

File Processing Automation¶

Recommended: Event-Driven Data Pipelines - Automated workflows - File arrival triggers - Orchestration

🛠️ Common Prerequisites¶

All scenarios require the following:

Azure Resources¶

Azure Subscription: With appropriate permissions
Resource Group: For organizing resources
Azure CLI or PowerShell: For deployment
Service Principal (optional): For automated deployments

Networking (Optional but Recommended)¶

Virtual Network: For network isolation
Private Endpoints: For secure connectivity
DNS Configuration: For private endpoint resolution

Security¶

Managed Identity: For service-to-service authentication
Azure Key Vault: For secrets management
RBAC Roles: Appropriate role assignments

📦 Template Repository Structure¶

Each scenario includes:

scenario-name/
├── bicep/
│   ├── main.bicep              # Main infrastructure template
│   ├── parameters.json         # Parameter file
│   └── modules/
│       ├── eventhubs.bicep     # Event Hubs module
│       ├── storage.bicep       # Storage module
│       └── ...                 # Other service modules
├── scripts/
│   ├── deploy.sh               # Bash deployment script
│   ├── deploy.ps1              # PowerShell deployment script
│   └── configure.sh            # Post-deployment configuration
├── config/
│   ├── stream-analytics.json   # Stream Analytics query
│   └── databricks-notebook.py # Databricks notebook
└── README.md                   # Scenario-specific guide

🚀 Getting Started¶

Step 1: Choose Your Scenario¶

Review the scenarios above and select one that matches your requirements.

Step 2: Review Prerequisites¶

Check the common prerequisites and scenario-specific requirements.

Step 3: Deploy Infrastructure¶

Use provided Bicep templates or follow manual deployment steps.

Step 4: Configure Services¶

Apply configuration files and set up data flows.

Step 5: Test and Validate¶

Send sample data and verify end-to-end processing.

Step 6: Monitor and Optimize¶

Set up monitoring and apply optimization best practices.

💰 Cost Considerations¶

Cost Factors by Scenario¶

Scenario	Primary Costs	Optimization Tips
Streaming to Data Lake	Storage, Event Hubs TUs	Use lifecycle policies, optimize capture
Streaming to SQL	SQL Database DTUs, Stream Analytics SUs	Right-size database, optimize queries
Event Hubs + Databricks	Databricks DBUs, Event Hubs	Use auto-scaling, spot instances
Stream Analytics + Cosmos	Cosmos RUs, Stream Analytics SUs	Optimize partition key, use TTL
Stream Analytics + Synapse	Synapse DWUs, Storage	Pause when idle, use result set caching
Event-Driven Pipelines	Data Factory activities, Storage	Optimize trigger frequency, batch operations

📖 Detailed Cost Guide →

🔒 Security Best Practices¶

Network Security¶

Use Private Endpoints for all service connections
Deploy services within Virtual Networks
Configure Network Security Groups (NSGs)
Enable Azure Firewall for centralized protection

Identity & Access¶

Use Managed Identities for service authentication
Apply least-privilege RBAC roles
Store credentials in Azure Key Vault
Enable Azure AD authentication where supported

Data Protection¶

Enable encryption in transit (TLS 1.2+)
Enable encryption at rest for all storage
Implement data masking for sensitive fields
Configure diagnostic logging and auditing

📖 Security Guide →

📊 Monitoring & Troubleshooting¶

Key Metrics to Monitor¶

Throughput: Events/messages per second
Latency: End-to-end processing time
Error Rate: Failed operations percentage
Resource Utilization: CPU, memory, storage
Cost: Daily spending trends

Troubleshooting Resources¶

Connectivity Issues - Network, endpoint, and firewall troubleshooting
Performance Problems - Throughput, latency, and scaling issues
Configuration Errors - Service setup and integration validation

📖 Full Troubleshooting Guide →

📚 Additional Resources¶

Architecture Patterns¶

Lambda Architecture
Kappa Architecture
Event-Driven Architecture - Build reactive, event-based systems

Service Documentation¶

Code Examples¶

Code samples are provided within each integration scenario guide
Infrastructure templates (Bicep/ARM) included in scenario directories

💬 Feedback¶

Help us improve these scenarios!

✅ Scenario worked perfectly - Share your success
⚠️ Encountered issues - Report a problem
💡 Have suggestions - Share your ideas

Last Updated: 2025-01-28 Total Scenarios: 6 Average Completion Time: 45 minutes