📖 Azure Cloud Scale Analytics Service Catalog¶
Complete catalog of Azure analytics services with capabilities, use cases, and decision guidance.
📊 Service Overview Matrix¶
| Service | Category | Complexity | Pricing Model | Primary Use Case |
|---|---|---|---|---|
| Azure Synapse Analytics | Analytics Compute | Pay-per-use + Reserved | Enterprise Data Warehousing | |
| Azure Databricks | Analytics Compute | Compute + DBU | Data Science & ML | |
| HDInsight | Analytics Compute | VM-based | Big Data Processing | |
| Stream Analytics | Streaming | Streaming Units | Real-time Analytics | |
| Event Hubs | Streaming | Throughput Units | Event Ingestion | |
| Event Grid | Streaming | Per Operation | Event Routing | |
| Data Lake Gen2 | Storage | Storage + Transactions | Big Data Storage | |
| Cosmos DB | Storage | Request Units | NoSQL Database | |
| Azure SQL | Storage | vCore or DTU | Relational Database | |
| Data Factory | Orchestration | Pipeline Runs | Data Integration | |
| Logic Apps | Orchestration | Action-based | Workflow Automation |
🎯 Analytics Compute Services¶
Azure Synapse Analytics
¶
Purpose: Unified analytics service combining data integration, data warehousing, and analytics.
Key Capabilities:
- Serverless SQL Pools: Query data directly from data lake
- Dedicated SQL Pools: Enterprise data warehousing
- Spark Pools: Big data processing and machine learning
- Data Integration: Built-in ETL/ELT pipelines
- Shared Metadata: Unified catalog across compute engines
Best For:
- Enterprise data warehousing
- Unified analytics workspaces
- Large-scale data processing
- Self-service analytics
Pricing: Pay-per-query (serverless) + Reserved capacity (dedicated)
Documentation: Azure Synapse Guide
Azure Databricks
¶
Purpose: Collaborative analytics platform optimized for machine learning and data science.
Key Capabilities:
- Collaborative Notebooks: Multi-language data science environment
- Delta Live Tables: Declarative ETL framework
- MLflow Integration: End-to-end ML lifecycle management
- Unity Catalog: Unified data governance
- Photon Engine: High-performance query engine
Best For:
- Data science and machine learning
- Collaborative analytics
- Advanced data engineering
- Real-time ML inference
Pricing: Compute costs + Databricks Unit (DBU) charges
Documentation: Azure Databricks Guide
HDInsight
¶
Purpose: Managed Apache Hadoop, Spark, and Kafka clusters in Azure.
Key Capabilities:
- Multiple Cluster Types: Hadoop, Spark, HBase, Kafka, Storm
- Enterprise Security: ESP integration with Active Directory
- Custom Applications: Support for custom Hadoop ecosystem tools
- Hybrid Connectivity: Integration with on-premises systems
Best For:
- Hadoop migration to cloud
- Custom big data applications
- Cost-optimized big data processing
- Open-source ecosystem requirements
Pricing: VM-based pricing model
Documentation: HDInsight Guide
🔄 Streaming Services¶
Azure Stream Analytics
¶
Purpose: Real-time analytics service for streaming data processing.
Key Capabilities:
- SQL-based Queries: Familiar SQL syntax for stream processing
- Windowing Functions: Tumbling, hopping, and sliding windows
- Anomaly Detection: Built-in ML-based anomaly detection
- Edge Deployment: Run analytics on IoT Edge devices
- Output Integration: Direct integration with Power BI, SQL, Cosmos DB
Best For:
- IoT device telemetry processing
- Real-time dashboards
- Fraud detection
- Operational monitoring
Pricing: Streaming Units (SU) hourly billing
Documentation: Streaming Services Guide
Azure Event Hubs
¶
Purpose: Big data streaming platform and event ingestion service.
Key Capabilities:
- High Throughput: Millions of events per second
- Kafka Compatibility: Drop-in replacement for Apache Kafka
- Capture Feature: Automatic data archival to storage
- Schema Registry: Centralized schema management
- Dedicated Clusters: Isolated, high-performance clusters
Best For:
- High-volume event ingestion
- Kafka migration scenarios
- Event-driven architectures
- IoT data collection
Pricing: Throughput Units or Dedicated Cluster Units
Documentation: Event Hubs Guide
Azure Event Grid
¶
Purpose: Event routing service for building event-driven applications.
Key Capabilities:
- Event Routing: Intelligent event routing to multiple destinations
- Custom Topics: Create custom event publishers
- System Topics: Built-in events from Azure services
- Dead Letter Queues: Handle failed event deliveries
- Event Filtering: Route events based on content
Best For:
- Event-driven application architectures
- Serverless workflows
- System integration
- Reactive applications
Pricing: Pay-per-operation model
Documentation: Streaming Services Guide
🗃️ Storage Services¶
Azure Data Lake Storage Gen2
¶
Purpose: Hierarchical namespace storage optimized for big data analytics.
Key Capabilities:
- Hierarchical Namespace: Directory and file-level operations
- Fine-grained ACLs: POSIX-compliant access control
- Multi-protocol Access: Blob and Data Lake APIs
- Lifecycle Management: Automated data tiering and archival
- Performance Tiers: Hot, cool, and archive storage
Best For:
- Data lake implementations
- Big data analytics storage
- Data archival and backup
- Multi-format data storage
Pricing: Storage capacity + transaction costs
Documentation: Data Lake Gen2 Guide
Azure Cosmos DB
¶
Purpose: Globally distributed, multi-model NoSQL database service.
Key Capabilities:
- Multiple APIs: SQL, MongoDB, Cassandra, Gremlin, Table
- Global Distribution: Multi-region writes and reads
- Analytical Store: HTAP capabilities with Synapse Link
- Change Feed: Real-time change data capture
- Serverless Option: Pay-per-request pricing model
Best For:
- Globally distributed applications
- Real-time applications requiring low latency
- Multi-model data scenarios
- HTAP workloads with Synapse integration
Pricing: Request Units (RU/s) or serverless
Documentation: Storage Services Guide
Azure SQL Database
¶
Purpose: Fully managed relational database service.
Key Capabilities:
- Hyperscale: Massively scalable database architecture
- Elastic Pools: Shared resources across multiple databases
- Built-in Intelligence: Automatic tuning and threat detection
- Always Encrypted: Column-level encryption
- Temporal Tables: Built-in data history tracking
Best For:
- Relational data workloads
- Transactional applications
- Data marts and reporting
- Application modernization
Pricing: vCore-based or DTU-based models
Documentation: Storage Services Guide
🔧 Orchestration Services¶
Azure Data Factory
¶
Purpose: Cloud-based data integration service for creating ETL/ELT pipelines.
Key Capabilities:
- Code-free ETL: Visual pipeline designer
- Data Flows: Transformation logic with Spark execution
- Hybrid Integration: On-premises and cloud data sources
- CI/CD Support: Azure DevOps and GitHub integration
- Monitoring: Built-in pipeline monitoring and alerting
Best For:
- Data integration pipelines
- ETL/ELT processes
- Data migration projects
- Scheduled data processing
Pricing: Pipeline orchestration + activity execution costs
Documentation: Data Factory Guide
Azure Logic Apps
¶
Purpose: Serverless workflow automation service.
Key Capabilities:
- Visual Designer: Drag-and-drop workflow creation
- 300+ Connectors: Pre-built connectors for popular services
- B2B Integration: EDI and AS2 support
- Event-driven: Trigger-based workflow execution
- Enterprise Integration: Integration with on-premises systems
Best For:
- Business process automation
- System integrations
- Event-driven workflows
- B2B data exchange
Pricing: Pay-per-action execution
Documentation: Orchestration Services Guide
🎯 Service Selection Guide¶
By Use Case¶
Real-time Analytics¶
Primary: Stream Analytics, Event Hubs Storage: Cosmos DB, Data Lake Gen2 Visualization: Power BI Real-time Dashboards
Data Warehousing¶
Primary: Synapse Dedicated SQL Pools Storage: Data Lake Gen2, Azure SQL Orchestration: Data Factory
Data Science & ML¶
Primary: Databricks, Synapse Spark Pools Storage: Data Lake Gen2, Cosmos DB Orchestration: Data Factory, Databricks Workflows
IoT Analytics¶
Primary: Stream Analytics, Event Hubs Edge: Stream Analytics on IoT Edge Storage: Data Lake Gen2, Cosmos DB
By Data Volume¶
Small to Medium (< 1TB)¶
- Azure SQL Database
- Cosmos DB
- Stream Analytics (< 100 SU)
Large (1-100TB)¶
- Synapse Dedicated SQL Pools
- Databricks
- HDInsight
Very Large (> 100TB)¶
- Synapse Serverless SQL Pools
- Data Lake Gen2 with Synapse
- Databricks with Delta Lake
By Budget Considerations¶
Cost-Optimized¶
- HDInsight
- Synapse Serverless SQL Pools
- Event Grid
Balanced Performance/Cost¶
- Stream Analytics
- Data Factory
- Cosmos DB (provisioned throughput)
Performance-Optimized¶
- Synapse Dedicated SQL Pools
- Databricks Premium
- Event Hubs Dedicated Clusters
📊 Service Comparison Matrix¶
Analytics Compute Comparison¶
| Feature | Synapse | Databricks | HDInsight |
|---|---|---|---|
| SQL Support | ✅ Native | ✅ Spark SQL | ✅ Hive/SparkSQL |
| Python/R | ✅ Spark | ✅ Native | ✅ Spark |
| Scala/Java | ✅ Spark | ✅ Native | ✅ Native |
| ML Integration | ✅ Built-in | ✅ MLflow | ⚠️ Custom |
| Serverless | ✅ Yes | ❌ No | ❌ No |
| Auto-scaling | ✅ Yes | ✅ Yes | ✅ Yes |
| Enterprise Security | ✅ AAD | ✅ Unity Catalog | ✅ ESP |
| Cost Model | Pay-per-use | DBU-based | VM-based |
Streaming Services Comparison¶
| Feature | Stream Analytics | Event Hubs | Event Grid |
|---|---|---|---|
| Processing | ✅ Built-in | ❌ Storage only | ❌ Routing only |
| Throughput | Medium (SU-based) | ✅ Very High | High |
| Latency | Sub-second | Milliseconds | Seconds |
| SQL Queries | ✅ Yes | ❌ No | ❌ No |
| Schema Registry | ❌ No | ✅ Yes | ❌ No |
| Event Filtering | ✅ Yes | ❌ No | ✅ Yes |
| Cost Model | SU hourly | TU/CU | Per operation |
🔗 Next Steps¶
🚀 Quick Starts¶
📖 Deep Dive Documentation¶
🛠️ Hands-on Learning¶
Last Updated: 2025-01-28
Next Review: 2025-04-28