🚀 Deployment Guide¶

📋 Overview¶

This guide provides step-by-step instructions for deploying the Azure Real-Time Analytics platform infrastructure using Infrastructure as Code (IaC) with Terraform or Azure Bicep.

📑 Table of Contents¶

Prerequisites
Environment Setup
Infrastructure Deployment
Configuration
Validation
Troubleshooting

✅ Prerequisites¶

Required Tools¶

# Check Azure CLI version (2.50+ required)
az --version

# Check Terraform version (1.5+ required)
terraform --version

# Check Databricks CLI
databricks --version

# Check Power BI CLI
pbicli --version

Required Permissions¶

Azure Permissions:
  - Subscription: Owner or Contributor + User Access Administrator
  - Resource Groups: Create and manage
  - Role Assignments: Create custom roles
  - Policy Assignments: Apply governance policies

Service Principals:
  - Terraform Service Principal with Contributor role
  - Databricks Service Principal for automation
  - Power BI Service Principal for Direct Lake

Azure Subscription Setup¶

# Login to Azure
az login

# Set default subscription
az account set --subscription "Your-Subscription-Name"

# Create service principal for Terraform
az ad sp create-for-rbac \
  --name "sp-terraform-realtime-analytics" \
  --role Contributor \
  --scopes /subscriptions/$(az account show --query id -o tsv)

🛠️ Environment Setup¶

1. Clone Repository¶

# Clone the infrastructure repository
git clone https://github.com/your-org/azure-realtime-analytics-infra.git
cd azure-realtime-analytics-infra

# Initialize git submodules if any
git submodule update --init --recursive

2. Configure Environment Variables¶

# Create .env file from template
cp .env.template .env

# Edit .env with your values
cat > .env << EOF
# Azure Configuration
AZURE_SUBSCRIPTION_ID=your-subscription-id
AZURE_TENANT_ID=your-tenant-id
AZURE_CLIENT_ID=your-service-principal-id
AZURE_CLIENT_SECRET=your-service-principal-secret

# Deployment Configuration
ENVIRONMENT=dev
LOCATION=eastus2
RESOURCE_GROUP_NAME=rg-realtime-analytics-dev

# Databricks Configuration
DATABRICKS_WORKSPACE_NAME=dbw-realtime-analytics-dev
DATABRICKS_PRICING_TIER=premium

# Storage Configuration
STORAGE_ACCOUNT_NAME=strtimeanalyticsdev
STORAGE_REPLICATION=ZRS

# Network Configuration
VNET_ADDRESS_SPACE=10.0.0.0/16
DATABRICKS_PUBLIC_SUBNET=10.0.1.0/24
DATABRICKS_PRIVATE_SUBNET=10.0.2.0/24
EOF

# Source environment variables
source .env

3. Initialize Terraform¶

# Navigate to Terraform directory
cd infrastructure/terraform

# Initialize Terraform
terraform init

# Create workspace for environment
terraform workspace new dev
terraform workspace select dev

# Validate configuration
terraform validate

🏗️ Infrastructure Deployment¶

Phase 1: Core Infrastructure¶

# Deploy core infrastructure
terraform apply -target=module.core -var-file=environments/dev.tfvars

# Resources created:
# - Resource Groups
# - Virtual Networks
# - Network Security Groups
# - Key Vault
# - Log Analytics Workspace

Phase 2: Storage Layer¶

# Deploy storage resources
terraform apply -target=module.storage -var-file=environments/dev.tfvars

# Resources created:
# - ADLS Gen2 Storage Account
# - Bronze, Silver, Gold containers
# - Private endpoints
# - Lifecycle policies

Phase 3: Databricks Platform¶

# Deploy Databricks workspace
terraform apply -target=module.databricks -var-file=environments/dev.tfvars

# Resources created:
# - Databricks workspace
# - VNet injection
# - Unity Catalog metastore
# - Initial clusters

Phase 4: Streaming Infrastructure¶

# Deploy streaming components
terraform apply -target=module.streaming -var-file=environments/dev.tfvars

# Resources created:
# - Event Hubs namespace
# - Kafka connectors
# - Stream Analytics jobs
# - Function Apps

Phase 5: Analytics Layer¶

# Deploy analytics components
terraform apply -target=module.analytics -var-file=environments/dev.tfvars

# Resources created:
# - Power BI Premium capacity
# - Azure OpenAI instance
# - API Management
# - Application Insights

Complete Deployment¶

# Deploy all resources
terraform apply -var-file=environments/dev.tfvars

# Review plan before applying
terraform plan -var-file=environments/dev.tfvars -out=tfplan
terraform apply tfplan

⚙️ Configuration¶

1. Databricks Configuration¶

# databricks_setup.py
import os
from databricks.sdk import WorkspaceClient

# Initialize client
w = WorkspaceClient(
    host=os.environ['DATABRICKS_HOST'],
    token=os.environ['DATABRICKS_TOKEN']
)

# Create catalogs
w.catalogs.create(
    name='realtime_analytics',
    comment='Real-time analytics catalog'
)

# Create schemas
for schema in ['bronze', 'silver', 'gold']:
    w.schemas.create(
        name=schema,
        catalog_name='realtime_analytics',
        comment=f'{schema.capitalize()} layer schema'
    )

# Configure cluster policies
cluster_policy = {
    "spark_version": {"type": "fixed", "value": "13.3.x-scala2.12"},
    "node_type_id": {"type": "allowlist", "values": ["Standard_D16s_v3", "Standard_D32s_v3"]},
    "autoscale": {"type": "fixed", "value": {"min_workers": 2, "max_workers": 50}},
    "autotermination_minutes": {"type": "range", "minValue": 10, "maxValue": 120}
}

w.cluster_policies.create(
    name='streaming-cluster-policy',
    definition=cluster_policy
)

2. Storage Configuration¶

# Configure storage lifecycle policies
az storage management-policy create \
  --account-name $STORAGE_ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP_NAME \
  --policy @storage-lifecycle-policy.json

# Set up private endpoints
az network private-endpoint create \
  --name pe-storage-blob \
  --resource-group $RESOURCE_GROUP_NAME \
  --vnet-name vnet-realtime-analytics \
  --subnet pe-subnet \
  --private-connection-resource-id $(az storage account show -n $STORAGE_ACCOUNT_NAME -g $RESOURCE_GROUP_NAME --query id -o tsv) \
  --group-id blob \
  --connection-name storage-blob-connection

3. Kafka Configuration¶

# kafka-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: kafka-config
data:
  bootstrap.servers: "pkc-xxxxx.eastus2.azure.confluent.cloud:9092"
  security.protocol: "SASL_SSL"
  sasl.mechanism: "PLAIN"
  schema.registry.url: "https://psrc-xxxxx.us-east-2.aws.confluent.cloud"

  topics:
    - name: events
      partitions: 20
      replication: 3
      retention.ms: 604800000  # 7 days

    - name: metrics
      partitions: 10
      replication: 3
      retention.ms: 259200000  # 3 days

4. Power BI Configuration¶

# Configure Power BI Premium workspace
Install-Module -Name MicrosoftPowerBIMgmt

# Connect to Power BI
Connect-PowerBIServiceAccount

# Create workspace
New-PowerBIWorkspace `
  -Name "RealTimeAnalytics" `
  -Description "Real-time analytics workspace"

# Assign to Premium capacity
Set-PowerBIWorkspace `
  -Id $workspaceId `
  -CapacityId $capacityId

# Configure Direct Lake
$datasetConfig = @{
    "mode" = "DirectLake"
    "datasources" = @(
        @{
            "datasourceType" = "AnalysisServices"
            "connectionDetails" = @{
                "server" = "powerbi://api.powerbi.com/v1.0/myorg/RealTimeAnalytics"
                "database" = "gold"
            }
        }
    )
}

✅ Validation¶

Infrastructure Validation Script¶

#!/bin/bash
# validate_deployment.sh

echo "🔍 Validating Azure Real-Time Analytics Deployment..."

# Check resource groups
echo "Checking resource groups..."
az group show --name $RESOURCE_GROUP_NAME > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "✅ Resource group exists"
else
    echo "❌ Resource group not found"
    exit 1
fi

# Check Databricks workspace
echo "Checking Databricks workspace..."
az databricks workspace show \
  --name $DATABRICKS_WORKSPACE_NAME \
  --resource-group $RESOURCE_GROUP_NAME > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "✅ Databricks workspace exists"
else
    echo "❌ Databricks workspace not found"
    exit 1
fi

# Check storage account
echo "Checking storage account..."
az storage account show \
  --name $STORAGE_ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP_NAME > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "✅ Storage account exists"
else
    echo "❌ Storage account not found"
    exit 1
fi

# Test connectivity
echo "Testing Databricks connectivity..."
databricks workspace list > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "✅ Databricks CLI connected"
else
    echo "❌ Databricks CLI connection failed"
    exit 1
fi

echo "✨ Deployment validation completed successfully!"

Health Check Dashboard¶

# health_check.py
import requests
import json
from datetime import datetime

def check_service_health(service_name, endpoint, expected_status=200):
    """Check if a service is healthy."""
    try:
        response = requests.get(endpoint, timeout=5)
        is_healthy = response.status_code == expected_status
        return {
            "service": service_name,
            "status": "healthy" if is_healthy else "unhealthy",
            "response_time": response.elapsed.total_seconds(),
            "status_code": response.status_code,
            "timestamp": datetime.utcnow().isoformat()
        }
    except Exception as e:
        return {
            "service": service_name,
            "status": "error",
            "error": str(e),
            "timestamp": datetime.utcnow().isoformat()
        }

# Check all services
services = [
    ("Databricks", f"https://{os.environ['DATABRICKS_HOST']}/api/2.0/clusters/list"),
    ("Storage", f"https://{os.environ['STORAGE_ACCOUNT_NAME']}.blob.core.windows.net/"),
    ("Event Hubs", f"https://{os.environ['EVENT_HUBS_NAMESPACE']}.servicebus.windows.net/"),
    ("Power BI", "https://api.powerbi.com/v1.0/myorg/groups")
]

health_results = []
for service_name, endpoint in services:
    result = check_service_health(service_name, endpoint)
    health_results.append(result)
    print(f"{result['service']}: {result['status']}")

# Save results
with open('health_check_results.json', 'w') as f:
    json.dump(health_results, f, indent=2)

🚨 Troubleshooting¶

Common Issues and Solutions¶

Issue	Symptoms	Solution
Terraform state lock	"Error acquiring the state lock"	Run `terraform force-unlock <lock-id>`
Insufficient quota	"OperationNotAllowed" errors	Request quota increase in Azure portal
VNet peering failed	Databricks unreachable	Verify address spaces don't overlap
Storage access denied	403 errors on containers	Check firewall rules and private endpoints
Cluster startup fails	"Cluster terminated"	Review driver logs in Databricks

Rollback Procedure¶

# Create backup of current state
terraform state pull > terraform.tfstate.backup

# Rollback to previous version
terraform destroy -target=module.affected_module -var-file=environments/dev.tfvars

# Restore from backup if needed
terraform state push terraform.tfstate.backup

Support Escalation¶

Level 1: Check deployment logs and health dashboard
Level 2: Review Azure Monitor alerts and diagnostics
Level 3: Contact platform team: platform@company.com
Level 4: Open Azure support ticket (if critical)

📊 Post-Deployment Checklist¶

📚 Next Steps¶

Configure Databricks - Set up workspaces and clusters
Implement Stream Processing - Deploy streaming pipelines
Setup Monitoring - Configure observability
Run Performance Tests - Validate system performance

Last Updated: January 29, 2025
Version: 1.0.0
Maintainer: Platform Engineering Team