Video Script: Monitoring Dashboards for Azure Synapse¶

Video Metadata¶

Title: Building Monitoring Dashboards for Azure Synapse Analytics
Duration: 25:00
Target Audience: Data engineers, platform administrators
Skill Level: Intermediate
Prerequisites:
Azure Synapse workspace
Log Analytics workspace
Power BI Desktop
Understanding of KQL (Kusto Query Language)

Learning Objectives¶

Configure diagnostic settings for Synapse workspace
Create custom KQL queries for monitoring
Build Power BI dashboards for Synapse metrics
Set up alerting rules
Monitor SQL pool, Spark pool, and pipeline performance
Implement cost monitoring

Video Script¶

Opening (0:00 - 1:00)¶

NARRATOR: "You can't improve what you don't measure. In this tutorial, you'll learn how to build comprehensive monitoring dashboards for Azure Synapse Analytics, giving you real-time visibility into performance, costs, and potential issues before they impact your users."

Section 1: Diagnostic Settings (1:00 - 5:00)¶

Configure Log Analytics (1:00 - 3:00)¶

# Create Log Analytics workspace
az monitor log-analytics workspace create \
  --resource-group synapse-monitoring-rg \
  --workspace-name synapse-logs \
  --location eastus2

# Enable diagnostic settings for Synapse workspace
az monitor diagnostic-settings create \
  --name synapse-diagnostics \
  --resource /subscriptions/{sub-id}/resourceGroups/synapse-rg/providers/Microsoft.Synapse/workspaces/mysynapse \
  --logs '[
    {"category": "SynapseRbacOperations", "enabled": true},
    {"category": "GatewayApiRequests", "enabled": true},
    {"category": "SQLSecurityAuditEvents", "enabled": true},
    {"category": "BuiltinSqlReqsEnded", "enabled": true}
  ]' \
  --metrics '[
    {"category": "AllMetrics", "enabled": true}
  ]' \
  --workspace /subscriptions/{sub-id}/resourceGroups/synapse-monitoring-rg/providers/Microsoft.OperationalInsights/workspaces/synapse-logs

Log Categories (3:00 - 5:00)¶

Key Log Categories: - SynapseRbacOperations: RBAC changes - SQLSecurityAuditEvents: SQL audit logs - BuiltinSqlReqsEnded: Serverless SQL queries - SynapseSqlPoolExecRequests: Dedicated SQL queries - SynapseSqlPoolDmsWorkers: Data movement - SynapseSqlPoolWaits: SQL pool wait statistics - SynapseSparkJobsEnded: Spark job completion

Section 2: KQL Queries (5:00 - 12:00)¶

SQL Pool Monitoring (5:00 - 7:30)¶

// Long-running queries
SynapseSqlPoolExecRequests
| where TimeGenerated > ago(24h)
| where DurationMs > 60000  // Over 1 minute
| project
    TimeGenerated,
    ResourceId,
    Command,
    DurationMs,
    RowCount,
    Status,
    SubmittedBy
| order by DurationMs desc
| take 50

// Query performance trends
SynapseSqlPoolExecRequests
| where TimeGenerated > ago(7d)
| summarize
    AvgDuration = avg(DurationMs),
    P50Duration = percentile(DurationMs, 50),
    P95Duration = percentile(DurationMs, 95),
    QueryCount = count()
    by bin(TimeGenerated, 1h)
| render timechart

// Failed queries
SynapseSqlPoolExecRequests
| where TimeGenerated > ago(24h)
| where Status == "Failed"
| project
    TimeGenerated,
    Command,
    ErrorMessage = Error,
    SubmittedBy
| order by TimeGenerated desc

// Resource consumption by user
SynapseSqlPoolExecRequests
| where TimeGenerated > ago(24h)
| summarize
    TotalQueries = count(),
    AvgDuration = avg(DurationMs),
    TotalDataProcessed = sum(DataProcessedMB)
    by SubmittedBy
| order by TotalDataProcessed desc

Spark Pool Monitoring (7:30 - 10:00)¶

// Spark job completion status
SynapseSparkJobsEnded
| where TimeGenerated > ago(24h)
| summarize
    TotalJobs = count(),
    Succeeded = countif(Status == "Succeeded"),
    Failed = countif(Status == "Failed")
    by bin(TimeGenerated, 1h), AppName
| extend SuccessRate = (Succeeded * 100.0) / TotalJobs
| render timechart

// Resource utilization
SynapseSparkJobsEnded
| where TimeGenerated > ago(24h)
| project
    TimeGenerated,
    AppName,
    DurationMs,
    MemoryUsedMB,
    CoresUsed,
    Status
| summarize
    AvgDuration = avg(DurationMs),
    AvgMemory = avg(MemoryUsedMB),
    AvgCores = avg(CoresUsed)
    by AppName
| order by AvgDuration desc

// Failed Spark jobs with details
SynapseSparkJobsEnded
| where TimeGenerated > ago(24h)
| where Status == "Failed"
| project
    TimeGenerated,
    AppName,
    ErrorMessage,
    DurationMs,
    MemoryUsedMB
| order by TimeGenerated desc

// Spark pool auto-scaling events
SynapseSparkPoolsEnded
| where TimeGenerated > ago(7d)
| project
    TimeGenerated,
    PoolName,
    MinNodes,
    MaxNodes,
    CurrentNodes = NodeCount
| render timechart

Pipeline Monitoring (10:00 - 12:00)¶

// Pipeline run status
SynapsePipelineRuns
| where TimeGenerated > ago(24h)
| summarize
    Total = count(),
    Succeeded = countif(Status == "Succeeded"),
    Failed = countif(Status == "Failed"),
    InProgress = countif(Status == "InProgress")
    by bin(TimeGenerated, 1h), PipelineName
| extend SuccessRate = (Succeeded * 100.0) / Total
| render timechart

// Failed pipeline activities
SynapseActivityRuns
| where TimeGenerated > ago(24h)
| where Status == "Failed"
| project
    TimeGenerated,
    PipelineName,
    ActivityName,
    ErrorMessage,
    DurationMs
| order by TimeGenerated desc

// Pipeline duration trends
SynapsePipelineRuns
| where TimeGenerated > ago(7d)
| where Status == "Succeeded"
| summarize
    AvgDuration = avg(DurationMs),
    P95Duration = percentile(DurationMs, 95)
    by bin(TimeGenerated, 1d), PipelineName
| render timechart

// Pipeline cost estimation
SynapseActivityRuns
| where TimeGenerated > ago(30d)
| extend CostEstimate = DurationMs * 0.0001  // Example calculation
| summarize
    TotalCost = sum(CostEstimate),
    ExecutionCount = count()
    by PipelineName
| order by TotalCost desc

Section 3: Power BI Dashboard (12:00 - 18:00)¶

Connect to Log Analytics (12:00 - 13:30)¶

// Power Query M formula
let
    Source = AzureMonitorDataExplorer.Contents(
        "https://api.loganalytics.io/v1",
        "workspace-id",
        "SynapseSqlPoolExecRequests
        | where TimeGenerated > ago(30d)
        | summarize
            QueryCount = count(),
            AvgDuration = avg(DurationMs),
            DataProcessed = sum(DataProcessedMB)
            by bin(TimeGenerated, 1d), SubmittedBy"
    ),
    #"Expanded Value" = Table.ExpandRecordColumn(Source, "Value",
        {"TimeGenerated", "SubmittedBy", "QueryCount", "AvgDuration", "DataProcessed"})
in
    #"Expanded Value"

Dashboard Visualizations (13:30 - 18:00)¶

Page 1: Overview - Total queries (Card) - Active SQL pools (Card) - Running pipelines (Card) - Cost this month (Card) - Query duration trend (Line chart) - Pipeline success rate (Gauge) - Top 10 longest queries (Table)

Page 2: SQL Pool Performance - Query duration distribution (Histogram) - Queries by user (Bar chart) - Data processed by query type (Pie chart) - Wait statistics (Stacked bar chart) - Failed queries (Table with drill-through)

Page 3: Spark Jobs - Job success rate (KPI) - Resource utilization (Multi-line chart) - Job duration by application (Bar chart) - Memory usage trends (Area chart)

Page 4: Pipelines - Pipeline runs by status (Donut chart) - Average duration by pipeline (Bar chart) - Failed activities (Matrix) - Cost breakdown (Treemap)

DAX Measures:

// Success Rate
SuccessRate =
DIVIDE(
    CALCULATE(COUNT(Queries[QueryId]), Queries[Status] = "Succeeded"),
    COUNT(Queries[QueryId]),
    0
) * 100

// Average Duration (minutes)
AvgDurationMinutes =
AVERAGE(Queries[DurationMs]) / 60000

// Cost This Month
CostThisMonth =
CALCULATE(
    SUM(Queries[EstimatedCost]),
    DATESMTD(Calendar[Date])
)

// YoY Growth
YoYGrowth =
VAR CurrentYear = CALCULATE(COUNT(Queries[QueryId]), YEAR(Calendar[Date]) = YEAR(TODAY()))
VAR PreviousYear = CALCULATE(COUNT(Queries[QueryId]), YEAR(Calendar[Date]) = YEAR(TODAY()) - 1)
RETURN
DIVIDE(CurrentYear - PreviousYear, PreviousYear, 0) * 100

Section 4: Alerting (18:00 - 22:00)¶

Alert Rules (18:00 - 20:00)¶

{
  "name": "High-SQL-Pool-DWU-Usage",
  "description": "Alert when SQL pool DWU usage exceeds 80%",
  "severity": 2,
  "enabled": true,
  "scopes": [
    "/subscriptions/{sub-id}/resourceGroups/synapse-rg/providers/Microsoft.Synapse/workspaces/mysynapse"
  ],
  "evaluationFrequency": "PT5M",
  "windowSize": "PT15M",
  "criteria": {
    "allOf": [
      {
        "query": "SynapseSqlPoolDmsWorkers | where TimeGenerated > ago(15m) | summarize AvgDWU = avg(DWUPercent) | where AvgDWU > 80",
        "timeAggregation": "Average",
        "operator": "GreaterThan",
        "threshold": 80
      }
    ]
  },
  "actions": [
    {
      "actionGroupId": "/subscriptions/{sub-id}/resourceGroups/synapse-rg/providers/microsoft.insights/actionGroups/SynapseAlerts",
      "webhookProperties": {}
    }
  ]
}

Action Groups (20:00 - 22:00)¶

# Create action group
az monitor action-group create \
  --name SynapseAlerts \
  --resource-group synapse-rg \
  --short-name SynAlerts \
  --email-receiver name=DevTeam email=devteam@company.com \
  --webhook-receiver name=Slack uri=https://hooks.slack.com/services/xxx

# Create metric alert
az monitor metrics alert create \
  --name pipeline-failure-alert \
  --resource-group synapse-rg \
  --scopes /subscriptions/{sub-id}/resourceGroups/synapse-rg/providers/Microsoft.Synapse/workspaces/mysynapse \
  --condition "count PipelineFailedRuns > 3" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action SynapseAlerts

Section 5: Cost Monitoring (22:00 - 24:00)¶

Cost Analysis Queries (22:00 - 23:00)¶

// SQL pool cost estimation
SynapseSqlPoolExecRequests
| where TimeGenerated > ago(30d)
| extend CostEstimate = (DurationMs / 3600000.0) * 1.20  // $1.20/hour example
| summarize
    TotalCost = sum(CostEstimate),
    QueryCount = count()
    by bin(TimeGenerated, 1d)
| render timechart

// Spark job cost
SynapseSparkJobsEnded
| where TimeGenerated > ago(30d)
| extend CostEstimate = (DurationMs / 3600000.0) * CoresUsed * 0.10  // Example rate
| summarize
    TotalCost = sum(CostEstimate),
    JobCount = count()
    by bin(TimeGenerated, 1d), AppName
| order by TotalCost desc

Cost Optimization Recommendations (23:00 - 24:00)¶

// Idle SQL pools
AzureActivity
| where TimeGenerated > ago(7d)
| where OperationNameValue == "MICROSOFT.SYNAPSE/WORKSPACES/SQLPOOLS/RESUME/ACTION"
| summarize ResumeCount = count() by Resource
| where ResumeCount < 5
| project Resource, ResumeCount, Recommendation = "Consider pausing or deleting"

// Under-utilized Spark pools
SynapseSparkJobsEnded
| where TimeGenerated > ago(7d)
| summarize
    AvgCoresUsed = avg(CoresUsed),
    MaxCoresAvailable = max(MaxCores)
    by PoolName
| extend Utilization = (AvgCoresUsed / MaxCoresAvailable) * 100
| where Utilization < 30
| project PoolName, Utilization, Recommendation = "Consider reducing pool size"

Conclusion (24:00 - 25:00)¶

Best Practices: 1. Set up comprehensive diagnostic logging 2. Create role-based dashboards 3. Implement proactive alerting 4. Review metrics weekly 5. Optimize based on insights

Next Steps: - Customize queries for your workload - Set up automated reports - Integrate with ITSM tools - Train team on dashboard usage

Last Updated: January 2025