Monitoring & observability¶
CSA Loom uses Application Insights + Log Analytics + Microsoft Sentinel (Gov) as the telemetry backbone. The Loom Console "Monitoring Hub" pane is the unified UX.
Telemetry sources¶
| Source | Sink |
|---|---|
| Loom Console (browser) | App Insights via javascripts/app-insights.js |
| Loom Console BFF | App Insights server-side |
| Loom Setup Wizard | App Insights + Activity Log |
| Loom Copilot | App Insights (per-turn telemetry per azure-functions/copilot-chat/telemetry.py) |
| MCP server tool calls | Activity Log + App Insights with correlation IDs |
| Loom Activator Engine | App Insights + Sentinel |
| Loom Mirroring Engine | App Insights + Databricks Spark UI |
| Loom Direct-Lake-Shim | App Insights + TOM refresh logs |
| Databricks workspaces | System tables → exported to LAW |
| Synapse Serverless | Synapse diagnostics → LAW |
| ADX | Native KQL on the cluster + diagnostic logs → LAW |
| Power BI Premium | Power BI activity log + capacity metrics |
| Purview | Activity log → LAW |
| ADLS Gen2 | Storage diagnostics → LAW |
| Container Apps / AKS | Container insights → LAW |
Monitoring Hub UI¶
The Console "Monitoring" pane aggregates the above:
- Capacity utilization — CU-equivalent dashboard (see Capacity management)
- Query history — unified across Databricks SQL, Synapse, ADX, Power BI XMLA
- Deploy history — every MCP-mediated deploy + Bicep diff
- Activator firing log
- Mirroring lag
- Cost dashboard — Azure Cost Management API integration
- Audit log search — Sentinel-backed in Gov
Pre-built KQL queries¶
Ships in docs/fiab/operations/queries.kql (referenced from the Console Monitoring Hub).
Capacity utilization (CU-equivalent) over last 24h¶
let dbx = DatabricksClusterEvents
| where TimeGenerated > ago(24h)
| summarize dbu = sum(dbuConsumed) by bin(TimeGenerated, 5m);
let ad = ADXIngestionEvents
| summarize vcs = sum(vcoreSeconds) by bin(TimeGenerated, 5m);
let pbi = PowerBICapacityEvents
| summarize memMb = max(memoryMb) by bin(TimeGenerated, 5m);
let aoai = ContainerLogsForCopilotChat
| where Message contains "openai-tokens-out"
| summarize tpm = sum(tokens) by bin(TimeGenerated, 5m);
union dbx, ad, pbi, aoai
| summarize cu_estimate = (dbu * 16) + (vcs / 60) + (memMb / 1024) + (tpm / 50000)
by bin(TimeGenerated, 5m)
Direct-Lake-Shim refresh latency¶
TraceLogs
| where Category == "DirectLakeShim"
| where Message contains "RefreshComplete"
| extend latency_seconds = todouble(extract(@"latencySeconds=(\d+\.?\d*)", 1, Message))
| summarize p50 = percentile(latency_seconds, 50),
p95 = percentile(latency_seconds, 95)
by bin(TimeGenerated, 1h)
Activator rule firing count¶
ActivatorEngineLogs
| where Category == "RuleFiring"
| summarize firings = count() by RuleId, bin(TimeGenerated, 1h)
Mirroring CDC lag per source¶
MirroringEngineLogs
| where Category == "CDCLag"
| extend lag_seconds = todouble(extract(@"lagSeconds=(\d+)", 1, Message))
| summarize p95_lag = percentile(lag_seconds, 95)
by sourceType, bin(TimeGenerated, 5m)
Loom Copilot error rate¶
CopilotChatLogs
| where TimeGenerated > ago(24h)
| summarize errors = countif(severity == "Error"),
total = count()
by bin(TimeGenerated, 1h)
| extend error_rate = errors * 100.0 / total
Sentinel (Gov)¶
In Gov boundaries (GCC-H / IL5), Loom Copilot telemetry is also routed to Microsoft Sentinel via a custom DCR (Data Collection Rule) per Defender AI workaround.
Sentinel analytics rules: - Excessive PII redactions per user - Off-topic refusals spike (possible prompt injection) - Unusually long outputs (likely jailbreak) - High-rate same-prompt repetition (likely bot) - Cross-workspace exfiltration patterns
Custom dashboards¶
The Monitoring Hub also accepts custom KQL queries + workbooks. Save your org-specific queries to Cosmos DB via the Console "Save query" action.