Skip to content
CSA Loom — the Microsoft Fabric experience for Azure tenants where Fabric isn't yet available: lakehouses, warehouses, notebooks, semantic models, Activator rules, Data Agents, across Commercial, GCC, GCC-High, and DoD IL5

CSA Loom — Operations

Day-2 operations for CSA Loom: capacity management, monitoring, cost, DR, upgrades, forward migration. Customer ops teams can run Loom in production using this section + the runbooks.

Topics

  • Capacity management

    The CU-equivalent model + per-service scaling. Pause/resume patterns. Monitoring capacity-overrun risks.

  • Monitoring & observability

    Monitoring Hub deep-dive. Pre-built KQL queries. Per-engine telemetry sources.

  • Cost management

    Cost-optimization patterns. Pause/resume. ADX hot/cold tiering. Power BI smoothing. AOAI provisioned vs PAYG.

  • Disaster recovery

    Per-component DR + RPO/RTO targets. Region pairs. Failover drills.

  • Upgrade & migration

    Upgrade lifecycle (azd up re-run + Console "Updates" pane); single-sub → multi-sub conversion; boundary promotion.

  • Forward migration to Microsoft Fabric

    The strategic anchor: when Fabric reaches your boundary, migrate forward 1:1 via OneLake shortcut + per-artifact mapping.

Day-2 responsibilities (split between Loom + customer)

Responsibility Who
Container image patching Loom (push to ACR; customer pulls via Console "Updates")
Bicep module updates Loom (via release tags; customer azd up re-runs)
Azure resource patching (Databricks runtime, ADX engine, etc.) Microsoft (managed services)
Capacity scaling decisions Customer (Console "Admin → Capacity")
Workspace creation + lifecycle Customer (Console "Workspaces" pane)
Per-workspace member management Customer (Entra groups)
Incident response for customer-data issues Customer (with Loom runbooks as reference)
Loom Console / parity service incidents Customer first; escalate via GitHub Issues if blocked

Runbook index

The full runbook library is at runbooks section. Common patterns:

Runbook When to use
Deploy failure Initial azd up or DLZ-add fails
Direct-Lake-Shim stuck Power BI semantic model not refreshing
Activator rules not firing Expected Activator action didn't dispatch
Mirroring CDC lag Mirror is more than N minutes behind source
Copilot throttling AOAI 429s in Console
Capacity overrun CU-equivalent exceeds threshold
DLZ onboard new domain Adding agency / mission domain
Forward migrate to Fabric Fabric GA in your boundary
Boundary promotion GCC-H → IL5 promotion
Defender AI equivalent SOC Sentinel pipeline health check
MCP troubleshooting MCP server / wizard issues
Purview scan stuck Catalog scan stalls

SLAs (operational targets)

Metric Target
Loom Console availability 99.5% / month
Loom Setup Wizard deploy success rate > 95%
Direct-Lake-Shim refresh latency (partition) p50 < 30 s; p95 < 60 s
Activator end-to-end latency 5-30 s
Mirroring CDC steady-state lag < 60 s
Loom Copilot response time p50 < 3 s; p95 < 10 s