Runbook — Loom Copilot throttling¶

Symptom¶

Loom Copilot returns 429 (rate-limited) errors. Console chat sidebar shows "I'm temporarily rate-limited; please retry in a moment."

Diagnosis¶

# 1. Check AOAI deployment TPM usage
az cognitiveservices account deployment list \
  --resource-group <rg> \
  --name <aoai-account> \
  --query "[].{name:name,tpm:properties.scaleSettings.capacity}"

# 2. Check actual TPM consumption (App Insights)
# Query:
CopilotChatLogs
| where TimeGenerated > ago(1h)
| where ResponseCode == "429"
| summarize count() by bin(TimeGenerated, 5m), deploymentName

# 3. Check per-user request rate
CopilotChatLogs
| where TimeGenerated > ago(1h)
| summarize requests = count() by userPrincipalName, bin(TimeGenerated, 5m)
| order by requests desc

# 4. Check the rate-limit configuration in azure-functions/copilot-chat
# See _rate_limits in function_app.py

Common causes + fixes¶

Cause	Fix
AOAI deployment TPM at quota	Scale TPM up via Azure portal; or split across multiple deployments
Single user / bot driving most traffic	Apply per-user rate limit (already in function_app.py); identify + contact user
Burst from multiple users (e.g., demo)	Pre-scale TPM before known demo windows
Cross-region TPM not provisioned (Data Zone Standard)	Add Data Zone Standard deployment in paired region
Long-context conversations (large input/output tokens)	Apply token budget per request (already in function_app.py)
AOAI deployment soft-deleted / wrong endpoint	Verify Console "Admin → AI Settings" points at active deployment

Remediation¶

Identify the throttling cause (which user / which deployment)
Apply fix per table — usually scale TPM or split deployments

Verify new requests succeed:

curl https://<console-url>/api/loom-chat -X POST \
  -H "Authorization: Bearer <token>" \
  -d '{"message": "test"}'

Backlog any requests that 429'd during outage (Console "Copilot → Pending" pane shows queued)

Prevention¶

Monitor copilot-429-rate > 1% → alert
Use Provisioned Managed Throughput (PMT) for stable production workloads instead of PAYG Standard
Pre-scale TPM before known demos / training events
Per-user rate limits already enforced in function_app.py (see [[copilot-chat-two-backends]] memory)

Workload: Copilot parity
Memory: [[copilot-chat-two-backends]]
Parent runbook: Azure OpenAI throttling

Runbook — Loom Copilot throttling¶

Symptom¶

Diagnosis¶

Common causes + fixes¶

Remediation¶

Prevention¶

Related¶