Home > Docs > Runbooks > Certificate Expiration
Runbook — Certificate Expiration & Rotation¶
Scope: Certificates and secrets for managed identities, Key Vault, Application Gateway, API Management, custom domains, and Entra ID App Registrations across the CSA-in-a-Box platform. Covers proactive monitoring, manual and automated rotation, and preventive controls.
Before First Use — Customization Checklist¶
- Populate the Contact Information table.
- Confirm Key Vault names per environment (dev / staging / prod).
- Confirm App Gateway resource names and listener bindings.
- Confirm APIM instance names and custom domain mappings.
- Wire Event Grid subscriptions for
SecretNearExpiryandCertificateNearExpiryevents to your alerting pipeline.
📑 Table of Contents¶
- 📋 1. Symptoms
- 🔍 2. Triage
- 📦 3. Certificate Inventory
- 🔒 4. Rotation Procedures
- ⚙️ 5. Automation
- 🛡️ 6. Preventive Controls
- 📎 7. Contact Information
- 🗓️ 8. Drill Log
- 🔗 9. Related Documentation
📋 1. Symptoms¶
| Symptom | Typical Source | Severity |
|---|---|---|
TLS handshake errors (ERR_CERT_DATE_INVALID) | App Gateway listener cert or APIM custom domain cert expired | P1 |
| Service auth failures (401 / 403 on internal API calls) | Entra ID App Registration secret/certificate expired | P1 |
Key Vault GET secret returns Forbidden / SecretDisabled | Key Vault secret expired or access policy revoked | P2 |
| App Gateway returning 502 Bad Gateway | Backend TLS cert expired; gateway cannot complete handshake | P1 |
Event Hub / Service Bus 401 Unauthorized on send/receive | SAS key expired or rotated without consumer update | P2 |
| Certificate warning emails from CA (DigiCert / GlobalSign) | Automated CA notification — cert nearing expiry | P3 |
🔍 2. Triage¶
Step 1: Identify which certificate expired¶
- Key Vault: Portal → Key Vault → Certificates. Check the
Expiry Datecolumn. - App Gateway: Portal → Application Gateway → Listeners. Each HTTPS listener shows cert status.
- APIM: Portal → API Management → Custom domains. Check thumbprint and expiry.
- Entra ID: Portal → App registrations → select app → Certificates & secrets.
Step 2: Check Key Vault certificate expiry dashboard¶
AzureDiagnostics
| where ResourceType == "VAULTS"
| where OperationName in ("CertificateNearExpiry", "SecretNearExpiry")
| where TimeGenerated > ago(7d)
| project TimeGenerated, vaultName = Resource,
objectName = tostring(properties_s),
expiryDate = tostring(parse_json(properties_s).exp)
| order by expiryDate asc
Step 3: Verify managed identity credential status¶
Managed identities do not have user-rotatable credentials. If a managed identity is failing, check role assignments and firewall rules — not certs.
Step 4: Check App Registration secret/certificate expiry¶
az ad app credential list --id <app-id> \
--query '[].{keyId:keyId,displayName:displayName,endDateTime:endDateTime}' -o table
Danger
If a credential is already expired and causing production failures, skip to §4 Rotation Procedures immediately.
📦 3. Certificate Inventory¶
| Certificate Type | Location | Rotation Method | Cadence |
|---|---|---|---|
| Key Vault TLS certs | Key Vault → Certificates | Auto-renew via DigiCert / GlobalSign | Auto (30d before expiry) |
| App Gateway listener certs | App Gateway → Listeners (from KV) | Update KV cert → App Gw picks up via MI | Follows KV lifecycle |
| APIM custom domain certs | APIM → Custom domains (from KV) | Update KV cert → re-bind in APIM | Follows KV lifecycle |
| Entra ID App Reg secrets | Entra ID → App Reg → Certs & secrets | Manual: create new → update consumers → delete old | 90 days max |
| Entra ID App Reg certificates | Entra ID → App Reg → Certs & secrets | Manual: upload new → update consumers → remove old | 12 months |
| Managed identity credentials | Azure-managed | No manual rotation needed | N/A |
| Service Bus / Event Hub SAS keys | Namespace → Shared access policies | Regenerate primary → update → regenerate secondary | 90 days |
🔒 4. Rotation Procedures¶
4.1 Key Vault auto-rotation setup¶
- Navigate to Key Vault → Certificates → select cert → Issuance Policy.
- Set Lifetime Action Type to
AutoRenew, Days Before Expiry to30. - Confirm CA integration is healthy:
Tip
For certs not issued by an integrated CA, use the Event Grid CertificateNearExpiry event to trigger an Azure Function for renewal.
4.2 Manual certificate rotation (Key Vault → App Gateway)¶
- Import the new certificate (PFX with private key) into Key Vault:
- App Gateway picks up the new version automatically within 4 hours. To force an immediate refresh:
- Validate the listener is serving the new certificate:
- Monitor for 502 errors in the 30 minutes post-rotation:
4.3 App Registration secret rotation¶
Warning
Never delete the old secret before all consumers are updated. The overlap window prevents downtime.
- Create a new client secret (do not touch the old one yet):
- Store the new value in Key Vault:
- Update all consumers (restart pods / Function apps that cache the value).
- Verify authentication succeeds with the new secret:
- After 24 hours of confirmed success, delete the old secret:
4.4 APIM custom domain certificate rotation¶
- Import the new certificate into Key Vault (see §4.2).
- Re-bind in APIM (APIM binds by thumbprint, so a new version requires re-binding):
- Validate the new cert is served:
4.5 SAS key rotation (Service Bus / Event Hub)¶
- Regenerate the secondary key:
- Update Key Vault with the new secondary key. Update consumers; wait 1 hour.
- Regenerate the primary key:
- Next rotation cycle, swap direction (consumers → primary, regenerate secondary).
⚙️ 5. Automation¶
5.1 Event Grid notifications for near-expiry¶
az eventgrid event-subscription create \
--name cert-expiry-alert \
--source-resource-id "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault>" \
--included-event-types Microsoft.KeyVault.CertificateNearExpiry Microsoft.KeyVault.SecretNearExpiry \
--endpoint-type azurefunction \
--endpoint "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Web/sites/<func-app>/functions/<func-name>"
5.2 Automation runbook for non-integrated CA certs¶
$vaultName = "<vault>"
$thresholdDays = 30
$certs = Get-AzKeyVaultCertificate -VaultName $vaultName
foreach ($cert in $certs) {
$detail = Get-AzKeyVaultCertificate -VaultName $vaultName -Name $cert.Name
$daysLeft = ($detail.Certificate.NotAfter - (Get-Date)).Days
if ($daysLeft -le $thresholdDays) {
Write-Output "EXPIRING: $($cert.Name) expires in $daysLeft days"
# Trigger renewal logic here
}
}
5.3 Azure Policy for certificate lifetime enforcement¶
az policy assignment create \
--name "cert-max-validity" \
--policy "0a075868-4c26-42ef-914c-5bc007359560" \
--params '{"maximumValidityInMonths":{"value":12}}' \
--scope "/subscriptions/<sub>"
🛡️ 6. Preventive Controls¶
6.1 Certificate lifecycle policy¶
| Control | Setting | Rationale |
|---|---|---|
| App Registration secrets max lifetime | 90 days | NIST 800-53 SC-12 compliance |
| Key Vault TLS certificates auto-renew | 30 days before expiry | Prevents manual renewal gaps |
| Key Vault certificate max validity | 12 months | Policy-enforced via Azure Policy |
| SAS keys rotation cadence | 90 days | Aligns with secret rotation schedule |
6.2 Alert rules for certificates expiring within 30 / 14 / 7 days¶
| Alert | Threshold | Severity | Action |
|---|---|---|---|
| Certificate expiring — 30 days | 30 days to expiry | Sev 3 | Email platform team |
| Certificate expiring — 14 days | 14 days to expiry | Sev 2 | Email + Teams channel |
| Certificate expiring — 7 days | 7 days to expiry | Sev 1 | PagerDuty / OpsGenie page |
| Certificate expired | 0 days | Sev 0 | Page on-call + auto-incident |
let threshold = 14d;
AzureDiagnostics
| where ResourceType == "VAULTS"
| where OperationName == "CertificateNearExpiry"
| where TimeGenerated > ago(1d)
| extend certName = tostring(parse_json(properties_s).objectName)
| extend expiryTime = todatetime(parse_json(properties_s).exp)
| where expiryTime - now() < threshold
| project certName, expiryTime, daysRemaining = datetime_diff("day", expiryTime, now()), vaultName = Resource
6.3 Monitoring setup¶
Enable Key Vault diagnostics and route to Log Analytics:
az monitor diagnostic-settings create \
--name kv-diagnostics \
--resource "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault>" \
--workspace "<log-analytics-workspace-id>" \
--logs '[{"category":"AuditEvent","enabled":true,"retentionPolicy":{"enabled":true,"days":90}}]'
📎 7. Contact Information¶
Warning
Action Required: Populate these before first production use.
| Role | Contact | Phone | Escalation |
|---|---|---|---|
| Platform On-Call | (set via your org's on-call roster) | (see PagerDuty / OpsGenie) | First responder |
| Platform Team Lead | (set via your org's platform team) | (see PagerDuty / OpsGenie) | P1/P2 escalation |
| Security On-Call | (set via your org's security team) | (see PagerDuty / OpsGenie) | Compromised certs |
| App Reg Owner | (per-app registration — see governance RBAC) | (DL) | Entra ID credential rotation |
| Azure Support | Case via Portal | N/A | Platform issues |
🗓️ 8. Drill Log¶
Run this runbook in tabletop form quarterly. Add one row per drill.
| Quarter | Date | Type (tabletop / live) | Scenario exercised | Lead | Gaps identified | Fixes tracked |
|---|---|---|---|---|---|---|
| Q1 — Jan | TBD | TBD | TBD | TBD | TBD | TBD |
| Q2 — Apr | TBD | TBD | TBD | TBD | TBD | TBD |
| Q3 — Jul | TBD | TBD | TBD | TBD | TBD | TBD |
| Q4 — Oct | TBD | TBD | TBD | TBD | TBD | TBD |
🔗 9. Related Documentation¶
- Key Rotation — Secret and access key rotation procedures
- Security Incident — Compromise response
- Break-Glass Access — Emergency admin flow
- DR Drill — Key Vault restore scenario
- Dead Letter — Rotation failure dead-letter recovery