Home > Docs > Runbooks > Security Incident
Incident Response Runbook — Security Events¶
Note
Quick Summary: Step-by-step incident response procedures for CSA-in-a-Box security events, including severity classification (P1-P4), containment steps, investigation KQL queries, common scenarios (exposed keys, token leaks, policy violations, pipeline tampering), evidence preservation, and communication templates.
✅ Before First Use — Customization Checklist (CSA-0070)¶
This runbook ships with placeholder contacts. It is not safe to invoke in a live incident until your organisation has completed the items below. Check each off in a PR against this file so the runbook history reflects who customised which fields and when.
- Populate the Contact Information table with your Platform Team Lead, Security Officer, Data Protection Officer, and Legal Counsel. Remove the
*(set via ...)*stubs. - Replace generic Azure Support link with your organisation's Azure TAM / Premier Support channel if applicable.
- Wire up an on-call rotation in PagerDuty / OpsGenie / Teams Shifts — paste the on-call URL into the Contact table.
- Confirm your SOC queue address (DL) for the internal notification template under Communication Templates.
- Add any region-specific legal notification windows (e.g. GDPR 72-hour DPO notification, HIPAA 60-day breach notification).
- Update the Last Drilled banner above and the Drill Log after each tabletop / live drill.
Warning
Do not remove this section after first use. New operators need the same onboarding pass on every fork / airgapped deployment.
📑 Table of Contents¶
- 📋 Scope
- 🔒 Severity Classification
- 🚀 Initial Response (All Severities)
- 💡 Common Scenarios
- 📋 Evidence Preservation Checklist
- 📝 Communication Templates
- 📎 Contact Information
📋 Scope¶
This runbook covers security incidents detected on the CSA-in-a-Box data platform, including unauthorized access, data exfiltration, and configuration tampering.
🔒 Severity Classification¶
| Severity | Description | Response Time | Escalation |
|---|---|---|---|
| P1 — Critical | Active data breach, credentials exposed | 1 hour | CISO, Legal |
| P2 — High | Unauthorized access attempt, policy violation | 4 hours | Platform Team Lead |
| P3 — Medium | Configuration drift, suspicious activity | 24 hours | On-call engineer |
| P4 — Low | Informational alert, audit finding | 72 hours | Team queue |
🚀 Initial Response (All Severities)¶
Step 1: Assess¶
// Check recent security alerts
SecurityAlert
| where TimeGenerated > ago(1h)
| project TimeGenerated, AlertName, AlertSeverity, Description, RemediationSteps
| order by TimeGenerated desc
Step 2: Contain¶
Danger
DO NOT delete evidence or modify logs.
- If active breach: Disable compromised identities immediately
- If data exfiltration: Block outbound traffic via firewall rule
- Preserve current state: Take storage account snapshots
Step 3: Investigate¶
// Track activity of compromised identity
AzureActivity
| where Caller == "<compromised-identity>"
| where TimeGenerated > ago(24h)
| project TimeGenerated, OperationNameValue, ResourceGroup, _ResourceId
| order by TimeGenerated desc
Step 4: Eradicate¶
- Rotate all credentials associated with the compromised identity
- Revoke Key Vault access
- Remove unauthorized role assignments
- Update NSG / Firewall rules if network-based attack
Step 5: Recover¶
- Verify all unauthorized access is revoked
- Re-enable services with new credentials
- Monitor for 48 hours for recurrence
Step 6: Post-Incident¶
- Create incident report (within 72 hours)
- Update RBAC matrix if access was overly broad
- Add detection rules for the attack vector
- Schedule review with stakeholders
💡 Common Scenarios¶
Scenario A: Exposed Storage Account Key¶
- Rotate storage account keys immediately
- Update all Key Vault references
- Audit access logs for the exposure window
- Check for data exfiltration in firewall logs
Scenario B: Databricks Token Leaked¶
- Revoke the token via Databricks admin console
- Audit Unity Catalog access logs
- Check for unauthorized data access
- Re-issue token with tighter scope
Scenario C: Azure Policy Non-Compliance¶
- Run compliance scan:
Get-AzPolicyState -SubscriptionId <id> - Identify non-compliant resources
- Remediate or create exemptions with justification
- Update policy assignments if false positive
Scenario D: Cosmos DB Unauthorized Access¶
- Check Cosmos DB diagnostic logs for unusual query patterns
- Rotate Cosmos DB primary and secondary keys
- Update all Key Vault secrets referencing Cosmos DB keys
- Review firewall rules — restrict to VNet-only access
- If data was read: assess PII exposure and activate data breach protocol
Scenario E: ADF Pipeline Tampering¶
- Check ADF activity runs for unauthorized modifications
- Compare current pipeline definitions to Git (source of truth)
- Redeploy pipelines from Git:
./scripts/deploy/deploy-adf.sh - Review ADF managed identity permissions
- Check for unauthorized linked services or datasets
Scenario F: Key Vault Secret Expiry or Compromise¶
- List expired or expiring secrets
- Rotate affected secrets using the secret rotation function
- Verify all dependent services restart with new secrets
- Check audit logs for unauthorized secret reads
📋 Evidence Preservation Checklist¶
Important
Before any remediation, preserve evidence:
- Screenshot or export of the security alert
- Export relevant Log Analytics queries to CSV
- Take ADLS storage account snapshots (if data breach suspected)
- Export AAD sign-in logs for the affected identities
- Save NSG flow logs for the relevant time window
- Document the timeline of events in the incident ticket
📝 Communication Templates¶
Internal notification (P1/P2)¶
Subject: [P1/P2] Security Incident — CSA Data Platform
Summary: [Brief description of the incident] Detected: [Timestamp UTC] Impact: [What data/services are affected] Status: [Investigating / Contained / Remediated] Next update: [Time]
Actions taken:
- [Action 1]
- [Action 2]
Stakeholder update¶
Subject: Security Incident Update #[N]
Current status: [Contained / Under investigation] Root cause: [Known / Under investigation] Data impact: [No PII exposed / Assessing / Confirmed exposure] Remediation ETA: [Time]
📎 Contact Information¶
Warning
Action Required: Update these contacts with your organization's actual personnel before using this runbook in production. File a PR against this table whenever roles change.
| Role | Contact | Phone | Escalation |
|---|---|---|---|
| Platform Team Lead | (set via your org's on-call roster) | (see PagerDuty / OpsGenie) | First responder |
| Security Officer | (set via your org's security team DL) | (see PagerDuty / OpsGenie) | P1/P2 escalation |
| Data Protection Officer | (set via your org's DPO) | (office hours) | PII breach only |
| Legal Counsel | (set via your org's legal team) | (office hours) | P1 with data exposure |
| Azure Support | Case via Portal | N/A | Platform issues |
🗓️ Drill Log (CSA-0085)¶
Runbook currency is measured by drill cadence. Add one row per tabletop or live drill. Blocks should run quarterly at a minimum (Jan / Apr / Jul / Oct). File a PR updating this table and the Last Drilled: banner at the top of the document after every exercise.
| Quarter | Date | Type (tabletop / live) | Scenario exercised | Lead | Gaps identified | Fixes tracked |
|---|---|---|---|---|---|---|
| Q1 — Jan | TBD | TBD | TBD | TBD | TBD | TBD |
| Q2 — Apr | TBD | TBD | TBD | TBD | TBD | TBD |
| Q3 — Jul | TBD | TBD | TBD | TBD | TBD | TBD |
| Q4 — Oct | TBD | TBD | TBD | TBD | TBD | TBD |
Tip
Archive historical drill log tables under a collapsed <details> block once a calendar year completes; keep the current year's rows visible.
🔗 Related Documentation¶
- Troubleshooting — Common issues and fixes
- Log Schema — Structured logging schema reference
- Gov Service Matrix — Azure Government service availability