Source:
examples/cybersecurity/README.md— this page is rendered live from that file.
Cybersecurity Threat Detection & MITRE ATT&CK Analytics¶
Examples > Cybersecurity
[!TIP] TL;DR — Azure Sentinel-based threat detection and MITRE ATT&CK correlation platform for federal agencies. Ingests security alerts, network flows, and vulnerability data through a medallion architecture to produce actionable threat intelligence, compliance posture reporting, and automated threat hunting dashboards.
📋 Table of Contents¶
- Overview
- Key Features
- Data Sources
- Architecture Overview
- Business Drivers
- Data Mesh Integration
- Prerequisites
- Quick Start
- Data Pipeline
- Bronze Layer
- Silver Layer
- Gold Layer
- KQL Query Examples
- Notebooks
- Data Contract
- Deployment
- Related Resources
- Contributing
- License
📋 Overview¶
This vertical provides a production-ready cybersecurity analytics platform built on Azure Cloud Scale Analytics (CSA). It demonstrates how federal agencies can operationalize Azure Sentinel alert data, correlate events with the MITRE ATT&CK framework, and produce continuous monitoring dashboards that satisfy CISA Binding Operational Directives, CMMC, and FedRAMP requirements.
✨ Key Features¶
- Real-Time Threat Detection: Sentinel analytics rules for brute force, lateral movement, exfiltration, and ransomware indicators
- MITRE ATT&CK Mapping: Every alert is normalized and correlated to ATT&CK tactics and techniques
- Compliance Posture Reporting: Automated CMMC and NIST 800-53 control gap analysis
- Threat Hunting Notebooks: Interactive Databricks notebooks for anomaly detection and ML-based alert scoring
- CISA KEV Integration: Known Exploited Vulnerabilities catalog cross-referenced with environment telemetry
- Zero Trust Analytics: Continuous verification metrics across identity, device, and network pillars
🗄️ Data Sources¶
| Source | Description | Ingestion |
|---|---|---|
| Azure Sentinel Alerts | Security alerts from all connected providers | Near real-time via Log Analytics |
| Windows Security Events | Event IDs 4624-4634, 4688, 4720, 7045 | Data Collection Rules |
| NSG Flow Logs | Network traffic metadata from Azure NSGs | Storage Account → ADLS |
| Azure Activity Log | Control plane operations and audit trail | Diagnostic Settings |
| Microsoft Defender for Cloud | Cloud security posture and recommendations | Continuous Export |
| CISA KEV Catalog | Known exploited vulnerabilities | Scheduled API pull |
🏗️ Architecture Overview¶
flowchart TB
subgraph Sources["Data Sources"]
S1[Azure Sentinel]
S2[Windows Event Logs]
S3[NSG Flow Logs]
S4[Azure Activity Log]
S5[Defender for Cloud]
S6[CISA KEV Catalog]
end
subgraph Bronze["Bronze Layer — Raw Ingestion"]
B1[Raw Sentinel Alerts]
B2[Raw Event Logs]
B3[Raw Network Flows]
B4[Raw Activity Logs]
end
subgraph Silver["Silver Layer — Normalized & Enriched"]
SV1[Fact: Security Alerts]
SV2[Dim: MITRE Techniques]
SV3[Dim: Assets & Entities]
SV4[Fact: Network Sessions]
end
subgraph Gold["Gold Layer — Threat Intelligence"]
G1[Threat Landscape Report]
G2[Compliance Posture Report]
G3[Incident Metrics Dashboard]
G4[Threat Hunting Findings]
end
subgraph Consumers["Consumers"]
C1[KQL Dashboards]
C2[Power BI]
C3[SOC Workbooks]
C4[Executive Briefings]
end
S1 --> B1
S2 --> B2
S3 --> B3
S4 --> B4
S5 --> B1
S6 --> SV2
B1 --> SV1
B2 --> SV1
B3 --> SV4
B4 --> SV1
SV1 --> G1
SV1 --> G2
SV2 --> G1
SV2 --> G2
SV3 --> G1
SV4 --> G1
SV1 --> G3
SV1 --> G4
G1 --> C1
G1 --> C2
G2 --> C3
G3 --> C2
G4 --> C1
G2 --> C4
G3 --> C4 🎯 Business Drivers¶
Regulatory Compliance¶
| Requirement | Description | How This Vertical Addresses It |
|---|---|---|
| CISA BOD 22-01 | Reduce risk from known exploited vulnerabilities | CISA KEV integration with automated remediation tracking |
| CISA BOD 23-01 | Asset visibility and vulnerability detection | Asset inventory via entity extraction, continuous scanning metrics |
| CMMC Level 2+ | Cybersecurity Maturity Model Certification | Control mapping to NIST 800-171, gap analysis reporting |
| FedRAMP | Continuous monitoring for cloud authorization | Automated control evidence collection, monthly POA&M generation |
| EO 14028 | Zero Trust Architecture adoption | Identity, device, and network trust scoring |
| FISMA | Federal Information Security Management Act | Annual assessment automation, continuous diagnostics |
Operational Value¶
- Mean Time to Detect (MTTD): Reduce from hours to minutes with automated correlation
- Mean Time to Respond (MTTR): Prioritize alerts using ML-based scoring
- Alert Fatigue Reduction: Correlate and deduplicate alerts, reducing noise by 60-80%
- Proactive Threat Hunting: Shift from reactive to proactive with behavioral analytics
🔗 Data Mesh Integration¶
Security Domain Ownership¶
The cybersecurity vertical operates as a Security Domain within the CSA Data Mesh:
| Aspect | Details |
|---|---|
| Domain Owner | Chief Information Security Officer (CISO) |
| Domain Team | SOC Analysts, Security Engineers, Threat Hunters |
| Self-Service Platform | Sentinel workspace + Databricks notebooks |
| Governance | NIST 800-53, CMMC, agency-specific policies |
Data Products¶
| Data Product | Description | Consumers |
|---|---|---|
| Threat Landscape | Daily/weekly threat summary with tactic trends and technique frequency | CISO, SOC Lead, Risk Management |
| Compliance Posture | Control implementation status mapped to CMMC/NIST frameworks | Authorizing Officials, ISSO, Auditors |
| Incident Metrics | MTTD, MTTR, alert volume, resolution rates, SLA compliance | SOC Manager, CIO Dashboard |
| Vulnerability Exposure | CISA KEV overlap, patching SLA compliance, risk scoring | Vulnerability Management Team |
📦 Prerequisites¶
Azure Resources¶
- Azure Subscription with Microsoft Sentinel enabled
- Log Analytics Workspace (Sentinel-connected)
- Azure Data Lake Storage Gen2 (for medallion layers)
- Azure Databricks Workspace
- Microsoft Defender for Cloud (Standard tier recommended)
Tools Required¶
- Azure CLI ≥ 2.50
- Bicep CLI ≥ 0.20
- dbt-core ≥ 1.7 with dbt-databricks adapter
- Python ≥ 3.10
- Databricks CLI
Permissions¶
Microsoft Sentinel Contributoron the resource groupLog Analytics Contributorfor workspace configurationStorage Blob Data Contributoron ADLS Gen2
🚀 Quick Start¶
1. Clone and Navigate¶
2. Deploy Sentinel Workspace¶
az deployment group create \
--resource-group rg-cybersecurity-dev \
--template-file deploy/sentinel-workspace.bicep \
--parameters namePrefix=csa environment=dev location=usgovvirginia
3. Deploy Analytics Rules¶
az deployment group create \
--resource-group rg-cybersecurity-dev \
--template-file deploy/analytics-rules.bicep \
--parameters workspaceName=csa-law-dev
4. Load Sample Data¶
# Upload sample data to ADLS Bronze container
az storage blob upload-batch \
--destination bronze/sentinel-alerts \
--source data/ \
--account-name csadatalakedev
5. Run dbt Models¶
6. Open Notebooks¶
Import the notebooks/ folder into your Databricks workspace and run sequentially.
🔄 Data Pipeline¶
Bronze Layer — Raw Ingestion¶
Raw data lands in ADLS Gen2 with no transformations. Each source maintains its original schema.
| Table | Source | Format | Refresh |
|---|---|---|---|
raw_sentinel_alerts | Sentinel API / Diagnostic Export | JSON | Near real-time |
raw_windows_events | Data Collection Rules | JSON | Near real-time |
raw_nsg_flows | NSG Flow Logs v2 | JSON | 5-minute batches |
raw_activity_logs | Azure Monitor Diagnostic Settings | JSON | Near real-time |
Silver Layer — Normalized & Enriched¶
Cleansed, deduplicated, and enriched with MITRE ATT&CK context.
| Table | Description | Key Joins |
|---|---|---|
fct_security_alerts | Normalized alerts with severity scores, entity extraction | dim_mitre_techniques |
dim_mitre_techniques | MITRE ATT&CK technique reference with detection guidance | — |
dim_assets | Host, user, and IP entity dimension | Entity extraction from alerts |
fct_network_sessions | Sessionized network flows with anomaly flags | dim_assets |
Gold Layer — Threat Intelligence¶
Aggregated, business-ready views for dashboards and reporting.
| Table | Description | Refresh |
|---|---|---|
rpt_threat_landscape | Tactic/technique trends, severity distribution, 30-day rolling | Hourly |
rpt_compliance_posture | CMMC/NIST control status, gap analysis, remediation priority | Daily |
rpt_incident_metrics | MTTD, MTTR, volume trends, SLA tracking | Hourly |
🔍 KQL Query Examples¶
1. High-Severity Alerts in the Last 24 Hours¶
SecurityAlert
| where TimeGenerated > ago(24h)
| where AlertSeverity == "High" or AlertSeverity == "Critical"
| summarize AlertCount = count() by AlertName, ProviderName
| order by AlertCount desc
| take 20
2. MITRE ATT&CK Tactic Heatmap¶
SecurityAlert
| where TimeGenerated > ago(30d)
| mv-expand Tactics
| summarize Count = count() by tostring(Tactics), bin(TimeGenerated, 1d)
| render timechart
3. Brute Force Detection — Failed Sign-Ins¶
SigninLogs
| where TimeGenerated > ago(1h)
| where ResultType != "0"
| summarize FailedAttempts = count() by UserPrincipalName, IPAddress, bin(TimeGenerated, 5m)
| where FailedAttempts > 10
| project UserPrincipalName, IPAddress, FailedAttempts, TimeGenerated
| order by FailedAttempts desc
4. Lateral Movement — Unusual RDP Connections¶
SecurityEvent
| where TimeGenerated > ago(7d)
| where EventID == 4624 and LogonType == 10
| summarize RDPSources = dcount(IpAddress) by TargetAccount, Computer
| where RDPSources > 3
| project TargetAccount, Computer, RDPSources
| order by RDPSources desc
5. Data Exfiltration Indicator — Large Outbound Transfers¶
AzureNetworkAnalytics_CL
| where TimeGenerated > ago(24h)
| where FlowDirection_s == "O" and FlowStatus_s == "A"
| summarize TotalBytesSent = sum(toint(BytesSent_d)) by SrcIP_s, DestIP_s
| where TotalBytesSent > 500000000
| project SrcIP_s, DestIP_s, TotalGB = round(TotalBytesSent / 1073741824.0, 2)
| order by TotalGB desc
6. DNS Tunneling Suspicion — High Query Volume¶
DnsEvents
| where TimeGenerated > ago(24h)
| summarize QueryCount = count(), UniqueSubdomains = dcount(Name) by ClientIP
| where QueryCount > 5000 and UniqueSubdomains > 500
| project ClientIP, QueryCount, UniqueSubdomains
| order by QueryCount desc
📓 Notebooks¶
| Notebook | Description |
|---|---|
01-alert-exploration.py | Load Bronze alerts, analyze distribution by severity/tactic/source, timeline visualization, entity network graph |
02-threat-detection-ml.py | Feature engineering on Silver data, Isolation Forest anomaly detection, alert scoring and prioritization |
03-kql-threat-hunting.py | KQL query examples for Log Analytics, threat hunting for unusual processes, lateral movement, C2 patterns |
📄 Data Contract¶
The contracts/sentinel-alerts.yaml data contract defines the schema, quality thresholds, and SLAs for the core Sentinel alerts data product. See contracts/ for details.
Key guarantees: - Freshness: Alerts available in Silver within 15 minutes of generation - Completeness: ≥ 99.5% of alerts ingested without data loss - Accuracy: MITRE mapping validated against ATT&CK v14+ - Availability: 99.9% uptime for Gold layer query access
🚢 Deployment¶
Infrastructure as Code¶
| File | Purpose |
|---|---|
deploy/sentinel-workspace.bicep | Log Analytics Workspace + Sentinel solution, data connectors, collection rules |
deploy/analytics-rules.bicep | 5 Sentinel scheduled analytics rules with MITRE mapping |
Environment Matrix¶
| Environment | Resource Group | Location | Retention |
|---|---|---|---|
| Development | rg-cybersecurity-dev | usgovvirginia | 30 days |
| Staging | rg-cybersecurity-stg | usgovvirginia | 90 days |
| Production | rg-cybersecurity-prd | usgovvirginia | 365 days |
📚 Related Resources¶
- MITRE ATT&CK Framework — Adversary tactics, techniques, and procedures
- CISA Known Exploited Vulnerabilities — KEV catalog
- Azure Sentinel Documentation — Microsoft Sentinel docs
- NIST 800-53 Rev 5 — Security and privacy controls
- CMMC Model — Cybersecurity Maturity Model Certification
- CISA Binding Operational Directives — Federal cybersecurity directives
- Zero Trust Architecture — NIST SP 800-207 — Zero Trust reference
🤝 Contributing¶
- Fork the repository
- Create a feature branch:
git checkout -b feature/cybersecurity-enhancement - Follow existing coding conventions and data contract patterns
- Submit a pull request with test evidence
📜 License¶
This project is part of CSA-in-a-Box and follows the repository-level license.
Directory Structure¶
cybersecurity/
├── contracts/ # Data product contracts (schemas, SLOs, owners)
│ └── sentinel-alerts.yaml
├── data/ # Sample data + synthetic generators
│ ├── cisa-kev-sample.json
│ ├── mitre-attack-mapping.json
│ └── sample-sentinel-alerts.json
├── deploy/ # Deployment parameters / Bicep templates
│ ├── analytics-rules.bicep
│ └── sentinel-workspace.bicep
├── domains/ # dbt models (bronze / silver / gold) and seeds
│ ├── bronze/
│ ├── gold/
│ └── silver/
├── notebooks/ # Synapse / Fabric / Databricks notebooks
│ ├── 01-alert-exploration.py
│ ├── 02-threat-detection-ml.py
│ └── 03-kql-threat-hunting.py
└── README.md # This file
Expected Results¶
After running the medallion pipeline against the bundled seed data, the Gold layer should populate the following tables. Row counts vary with the seed-data generator parameters; the figures below are the approximate scale you should see on a default run.
| Gold Table | Approximate Rows | Notes |
|---|---|---|
rpt_compliance_posture | TODO: capture after first run | Populated from Silver via dbt --select tag:gold |
rpt_threat_landscape | TODO: capture after first run | Populated from Silver via dbt --select tag:gold |
TODO: capture exact counts after the next end-to-end seed run. These are bounded by the seed-data generator parameters in
data/generators/.