Federal Cybersecurity & Threat Analytics

Federal Cybersecurity & Threat Analytics on Azure¶

Federal Security Operations Centers generate millions of events daily across endpoints, network perimeters, identity systems, and cloud workloads. Individually, each telemetry source captures a narrow slice of attacker behavior. Combined in a unified analytical platform, they enable MITRE ATT&CK technique correlation, mean-time-to-detect/respond (MTTD/MTTR) tracking, anomaly-based threat hunting, and continuous compliance posture assessment against CMMC, NIST 800-53, and FedRAMP controls.

This use case applies the CSA-in-a-Box medallion architecture to federal cybersecurity telemetry, demonstrating how near-real-time alert ingestion and batch log processing converge in a single lakehouse to support both operational SOC workflows and executive compliance reporting.

Data Sources¶

Source	Description	Volume / Coverage	Update Frequency	Access Method
Azure Sentinel Alerts	Security alerts from analytics rules, ML models, and fusion detection	Varies by environment, typically 1K–100K alerts/day	Near real-time	Log Analytics API / Event Hub export
Windows Security Events	Logon events (4624/4625), process creation (4688), privilege use (4672), service installs (7045)	10–500 GB/day depending on endpoint count	Real-time via AMA	Data Collection Rules → Log Analytics
NSG Flow Logs	Network Security Group flow records — source/dest IP, port, protocol, bytes, allow/deny	1–50 GB/day per subscription	1-minute aggregation	ADLS Gen2 (JSON)
Azure Activity Log	Control-plane operations — resource creation, RBAC changes, policy assignments	Low volume, high signal	Near real-time	Diagnostic Settings → Event Hub
Microsoft Defender for Cloud	Security recommendations, secure score, vulnerability assessments, regulatory compliance	Per-subscription	Continuous	REST API / Log Analytics
CISA KEV Catalog	Known Exploited Vulnerabilities — CVEs with mandated remediation deadlines	~1,100 entries, growing	Updated as needed	JSON download / REST API

Data Collection

Windows Security Events use the Azure Monitor Agent (AMA) with Data Collection Rules for selective event forwarding. NSG Flow Logs v2 write directly to ADLS Gen2. Sentinel alerts can be exported to Event Hub for real-time downstream processing. The CISA KEV catalog is publicly available at cisa.gov/known-exploited-vulnerabilities-catalog.

Architecture¶

The architecture separates two ingestion paths: a hot path for real-time alert triage through Event Hub and Azure Data Explorer, and a batch path for historical log normalization through ADLS and dbt. Both paths converge in Gold-layer analytical models consumed by Power BI dashboards, Sentinel workbooks, and ADX near-real-time queries.

graph LR
    subgraph Security Sources
        A[Azure Sentinel<br/>Alerts & Incidents]
        B[Windows Security<br/>Events via AMA]
        C[NSG Flow Logs v2]
        D[Azure Activity Log]
        E[Defender for Cloud<br/>Recommendations]
        F[CISA KEV Catalog]
    end

    subgraph Ingestion
        G[Event Hub<br/>Real-time alerts]
        H[Diagnostic Settings<br/>+ Data Collection Rules]
    end

    subgraph "Bronze — Raw"
        I[ADLS Gen2<br/>Raw alerts JSON]
        J[ADLS Gen2<br/>Raw events Parquet]
        K[ADLS Gen2<br/>Raw flow logs]
    end

    subgraph "Silver — Normalized"
        L[fct_security_alerts<br/>Enriched alerts]
        M[dim_mitre_techniques<br/>ATT&CK reference]
        N[fct_network_flows<br/>Normalized flows]
    end

    subgraph "Gold — Analytics"
        O[rpt_threat_landscape<br/>ATT&CK heatmaps]
        P[rpt_compliance_posture<br/>NIST/CMMC scorecards]
        Q[rpt_mttd_mttr<br/>SOC metrics]
    end

    subgraph Consumers
        R[Power BI<br/>Executive dashboards]
        S[Azure Data Explorer<br/>Threat hunting]
        T[Sentinel Workbooks<br/>SOC operations]
    end

    A --> G
    B --> H
    C --> K
    D --> G
    E --> H
    F --> I

    G --> I
    H --> J

    I --> L
    J --> L
    K --> N
    I --> M

    L --> O
    L --> P
    L --> Q
    M --> O
    N --> O

    O --> R
    O --> S
    P --> R
    Q --> R
    Q --> T
    O --> T

Step-by-Step Implementation¶

1. Deploy Sentinel Workspace¶

The Bicep template provisions a Log Analytics workspace, enables the Microsoft Sentinel solution, configures data connectors for Azure Activity, Microsoft Entra ID, and Microsoft 365 Defender, and sets up a Data Collection Rule for selective Windows Security Event forwarding.

// sentinel-workspace.bicep — key resource definitions

@description('Log Analytics data retention in days')
@minValue(30)
@maxValue(730)
param retentionDays int = 90

resource workspace 'Microsoft.OperationalInsights/workspaces@2022-10-01' = {
  name: '${namePrefix}-law-${environment}'
  location: location
  properties: {
    sku: { name: 'PerGB2018' }
    retentionInDays: retentionDays
    features: {
      enableLogAccessUsingOnlyResourcePermissions: true
    }
    workspaceCapping: {
      dailyQuotaGb: environment == 'prd' ? -1 : 5
    }
  }
}

resource sentinel 'Microsoft.OperationsManagement/solutions@2015-11-01-preview' = {
  name: 'SecurityInsights(${workspaceName})'
  location: location
  plan: {
    name: 'SecurityInsights(${workspaceName})'
    publisher: 'Microsoft'
    product: 'OMSGallery/SecurityInsights'
  }
  properties: {
    workspaceResourceId: workspace.id
  }
}

// Data Collection Rule — selective Windows Security Events
resource dataCollectionRule 'Microsoft.Insights/dataCollectionRules@2022-06-01' = {
  name: '${namePrefix}-dcr-winsec-${environment}'
  location: location
  properties: {
    dataSources: {
      windowsEventLogs: [{
        name: 'securityEvents'
        streams: ['Microsoft-SecurityEvent']
        xPathQueries: [
          'Security!*[System[(EventID=4624 or EventID=4625 or EventID=4634 or EventID=4648 or EventID=4672 or EventID=4688 or EventID=4720 or EventID=4726 or EventID=7045)]]'
        ]
      }]
    }
    destinations: {
      logAnalytics: [{
        name: 'sentinelWorkspace'
        workspaceResourceId: workspace.id
      }]
    }
    dataFlows: [{
      streams: ['Microsoft-SecurityEvent']
      destinations: ['sentinelWorkspace']
    }]
  }
}

Deploy with:

az deployment group create \
  --resource-group rg-cyber-dev \
  --template-file examples/cybersecurity/deploy/sentinel-workspace.bicep \
  --parameters namePrefix=csa environment=dev retentionDays=90

Full Templates

Complete Bicep templates are in examples/cybersecurity/deploy/. The workspace template also provisions a managed identity and diagnostic settings for workspace audit logging.

2. Configure Sentinel Analytics Rules¶

Analytics rules define the detection logic that generates alerts. The project includes five pre-built rules covering common attack patterns mapped to MITRE ATT&CK tactics.

// analytics-rules.bicep — Brute Force Detection example

resource bruteForceRule 'Microsoft.SecurityInsights/alertRules@2023-02-01-preview' = {
  scope: workspace
  name: guid(workspace.id, 'brute-force-detection')
  kind: 'Scheduled'
  properties: {
    displayName: 'Brute Force Attack - Multiple Failed Sign-Ins'
    severity: 'Medium'
    query: '''
      SigninLogs
      | where ResultType != "0"
      | summarize FailedAttempts = count(),
          TargetAccounts = dcount(UserPrincipalName),
          Accounts = make_set(UserPrincipalName, 10)
        by IPAddress, bin(TimeGenerated, 5m)
      | where FailedAttempts > 10
    '''
    queryFrequency: 'PT5M'
    queryPeriod: 'PT10M'
    tactics: ['CredentialAccess']
    techniques: ['T1110']
  }
}

Rule	Tactic	Technique	Severity
Brute Force — Failed Sign-Ins	Credential Access	T1110	Medium
Suspicious PowerShell Execution	Execution, Defense Evasion	T1059.001	High
Lateral Movement — RDP from Unusual Source	Lateral Movement	T1021.001	High
Data Exfiltration — Large Outbound Transfer	Exfiltration	T1048	High
Communication with Known Malicious IP	Command and Control	T1071	High

3. MITRE ATT&CK Correlation (dbt Silver)¶

The dim_mitre_techniques model normalizes the ATT&CK framework into a queryable dimension, enabling enrichment of every alert with tactic context, severity weighting, and parent-technique rollups.

-- domains/silver/dim_mitre_techniques.sql
WITH raw_techniques AS (
    SELECT
        id              AS technique_id,
        name            AS technique_name,
        tactic          AS tactic_name,
        severity_weight AS severity_weight,
        data_sources    AS data_sources
    FROM {{ source('mitre_reference', 'mitre_attack_mapping') }}
),

enriched AS (
    SELECT
        technique_id,
        technique_name,
        tactic_name,
        CASE
            WHEN CONTAINS(technique_id, '.')
            THEN SUBSTRING(technique_id, 1, INSTR(technique_id, '.') - 1)
            ELSE technique_id
        END AS parent_technique_id,
        CONTAINS(technique_id, '.') AS is_sub_technique,
        severity_weight,
        CASE
            WHEN severity_weight >= 0.9 THEN 'Critical'
            WHEN severity_weight >= 0.7 THEN 'High'
            WHEN severity_weight >= 0.5 THEN 'Medium'
            ELSE 'Low'
        END AS severity_tier,
        CURRENT_TIMESTAMP() AS updated_at
    FROM raw_techniques
)

SELECT * FROM enriched

The fct_security_alerts model joins raw alerts to MITRE dimensions, producing enriched rows with composite risk scores used by all Gold-layer reports.

4. Anomaly Detection with Isolation Forest¶

The ML pipeline extracts behavioral features from Silver-layer alerts and applies an Isolation Forest model to flag structurally unusual activity. This surfaces alerts that rule-based detection might miss.

# Feature engineering from security alert patterns
feature_cols = [
    "hour_of_day",      # Temporal — off-hours activity
    "day_of_week",      # Temporal — weekend activity
    "is_business_hours", # Binary flag
    "severity_level",    # Numeric severity (1-4)
    "provider_frequency", # How common is this alert source
    "technique_frequency", # How common is this MITRE technique
]

X = pdf[feature_cols].fillna(0).values
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Isolation Forest — ~15% expected anomaly rate
iso_forest = IsolationForest(
    n_estimators=100,
    contamination=0.15,
    random_state=42,
    n_jobs=-1,
)
pdf["anomaly_label"] = iso_forest.fit_predict(X_scaled)
pdf["anomaly_score"] = iso_forest.decision_function(X_scaled)

# Composite priority score
pdf["priority_score"] = (
    0.40 * (pdf["severity_level"] / 4.0) +
    0.35 * pdf["anomaly_normalized"] +
    0.25 * (pdf["technique_frequency"] / pdf["technique_frequency"].max()).fillna(0)
)

Model Tuning

The contamination=0.15 parameter should be calibrated to your environment's baseline alert volume. High-alert environments may need lower contamination rates to avoid alert fatigue. Evaluate precision/recall trade-offs on labeled historical data before production deployment.

5. Compliance Posture Scoring (dbt Gold)¶

The rpt_compliance_posture model maps MITRE ATT&CK tactics to NIST 800-53 control families, producing a gap analysis with remediation priority scoring. This supports continuous monitoring requirements under CMMC, FedRAMP, and FISMA.

-- domains/gold/rpt_compliance_posture.sql (abbreviated)
WITH control_mapping AS (
    SELECT * FROM (VALUES
        ('Initial Access',       'AC', 'Access Control',        'AC-17', 'Remote Access'),
        ('Execution',            'SI', 'System Integrity',      'SI-3',  'Malicious Code Protection'),
        ('Credential Access',    'IA', 'Identification & Auth', 'IA-5',  'Authenticator Management'),
        ('Lateral Movement',     'AC', 'Access Control',        'AC-4',  'Information Flow Enforcement'),
        ('Exfiltration',         'SC', 'System & Comms',        'SC-7',  'Boundary Protection'),
        ('Command and Control',  'SC', 'System & Comms',        'SC-7',  'Boundary Protection')
    ) AS t(tactic_name, control_family_id, control_family_name, control_id, control_name)
),

posture AS (
    SELECT
        cm.control_id,
        cm.control_name,
        cm.tactic_name AS associated_tactic,
        COALESCE(am.alert_count, 0) AS alert_count_30d,
        CASE
            WHEN am.max_severity >= 3 THEN 'At Risk'
            WHEN am.alert_count > 5   THEN 'Needs Review'
            ELSE 'Monitored'
        END AS compliance_status,
        ROUND(
            COALESCE(am.avg_risk_score, 0) * LOG2(COALESCE(am.alert_count, 0) + 1), 2
        ) AS remediation_priority
    FROM control_mapping cm
    LEFT JOIN alert_metrics am ON cm.tactic_name = am.tactic_name
)

SELECT * FROM posture ORDER BY remediation_priority DESC

Framework	Controls Mapped	Assessment Frequency
NIST 800-53 Rev 5	AC, CM, IA, SI, SC, CP, MP	Continuous (30-day rolling)
CMMC Level 2	Maps via NIST 800-53 crosswalk	Continuous
FedRAMP High	Inherits NIST 800-53 High baseline	Continuous

6. KQL Threat Hunting¶

Pre-built KQL queries execute against the Sentinel workspace for proactive threat hunting. These can run interactively in Sentinel or programmatically via the Azure Monitor Query API.

// Hunt: C2 Beaconing Detection
// Identifies periodic outbound connections indicating command-and-control
AzureNetworkAnalytics_CL
| where TimeGenerated > ago(24h)
| where FlowDirection_s == "O" and FlowStatus_s == "A"
| where not(ipv4_is_private(DestIP_s))
| summarize
    ConnectionCount = count(),
    AvgInterval = avg(datetime_diff('second', TimeGenerated,
        prev(TimeGenerated, 1)))
  by SrcIP_s, DestIP_s, DestPort_d, bin(TimeGenerated, 1h)
| where ConnectionCount > 20
| extend BeaconScore = iff(AvgInterval between (50 .. 70), "High", "Low")
| where BeaconScore == "High"

// Hunt: Pass-the-Hash Indicators
SecurityEvent
| where TimeGenerated > ago(7d)
| where EventID == 4624
| where LogonType == 9
    or (LogonType == 3 and AuthenticationPackageName == "NTLM")
| where AccountType == "User"
| summarize
    LogonCount = count(),
    UniqueTargets = dcount(Computer),
    Targets = make_set(Computer, 10)
  by TargetAccount, IpAddress
| where UniqueTargets > 3
| order by UniqueTargets desc

Additional hunts included in the project: unusual process execution from non-standard paths, privilege escalation via admin group membership changes, and suspicious PowerShell with obfuscation indicators.

7. CISA BOD Automation¶

The pipeline ingests the CISA Known Exploited Vulnerabilities (KEV) catalog and correlates entries against Defender for Cloud vulnerability assessments to identify assets with mandated remediation deadlines under Binding Operational Directive 22-01.

import requests
import pandas as pd

# Fetch current CISA KEV catalog
kev_url = "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
response = requests.get(kev_url)
kev_data = response.json()

df_kev = pd.json_normalize(kev_data["vulnerabilities"])
df_kev["dueDate"] = pd.to_datetime(df_kev["dueDate"])

# Flag overdue vulnerabilities
df_kev["is_overdue"] = df_kev["dueDate"] < pd.Timestamp.now()
overdue_count = df_kev["is_overdue"].sum()
print(f"CISA KEV: {len(df_kev)} total, {overdue_count} past remediation deadline")

# Join against Defender for Cloud findings to identify affected assets
# df_findings = spark.table("cybersecurity_silver.fct_vulnerability_findings")
# df_matched = df_findings.join(df_kev_spark, on="cveId", how="inner")

Zero Trust Analytics Integration¶

The analytics pipeline supports Zero Trust Architecture principles by providing continuous verification signals across identity, device, network, and workload pillars.

Zero Trust Pillar	Data Source	Analytics Output
Identity	Microsoft Entra ID Sign-In Logs, Security Events (4624/4625)	Impossible travel detection, brute force alerts, MFA gap analysis
Device	Defender for Endpoint, Security Events (7045)	Endpoint compliance scoring, unauthorized software detection
Network	NSG Flow Logs, DNS logs	Lateral movement detection, C2 beaconing, data exfiltration tracking
Workload	Azure Activity Log, Defender for Cloud	Resource misconfiguration alerts, privilege escalation detection
Data	DLP alerts, Azure Information Protection	Sensitive data access anomalies, unauthorized sharing patterns

Conditional Access Integration

Anomaly scores from the Isolation Forest model can feed Microsoft Entra ID Conditional Access policies via custom risk signals, enabling automated session revocation when user behavior deviates from baseline.

MTTD/MTTR Reporting¶

The Gold-layer rpt_mttd_mttr model computes SOC performance metrics from alert and incident lifecycle timestamps.

Metric	Definition	Target (Federal SOC)
MTTD	Time from threat activity to first alert generation	< 15 minutes
MTTR	Time from alert creation to incident closure	< 4 hours (Critical), < 24 hours (High)
Alert-to-Triage	Time from alert creation to analyst assignment	< 10 minutes
False Positive Rate	Percentage of alerts closed as benign	< 30%
Coverage Ratio	MITRE ATT&CK techniques with active detection rules	> 60% of applicable techniques

graph TD
    A[Threat Activity Occurs] -->|MTTD| B[Alert Generated]
    B -->|Alert-to-Triage| C[Analyst Assigned]
    C -->|Investigation| D[Incident Created]
    D -->|MTTR| E[Incident Resolved]
    E --> F[Post-Incident Review]

Azure Government Deployment¶

For FedRAMP High workloads, deploy all resources to Azure Government regions. Key differences from commercial Azure:

Component	Commercial	Azure Government
Sentinel	All regions	USGov Virginia, USGov Arizona
Log Analytics	All regions	USGov Virginia, USGov Arizona, USDoD Central, USDoD East
Event Hub	All regions	USGov Virginia, USGov Arizona
ADLS Gen2	All regions	USGov Virginia, USGov Arizona
Defender for Cloud	All regions	USGov Virginia, USGov Arizona
ARM endpoint	`management.azure.com`	`management.usgovcloudapi.net`

!!! warning "Azure Government Considerations" - Use az cloud set --name AzureUSGovernment before deploying - Sentinel content hub solutions may have delayed availability in government regions - Log Analytics workspace IDs differ between clouds — update WORKSPACE_ID references - Some Defender for Cloud features (e.g., CSPM) may have feature parity gaps — check Azure Government services availability

# Deploy to Azure Government
az cloud set --name AzureUSGovernment
az login

az deployment group create \
  --resource-group rg-cyber-prd \
  --template-file examples/cybersecurity/deploy/sentinel-workspace.bicep \
  --parameters namePrefix=csa environment=prd retentionDays=365

Project Structure¶

examples/cybersecurity/
├── contracts/
│   └── sentinel-alerts.yaml          # Data contract for alert schema
├── data/
│   ├── cisa-kev-sample.json          # Sample CISA KEV catalog
│   ├── mitre-attack-mapping.json     # ATT&CK technique reference
│   ├── sample-network-flows.csv      # Sample NSG flow data
│   └── sample-sentinel-alerts.json   # Sample Sentinel alerts
├── deploy/
│   ├── analytics-rules.bicep         # Sentinel detection rules
│   └── sentinel-workspace.bicep      # Workspace + connectors
├── domains/
│   ├── bronze/
│   │   └── stg_sentinel_alerts.sql   # Raw alert staging
│   ├── silver/
│   │   ├── dim_mitre_techniques.sql  # ATT&CK dimension
│   │   └── fct_security_alerts.sql   # Enriched alert facts
│   └── gold/
│       ├── rpt_compliance_posture.sql # NIST/CMMC gap analysis
│       └── rpt_threat_landscape.sql   # Threat activity summary
└── notebooks/
    ├── 01-alert-exploration.py       # Data profiling
    ├── 02-threat-detection-ml.py     # Isolation Forest anomaly detection
    └── 03-kql-threat-hunting.py      # KQL hunt library

Sources¶

MITRE ATT&CK Framework — Technique and tactic taxonomy
MITRE ATT&CK Enterprise Matrix — Full technique mapping
CISA Known Exploited Vulnerabilities Catalog — Mandated remediation tracking
CISA Binding Operational Directive 22-01 — Federal vulnerability remediation requirements
NIST SP 800-53 Rev 5 — Security and privacy controls
CMMC Model Overview — DoD cybersecurity maturity model
Microsoft Sentinel Documentation — SIEM deployment and configuration
Azure Government Documentation — FedRAMP High deployment guidance
FedRAMP Authorization — Federal cloud security authorization program