Skip to content

Federal Cybersecurity & Threat Analytics

Federal Cybersecurity & Threat Analytics on Azure

Federal Security Operations Centers generate millions of events daily across endpoints, network perimeters, identity systems, and cloud workloads. Individually, each telemetry source captures a narrow slice of attacker behavior. Combined in a unified analytical platform, they enable MITRE ATT&CK technique correlation, mean-time-to-detect/respond (MTTD/MTTR) tracking, anomaly-based threat hunting, and continuous compliance posture assessment against CMMC, NIST 800-53, and FedRAMP controls.

This use case applies the CSA-in-a-Box medallion architecture to federal cybersecurity telemetry, demonstrating how near-real-time alert ingestion and batch log processing converge in a single lakehouse to support both operational SOC workflows and executive compliance reporting.


Data Sources

Source Description Volume / Coverage Update Frequency Access Method
Azure Sentinel Alerts Security alerts from analytics rules, ML models, and fusion detection Varies by environment, typically 1K–100K alerts/day Near real-time Log Analytics API / Event Hub export
Windows Security Events Logon events (4624/4625), process creation (4688), privilege use (4672), service installs (7045) 10–500 GB/day depending on endpoint count Real-time via AMA Data Collection Rules → Log Analytics
NSG Flow Logs Network Security Group flow records — source/dest IP, port, protocol, bytes, allow/deny 1–50 GB/day per subscription 1-minute aggregation ADLS Gen2 (JSON)
Azure Activity Log Control-plane operations — resource creation, RBAC changes, policy assignments Low volume, high signal Near real-time Diagnostic Settings → Event Hub
Microsoft Defender for Cloud Security recommendations, secure score, vulnerability assessments, regulatory compliance Per-subscription Continuous REST API / Log Analytics
CISA KEV Catalog Known Exploited Vulnerabilities — CVEs with mandated remediation deadlines ~1,100 entries, growing Updated as needed JSON download / REST API

Data Collection

Windows Security Events use the Azure Monitor Agent (AMA) with Data Collection Rules for selective event forwarding. NSG Flow Logs v2 write directly to ADLS Gen2. Sentinel alerts can be exported to Event Hub for real-time downstream processing. The CISA KEV catalog is publicly available at cisa.gov/known-exploited-vulnerabilities-catalog.


Architecture

The architecture separates two ingestion paths: a hot path for real-time alert triage through Event Hub and Azure Data Explorer, and a batch path for historical log normalization through ADLS and dbt. Both paths converge in Gold-layer analytical models consumed by Power BI dashboards, Sentinel workbooks, and ADX near-real-time queries.

graph LR
    subgraph Security Sources
        A[Azure Sentinel<br/>Alerts & Incidents]
        B[Windows Security<br/>Events via AMA]
        C[NSG Flow Logs v2]
        D[Azure Activity Log]
        E[Defender for Cloud<br/>Recommendations]
        F[CISA KEV Catalog]
    end

    subgraph Ingestion
        G[Event Hub<br/>Real-time alerts]
        H[Diagnostic Settings<br/>+ Data Collection Rules]
    end

    subgraph "Bronze — Raw"
        I[ADLS Gen2<br/>Raw alerts JSON]
        J[ADLS Gen2<br/>Raw events Parquet]
        K[ADLS Gen2<br/>Raw flow logs]
    end

    subgraph "Silver — Normalized"
        L[fct_security_alerts<br/>Enriched alerts]
        M[dim_mitre_techniques<br/>ATT&CK reference]
        N[fct_network_flows<br/>Normalized flows]
    end

    subgraph "Gold — Analytics"
        O[rpt_threat_landscape<br/>ATT&CK heatmaps]
        P[rpt_compliance_posture<br/>NIST/CMMC scorecards]
        Q[rpt_mttd_mttr<br/>SOC metrics]
    end

    subgraph Consumers
        R[Power BI<br/>Executive dashboards]
        S[Azure Data Explorer<br/>Threat hunting]
        T[Sentinel Workbooks<br/>SOC operations]
    end

    A --> G
    B --> H
    C --> K
    D --> G
    E --> H
    F --> I

    G --> I
    H --> J

    I --> L
    J --> L
    K --> N
    I --> M

    L --> O
    L --> P
    L --> Q
    M --> O
    N --> O

    O --> R
    O --> S
    P --> R
    Q --> R
    Q --> T
    O --> T

Step-by-Step Implementation

1. Deploy Sentinel Workspace

The Bicep template provisions a Log Analytics workspace, enables the Microsoft Sentinel solution, configures data connectors for Azure Activity, Microsoft Entra ID, and Microsoft 365 Defender, and sets up a Data Collection Rule for selective Windows Security Event forwarding.

// sentinel-workspace.bicep — key resource definitions

@description('Log Analytics data retention in days')
@minValue(30)
@maxValue(730)
param retentionDays int = 90

resource workspace 'Microsoft.OperationalInsights/workspaces@2022-10-01' = {
  name: '${namePrefix}-law-${environment}'
  location: location
  properties: {
    sku: { name: 'PerGB2018' }
    retentionInDays: retentionDays
    features: {
      enableLogAccessUsingOnlyResourcePermissions: true
    }
    workspaceCapping: {
      dailyQuotaGb: environment == 'prd' ? -1 : 5
    }
  }
}

resource sentinel 'Microsoft.OperationsManagement/solutions@2015-11-01-preview' = {
  name: 'SecurityInsights(${workspaceName})'
  location: location
  plan: {
    name: 'SecurityInsights(${workspaceName})'
    publisher: 'Microsoft'
    product: 'OMSGallery/SecurityInsights'
  }
  properties: {
    workspaceResourceId: workspace.id
  }
}

// Data Collection Rule — selective Windows Security Events
resource dataCollectionRule 'Microsoft.Insights/dataCollectionRules@2022-06-01' = {
  name: '${namePrefix}-dcr-winsec-${environment}'
  location: location
  properties: {
    dataSources: {
      windowsEventLogs: [{
        name: 'securityEvents'
        streams: ['Microsoft-SecurityEvent']
        xPathQueries: [
          'Security!*[System[(EventID=4624 or EventID=4625 or EventID=4634 or EventID=4648 or EventID=4672 or EventID=4688 or EventID=4720 or EventID=4726 or EventID=7045)]]'
        ]
      }]
    }
    destinations: {
      logAnalytics: [{
        name: 'sentinelWorkspace'
        workspaceResourceId: workspace.id
      }]
    }
    dataFlows: [{
      streams: ['Microsoft-SecurityEvent']
      destinations: ['sentinelWorkspace']
    }]
  }
}

Deploy with:

az deployment group create \
  --resource-group rg-cyber-dev \
  --template-file examples/cybersecurity/deploy/sentinel-workspace.bicep \
  --parameters namePrefix=csa environment=dev retentionDays=90

Full Templates

Complete Bicep templates are in examples/cybersecurity/deploy/. The workspace template also provisions a managed identity and diagnostic settings for workspace audit logging.


2. Configure Sentinel Analytics Rules

Analytics rules define the detection logic that generates alerts. The project includes five pre-built rules covering common attack patterns mapped to MITRE ATT&CK tactics.

// analytics-rules.bicep — Brute Force Detection example

resource bruteForceRule 'Microsoft.SecurityInsights/alertRules@2023-02-01-preview' = {
  scope: workspace
  name: guid(workspace.id, 'brute-force-detection')
  kind: 'Scheduled'
  properties: {
    displayName: 'Brute Force Attack - Multiple Failed Sign-Ins'
    severity: 'Medium'
    query: '''
      SigninLogs
      | where ResultType != "0"
      | summarize FailedAttempts = count(),
          TargetAccounts = dcount(UserPrincipalName),
          Accounts = make_set(UserPrincipalName, 10)
        by IPAddress, bin(TimeGenerated, 5m)
      | where FailedAttempts > 10
    '''
    queryFrequency: 'PT5M'
    queryPeriod: 'PT10M'
    tactics: ['CredentialAccess']
    techniques: ['T1110']
  }
}
Rule Tactic Technique Severity
Brute Force — Failed Sign-Ins Credential Access T1110 Medium
Suspicious PowerShell Execution Execution, Defense Evasion T1059.001 High
Lateral Movement — RDP from Unusual Source Lateral Movement T1021.001 High
Data Exfiltration — Large Outbound Transfer Exfiltration T1048 High
Communication with Known Malicious IP Command and Control T1071 High

3. MITRE ATT&CK Correlation (dbt Silver)

The dim_mitre_techniques model normalizes the ATT&CK framework into a queryable dimension, enabling enrichment of every alert with tactic context, severity weighting, and parent-technique rollups.

-- domains/silver/dim_mitre_techniques.sql
WITH raw_techniques AS (
    SELECT
        id              AS technique_id,
        name            AS technique_name,
        tactic          AS tactic_name,
        severity_weight AS severity_weight,
        data_sources    AS data_sources
    FROM {{ source('mitre_reference', 'mitre_attack_mapping') }}
),

enriched AS (
    SELECT
        technique_id,
        technique_name,
        tactic_name,
        CASE
            WHEN CONTAINS(technique_id, '.')
            THEN SUBSTRING(technique_id, 1, INSTR(technique_id, '.') - 1)
            ELSE technique_id
        END AS parent_technique_id,
        CONTAINS(technique_id, '.') AS is_sub_technique,
        severity_weight,
        CASE
            WHEN severity_weight >= 0.9 THEN 'Critical'
            WHEN severity_weight >= 0.7 THEN 'High'
            WHEN severity_weight >= 0.5 THEN 'Medium'
            ELSE 'Low'
        END AS severity_tier,
        CURRENT_TIMESTAMP() AS updated_at
    FROM raw_techniques
)

SELECT * FROM enriched

The fct_security_alerts model joins raw alerts to MITRE dimensions, producing enriched rows with composite risk scores used by all Gold-layer reports.


4. Anomaly Detection with Isolation Forest

The ML pipeline extracts behavioral features from Silver-layer alerts and applies an Isolation Forest model to flag structurally unusual activity. This surfaces alerts that rule-based detection might miss.

# Feature engineering from security alert patterns
feature_cols = [
    "hour_of_day",      # Temporal — off-hours activity
    "day_of_week",      # Temporal — weekend activity
    "is_business_hours", # Binary flag
    "severity_level",    # Numeric severity (1-4)
    "provider_frequency", # How common is this alert source
    "technique_frequency", # How common is this MITRE technique
]

X = pdf[feature_cols].fillna(0).values
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Isolation Forest — ~15% expected anomaly rate
iso_forest = IsolationForest(
    n_estimators=100,
    contamination=0.15,
    random_state=42,
    n_jobs=-1,
)
pdf["anomaly_label"] = iso_forest.fit_predict(X_scaled)
pdf["anomaly_score"] = iso_forest.decision_function(X_scaled)

# Composite priority score
pdf["priority_score"] = (
    0.40 * (pdf["severity_level"] / 4.0) +
    0.35 * pdf["anomaly_normalized"] +
    0.25 * (pdf["technique_frequency"] / pdf["technique_frequency"].max()).fillna(0)
)

Model Tuning

The contamination=0.15 parameter should be calibrated to your environment's baseline alert volume. High-alert environments may need lower contamination rates to avoid alert fatigue. Evaluate precision/recall trade-offs on labeled historical data before production deployment.


5. Compliance Posture Scoring (dbt Gold)

The rpt_compliance_posture model maps MITRE ATT&CK tactics to NIST 800-53 control families, producing a gap analysis with remediation priority scoring. This supports continuous monitoring requirements under CMMC, FedRAMP, and FISMA.

-- domains/gold/rpt_compliance_posture.sql (abbreviated)
WITH control_mapping AS (
    SELECT * FROM (VALUES
        ('Initial Access',       'AC', 'Access Control',        'AC-17', 'Remote Access'),
        ('Execution',            'SI', 'System Integrity',      'SI-3',  'Malicious Code Protection'),
        ('Credential Access',    'IA', 'Identification & Auth', 'IA-5',  'Authenticator Management'),
        ('Lateral Movement',     'AC', 'Access Control',        'AC-4',  'Information Flow Enforcement'),
        ('Exfiltration',         'SC', 'System & Comms',        'SC-7',  'Boundary Protection'),
        ('Command and Control',  'SC', 'System & Comms',        'SC-7',  'Boundary Protection')
    ) AS t(tactic_name, control_family_id, control_family_name, control_id, control_name)
),

posture AS (
    SELECT
        cm.control_id,
        cm.control_name,
        cm.tactic_name AS associated_tactic,
        COALESCE(am.alert_count, 0) AS alert_count_30d,
        CASE
            WHEN am.max_severity >= 3 THEN 'At Risk'
            WHEN am.alert_count > 5   THEN 'Needs Review'
            ELSE 'Monitored'
        END AS compliance_status,
        ROUND(
            COALESCE(am.avg_risk_score, 0) * LOG2(COALESCE(am.alert_count, 0) + 1), 2
        ) AS remediation_priority
    FROM control_mapping cm
    LEFT JOIN alert_metrics am ON cm.tactic_name = am.tactic_name
)

SELECT * FROM posture ORDER BY remediation_priority DESC
Framework Controls Mapped Assessment Frequency
NIST 800-53 Rev 5 AC, CM, IA, SI, SC, CP, MP Continuous (30-day rolling)
CMMC Level 2 Maps via NIST 800-53 crosswalk Continuous
FedRAMP High Inherits NIST 800-53 High baseline Continuous

6. KQL Threat Hunting

Pre-built KQL queries execute against the Sentinel workspace for proactive threat hunting. These can run interactively in Sentinel or programmatically via the Azure Monitor Query API.

// Hunt: C2 Beaconing Detection
// Identifies periodic outbound connections indicating command-and-control
AzureNetworkAnalytics_CL
| where TimeGenerated > ago(24h)
| where FlowDirection_s == "O" and FlowStatus_s == "A"
| where not(ipv4_is_private(DestIP_s))
| summarize
    ConnectionCount = count(),
    AvgInterval = avg(datetime_diff('second', TimeGenerated,
        prev(TimeGenerated, 1)))
  by SrcIP_s, DestIP_s, DestPort_d, bin(TimeGenerated, 1h)
| where ConnectionCount > 20
| extend BeaconScore = iff(AvgInterval between (50 .. 70), "High", "Low")
| where BeaconScore == "High"
// Hunt: Pass-the-Hash Indicators
SecurityEvent
| where TimeGenerated > ago(7d)
| where EventID == 4624
| where LogonType == 9
    or (LogonType == 3 and AuthenticationPackageName == "NTLM")
| where AccountType == "User"
| summarize
    LogonCount = count(),
    UniqueTargets = dcount(Computer),
    Targets = make_set(Computer, 10)
  by TargetAccount, IpAddress
| where UniqueTargets > 3
| order by UniqueTargets desc

Additional hunts included in the project: unusual process execution from non-standard paths, privilege escalation via admin group membership changes, and suspicious PowerShell with obfuscation indicators.


7. CISA BOD Automation

The pipeline ingests the CISA Known Exploited Vulnerabilities (KEV) catalog and correlates entries against Defender for Cloud vulnerability assessments to identify assets with mandated remediation deadlines under Binding Operational Directive 22-01.

import requests
import pandas as pd

# Fetch current CISA KEV catalog
kev_url = "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
response = requests.get(kev_url)
kev_data = response.json()

df_kev = pd.json_normalize(kev_data["vulnerabilities"])
df_kev["dueDate"] = pd.to_datetime(df_kev["dueDate"])

# Flag overdue vulnerabilities
df_kev["is_overdue"] = df_kev["dueDate"] < pd.Timestamp.now()
overdue_count = df_kev["is_overdue"].sum()
print(f"CISA KEV: {len(df_kev)} total, {overdue_count} past remediation deadline")

# Join against Defender for Cloud findings to identify affected assets
# df_findings = spark.table("cybersecurity_silver.fct_vulnerability_findings")
# df_matched = df_findings.join(df_kev_spark, on="cveId", how="inner")

Zero Trust Analytics Integration

The analytics pipeline supports Zero Trust Architecture principles by providing continuous verification signals across identity, device, network, and workload pillars.

Zero Trust Pillar Data Source Analytics Output
Identity Microsoft Entra ID Sign-In Logs, Security Events (4624/4625) Impossible travel detection, brute force alerts, MFA gap analysis
Device Defender for Endpoint, Security Events (7045) Endpoint compliance scoring, unauthorized software detection
Network NSG Flow Logs, DNS logs Lateral movement detection, C2 beaconing, data exfiltration tracking
Workload Azure Activity Log, Defender for Cloud Resource misconfiguration alerts, privilege escalation detection
Data DLP alerts, Azure Information Protection Sensitive data access anomalies, unauthorized sharing patterns

Conditional Access Integration

Anomaly scores from the Isolation Forest model can feed Microsoft Entra ID Conditional Access policies via custom risk signals, enabling automated session revocation when user behavior deviates from baseline.


MTTD/MTTR Reporting

The Gold-layer rpt_mttd_mttr model computes SOC performance metrics from alert and incident lifecycle timestamps.

Metric Definition Target (Federal SOC)
MTTD Time from threat activity to first alert generation < 15 minutes
MTTR Time from alert creation to incident closure < 4 hours (Critical), < 24 hours (High)
Alert-to-Triage Time from alert creation to analyst assignment < 10 minutes
False Positive Rate Percentage of alerts closed as benign < 30%
Coverage Ratio MITRE ATT&CK techniques with active detection rules > 60% of applicable techniques
graph TD
    A[Threat Activity Occurs] -->|MTTD| B[Alert Generated]
    B -->|Alert-to-Triage| C[Analyst Assigned]
    C -->|Investigation| D[Incident Created]
    D -->|MTTR| E[Incident Resolved]
    E --> F[Post-Incident Review]

Azure Government Deployment

For FedRAMP High workloads, deploy all resources to Azure Government regions. Key differences from commercial Azure:

Component Commercial Azure Government
Sentinel All regions USGov Virginia, USGov Arizona
Log Analytics All regions USGov Virginia, USGov Arizona, USDoD Central, USDoD East
Event Hub All regions USGov Virginia, USGov Arizona
ADLS Gen2 All regions USGov Virginia, USGov Arizona
Defender for Cloud All regions USGov Virginia, USGov Arizona
ARM endpoint management.azure.com management.usgovcloudapi.net

!!! warning "Azure Government Considerations" - Use az cloud set --name AzureUSGovernment before deploying - Sentinel content hub solutions may have delayed availability in government regions - Log Analytics workspace IDs differ between clouds — update WORKSPACE_ID references - Some Defender for Cloud features (e.g., CSPM) may have feature parity gaps — check Azure Government services availability

# Deploy to Azure Government
az cloud set --name AzureUSGovernment
az login

az deployment group create \
  --resource-group rg-cyber-prd \
  --template-file examples/cybersecurity/deploy/sentinel-workspace.bicep \
  --parameters namePrefix=csa environment=prd retentionDays=365

Project Structure

examples/cybersecurity/
├── contracts/
│   └── sentinel-alerts.yaml          # Data contract for alert schema
├── data/
│   ├── cisa-kev-sample.json          # Sample CISA KEV catalog
│   ├── mitre-attack-mapping.json     # ATT&CK technique reference
│   ├── sample-network-flows.csv      # Sample NSG flow data
│   └── sample-sentinel-alerts.json   # Sample Sentinel alerts
├── deploy/
│   ├── analytics-rules.bicep         # Sentinel detection rules
│   └── sentinel-workspace.bicep      # Workspace + connectors
├── domains/
│   ├── bronze/
│   │   └── stg_sentinel_alerts.sql   # Raw alert staging
│   ├── silver/
│   │   ├── dim_mitre_techniques.sql  # ATT&CK dimension
│   │   └── fct_security_alerts.sql   # Enriched alert facts
│   └── gold/
│       ├── rpt_compliance_posture.sql # NIST/CMMC gap analysis
│       └── rpt_threat_landscape.sql   # Threat activity summary
└── notebooks/
    ├── 01-alert-exploration.py       # Data profiling
    ├── 02-threat-detection-ml.py     # Isolation Forest anomaly detection
    └── 03-kql-threat-hunting.py      # KQL hunt library

Sources