Monitoring & Observability Migration¶
Establish comprehensive monitoring for the SAS-to-Entra authentication migration, including Entra sign-in logs, certificate expiration alerts, and authentication dashboards.
Finding: CSA-0025 (HIGH, BREAKING) | Ballot: AQ-0014 (approved)
Overview¶
Monitoring is critical during and after a security migration. Before migration, you need a baseline of current SAS authentication patterns. During migration, you need real-time visibility into both SAS and Entra authentication to ensure devices and services are transitioning correctly. After migration, you need ongoing monitoring of certificate lifetimes, managed identity usage, and authentication failures.
Pre-migration baseline¶
Enable IoT Hub diagnostic settings¶
Before starting the migration, ensure diagnostic settings are configured to capture authentication events.
# Enable diagnostic settings on IoT Hub
az monitor diagnostic-settings create \
--name "iot-hub-auth-diagnostics" \
--resource "$IOT_HUB_ID" \
--workspace "$LOG_ANALYTICS_WORKSPACE_ID" \
--logs '[
{"category": "Connections", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "DeviceTelemetry", "enabled": true, "retentionPolicy": {"days": 30, "enabled": true}},
{"category": "C2DCommands", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "DeviceIdentityOperations", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "Routes", "enabled": true, "retentionPolicy": {"days": 30, "enabled": true}},
{"category": "D2CTwinOperations", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "C2DTwinOperations", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "TwinQueries", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "DirectMethods", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "Configurations", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}}
]' \
--metrics '[{"category": "AllMetrics", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}}]'
Baseline KQL query: Current SAS authentication patterns¶
// Count connections by auth type over the last 7 days
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES"
| where Category == "Connections"
| where TimeGenerated > ago(7d)
| extend authType = tostring(properties_s)
| summarize
TotalConnections = count(),
UniqueDevices = dcount(deviceId_s)
by authType_s
| order by TotalConnections desc
// Identify devices still using SAS authentication
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES"
| where Category == "Connections"
| where TimeGenerated > ago(24h)
| where authType_s == "sas"
| summarize
LastConnection = max(TimeGenerated),
ConnectionCount = count()
by deviceId_s
| order by LastConnection desc
Entra sign-in logs for IoT Hub access¶
Enable Entra diagnostic settings¶
# Enable Entra ID diagnostic settings to Log Analytics
az monitor diagnostic-settings create \
--name "entra-iot-diagnostics" \
--resource "/providers/Microsoft.aadiam/diagnosticSettings/entra-iot" \
--workspace "$LOG_ANALYTICS_WORKSPACE_ID" \
--logs '[
{"category": "SignInLogs", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "NonInteractiveUserSignInLogs", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "ServicePrincipalSignInLogs", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "ManagedIdentitySignInLogs", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}},
{"category": "AuditLogs", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}}
]'
KQL: Managed identity sign-ins to IoT Hub¶
// Managed identity authentications to IoT Hub
ManagedIdentitySignInLogs
| where TimeGenerated > ago(24h)
| where ResourceDisplayName contains "IoT Hub"
or ResourceId contains "Microsoft.Devices/IotHubs"
| project
TimeGenerated,
ServicePrincipalName,
ServicePrincipalId,
ResourceDisplayName,
IPAddress,
Status = ResultType,
ConditionalAccessStatus
| order by TimeGenerated desc
// Failed managed identity authentications (potential RBAC issues)
ManagedIdentitySignInLogs
| where TimeGenerated > ago(24h)
| where ResultType != "0" // Non-success
| where ResourceDisplayName contains "IoT Hub"
| project
TimeGenerated,
ServicePrincipalName,
ResultType,
ResultDescription,
IPAddress
| order by TimeGenerated desc
Certificate expiration monitoring¶
Azure Monitor alert for certificate expiration¶
# Create alert rule for certificates expiring within 30 days
az monitor scheduled-query create \
--name "iot-cert-expiry-warning" \
--resource-group "$RG" \
--scopes "$LOG_ANALYTICS_WORKSPACE_ID" \
--condition "count > 0" \
--condition-query "
AzureDiagnostics
| where ResourceProvider == 'MICROSOFT.DEVICES'
| where Category == 'Connections'
| where TimeGenerated > ago(1h)
| where authType_s == 'x509'
| extend certExpiry = todatetime(properties_s)
| where certExpiry < now() + 30d
| summarize count() by deviceId_s, certExpiry
" \
--evaluation-frequency "1h" \
--window-size "1h" \
--severity 2 \
--action-groups "$ACTION_GROUP_ID" \
--description "IoT device certificates expiring within 30 days"
KQL: Certificate expiration dashboard¶
// Certificates expiring in the next 90 days
let CertInventory = datatable(deviceId:string, certThumbprint:string, certExpiry:datetime) [
// This would be populated from your certificate management system
// or extracted from IoT Hub device registry
];
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES"
| where Category == "Connections"
| where authType_s == "x509"
| where TimeGenerated > ago(24h)
| summarize LastSeen = max(TimeGenerated) by deviceId_s
| join kind=inner (
// Join with certificate inventory
CertInventory
) on $left.deviceId_s == $right.deviceId
| extend DaysUntilExpiry = datetime_diff('day', certExpiry, now())
| extend ExpiryBucket = case(
DaysUntilExpiry <= 0, "EXPIRED",
DaysUntilExpiry <= 7, "Critical (< 7 days)",
DaysUntilExpiry <= 30, "Warning (< 30 days)",
DaysUntilExpiry <= 90, "Upcoming (< 90 days)",
"OK (90+ days)"
)
| summarize DeviceCount = count() by ExpiryBucket
| order by DeviceCount desc
Key Vault certificate expiration monitoring¶
If certificates are managed through Azure Key Vault:
# Enable Key Vault diagnostic settings
az monitor diagnostic-settings create \
--name "kv-cert-diagnostics" \
--resource "$KEY_VAULT_ID" \
--workspace "$LOG_ANALYTICS_WORKSPACE_ID" \
--logs '[
{"category": "AuditEvent", "enabled": true, "retentionPolicy": {"days": 90, "enabled": true}}
]'
# Create alert for Key Vault certificate near expiry events
az monitor scheduled-query create \
--name "kv-cert-expiry-alert" \
--resource-group "$RG" \
--scopes "$LOG_ANALYTICS_WORKSPACE_ID" \
--condition "count > 0" \
--condition-query "
AzureDiagnostics
| where ResourceProvider == 'MICROSOFT.KEYVAULT'
| where OperationName == 'CertificateNearExpiry'
| project TimeGenerated, id_s, requestUri_s
" \
--evaluation-frequency "6h" \
--window-size "6h" \
--severity 2 \
--action-groups "$ACTION_GROUP_ID" \
--description "Key Vault certificates approaching expiration"
Managed identity usage auditing¶
KQL: Managed identity usage patterns¶
// Which managed identities are accessing IoT Hub and how often
ManagedIdentitySignInLogs
| where TimeGenerated > ago(7d)
| where ResourceDisplayName contains "IoT Hub"
| summarize
AuthCount = count(),
SuccessCount = countif(ResultType == "0"),
FailureCount = countif(ResultType != "0"),
LastAccess = max(TimeGenerated),
DistinctIPs = dcount(IPAddress)
by ServicePrincipalName, ServicePrincipalId
| extend FailureRate = round(100.0 * FailureCount / AuthCount, 2)
| order by AuthCount desc
// Unused managed identities (assigned RBAC but no sign-ins in 30 days)
let ActiveIdentities = ManagedIdentitySignInLogs
| where TimeGenerated > ago(30d)
| where ResourceDisplayName contains "IoT Hub"
| distinct ServicePrincipalId;
// Cross-reference with RBAC assignments via Azure Resource Graph
// (requires Resource Graph query integration)
AzureActivity
| where OperationNameValue == "Microsoft.Authorization/roleAssignments/write"
| where TimeGenerated > ago(90d)
| where Properties_d contains "Microsoft.Devices/IotHubs"
| extend AssignedPrincipalId = tostring(parse_json(Properties_d).principalId)
| where AssignedPrincipalId !in (ActiveIdentities)
| project AssignedPrincipalId, OperationName, TimeGenerated
Dashboard template¶
Migration progress dashboard (KQL queries)¶
// === Panel 1: Migration Progress ===
// Devices by authentication type over time
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES"
| where Category == "Connections"
| where TimeGenerated > ago(30d)
| summarize DeviceCount = dcount(deviceId_s) by bin(TimeGenerated, 1d), authType_s
| render timechart with (title="Device Auth Type Over Time")
// === Panel 2: Current Auth Type Distribution ===
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES"
| where Category == "Connections"
| where TimeGenerated > ago(24h)
| summarize DeviceCount = dcount(deviceId_s) by authType_s
| render piechart with (title="Current Auth Type Distribution")
// === Panel 3: Authentication Failures ===
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES"
| where Category == "Connections"
| where TimeGenerated > ago(24h)
| where level_s == "Error" or statusCode_s startswith "4"
| summarize FailureCount = count() by bin(TimeGenerated, 1h), authType_s
| render timechart with (title="Auth Failures by Type")
// === Panel 4: Service Identity Usage ===
ManagedIdentitySignInLogs
| where TimeGenerated > ago(24h)
| where ResourceDisplayName contains "IoT Hub"
| summarize
Total = count(),
Failures = countif(ResultType != "0")
by bin(TimeGenerated, 1h)
| render timechart with (title="Service Identity Auth Events")
// === Panel 5: Certificate Health ===
// Requires certificate inventory table (custom)
// Simulated with connection log data
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES"
| where Category == "Connections"
| where authType_s == "x509"
| where TimeGenerated > ago(1h)
| summarize
ConnectedDevices = dcount(deviceId_s),
TotalConnections = count()
| extend Status = iff(ConnectedDevices > 0, "Healthy", "No X.509 connections")
// === Panel 6: Remaining SAS Devices ===
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES"
| where Category == "Connections"
| where TimeGenerated > ago(24h)
| where authType_s == "sas"
| distinct deviceId_s
| summarize RemainingDevices = count()
Alert rules for authentication failures¶
Alert 1: High rate of authentication failures¶
az monitor scheduled-query create \
--name "iot-auth-failure-spike" \
--resource-group "$RG" \
--scopes "$LOG_ANALYTICS_WORKSPACE_ID" \
--condition "count > 50" \
--condition-query "
AzureDiagnostics
| where ResourceProvider == 'MICROSOFT.DEVICES'
| where Category == 'Connections'
| where statusCode_s startswith '4'
| where TimeGenerated > ago(15m)
| summarize FailureCount = count()
" \
--evaluation-frequency "5m" \
--window-size "15m" \
--severity 1 \
--action-groups "$ACTION_GROUP_ID" \
--description "More than 50 IoT Hub auth failures in 15 minutes"
Alert 2: SAS authentication detected after migration¶
az monitor scheduled-query create \
--name "iot-sas-after-migration" \
--resource-group "$RG" \
--scopes "$LOG_ANALYTICS_WORKSPACE_ID" \
--condition "count > 0" \
--condition-query "
AzureDiagnostics
| where ResourceProvider == 'MICROSOFT.DEVICES'
| where Category == 'Connections'
| where authType_s == 'sas'
| where TimeGenerated > ago(1h)
| summarize count()
" \
--evaluation-frequency "1h" \
--window-size "1h" \
--severity 2 \
--action-groups "$ACTION_GROUP_ID" \
--description "SAS authentication detected after migration cutover"
Alert 3: Managed identity authentication failure¶
az monitor scheduled-query create \
--name "iot-mi-auth-failure" \
--resource-group "$RG" \
--scopes "$LOG_ANALYTICS_WORKSPACE_ID" \
--condition "count > 0" \
--condition-query "
ManagedIdentitySignInLogs
| where ResultType != '0'
| where ResourceDisplayName contains 'IoT Hub'
| where TimeGenerated > ago(15m)
| summarize count()
" \
--evaluation-frequency "5m" \
--window-size "15m" \
--severity 2 \
--action-groups "$ACTION_GROUP_ID" \
--description "Managed identity failed to authenticate to IoT Hub"
Alert 4: Device certificate expired and attempting connection¶
az monitor scheduled-query create \
--name "iot-expired-cert-connection" \
--resource-group "$RG" \
--scopes "$LOG_ANALYTICS_WORKSPACE_ID" \
--condition "count > 0" \
--condition-query "
AzureDiagnostics
| where ResourceProvider == 'MICROSOFT.DEVICES'
| where Category == 'Connections'
| where authType_s == 'x509'
| where statusCode_s == '401'
| where TimeGenerated > ago(1h)
| summarize FailedDevices = dcount(deviceId_s)
" \
--evaluation-frequency "1h" \
--window-size "1h" \
--severity 2 \
--action-groups "$ACTION_GROUP_ID" \
--description "Devices with expired certificates attempting to connect"
Post-migration monitoring checklist¶
- Diagnostic settings enabled on IoT Hub (all categories)
- Entra diagnostic settings enabled (ManagedIdentitySignInLogs)
- Key Vault diagnostic settings enabled (AuditEvent)
- Alert: Authentication failure spike (> 50 in 15 min)
- Alert: SAS authentication detected post-migration
- Alert: Managed identity auth failure
- Alert: Expired certificate connection attempts
- Alert: Certificates expiring within 30 days
- Dashboard: Migration progress (auth type over time)
- Dashboard: Current auth distribution
- Dashboard: Service identity usage
- Dashboard: Certificate health
- Weekly review of unused RBAC assignments
- Monthly review of certificate renewal compliance
Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team Related: Best Practices | Managed Identity Migration | X.509 Migration