Security Architecture¶
Overview¶
This document describes the security architecture for the Azure Real-Time Analytics solution, implementing defense-in-depth with zero-trust principles, identity-based access control, and comprehensive security monitoring.
Table of Contents¶
- Security Principles
- Identity and Access Management
- Data Protection
- Network Security
- Threat Protection
- Compliance and Governance
- Security Monitoring
- Incident Response
Security Principles¶
Zero Trust Architecture¶
graph TB
subgraph Identity["Identity Layer"]
AAD[Azure AD]
MFA[Multi-Factor Auth]
CA[Conditional Access]
end
subgraph Network["Network Layer"]
PE[Private Endpoints]
NSG[Network Security Groups]
FW[Azure Firewall]
end
subgraph Data["Data Layer"]
Encryption[Encryption at Rest]
TLS[TLS in Transit]
CMK[Customer Managed Keys]
end
subgraph Application["Application Layer"]
RBAC[Role-Based Access]
PIM[Privileged Identity]
Secrets[Key Vault]
end
subgraph Monitoring["Monitoring Layer"]
Defender[Defender for Cloud]
Sentinel[Azure Sentinel]
Logs[Audit Logs]
end
Identity --> Network
Network --> Data
Data --> Application
Application --> Monitoring Defense-in-Depth Layers¶
| Layer | Controls | Purpose |
|---|---|---|
| Identity | Azure AD, MFA, Conditional Access | Verify user identity |
| Network | Private Link, NSG, Firewall | Isolate resources |
| Data | Encryption, Masking, CMK | Protect data |
| Application | RBAC, Managed Identity, Secrets | Control access |
| Monitoring | Defender, Sentinel, Logs | Detect threats |
Identity and Access Management¶
Azure Active Directory Integration¶
# Configure Azure AD authentication for Databricks
az databricks workspace update \
--resource-group analytics-rg \
--name databricks-workspace \
--aad-tenant-id <tenant-id> \
--prepare-encryption
# Enable SCIM provisioning
# Via Azure Portal: Azure AD > Enterprise Applications > Databricks > Provisioning
Conditional Access Policies¶
{
"displayName": "Require MFA for Databricks Access",
"state": "enabled",
"conditions": {
"applications": {
"includeApplications": ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"]
},
"users": {
"includeGroups": ["DataEngineers", "DataScientists"]
},
"locations": {
"includeLocations": ["All"],
"excludeLocations": ["TrustedNetwork"]
}
},
"grantControls": {
"operator": "AND",
"builtInControls": ["mfa", "compliantDevice"]
}
}
Role-Based Access Control¶
Databricks Workspace Roles¶
# Unity Catalog role assignments
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Grant catalog access
w.grants.update(
securable_type="catalog",
full_name="realtime_analytics",
changes=[
{
"principal": "data-engineers",
"add": ["USE_CATALOG", "USE_SCHEMA", "SELECT", "MODIFY"]
},
{
"principal": "analysts",
"add": ["USE_CATALOG", "USE_SCHEMA", "SELECT"]
},
{
"principal": "data-scientists",
"add": ["USE_CATALOG", "USE_SCHEMA", "SELECT", "EXECUTE"]
}
]
)
Azure RBAC Assignments¶
# Assign Storage Blob Data Contributor to Databricks MSI
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id <databricks-msi-object-id> \
--scope /subscriptions/{subscription-id}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/{storage-account}
# Assign Event Hubs Data Receiver
az role assignment create \
--role "Azure Event Hubs Data Receiver" \
--assignee-object-id <databricks-msi-object-id> \
--scope /subscriptions/{subscription-id}/resourceGroups/{rg}/providers/Microsoft.EventHub/namespaces/{namespace}
Managed Identities¶
# Use managed identity in Databricks
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient
# Authenticate with managed identity
credential = DefaultAzureCredential()
# Access storage
blob_service_client = BlobServiceClient(
account_url="https://storageaccount.blob.core.windows.net",
credential=credential
)
# List containers
containers = blob_service_client.list_containers()
for container in containers:
print(container.name)
Privileged Identity Management¶
# Configure PIM for Azure resources
# Via Azure Portal: Azure AD > Privileged Identity Management
# Sample PIM settings:
{
"roleName": "Owner",
"resourceScope": "/subscriptions/{sub-id}/resourceGroups/analytics-rg",
"settings": {
"activationDuration": "PT8H",
"requireMFA": true,
"requireJustification": true,
"requireApproval": true,
"approvers": ["security-team@company.com"]
}
}
Data Protection¶
Encryption at Rest¶
# Enable customer-managed keys for Storage Account
az storage account update \
--resource-group analytics-rg \
--name realtimeanalyticsstorage \
--encryption-key-source Microsoft.Keyvault \
--encryption-key-vault https://analytics-kv.vault.azure.net \
--encryption-key-name storage-encryption-key
# Enable encryption for Databricks managed services
az databricks workspace update \
--resource-group analytics-rg \
--name databricks-workspace \
--key-source Microsoft.Keyvault \
--key-vault-uri https://analytics-kv.vault.azure.net \
--key-name databricks-encryption-key \
--key-version <version>
Encryption in Transit¶
# Enforce TLS 1.2 for all connections
spark.conf.set("spark.databricks.delta.ssl.enabled", "true")
spark.conf.set("spark.databricks.delta.ssl.protocolVersion", "TLSv1.2")
# Event Hubs with TLS
from azure.eventhub import EventHubProducerClient
producer = EventHubProducerClient.from_connection_string(
conn_str=connection_string,
eventhub_name=eventhub_name,
transport_type="AmqpOverWebsocket", # Forces HTTPS
)
Data Masking and Classification¶
-- Apply dynamic data masking
CREATE TABLE gold.customer_sensitive (
customer_id STRING,
customer_name STRING,
email STRING MASK 'email', -- Mask email addresses
ssn STRING MASK 'hash', -- Hash SSN
credit_score INT MASK 'default' -- Show default value
)
USING DELTA
TBLPROPERTIES (
'delta.feature.columnMapping.enabled' = 'true',
'delta.columnMapping.mode' = 'name'
);
-- Tag sensitive columns
ALTER TABLE gold.customer_sensitive
ALTER COLUMN ssn
SET TAGS ('classification' = 'PII', 'sensitivity' = 'HIGH');
Key Vault Integration¶
# Create Key Vault
az keyvault create \
--resource-group analytics-rg \
--name analytics-kv-prod \
--location eastus \
--enable-purge-protection true \
--enable-soft-delete true \
--retention-days 90 \
--sku premium
# Store secrets
az keyvault secret set \
--vault-name analytics-kv-prod \
--name "storage-account-key" \
--value "<storage-key>"
az keyvault secret set \
--vault-name analytics-kv-prod \
--name "eventhub-connection-string" \
--value "<connection-string>"
# Access secrets in Databricks
# Create secret scope backed by Key Vault
databricks secrets create-scope \
--scope kv-secrets \
--scope-backend-type AZURE_KEYVAULT \
--resource-id /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.KeyVault/vaults/analytics-kv-prod \
--dns-name https://analytics-kv-prod.vault.azure.net/
# Use secrets in code
storage_account_key = dbutils.secrets.get(scope="kv-secrets", key="storage-account-key")
spark.conf.set(
f"fs.azure.account.key.storageaccount.dfs.core.windows.net",
storage_account_key
)
Network Security¶
Private Link Configuration¶
# Disable public access to Storage Account
az storage account update \
--resource-group analytics-rg \
--name realtimeanalyticsstorage \
--public-network-access Disabled
# Create private endpoint
az network private-endpoint create \
--resource-group analytics-rg \
--name storage-pe \
--vnet-name analytics-vnet \
--subnet private-endpoints \
--private-connection-resource-id /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/realtimeanalyticsstorage \
--group-id blob \
--connection-name storage-blob-pe-connection
Network Security Groups¶
# Create NSG rule to deny internet outbound
az network nsg rule create \
--resource-group analytics-rg \
--nsg-name databricks-nsg \
--name DenyInternetOutbound \
--priority 4000 \
--direction Outbound \
--access Deny \
--protocol '*' \
--source-address-prefixes '*' \
--source-port-ranges '*' \
--destination-address-prefixes Internet \
--destination-port-ranges '*'
# Allow only required Azure services
az network nsg rule create \
--resource-group analytics-rg \
--nsg-name databricks-nsg \
--name AllowAzureServices \
--priority 100 \
--direction Outbound \
--access Allow \
--protocol Tcp \
--source-address-prefixes VirtualNetwork \
--destination-address-prefixes AzureCloud \
--destination-port-ranges 443
Service Endpoints and Private Endpoints¶
| Service | Connectivity | Security |
|---|---|---|
| Storage Account | Private Endpoint | No public access |
| Event Hubs | Private Endpoint | VNet only |
| Key Vault | Private Endpoint | IP whitelist |
| Databricks | VNet Injection | Private Link |
| SQL Endpoint | Private Endpoint | Azure AD auth |
Threat Protection¶
Microsoft Defender for Cloud¶
# Enable Defender for Cloud plans
az security pricing create \
--name VirtualMachines \
--tier Standard
az security pricing create \
--name StorageAccounts \
--tier Standard
az security pricing create \
--name SqlServers \
--tier Standard
az security pricing create \
--name KeyVaults \
--tier Standard
Azure Sentinel Integration¶
// Create analytics rule for suspicious Databricks activity
let SuspiciousCommands = dynamic(["rm -rf", "DROP DATABASE", "DROP TABLE", "TRUNCATE"]);
AzureDiagnostics
| where Category == "clusters" or Category == "jobs"
| where TimeGenerated > ago(1h)
| extend Command = tostring(parse_json(properties_s).commandText)
| where Command has_any (SuspiciousCommands)
| project TimeGenerated, User = identity_s, Command, ClusterName = resourceId
| summarize Count = count() by User, Command
| where Count > 5
Advanced Threat Protection¶
# Enable ATP for Storage Account
az storage account update \
--resource-group analytics-rg \
--name realtimeanalyticsstorage \
--enable-advanced-threat-protection true
# Configure ATP policies
az security atp storage update \
--resource-group analytics-rg \
--storage-account realtimeanalyticsstorage \
--is-enabled true
Compliance and Governance¶
Azure Policy¶
{
"properties": {
"displayName": "Require encryption for all storage accounts",
"policyType": "Custom",
"mode": "All",
"parameters": {},
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Storage/storageAccounts"
},
{
"field": "Microsoft.Storage/storageAccounts/encryption.services.blob.enabled",
"notEquals": "true"
}
]
},
"then": {
"effect": "deny"
}
}
}
}
Compliance Frameworks¶
| Framework | Status | Controls Implemented |
|---|---|---|
| SOC 2 Type II | ✅ Compliant | 150+ controls |
| ISO 27001 | ✅ Certified | Information security |
| GDPR | ✅ Ready | Data privacy |
| HIPAA | ✅ Compatible | Health data protection |
| PCI DSS | 🔄 In Progress | Payment data security |
Data Governance with Purview¶
# Register data sources in Purview
az purview account create \
--resource-group analytics-rg \
--name analytics-purview \
--location eastus
# Scan ADLS Gen2
# Via Purview Studio: Register and scan
{
"name": "ADLS-Gen2-Scan",
"kind": "AdlsGen2Msi",
"properties": {
"scanRulesetName": "AdlsGen2",
"scanRulesetType": "System"
}
}
Security Monitoring¶
Diagnostic Settings¶
# Enable diagnostics for Storage Account
az monitor diagnostic-settings create \
--resource /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/realtimeanalyticsstorage \
--name storage-diagnostics \
--logs '[{"category":"StorageRead","enabled":true},{"category":"StorageWrite","enabled":true},{"category":"StorageDelete","enabled":true}]' \
--metrics '[{"category":"Transaction","enabled":true}]' \
--workspace /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.OperationalInsights/workspaces/analytics-workspace
Security Alerts¶
// Alert on failed authentication attempts
SigninLogs
| where TimeGenerated > ago(1h)
| where ResultType != "0" // Failed sign-in
| where AppDisplayName == "Azure Databricks"
| summarize FailedAttempts = count() by UserPrincipalName, IPAddress
| where FailedAttempts > 5
| project TimeGenerated = now(), User = UserPrincipalName, IPAddress, FailedAttempts
Audit Logging¶
-- Enable audit logging for Unity Catalog
ALTER CATALOG realtime_analytics
SET TBLPROPERTIES (
'delta.enableChangeDataFeed' = 'true',
'audit.enabled' = 'true'
);
-- Query audit logs
SELECT *
FROM system.access.audit
WHERE action_name IN ('CREATE', 'DROP', 'ALTER', 'SELECT', 'INSERT', 'UPDATE', 'DELETE')
AND event_time > current_timestamp() - INTERVAL 24 HOURS
ORDER BY event_time DESC;
Incident Response¶
Security Incident Playbook¶
1. Detection and Analysis¶
// Investigate security incident
SecurityAlert
| where TimeGenerated > ago(24h)
| where AlertSeverity in ("High", "Medium")
| summarize AlertCount = count() by AlertName, CompromisedEntity
| order by AlertCount desc
2. Containment¶
# Revoke access for compromised account
az ad user update \
--id compromised.user@company.com \
--account-enabled false
# Rotate storage account keys
az storage account keys renew \
--resource-group analytics-rg \
--account-name realtimeanalyticsstorage \
--key primary
# Revoke Databricks personal access tokens
databricks tokens revoke --token-id <token-id>
3. Eradication¶
# Remove malicious resources
az databricks cluster delete \
--resource-group analytics-rg \
--workspace-name databricks-workspace \
--cluster-id <cluster-id>
# Update firewall rules
az network firewall network-rule create \
--collection-name block-malicious-ips \
--destination-addresses <malicious-ip> \
--action Deny
4. Recovery¶
# Restore from backup
az storage blob restore \
--account-name realtimeanalyticsstorage \
--time-to-restore "2024-01-01T00:00:00Z" \
--blob-range container1
# Re-enable account
az ad user update \
--id user@company.com \
--account-enabled true
5. Post-Incident Activities¶
- Document incident timeline
- Update security controls
- Conduct lessons learned session
- Update incident response plan
Security Checklist¶
Pre-Production¶
- Enable Azure AD authentication
- Configure MFA for all users
- Implement RBAC with least privilege
- Enable encryption at rest and in transit
- Configure private endpoints
- Enable NSG flow logs
- Configure Azure Firewall
- Enable Defender for Cloud
- Set up Azure Sentinel
- Configure audit logging
- Implement backup and DR
- Document security architecture
- Conduct security assessment
- Perform penetration testing
- Train security team
Ongoing¶
- Review access permissions monthly
- Rotate secrets quarterly
- Update security policies
- Review security alerts daily
- Conduct security drills quarterly
- Update incident response plan
- Review compliance status
- Patch and update systems
- Monitor threat intelligence
- Conduct security audits
Related Documentation¶
- Network Architecture
- Architecture Components
- Security Implementation
- Identity Setup
- Security Monitoring
Last Updated: January 2025 Version: 1.0.0 Status: Production Ready