Azure Data Explorer (ADX) / KQL Guide¶
Azure Data Explorer delivers sub-second queries over streaming and time-series data — the hot-path analytics engine in CSA-in-a-Box for real-time dashboards, anomaly detection, and operational intelligence.
Why ADX¶
Azure Data Explorer (Kusto) is purpose-built for exploring large volumes of semi-structured telemetry, logs, and time-series data. Where the Delta Lakehouse excels at batch analytics across the medallion layers, ADX excels at interactive, ad-hoc queries over streaming data with sub-second response times. CSA-in-a-Box uses ADX alongside the Lakehouse — not instead of it — for scenarios where detection latency is measured in seconds, not hours.
Architecture Overview¶
graph LR
subgraph Ingestion
EH[Event Hubs<br/>Streaming Data]
IOT[IoT Hub<br/>Device Telemetry]
ADLS[(ADLS Gen2<br/>Historical Batch)]
end
subgraph "ADX Cluster"
DB1[Database: telemetry]
DB2[Database: security_logs]
MV[Materialized Views<br/>Pre-aggregated]
HOT[Hot Cache<br/>SSD — fast queries]
COLD[Cold Storage<br/>Managed disk — low cost]
end
subgraph Consumption
DASH[ADX Dashboards]
GRAF[Grafana Plugin]
PBI[Power BI<br/>DirectQuery]
API[KQL REST API]
ALERT[Azure Monitor Alerts]
end
subgraph "Long-term Archival"
EXPORT[Continuous Export]
ADLS2[(ADLS Gen2<br/>Delta / Parquet)]
end
EH -->|Data connection| DB1
IOT -->|Data connection| DB1
ADLS -->|One-time ingest| DB2
EH -->|Data connection| DB2
DB1 --> MV
DB1 --> HOT
DB1 --> COLD
DB2 --> HOT
DB2 --> COLD
HOT --> DASH
HOT --> GRAF
MV --> PBI
HOT --> API
HOT --> ALERT
DB1 --> EXPORT
DB2 --> EXPORT
EXPORT --> ADLS2 Setup¶
Cluster Creation (Bicep)¶
resource adxCluster 'Microsoft.Kusto/clusters@2023-08-15' = {
name: 'adxcsa${environment}${location}'
location: location
sku: {
name: 'Standard_E8ads_v5' // Dev: Dev(No SLA)_Standard_E2a_v4
tier: 'Standard'
capacity: 2 // Node count (min 2 for production)
}
identity: {
type: 'SystemAssigned'
}
properties: {
enableStreamingIngest: true
enableAutoStop: true // Auto-stop when idle (dev/test)
enableDiskEncryption: true
enableDoubleEncryption: false // Set true for IL5/CMMC workloads
publicNetworkAccess: 'Disabled'
trustedExternalTenants: []
}
}
Database Creation¶
resource adxDatabase 'Microsoft.Kusto/clusters/databases@2023-08-15' = {
parent: adxCluster
name: 'telemetry'
location: location
kind: 'ReadWrite'
properties: {
softDeletePeriod: 'P365D' // Keep data for 1 year
hotCachePeriod: 'P30D' // Keep 30 days in hot cache (SSD)
}
}
Table Schema Definition (KQL)¶
// Create the raw telemetry table
.create table TelemetryRaw (
DeviceId: string,
Timestamp: datetime,
Temperature: real,
Humidity: real,
Pressure: real,
Location: dynamic, // JSON: {"lat": 38.9, "lon": -77.0}
Tags: dynamic // JSON: {"site": "dc-east", "floor": 3}
) with (folder = "raw")
// Create ingestion mapping for Event Hub JSON payload
.create table TelemetryRaw ingestion json mapping 'TelemetryMapping'
'[{"column":"DeviceId","path":"$.device_id","datatype":"string"},'
' {"column":"Timestamp","path":"$.timestamp","datatype":"datetime"},'
' {"column":"Temperature","path":"$.temperature","datatype":"real"},'
' {"column":"Humidity","path":"$.humidity","datatype":"real"},'
' {"column":"Pressure","path":"$.pressure","datatype":"real"},'
' {"column":"Location","path":"$.location","datatype":"dynamic"},'
' {"column":"Tags","path":"$.tags","datatype":"dynamic"}]'
// Set retention and caching policy
.alter table TelemetryRaw policy retention softdelete = 365d
.alter table TelemetryRaw policy caching hot = 30d
Ingestion¶
Ingestion Methods¶
| Method | Latency | Best for | Configuration |
|---|---|---|---|
| Streaming ingestion | < 10 seconds | Low-latency, low-volume | Enable on cluster + table |
| Queued (batched) ingestion | 1-5 minutes | High throughput, cost-efficient | Default; batching policy |
| Event Hub data connection | 1-5 minutes (queued) or < 10s (streaming) | CSA streaming pipeline | Managed data connection |
| IoT Hub data connection | 1-5 minutes | Device telemetry | Managed data connection |
| One-time ADLS ingest | Minutes to hours | Historical backfill | .ingest into command |
Event Hub Data Connection (Bicep)¶
resource ehDataConnection 'Microsoft.Kusto/clusters/databases/dataConnections@2023-08-15' = {
parent: adxDatabase
name: 'dc-telemetry-eh'
location: location
kind: 'EventHub'
properties: {
eventHubResourceId: eventHub.id
consumerGroup: 'cg-adx'
tableName: 'TelemetryRaw'
mappingRuleName: 'TelemetryMapping'
dataFormat: 'MULTIJSON'
compression: 'None'
managedIdentityResourceId: adxCluster.id
}
}
Batching Policy¶
Control the trade-off between ingestion latency and efficiency.
// Default batching: 5 minutes or 1 GB or 1000 files
// For lower latency (higher cost):
.alter table TelemetryRaw policy ingestionbatching
@'{"MaximumBatchingTimeSpan": "00:00:30", "MaximumNumberOfItems": 500, "MaximumRawDataSizeMB": 256}'
KQL Essentials¶
Core Operators¶
// Filter, project, and sort
TelemetryRaw
| where Timestamp > ago(1h)
| where Temperature > 35.0
| project DeviceId, Timestamp, Temperature, Humidity
| order by Temperature desc
| take 100
// Aggregation with time bins
TelemetryRaw
| where Timestamp > ago(24h)
| summarize
AvgTemp = avg(Temperature),
MaxTemp = max(Temperature),
EventCount = count()
by bin(Timestamp, 1h), DeviceId
| order by Timestamp asc
Time Series Analysis¶
// Create a regular time series and detect anomalies
let min_t = ago(7d);
let max_t = now();
let dt = 1h;
TelemetryRaw
| where Timestamp between (min_t .. max_t)
| make-series AvgTemp = avg(Temperature) on Timestamp from min_t to max_t step dt by DeviceId
| extend (anomalies, score, baseline) = series_decompose_anomalies(AvgTemp, 1.5)
| mv-expand Timestamp to typeof(datetime),
AvgTemp to typeof(real),
anomalies to typeof(int),
score to typeof(real),
baseline to typeof(real)
| where anomalies != 0
| project Timestamp, DeviceId, AvgTemp, baseline, score, AnomalyType = iff(anomalies > 0, "Spike", "Dip")
Anomaly Detection¶
// Decompose time series into trend, seasonality, and residual
TelemetryRaw
| where Timestamp > ago(30d)
| make-series AvgTemp = avg(Temperature) on Timestamp step 1h by DeviceId
| extend (flag, score, baseline) = series_decompose_anomalies(AvgTemp)
| render anomalychart with (anomalycolumns=flag)
Geospatial Queries¶
// Find devices within a geographic polygon (e.g., a facility boundary)
let facility = dynamic({
"type": "Polygon",
"coordinates": [[[-77.05, 38.88], [-77.03, 38.88], [-77.03, 38.90], [-77.05, 38.90], [-77.05, 38.88]]]
});
TelemetryRaw
| where Timestamp > ago(1h)
| extend lat = toreal(Location.lat), lon = toreal(Location.lon)
| where geo_point_in_polygon(lon, lat, facility)
| summarize DeviceCount = dcount(DeviceId), AvgTemp = avg(Temperature)
Render Visualizations¶
// Time chart
TelemetryRaw
| where Timestamp > ago(24h)
| summarize AvgTemp = avg(Temperature) by bin(Timestamp, 15m), DeviceId
| render timechart
// Pie chart of events by device
TelemetryRaw
| where Timestamp > ago(1h)
| summarize EventCount = count() by DeviceId
| top 10 by EventCount
| render piechart
Materialized Views¶
Materialized views pre-aggregate data for dashboard queries, reducing compute at query time.
// Hourly aggregation — dashboards query this instead of raw table
.create materialized-view with (backfill=true) TelemetryHourly on table TelemetryRaw {
TelemetryRaw
| summarize
AvgTemp = avg(Temperature),
MaxTemp = max(Temperature),
MinTemp = min(Temperature),
AvgHumidity = avg(Humidity),
EventCount = count()
by bin(Timestamp, 1h), DeviceId
}
// Daily rollup for trend dashboards
.create materialized-view with (backfill=true) TelemetryDaily on table TelemetryRaw {
TelemetryRaw
| summarize
AvgTemp = avg(Temperature),
P95Temp = percentile(Temperature, 95),
EventCount = count()
by bin(Timestamp, 1d), DeviceId
}
Dashboard Performance
Point Power BI DirectQuery and Grafana dashboards at materialized views, not raw tables. A dashboard hitting TelemetryHourly runs 10-100x faster than one scanning TelemetryRaw with a summarize.
Data Retention¶
Hot Cache vs Cold Storage¶
| Tier | Storage | Query speed | Cost | Use case |
|---|---|---|---|---|
| Hot cache | SSD on cluster nodes | Sub-second | Higher (compute-bound) | Recent data, active dashboards |
| Cold storage | Managed disk (remote) | Seconds | Lower | Historical queries, compliance |
// Keep 30 days hot, 365 days total
.alter table TelemetryRaw policy caching hot = 30d
.alter table TelemetryRaw policy retention softdelete = 365d recoverability = enabled
Continuous Export to ADLS¶
For data older than the ADX retention period, export to ADLS for long-term archival in the Delta Lakehouse.
// Create external table for export target
.create external table TelemetryArchive (
DeviceId: string,
Timestamp: datetime,
Temperature: real,
Humidity: real,
Pressure: real
) kind=adls
partition by (Year: int = getYear(Timestamp), Month: int = getMonth(Timestamp))
pathformat = ("year=" Year "/month=" Month)
dataformat = parquet
(
h@'https://csaadls.blob.core.windows.net/archive/telemetry;managed_identity=system'
)
// Set up continuous export
.create-or-alter continuous-export TelemetryExport
to table TelemetryArchive
with (intervalBetweenRuns=1h, forcedLatency=10m)
<| TelemetryRaw
| project DeviceId, Timestamp, Temperature, Humidity, Pressure
Fabric Eventhouse¶
Fabric Eventhouse is ADX running inside Fabric — same KQL engine, same query language, integrated with OneLake and Real-Time Dashboards.
| Feature | Standalone ADX | Fabric Eventhouse |
|---|---|---|
| KQL engine | Same | Same |
| Deployment | Azure resource (Bicep) | Fabric workspace item |
| Storage | Managed by ADX | OneLake (Delta-compatible) |
| Dashboards | ADX Dashboards | Real-Time Dashboards |
| Alerting | Azure Monitor | Fabric Activator (Reflex) |
| Gov availability | GA | Not yet Gov GA (ADR-0018) |
graph LR
ES[Fabric Eventstream] --> EH2[Eventhouse<br/>KQL Database]
EH2 --> RTD[Real-Time Dashboard]
EH2 --> ACT[Activator<br/>Automated Alerts]
EH2 --> OL[(OneLake<br/>Delta shortcut)]
OL --> PBI[Power BI<br/>Direct Lake] Migration Path
When Fabric reaches Gov GA, ADX workloads migrate to Eventhouse with no KQL changes. The query language, table schemas, and materialized views are identical. See ADR-0018 for the env-gated adapter pattern.
Dashboards and Visualization¶
ADX Dashboards¶
Native dashboards built into the ADX web UI. Best for ops teams who write KQL directly.
- KQL-native — each tile is a KQL query
- Parameters — dropdown filters backed by KQL queries
- Auto-refresh — configurable refresh interval (30s minimum)
- Sharing — share via URL or embed in portal
Grafana Plugin¶
For teams already using Grafana, the Azure Data Explorer plugin provides native KQL support.
# Grafana data source configuration
[plugin.azure-data-explorer]
clusterUrl = https://adxcsadeveastus.eastus.kusto.windows.net
database = telemetry
authentication = managed_identity
Power BI Connector¶
Use DirectQuery from Power BI to ADX for real-time dashboards that combine streaming data with batch Gold layer data.
// Optimized view for Power BI (materialized view recommended)
TelemetryHourly
| where Timestamp > ago(30d)
| project Timestamp, DeviceId, AvgTemp, MaxTemp, EventCount
Power BI + ADX Performance
Always point Power BI DirectQuery at materialized views or functions — never at raw tables. Power BI generates KQL queries for every visual interaction; raw table scans will be slow and expensive.
Security¶
Authentication¶
| Method | Use case | Configuration |
|---|---|---|
| Managed Identity | Azure services (Functions, ADF, Databricks) | Assign Database Viewer or Database User role |
| Entra ID user | Analysts, dashboard users | Add to database principals |
| Service principal | External apps, CI/CD | App registration + database principal |
Network Security¶
- Private Endpoints — connect via Private Link; disable public access
- VNet injection — deploy ADX cluster into a VNet (legacy; prefer Private Endpoints)
- Managed Private Endpoints — ADX-initiated connections to sources (Event Hub, ADLS)
Row-Level Security¶
// Create a function that filters by user's region
.create function with (
folder="security",
docstring="RLS filter for regional access"
) RegionalFilter() {
TelemetryRaw
| where Tags.region == current_principal_details().Region
or current_principal_details().Role == "Admin"
}
// Apply as restricted view policy
.alter table TelemetryRaw policy restricted_view_access true
Monitoring¶
Key Metrics¶
| Metric | Alert threshold | What it means |
|---|---|---|
| Ingestion latency | > 5 minutes (queued) | Batching delay or source issue |
| Ingestion volume | Drop > 50% | Source failure or data connection issue |
| Cache utilization | > 80% | Consider increasing hot cache or cluster size |
| CPU utilization | > 70% sustained | Scale out or optimize queries |
| Query duration (P95) | > 30 seconds | Missing materialized views or unoptimized KQL |
| Failed ingestions | Any | Schema mismatch or mapping error |
Diagnostic Settings¶
resource adxDiagnostics 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
scope: adxCluster
name: 'adx-diagnostics'
properties: {
workspaceId: logAnalyticsWorkspace.id
logs: [
{ category: 'SucceededIngestion', enabled: true }
{ category: 'FailedIngestion', enabled: true }
{ category: 'Command', enabled: true }
{ category: 'Query', enabled: true }
]
metrics: [
{ category: 'AllMetrics', enabled: true }
]
}
}
Cost Optimization¶
Cluster SKU Sizing¶
| Workload | Recommended SKU | Nodes | Use case |
|---|---|---|---|
| Dev/Test | Dev(No SLA)_Standard_E2a_v4 | 1 | Single-node, no SLA, auto-stop |
| Small production | Standard_E8ads_v5 | 2 | < 1 TB hot data, moderate queries |
| Medium production | Standard_E16ads_v5 | 3-5 | 1-10 TB hot data, concurrent dashboards |
| Large production | Standard_E16ads_v5 | 8+ | > 10 TB hot data, heavy anomaly detection |
Auto-Stop for Dev/Test
Enable enableAutoStop: true on dev/test clusters. The cluster stops after 24 hours of inactivity and restarts on the next query — saving significant compute cost.
Cost Levers¶
| Lever | Action | Impact |
|---|---|---|
| Hot cache period | Reduce from 30d to 7d | Less SSD storage per node |
| Retention period | Export to ADLS, reduce ADX retention | Less managed storage |
| Materialized views | Pre-aggregate; reduce query compute | Lower per-query cost |
| Auto-stop | Enable for non-production | Zero cost when idle |
| Reserved instances | 1-year or 3-year commitment | 30-60% discount |
Anti-Patterns¶
Don't: Use ADX as a data warehouse
ADX is optimized for time-series and log analytics, not relational star-schema queries. Use the Delta Lakehouse (Databricks/Synapse) for warehouse workloads and ADX for streaming analytics.
Don't: Query raw tables from dashboards
Every dashboard refresh scans the raw table without materialized views. Create materialized views for any query pattern used by dashboards.
Don't: Set hot cache to maximum retention
Keeping all data in hot cache is expensive and unnecessary. Only recent data (7-30 days) needs sub-second query speed; older data queries can tolerate seconds of latency from cold storage.
Don't: Ignore ingestion failures
Failed ingestions usually indicate schema mismatches between the source payload and the table mapping. Monitor the FailedIngestion log and fix mappings immediately — failed events are dropped, not retried.
Do: Use streaming ingestion for low-latency use cases
When detection latency must be under 10 seconds (fraud, security), enable streaming ingestion on the table. For everything else, queued ingestion is more cost-efficient.
Do: Export to ADLS for long-term retention
Continuous export moves aged data to Parquet in ADLS, where it joins the Delta Lakehouse and is queryable by dbt, Databricks, and Synapse at a fraction of the ADX storage cost.
Checklist¶
- ADX cluster deployed via Bicep with streaming ingestion enabled
- Database(s) created with appropriate retention and cache policies
- Table schemas defined with ingestion mappings
- Event Hub data connections configured with dedicated consumer groups
- Materialized views created for dashboard queries
- Continuous export to ADLS configured for long-term archival
- Private Endpoints enabled; public network access disabled
- Managed identity granted
Database Viewerfor downstream services - Diagnostic settings forwarding to Log Analytics
- Alerting configured for failed ingestions and high CPU
- Dev/test clusters set to auto-stop
- Power BI connector tested against materialized views (not raw tables)