AKS Performance and Capability Benchmarks¶

Status: Authored 2026-04-30 Audience: Federal CTOs, platform architects, and SREs evaluating AKS performance against self-managed Kubernetes and OpenShift. Methodology: Benchmarks use published vendor data, community benchmarks, and representative workload patterns. All numbers are illustrative and should be validated against your specific workload. Test configurations noted per section.

How to read this document¶

Every benchmark section includes:

What is measured -- the specific metric
Baseline (self-managed K8s or OpenShift) -- on-premises performance
AKS result -- Azure Kubernetes Service performance
Winner and context -- which platform leads and why

Numbers represent typical mid-range federal deployments (50 nodes, Standard_D8s_v5 VMs) unless otherwise noted.

1. Pod scheduling latency¶

How quickly a pending pod gets scheduled to a node and starts running.

Scheduling latency: pending to running¶

Metric	Self-managed K8s (50 nodes)	OpenShift 4.x (50 nodes)	AKS Standard (50 nodes)
Median scheduling latency	1.2 seconds	1.5 seconds	1.0 seconds
p95 scheduling latency	3.5 seconds	4.2 seconds	2.8 seconds
p99 scheduling latency	8.1 seconds	9.5 seconds	6.2 seconds
Scheduling latency with PV binding	4.5 seconds	5.8 seconds	3.8 seconds
Cold start (new node via autoscaler)	N/A (pre-provisioned)	N/A (pre-provisioned)	45--90 seconds
Cold start (NAP/Karpenter)	N/A	N/A	30--60 seconds

Winner: AKS for scheduling latency on existing nodes. Self-managed wins when nodes are pre-provisioned (no autoscaler cold start). AKS NAP reduces cold start time by selecting optimal VM sizes automatically.

Pod startup time by image size¶

Image size	Pull time (cold cache)	Pull time (warm cache)	Notes
50 MB (Alpine-based)	2.1 seconds	0.3 seconds	Recommended for production
200 MB (Python/Node slim)	5.4 seconds	0.8 seconds	Common for API services
500 MB (Java/Spark)	12.8 seconds	1.5 seconds	Use ACR proximity for AKS
1 GB (ML model container)	28.3 seconds	3.2 seconds	Consider artifact streaming
2 GB+ (GPU/CUDA runtime)	55+ seconds	5.8 seconds	Use AKS artifact streaming

AKS advantage: ACR artifact streaming (preview) allows pods to start before the full image is pulled, reducing cold-start time for large images by 50--70%.

2. Network throughput: CNI comparison¶

Pod-to-pod throughput (iperf3, single stream)¶

CNI	TCP throughput (Gbps)	UDP throughput (Gbps)	Latency (microseconds)	Notes
Azure CNI Overlay	9.2	8.8	85	Recommended for most workloads
Azure CNI (VNet)	9.5	9.1	72	Direct VNet routing, lowest latency
Azure CNI + Cilium	9.4	9.0	78	eBPF dataplane, near-wire speed
Calico (on-prem, VXLAN)	8.1	7.5	120	VXLAN encapsulation overhead
Calico (on-prem, BGP)	9.0	8.6	90	Native routing, better performance
Flannel (on-prem)	7.8	7.2	130	VXLAN overlay
OVN-Kubernetes (OCP)	8.5	8.0	105	OpenShift default CNI

Winner: Azure CNI (VNet) for raw throughput and latency. Azure CNI + Cilium for best balance of performance and features (network policy, observability).

Network policy performance impact¶

CNI + Policy engine	Throughput with 100 policies	Latency impact	CPU overhead per node
Azure CNI + Cilium	9.1 Gbps (-3%)	+5 us	2% CPU
Azure CNI + Calico	8.8 Gbps (-6%)	+12 us	4% CPU
Azure NPM	8.5 Gbps (-9%)	+18 us	6% CPU
Calico on-prem (iptables)	7.2 Gbps (-11%)	+25 us	8% CPU

Winner: Cilium (eBPF-based) has the lowest policy enforcement overhead because eBPF programs are compiled and run in kernel space, avoiding iptables chain traversal.

Service mesh overhead (Istio sidecar)¶

Metric	Without Istio	With Istio (sidecar)	Overhead
Latency (p50)	1.2 ms	2.8 ms	+1.6 ms
Latency (p99)	8.5 ms	14.2 ms	+5.7 ms
Throughput	45K req/s	32K req/s	-29%
Memory per pod	128 MB (app)	128 MB + 72 MB (sidecar)	+56%
CPU per pod	0.5 CPU (app)	0.5 + 0.15 CPU (sidecar)	+30%

3. Storage IOPS by CSI driver¶

Sequential read/write throughput (fio, 128K block size)¶

Storage type	Sequential read (MBps)	Sequential write (MBps)	IOPS (4K random read)	IOPS (4K random write)	Latency (p99, 4K)
Azure Premium SSD (P30 1TB)	200	200	5,000	5,000	2.1 ms
Azure Premium SSD v2 (1TB)	400	400	20,000	20,000	0.4 ms
Azure Ultra Disk (1TB)	2,000	2,000	80,000	80,000	0.15 ms
Azure Files NFS (Premium)	300	200	10,000	8,000	1.5 ms
Azure NetApp Files (Premium)	500	350	25,000	20,000	0.5 ms
Azure Blob (BlobFuse2)	350	200	5,000	3,000	5.0 ms
Ceph RBD (on-prem, NVMe)	400	350	30,000	25,000	0.3 ms
Local NVMe (LSv3)	3,200	1,600	400,000	200,000	0.05 ms

Winner: Local NVMe (ephemeral, non-persistent) for raw IOPS. Azure Ultra Disk for persistent high-IOPS. Azure Premium SSD v2 for best price/performance balance.

CSA-in-a-Box context: Spark on AKS executors benefit from local NVMe (LSv3 nodes) for shuffle data. PostgreSQL workloads perform best on Premium SSD v2 or Ultra Disk.

Volume provisioning time¶

Storage type	Provision time (new PVC)	Attach time (existing volume)
Azure Premium SSD	5--15 seconds	15--30 seconds
Azure Premium SSD v2	5--15 seconds	15--30 seconds
Azure Ultra Disk	10--30 seconds	15--30 seconds
Azure Files NFS	3--10 seconds	Instant (mount)
Azure NetApp Files	10--60 seconds	Instant (mount)
Ceph RBD (on-prem)	2--5 seconds	5--15 seconds

4. Autoscaling response time¶

Cluster autoscaler scale-up time¶

Scenario	Scale-up time	Notes
Existing VMSS capacity (warm pool)	30--45 seconds	Node ready from VMSS warm pool
New VMSS instance (cold)	60--120 seconds	Full VM provisioning + K8s join
NAP (Karpenter)	30--60 seconds	Optimal VM selection + faster provisioning
Spot VM (when available)	45--90 seconds	Slightly slower due to capacity search
GPU node (NC/ND series)	120--300 seconds	GPU driver initialization adds time

KEDA scaling response¶

Scaler	Detection time	Scale-up time (total)	Notes
Event Hubs (partition lag)	5--15 seconds	20--45 seconds	Includes pod startup
HTTP (request rate)	10--30 seconds	25--60 seconds	Prometheus metrics scrape interval
Azure Queue (message count)	5--15 seconds	20--45 seconds	Azure Storage metrics
Custom Prometheus metric	15--30 seconds	30--60 seconds	Depends on scrape interval

HPA scaling response¶

Metric source	Detection time	Scale-up time (total)	Notes
CPU utilization	15--30 seconds	30--60 seconds	Default metrics server scrape: 15s
Memory utilization	15--30 seconds	30--60 seconds	Same as CPU
Custom metrics (Prometheus)	30--60 seconds	45--90 seconds	Prometheus adapter polling
External metrics	30--60 seconds	45--90 seconds	External metrics API polling

5. Control plane API latency¶

API server response time¶

Operation	Self-managed K8s (3 masters)	OpenShift (3 masters)	AKS Standard	AKS Premium
`kubectl get pods` (100 pods)	45 ms	55 ms	38 ms	32 ms
`kubectl get pods -A` (800 pods)	180 ms	220 ms	150 ms	120 ms
`kubectl apply` (single resource)	85 ms	105 ms	72 ms	58 ms
`kubectl apply` (100 resources)	1,200 ms	1,500 ms	980 ms	800 ms
`kubectl logs` (streaming)	120 ms initial	150 ms initial	95 ms initial	80 ms initial
Watch (1000 resources)	250 ms setup	300 ms setup	200 ms setup	160 ms setup
API server under load (100 concurrent)	350 ms p99	420 ms p99	280 ms p99	220 ms p99

Winner: AKS Premium for lowest API latency. AKS benefits from Microsoft's optimized API server infrastructure and auto-scaling.

API server availability¶

Platform	Uptime target	Measured (12-month average)	SLA-backed
Self-managed K8s	99.9% (design target)	99.7--99.95% (varies)	No
OpenShift (self-managed)	99.9% (design target)	99.8--99.95% (varies)	Red Hat support SLA
AKS Free tier	99.5% (design target)	99.5--99.9% (measured)	No
AKS Standard tier	99.95% (SLA)	99.95--99.99% (measured)	Yes (financially backed)
AKS Premium tier	99.95% (SLA)	99.95--99.99% (measured)	Yes (financially backed)

6. CSA-in-a-Box workload benchmarks on AKS¶

Spark on Kubernetes (Spark Operator)¶

Metric	On-prem K8s (8 workers, NVMe)	AKS (8 workers, D16s_v5)	AKS (8 workers, L16s_v3 NVMe)
TPC-DS 1TB total runtime	285 seconds	310 seconds	270 seconds
Shuffle write throughput	1.2 GB/s	0.8 GB/s	1.5 GB/s
Parquet read throughput (ADLS)	N/A	2.1 GB/s	2.1 GB/s
Executor startup time	8 seconds	12 seconds	12 seconds
Spot executor recovery	N/A	25--45 seconds	25--45 seconds

Context: AKS with NVMe-backed nodes (LSv3) outperforms on-prem for Spark workloads due to faster local disk I/O for shuffle data. ADLS Gen2 provides high-throughput reads for source data.

Model serving (Triton on GPU node pools)¶

Metric	On-prem K8s (V100 GPU)	AKS NC6s_v3 (V100)	AKS NC24ads_A100_v4 (A100)
ResNet-50 inference (batch=1)	5.2 ms	5.4 ms	2.1 ms
ResNet-50 throughput	1,200 req/s	1,150 req/s	3,400 req/s
LLM serving (7B params, vLLM)	42 tokens/s	40 tokens/s	120 tokens/s
GPU utilization (sustained)	85%	82%	88%
Model load time (2 GB model)	8 seconds	12 seconds	10 seconds

Context: GPU performance is equivalent between on-prem and AKS for the same GPU generation. The A100 (NC24ads_A100_v4) provides 3x throughput over V100 for LLM workloads.

7. Benchmark methodology¶

How to reproduce these benchmarks¶

# 1. Scheduling latency
kubectl apply -f - << 'EOF'
apiVersion: batch/v1
kind: Job
metadata:
  name: scheduling-bench
spec:
  parallelism: 100
  completions: 100
  template:
    spec:
      containers:
        - name: bench
          image: busybox:latest
          command: ["sh", "-c", "echo scheduled; sleep 1"]
      restartPolicy: Never
EOF
# Measure time from creation to Running for each pod

# 2. Network throughput
# Deploy iperf3 server and client pods on different nodes
kubectl run iperf-server --image=networkstatic/iperf3 -- -s
kubectl run iperf-client --image=networkstatic/iperf3 -- -c iperf-server -t 60 -P 8

# 3. Storage IOPS
# Deploy fio pod with target PVC
kubectl apply -f fio-benchmark.yaml
# See storage-migration.md for fio configuration

# 4. API latency
# Use k6 or hey to benchmark API server
k6 run api-bench.js

Maintainers: CSA-in-a-Box core team Last updated: 2026-04-30 Related: TCO Analysis | Cluster Migration | Best Practices