Skip to content

AKS Performance and Capability Benchmarks

Status: Authored 2026-04-30 Audience: Federal CTOs, platform architects, and SREs evaluating AKS performance against self-managed Kubernetes and OpenShift. Methodology: Benchmarks use published vendor data, community benchmarks, and representative workload patterns. All numbers are illustrative and should be validated against your specific workload. Test configurations noted per section.


How to read this document

Every benchmark section includes:

  • What is measured -- the specific metric
  • Baseline (self-managed K8s or OpenShift) -- on-premises performance
  • AKS result -- Azure Kubernetes Service performance
  • Winner and context -- which platform leads and why

Numbers represent typical mid-range federal deployments (50 nodes, Standard_D8s_v5 VMs) unless otherwise noted.


1. Pod scheduling latency

How quickly a pending pod gets scheduled to a node and starts running.

Scheduling latency: pending to running

Metric Self-managed K8s (50 nodes) OpenShift 4.x (50 nodes) AKS Standard (50 nodes)
Median scheduling latency 1.2 seconds 1.5 seconds 1.0 seconds
p95 scheduling latency 3.5 seconds 4.2 seconds 2.8 seconds
p99 scheduling latency 8.1 seconds 9.5 seconds 6.2 seconds
Scheduling latency with PV binding 4.5 seconds 5.8 seconds 3.8 seconds
Cold start (new node via autoscaler) N/A (pre-provisioned) N/A (pre-provisioned) 45--90 seconds
Cold start (NAP/Karpenter) N/A N/A 30--60 seconds

Winner: AKS for scheduling latency on existing nodes. Self-managed wins when nodes are pre-provisioned (no autoscaler cold start). AKS NAP reduces cold start time by selecting optimal VM sizes automatically.

Pod startup time by image size

Image size Pull time (cold cache) Pull time (warm cache) Notes
50 MB (Alpine-based) 2.1 seconds 0.3 seconds Recommended for production
200 MB (Python/Node slim) 5.4 seconds 0.8 seconds Common for API services
500 MB (Java/Spark) 12.8 seconds 1.5 seconds Use ACR proximity for AKS
1 GB (ML model container) 28.3 seconds 3.2 seconds Consider artifact streaming
2 GB+ (GPU/CUDA runtime) 55+ seconds 5.8 seconds Use AKS artifact streaming

AKS advantage: ACR artifact streaming (preview) allows pods to start before the full image is pulled, reducing cold-start time for large images by 50--70%.


2. Network throughput: CNI comparison

Pod-to-pod throughput (iperf3, single stream)

CNI TCP throughput (Gbps) UDP throughput (Gbps) Latency (microseconds) Notes
Azure CNI Overlay 9.2 8.8 85 Recommended for most workloads
Azure CNI (VNet) 9.5 9.1 72 Direct VNet routing, lowest latency
Azure CNI + Cilium 9.4 9.0 78 eBPF dataplane, near-wire speed
Calico (on-prem, VXLAN) 8.1 7.5 120 VXLAN encapsulation overhead
Calico (on-prem, BGP) 9.0 8.6 90 Native routing, better performance
Flannel (on-prem) 7.8 7.2 130 VXLAN overlay
OVN-Kubernetes (OCP) 8.5 8.0 105 OpenShift default CNI

Winner: Azure CNI (VNet) for raw throughput and latency. Azure CNI + Cilium for best balance of performance and features (network policy, observability).

Network policy performance impact

CNI + Policy engine Throughput with 100 policies Latency impact CPU overhead per node
Azure CNI + Cilium 9.1 Gbps (-3%) +5 us 2% CPU
Azure CNI + Calico 8.8 Gbps (-6%) +12 us 4% CPU
Azure NPM 8.5 Gbps (-9%) +18 us 6% CPU
Calico on-prem (iptables) 7.2 Gbps (-11%) +25 us 8% CPU

Winner: Cilium (eBPF-based) has the lowest policy enforcement overhead because eBPF programs are compiled and run in kernel space, avoiding iptables chain traversal.

Service mesh overhead (Istio sidecar)

Metric Without Istio With Istio (sidecar) Overhead
Latency (p50) 1.2 ms 2.8 ms +1.6 ms
Latency (p99) 8.5 ms 14.2 ms +5.7 ms
Throughput 45K req/s 32K req/s -29%
Memory per pod 128 MB (app) 128 MB + 72 MB (sidecar) +56%
CPU per pod 0.5 CPU (app) 0.5 + 0.15 CPU (sidecar) +30%

3. Storage IOPS by CSI driver

Sequential read/write throughput (fio, 128K block size)

Storage type Sequential read (MBps) Sequential write (MBps) IOPS (4K random read) IOPS (4K random write) Latency (p99, 4K)
Azure Premium SSD (P30 1TB) 200 200 5,000 5,000 2.1 ms
Azure Premium SSD v2 (1TB) 400 400 20,000 20,000 0.4 ms
Azure Ultra Disk (1TB) 2,000 2,000 80,000 80,000 0.15 ms
Azure Files NFS (Premium) 300 200 10,000 8,000 1.5 ms
Azure NetApp Files (Premium) 500 350 25,000 20,000 0.5 ms
Azure Blob (BlobFuse2) 350 200 5,000 3,000 5.0 ms
Ceph RBD (on-prem, NVMe) 400 350 30,000 25,000 0.3 ms
Local NVMe (LSv3) 3,200 1,600 400,000 200,000 0.05 ms

Winner: Local NVMe (ephemeral, non-persistent) for raw IOPS. Azure Ultra Disk for persistent high-IOPS. Azure Premium SSD v2 for best price/performance balance.

CSA-in-a-Box context: Spark on AKS executors benefit from local NVMe (LSv3 nodes) for shuffle data. PostgreSQL workloads perform best on Premium SSD v2 or Ultra Disk.

Volume provisioning time

Storage type Provision time (new PVC) Attach time (existing volume)
Azure Premium SSD 5--15 seconds 15--30 seconds
Azure Premium SSD v2 5--15 seconds 15--30 seconds
Azure Ultra Disk 10--30 seconds 15--30 seconds
Azure Files NFS 3--10 seconds Instant (mount)
Azure NetApp Files 10--60 seconds Instant (mount)
Ceph RBD (on-prem) 2--5 seconds 5--15 seconds

4. Autoscaling response time

Cluster autoscaler scale-up time

Scenario Scale-up time Notes
Existing VMSS capacity (warm pool) 30--45 seconds Node ready from VMSS warm pool
New VMSS instance (cold) 60--120 seconds Full VM provisioning + K8s join
NAP (Karpenter) 30--60 seconds Optimal VM selection + faster provisioning
Spot VM (when available) 45--90 seconds Slightly slower due to capacity search
GPU node (NC/ND series) 120--300 seconds GPU driver initialization adds time

KEDA scaling response

Scaler Detection time Scale-up time (total) Notes
Event Hubs (partition lag) 5--15 seconds 20--45 seconds Includes pod startup
HTTP (request rate) 10--30 seconds 25--60 seconds Prometheus metrics scrape interval
Azure Queue (message count) 5--15 seconds 20--45 seconds Azure Storage metrics
Custom Prometheus metric 15--30 seconds 30--60 seconds Depends on scrape interval

HPA scaling response

Metric source Detection time Scale-up time (total) Notes
CPU utilization 15--30 seconds 30--60 seconds Default metrics server scrape: 15s
Memory utilization 15--30 seconds 30--60 seconds Same as CPU
Custom metrics (Prometheus) 30--60 seconds 45--90 seconds Prometheus adapter polling
External metrics 30--60 seconds 45--90 seconds External metrics API polling

5. Control plane API latency

API server response time

Operation Self-managed K8s (3 masters) OpenShift (3 masters) AKS Standard AKS Premium
kubectl get pods (100 pods) 45 ms 55 ms 38 ms 32 ms
kubectl get pods -A (800 pods) 180 ms 220 ms 150 ms 120 ms
kubectl apply (single resource) 85 ms 105 ms 72 ms 58 ms
kubectl apply (100 resources) 1,200 ms 1,500 ms 980 ms 800 ms
kubectl logs (streaming) 120 ms initial 150 ms initial 95 ms initial 80 ms initial
Watch (1000 resources) 250 ms setup 300 ms setup 200 ms setup 160 ms setup
API server under load (100 concurrent) 350 ms p99 420 ms p99 280 ms p99 220 ms p99

Winner: AKS Premium for lowest API latency. AKS benefits from Microsoft's optimized API server infrastructure and auto-scaling.

API server availability

Platform Uptime target Measured (12-month average) SLA-backed
Self-managed K8s 99.9% (design target) 99.7--99.95% (varies) No
OpenShift (self-managed) 99.9% (design target) 99.8--99.95% (varies) Red Hat support SLA
AKS Free tier 99.5% (design target) 99.5--99.9% (measured) No
AKS Standard tier 99.95% (SLA) 99.95--99.99% (measured) Yes (financially backed)
AKS Premium tier 99.95% (SLA) 99.95--99.99% (measured) Yes (financially backed)

6. CSA-in-a-Box workload benchmarks on AKS

Spark on Kubernetes (Spark Operator)

Metric On-prem K8s (8 workers, NVMe) AKS (8 workers, D16s_v5) AKS (8 workers, L16s_v3 NVMe)
TPC-DS 1TB total runtime 285 seconds 310 seconds 270 seconds
Shuffle write throughput 1.2 GB/s 0.8 GB/s 1.5 GB/s
Parquet read throughput (ADLS) N/A 2.1 GB/s 2.1 GB/s
Executor startup time 8 seconds 12 seconds 12 seconds
Spot executor recovery N/A 25--45 seconds 25--45 seconds

Context: AKS with NVMe-backed nodes (LSv3) outperforms on-prem for Spark workloads due to faster local disk I/O for shuffle data. ADLS Gen2 provides high-throughput reads for source data.

Model serving (Triton on GPU node pools)

Metric On-prem K8s (V100 GPU) AKS NC6s_v3 (V100) AKS NC24ads_A100_v4 (A100)
ResNet-50 inference (batch=1) 5.2 ms 5.4 ms 2.1 ms
ResNet-50 throughput 1,200 req/s 1,150 req/s 3,400 req/s
LLM serving (7B params, vLLM) 42 tokens/s 40 tokens/s 120 tokens/s
GPU utilization (sustained) 85% 82% 88%
Model load time (2 GB model) 8 seconds 12 seconds 10 seconds

Context: GPU performance is equivalent between on-prem and AKS for the same GPU generation. The A100 (NC24ads_A100_v4) provides 3x throughput over V100 for LLM workloads.


7. Benchmark methodology

How to reproduce these benchmarks

# 1. Scheduling latency
kubectl apply -f - << 'EOF'
apiVersion: batch/v1
kind: Job
metadata:
  name: scheduling-bench
spec:
  parallelism: 100
  completions: 100
  template:
    spec:
      containers:
        - name: bench
          image: busybox:latest
          command: ["sh", "-c", "echo scheduled; sleep 1"]
      restartPolicy: Never
EOF
# Measure time from creation to Running for each pod

# 2. Network throughput
# Deploy iperf3 server and client pods on different nodes
kubectl run iperf-server --image=networkstatic/iperf3 -- -s
kubectl run iperf-client --image=networkstatic/iperf3 -- -c iperf-server -t 60 -P 8

# 3. Storage IOPS
# Deploy fio pod with target PVC
kubectl apply -f fio-benchmark.yaml
# See storage-migration.md for fio configuration

# 4. API latency
# Use k6 or hey to benchmark API server
k6 run api-bench.js

Maintainers: CSA-in-a-Box core team Last updated: 2026-04-30 Related: TCO Analysis | Cluster Migration | Best Practices