AKS Performance and Capability Benchmarks¶
Status: Authored 2026-04-30 Audience: Federal CTOs, platform architects, and SREs evaluating AKS performance against self-managed Kubernetes and OpenShift. Methodology: Benchmarks use published vendor data, community benchmarks, and representative workload patterns. All numbers are illustrative and should be validated against your specific workload. Test configurations noted per section.
How to read this document¶
Every benchmark section includes:
- What is measured -- the specific metric
- Baseline (self-managed K8s or OpenShift) -- on-premises performance
- AKS result -- Azure Kubernetes Service performance
- Winner and context -- which platform leads and why
Numbers represent typical mid-range federal deployments (50 nodes, Standard_D8s_v5 VMs) unless otherwise noted.
1. Pod scheduling latency¶
How quickly a pending pod gets scheduled to a node and starts running.
Scheduling latency: pending to running¶
| Metric | Self-managed K8s (50 nodes) | OpenShift 4.x (50 nodes) | AKS Standard (50 nodes) |
|---|---|---|---|
| Median scheduling latency | 1.2 seconds | 1.5 seconds | 1.0 seconds |
| p95 scheduling latency | 3.5 seconds | 4.2 seconds | 2.8 seconds |
| p99 scheduling latency | 8.1 seconds | 9.5 seconds | 6.2 seconds |
| Scheduling latency with PV binding | 4.5 seconds | 5.8 seconds | 3.8 seconds |
| Cold start (new node via autoscaler) | N/A (pre-provisioned) | N/A (pre-provisioned) | 45--90 seconds |
| Cold start (NAP/Karpenter) | N/A | N/A | 30--60 seconds |
Winner: AKS for scheduling latency on existing nodes. Self-managed wins when nodes are pre-provisioned (no autoscaler cold start). AKS NAP reduces cold start time by selecting optimal VM sizes automatically.
Pod startup time by image size¶
| Image size | Pull time (cold cache) | Pull time (warm cache) | Notes |
|---|---|---|---|
| 50 MB (Alpine-based) | 2.1 seconds | 0.3 seconds | Recommended for production |
| 200 MB (Python/Node slim) | 5.4 seconds | 0.8 seconds | Common for API services |
| 500 MB (Java/Spark) | 12.8 seconds | 1.5 seconds | Use ACR proximity for AKS |
| 1 GB (ML model container) | 28.3 seconds | 3.2 seconds | Consider artifact streaming |
| 2 GB+ (GPU/CUDA runtime) | 55+ seconds | 5.8 seconds | Use AKS artifact streaming |
AKS advantage: ACR artifact streaming (preview) allows pods to start before the full image is pulled, reducing cold-start time for large images by 50--70%.
2. Network throughput: CNI comparison¶
Pod-to-pod throughput (iperf3, single stream)¶
| CNI | TCP throughput (Gbps) | UDP throughput (Gbps) | Latency (microseconds) | Notes |
|---|---|---|---|---|
| Azure CNI Overlay | 9.2 | 8.8 | 85 | Recommended for most workloads |
| Azure CNI (VNet) | 9.5 | 9.1 | 72 | Direct VNet routing, lowest latency |
| Azure CNI + Cilium | 9.4 | 9.0 | 78 | eBPF dataplane, near-wire speed |
| Calico (on-prem, VXLAN) | 8.1 | 7.5 | 120 | VXLAN encapsulation overhead |
| Calico (on-prem, BGP) | 9.0 | 8.6 | 90 | Native routing, better performance |
| Flannel (on-prem) | 7.8 | 7.2 | 130 | VXLAN overlay |
| OVN-Kubernetes (OCP) | 8.5 | 8.0 | 105 | OpenShift default CNI |
Winner: Azure CNI (VNet) for raw throughput and latency. Azure CNI + Cilium for best balance of performance and features (network policy, observability).
Network policy performance impact¶
| CNI + Policy engine | Throughput with 100 policies | Latency impact | CPU overhead per node |
|---|---|---|---|
| Azure CNI + Cilium | 9.1 Gbps (-3%) | +5 us | 2% CPU |
| Azure CNI + Calico | 8.8 Gbps (-6%) | +12 us | 4% CPU |
| Azure NPM | 8.5 Gbps (-9%) | +18 us | 6% CPU |
| Calico on-prem (iptables) | 7.2 Gbps (-11%) | +25 us | 8% CPU |
Winner: Cilium (eBPF-based) has the lowest policy enforcement overhead because eBPF programs are compiled and run in kernel space, avoiding iptables chain traversal.
Service mesh overhead (Istio sidecar)¶
| Metric | Without Istio | With Istio (sidecar) | Overhead |
|---|---|---|---|
| Latency (p50) | 1.2 ms | 2.8 ms | +1.6 ms |
| Latency (p99) | 8.5 ms | 14.2 ms | +5.7 ms |
| Throughput | 45K req/s | 32K req/s | -29% |
| Memory per pod | 128 MB (app) | 128 MB + 72 MB (sidecar) | +56% |
| CPU per pod | 0.5 CPU (app) | 0.5 + 0.15 CPU (sidecar) | +30% |
3. Storage IOPS by CSI driver¶
Sequential read/write throughput (fio, 128K block size)¶
| Storage type | Sequential read (MBps) | Sequential write (MBps) | IOPS (4K random read) | IOPS (4K random write) | Latency (p99, 4K) |
|---|---|---|---|---|---|
| Azure Premium SSD (P30 1TB) | 200 | 200 | 5,000 | 5,000 | 2.1 ms |
| Azure Premium SSD v2 (1TB) | 400 | 400 | 20,000 | 20,000 | 0.4 ms |
| Azure Ultra Disk (1TB) | 2,000 | 2,000 | 80,000 | 80,000 | 0.15 ms |
| Azure Files NFS (Premium) | 300 | 200 | 10,000 | 8,000 | 1.5 ms |
| Azure NetApp Files (Premium) | 500 | 350 | 25,000 | 20,000 | 0.5 ms |
| Azure Blob (BlobFuse2) | 350 | 200 | 5,000 | 3,000 | 5.0 ms |
| Ceph RBD (on-prem, NVMe) | 400 | 350 | 30,000 | 25,000 | 0.3 ms |
| Local NVMe (LSv3) | 3,200 | 1,600 | 400,000 | 200,000 | 0.05 ms |
Winner: Local NVMe (ephemeral, non-persistent) for raw IOPS. Azure Ultra Disk for persistent high-IOPS. Azure Premium SSD v2 for best price/performance balance.
CSA-in-a-Box context: Spark on AKS executors benefit from local NVMe (LSv3 nodes) for shuffle data. PostgreSQL workloads perform best on Premium SSD v2 or Ultra Disk.
Volume provisioning time¶
| Storage type | Provision time (new PVC) | Attach time (existing volume) |
|---|---|---|
| Azure Premium SSD | 5--15 seconds | 15--30 seconds |
| Azure Premium SSD v2 | 5--15 seconds | 15--30 seconds |
| Azure Ultra Disk | 10--30 seconds | 15--30 seconds |
| Azure Files NFS | 3--10 seconds | Instant (mount) |
| Azure NetApp Files | 10--60 seconds | Instant (mount) |
| Ceph RBD (on-prem) | 2--5 seconds | 5--15 seconds |
4. Autoscaling response time¶
Cluster autoscaler scale-up time¶
| Scenario | Scale-up time | Notes |
|---|---|---|
| Existing VMSS capacity (warm pool) | 30--45 seconds | Node ready from VMSS warm pool |
| New VMSS instance (cold) | 60--120 seconds | Full VM provisioning + K8s join |
| NAP (Karpenter) | 30--60 seconds | Optimal VM selection + faster provisioning |
| Spot VM (when available) | 45--90 seconds | Slightly slower due to capacity search |
| GPU node (NC/ND series) | 120--300 seconds | GPU driver initialization adds time |
KEDA scaling response¶
| Scaler | Detection time | Scale-up time (total) | Notes |
|---|---|---|---|
| Event Hubs (partition lag) | 5--15 seconds | 20--45 seconds | Includes pod startup |
| HTTP (request rate) | 10--30 seconds | 25--60 seconds | Prometheus metrics scrape interval |
| Azure Queue (message count) | 5--15 seconds | 20--45 seconds | Azure Storage metrics |
| Custom Prometheus metric | 15--30 seconds | 30--60 seconds | Depends on scrape interval |
HPA scaling response¶
| Metric source | Detection time | Scale-up time (total) | Notes |
|---|---|---|---|
| CPU utilization | 15--30 seconds | 30--60 seconds | Default metrics server scrape: 15s |
| Memory utilization | 15--30 seconds | 30--60 seconds | Same as CPU |
| Custom metrics (Prometheus) | 30--60 seconds | 45--90 seconds | Prometheus adapter polling |
| External metrics | 30--60 seconds | 45--90 seconds | External metrics API polling |
5. Control plane API latency¶
API server response time¶
| Operation | Self-managed K8s (3 masters) | OpenShift (3 masters) | AKS Standard | AKS Premium |
|---|---|---|---|---|
kubectl get pods (100 pods) | 45 ms | 55 ms | 38 ms | 32 ms |
kubectl get pods -A (800 pods) | 180 ms | 220 ms | 150 ms | 120 ms |
kubectl apply (single resource) | 85 ms | 105 ms | 72 ms | 58 ms |
kubectl apply (100 resources) | 1,200 ms | 1,500 ms | 980 ms | 800 ms |
kubectl logs (streaming) | 120 ms initial | 150 ms initial | 95 ms initial | 80 ms initial |
| Watch (1000 resources) | 250 ms setup | 300 ms setup | 200 ms setup | 160 ms setup |
| API server under load (100 concurrent) | 350 ms p99 | 420 ms p99 | 280 ms p99 | 220 ms p99 |
Winner: AKS Premium for lowest API latency. AKS benefits from Microsoft's optimized API server infrastructure and auto-scaling.
API server availability¶
| Platform | Uptime target | Measured (12-month average) | SLA-backed |
|---|---|---|---|
| Self-managed K8s | 99.9% (design target) | 99.7--99.95% (varies) | No |
| OpenShift (self-managed) | 99.9% (design target) | 99.8--99.95% (varies) | Red Hat support SLA |
| AKS Free tier | 99.5% (design target) | 99.5--99.9% (measured) | No |
| AKS Standard tier | 99.95% (SLA) | 99.95--99.99% (measured) | Yes (financially backed) |
| AKS Premium tier | 99.95% (SLA) | 99.95--99.99% (measured) | Yes (financially backed) |
6. CSA-in-a-Box workload benchmarks on AKS¶
Spark on Kubernetes (Spark Operator)¶
| Metric | On-prem K8s (8 workers, NVMe) | AKS (8 workers, D16s_v5) | AKS (8 workers, L16s_v3 NVMe) |
|---|---|---|---|
| TPC-DS 1TB total runtime | 285 seconds | 310 seconds | 270 seconds |
| Shuffle write throughput | 1.2 GB/s | 0.8 GB/s | 1.5 GB/s |
| Parquet read throughput (ADLS) | N/A | 2.1 GB/s | 2.1 GB/s |
| Executor startup time | 8 seconds | 12 seconds | 12 seconds |
| Spot executor recovery | N/A | 25--45 seconds | 25--45 seconds |
Context: AKS with NVMe-backed nodes (LSv3) outperforms on-prem for Spark workloads due to faster local disk I/O for shuffle data. ADLS Gen2 provides high-throughput reads for source data.
Model serving (Triton on GPU node pools)¶
| Metric | On-prem K8s (V100 GPU) | AKS NC6s_v3 (V100) | AKS NC24ads_A100_v4 (A100) |
|---|---|---|---|
| ResNet-50 inference (batch=1) | 5.2 ms | 5.4 ms | 2.1 ms |
| ResNet-50 throughput | 1,200 req/s | 1,150 req/s | 3,400 req/s |
| LLM serving (7B params, vLLM) | 42 tokens/s | 40 tokens/s | 120 tokens/s |
| GPU utilization (sustained) | 85% | 82% | 88% |
| Model load time (2 GB model) | 8 seconds | 12 seconds | 10 seconds |
Context: GPU performance is equivalent between on-prem and AKS for the same GPU generation. The A100 (NC24ads_A100_v4) provides 3x throughput over V100 for LLM workloads.
7. Benchmark methodology¶
How to reproduce these benchmarks¶
# 1. Scheduling latency
kubectl apply -f - << 'EOF'
apiVersion: batch/v1
kind: Job
metadata:
name: scheduling-bench
spec:
parallelism: 100
completions: 100
template:
spec:
containers:
- name: bench
image: busybox:latest
command: ["sh", "-c", "echo scheduled; sleep 1"]
restartPolicy: Never
EOF
# Measure time from creation to Running for each pod
# 2. Network throughput
# Deploy iperf3 server and client pods on different nodes
kubectl run iperf-server --image=networkstatic/iperf3 -- -s
kubectl run iperf-client --image=networkstatic/iperf3 -- -c iperf-server -t 60 -P 8
# 3. Storage IOPS
# Deploy fio pod with target PVC
kubectl apply -f fio-benchmark.yaml
# See storage-migration.md for fio configuration
# 4. API latency
# Use k6 or hey to benchmark API server
k6 run api-bench.js
Maintainers: CSA-in-a-Box core team Last updated: 2026-04-30 Related: TCO Analysis | Cluster Migration | Best Practices