Cluster Configuration Migration: On-Premises to AKS¶
Status: Authored 2026-04-30 Audience: Platform engineers and infrastructure architects migrating cluster-level configuration from self-managed Kubernetes or OpenShift to AKS. Scope: Node pools, VM sizing, availability zones, CNI selection, kubelet configuration, cluster autoscaler, node auto-provisioning, and maintenance windows.
1. Cluster design decisions¶
Before creating your first AKS cluster, make these design decisions. Each maps to a configuration that is difficult or impossible to change after cluster creation.
Cluster identity¶
| Decision | Options | Recommendation |
|---|---|---|
| Cluster identity type | System-assigned managed identity, User-assigned managed identity | User-assigned managed identity for production (portable, pre-configurable RBAC) |
| Entra ID integration | Enabled, Disabled | Always enable for production. Maps Entra ID groups to K8s RBAC |
| Azure RBAC for K8s | Enabled (Azure RBAC), Disabled (K8s-native RBAC) | Azure RBAC for centralized management; K8s-native RBAC if migrating existing RBAC policies |
| Local accounts | Enabled, Disabled | Disable local accounts for production (force Entra ID auth) |
Networking (immutable after creation)¶
| Decision | Options | Recommendation |
|---|---|---|
| Network plugin | Azure CNI Overlay, Azure CNI (VNet), Azure CNI + Cilium, kubenet | Azure CNI Overlay for most workloads; Azure CNI + Cilium for advanced network policy and observability |
| Network policy | Azure (Azure NPM), Calico, Cilium | Cilium (if using Azure CNI + Cilium); Calico for existing Calico policy migration |
| Pod CIDR | Custom (default: 10.244.0.0/16) | Size for growth. /16 provides 65K pod IPs per cluster. Overlay mode does not consume VNet IPs |
| Service CIDR | Custom (default: 10.0.0.0/16) | Non-overlapping with VNet and pod CIDR |
| DNS service IP | Within service CIDR | Default is typically fine |
| Private cluster | Enabled (no public API endpoint), Disabled | Enable for production / federal workloads. API server accessible only via Private Link |
| Outbound type | Load balancer, User-defined routing, NAT Gateway, None | User-defined routing with Azure Firewall for federal (egress control) |
SKU and SLA¶
| Decision | Options | Recommendation |
|---|---|---|
| AKS SKU | Free, Standard, Premium | Standard for production (99.95% SLA). Premium for LTS + advanced features |
| Uptime SLA | Included in Standard/Premium | Standard tier includes financially-backed SLA |
| AKS Automatic | Enabled, Disabled | Consider for new clusters where opinionated defaults are acceptable |
2. Node pool configuration¶
System node pool¶
Every AKS cluster requires at least one system node pool running critical system pods (CoreDNS, metrics-server, kube-proxy, Azure CNI, CSI drivers).
# Create AKS cluster with system node pool
az aks create \
--resource-group rg-aks-prod \
--name aks-prod-eastus2 \
--location eastus2 \
--kubernetes-version 1.30 \
--network-plugin azure \
--network-plugin-mode overlay \
--network-dataplane cilium \
--enable-managed-identity \
--assign-identity /subscriptions/.../resourceGroups/.../providers/Microsoft.ManagedIdentity/userAssignedIdentities/umi-aks-prod \
--enable-aad \
--enable-azure-rbac \
--disable-local-accounts \
--enable-private-cluster \
--private-dns-zone system \
--outbound-type userDefinedRouting \
--node-count 3 \
--node-vm-size Standard_D4s_v5 \
--nodepool-name system \
--nodepool-labels nodepool=system \
--nodepool-taints CriticalAddonsOnly=true:NoSchedule \
--zones 1 2 3 \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 5 \
--tier standard \
--enable-defender \
--enable-workload-identity \
--enable-oidc-issuer \
--attach-acr /subscriptions/.../resourceGroups/.../providers/Microsoft.ContainerRegistry/registries/csainaboxacr \
--tags environment=production team=platform
User node pools¶
Create separate node pools for different workload types. This replaces the "one big cluster" pattern common in self-managed Kubernetes with targeted node pools.
# General-purpose workload pool
az aks nodepool add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-eastus2 \
--name workload \
--node-vm-size Standard_D8s_v5 \
--node-count 5 \
--zones 1 2 3 \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 20 \
--labels workload-type=general \
--max-pods 110 \
--mode User
# Memory-optimized pool (for data-intensive workloads)
az aks nodepool add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-eastus2 \
--name highmem \
--node-vm-size Standard_E16s_v5 \
--node-count 2 \
--zones 1 2 3 \
--enable-cluster-autoscaler \
--min-count 2 \
--max-count 10 \
--labels workload-type=memory-intensive \
--node-taints workload=memory-intensive:NoSchedule \
--mode User
# GPU pool (for ML inference / model serving)
az aks nodepool add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-eastus2 \
--name gpu \
--node-vm-size Standard_NC24ads_A100_v4 \
--node-count 0 \
--zones 1 \
--enable-cluster-autoscaler \
--min-count 0 \
--max-count 4 \
--labels workload-type=gpu accelerator=nvidia-a100 \
--node-taints nvidia.com/gpu=present:NoSchedule \
--mode User
# Spot pool (for batch / fault-tolerant workloads)
az aks nodepool add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-eastus2 \
--name spot \
--node-vm-size Standard_D8s_v5 \
--node-count 0 \
--enable-cluster-autoscaler \
--min-count 0 \
--max-count 30 \
--priority Spot \
--eviction-policy Delete \
--spot-max-price -1 \
--labels workload-type=batch kubernetes.azure.com/scalesetpriority=spot \
--node-taints kubernetes.azure.com/scalesetpriority=spot:NoSchedule \
--mode User
# FIPS-enabled pool (for federal compliance)
az aks nodepool add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-eastus2 \
--name fips \
--node-vm-size Standard_D8s_v5 \
--node-count 3 \
--zones 1 2 3 \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 15 \
--enable-fips-image \
--labels workload-type=fips-required \
--mode User
VM size mapping: on-prem to Azure¶
| On-prem server profile | Azure VM size | vCPU | Memory | Notes |
|---|---|---|---|---|
| General worker (4C/16GB) | Standard_D4s_v5 | 4 | 16 GB | System pool, light workloads |
| General worker (8C/32GB) | Standard_D8s_v5 | 8 | 32 GB | Most application workloads |
| General worker (16C/64GB) | Standard_D16s_v5 | 16 | 64 GB | Higher-density workloads |
| Memory-optimized (8C/64GB) | Standard_E8s_v5 | 8 | 64 GB | Caching, in-memory processing |
| Memory-optimized (16C/128GB) | Standard_E16s_v5 | 16 | 128 GB | Spark executors, large caches |
| Compute-optimized (8C/16GB) | Standard_F8s_v2 | 8 | 16 GB | CPU-intensive batch jobs |
| Storage-optimized (local NVMe) | Standard_L8s_v3 | 8 | 64 GB | Local SSD for databases, etcd |
| GPU (single GPU) | Standard_NC6s_v3 | 6 | 112 GB | ML inference (V100) |
| GPU (A100) | Standard_NC24ads_A100_v4 | 24 | 220 GB | ML training and inference |
| GPU (H100) | Standard_ND96isr_H100_v5 | 96 | 1900 GB | Large-scale ML training |
3. Availability zones¶
AKS supports spreading node pools across Azure availability zones for high availability. This replaces the rack-aware scheduling and failure-domain configuration in self-managed clusters.
Zone-redundant deployment¶
# Node pool spread across all 3 zones
az aks nodepool add \
--name workload \
--zones 1 2 3 \
--node-count 6 # 2 nodes per zone
Zone topology constraints¶
Use pod topology spread constraints to ensure even pod distribution across zones:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 6
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-server
Zone-aware storage¶
Azure Managed Disks are zonal resources. For StatefulSets using Azure Disk, the pod and disk must be in the same zone. AKS handles this automatically with the volumeBindingMode: WaitForFirstConsumer storage class.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-premium-zrs
provisioner: disk.csi.azure.com
parameters:
skuName: Premium_ZRS # Zone-redundant storage
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
4. CNI selection guide¶
Azure CNI Overlay (recommended for most)¶
- Pods get IPs from a private CIDR (not VNet IPs)
- Scales to thousands of pods without VNet exhaustion
- Lower IP planning overhead
- Compatible with Calico and Cilium network policies
Azure CNI (VNet)¶
- Every pod gets a VNet IP address
- Pods are directly reachable from VNet-peered networks
- Higher IP planning overhead (need large subnets)
- Best when: pods must be directly addressable from VNet or on-prem
Azure CNI powered by Cilium¶
- eBPF-based dataplane (replaces iptables/ipvs)
- Advanced network policy (L7 policy, DNS policy, FQDN policy)
- Network observability (Hubble)
- Better performance than iptables-based CNI
- Best when: advanced network policy, observability, or high-performance networking required
kubenet (legacy, not recommended)¶
- Basic overlay networking
- Limited to 400 nodes per cluster
- No network policy support without Calico addon
- Only use when: migrating from kubenet-based clusters and not ready to change CNI
5. Kubelet configuration¶
AKS supports custom kubelet configuration via JSON. This replaces the kubelet flags and configuration files in self-managed clusters.
# Create kubelet config file
cat > kubelet-config.json << 'EOF'
{
"cpuManagerPolicy": "static",
"cpuCfsQuota": true,
"cpuCfsQuotaPeriod": "100ms",
"topologyManagerPolicy": "best-effort",
"allowedUnsafeSysctls": [
"net.core.somaxconn",
"net.ipv4.tcp_keepalive_time"
],
"containerLogMaxSizeMB": 100,
"containerLogMaxFiles": 5,
"podMaxPids": 4096,
"imageGcHighThreshold": 85,
"imageGcLowThreshold": 80
}
EOF
# Apply to node pool
az aks nodepool add \
--name highperf \
--kubelet-config kubelet-config.json \
--node-vm-size Standard_D16s_v5 \
--node-count 3
Common kubelet configurations for data workloads¶
| Setting | Value | Use case |
|---|---|---|
cpuManagerPolicy: static | Guaranteed QoS pods get dedicated CPUs | Spark executors, database pods |
topologyManagerPolicy: best-effort | NUMA-aware scheduling | GPU workloads, high-performance computing |
podMaxPids: 4096 | Higher PID limit per pod | Java applications, Spark (many threads) |
containerLogMaxSizeMB: 100 | Larger container log files | Debug scenarios, verbose logging |
6. Cluster autoscaler configuration¶
Basic autoscaler¶
# Enable autoscaler on a node pool
az aks nodepool update \
--resource-group rg-aks-prod \
--cluster-name aks-prod-eastus2 \
--name workload \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 20
Autoscaler profile (cluster-wide settings)¶
az aks update \
--resource-group rg-aks-prod \
--name aks-prod-eastus2 \
--cluster-autoscaler-profile \
scan-interval=10s \
scale-down-delay-after-add=10m \
scale-down-delay-after-delete=10s \
scale-down-unneeded-time=10m \
scale-down-utilization-threshold=0.5 \
max-graceful-termination-sec=600 \
balance-similar-node-groups=true \
expander=least-waste \
skip-nodes-with-local-storage=false \
skip-nodes-with-system-pods=true \
max-node-provision-time=15m \
max-total-unready-percentage=45 \
ok-total-unready-count=3 \
new-pod-scale-up-delay=0s
Autoscaler profile mapping from self-managed¶
| Self-managed flag | AKS profile parameter | Notes |
|---|---|---|
--scan-interval | scan-interval | How often autoscaler checks for pending pods |
--scale-down-delay-after-add | scale-down-delay-after-add | Wait time before scale-down after scale-up |
--scale-down-utilization-threshold | scale-down-utilization-threshold | Node utilization below which node is candidate for removal |
--expander | expander | Options: random, most-pods, least-waste, priority |
--max-node-provision-time | max-node-provision-time | Timeout for new node to become ready |
7. Node auto-provisioning (NAP)¶
NAP, built on Karpenter, automatically selects the optimal VM size for pending pods based on their resource requests, node selectors, and tolerations. This replaces the manual VM size selection in self-managed clusters.
# Enable NAP on the cluster
az aks update \
--resource-group rg-aks-prod \
--name aks-prod-eastus2 \
--enable-node-auto-provisioning \
--nap-managed-network-plugin azure \
--nap-managed-network-dataplane cilium
NAP automatically:
- Selects the cheapest VM size that satisfies pod requirements
- Uses Spot VMs when pods tolerate the
kubernetes.azure.com/scalesetpriority=spottaint - Consolidates underutilized nodes by rescheduling pods and removing nodes
- Respects pod topology spread constraints and anti-affinity rules
8. Maintenance windows¶
AKS maintenance windows replace the manual upgrade scheduling in self-managed clusters.
# Configure planned maintenance window for upgrades
az aks maintenancewindow add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-eastus2 \
--name default \
--schedule-type Weekly \
--day-of-week Saturday \
--start-hour 2 \
--duration 4 \
--utc-offset -05:00
# Configure maintenance window for node OS updates
az aks maintenancewindow add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-eastus2 \
--name aksManagedNodeOSUpgradeSchedule \
--schedule-type Weekly \
--day-of-week Sunday \
--start-hour 2 \
--duration 4 \
--utc-offset -05:00
9. Bicep deployment template¶
For teams using infrastructure as code (recommended for CSA-in-a-Box deployments), here is a representative Bicep template:
@description('AKS cluster configuration for CSA-in-a-Box data platform')
param clusterName string = 'aks-csa-prod'
param location string = resourceGroup().location
param kubernetesVersion string = '1.30'
param systemNodeCount int = 3
param workloadNodeCount int = 5
resource aksCluster 'Microsoft.ContainerService/managedClusters@2024-06-02-preview' = {
name: clusterName
location: location
identity: {
type: 'UserAssigned'
userAssignedIdentities: {
'${managedIdentity.id}': {}
}
}
sku: {
name: 'Base'
tier: 'Standard'
}
properties: {
kubernetesVersion: kubernetesVersion
dnsPrefix: clusterName
enableRBAC: true
aadProfile: {
managed: true
enableAzureRBAC: true
tenantID: subscription().tenantId
}
disableLocalAccounts: true
networkProfile: {
networkPlugin: 'azure'
networkPluginMode: 'overlay'
networkDataplane: 'cilium'
networkPolicy: 'cilium'
podCidr: '10.244.0.0/16'
serviceCidr: '10.0.0.0/16'
dnsServiceIP: '10.0.0.10'
outboundType: 'userDefinedRouting'
loadBalancerSku: 'standard'
}
apiServerAccessProfile: {
enablePrivateCluster: true
privateDNSZone: 'system'
}
autoUpgradeProfile: {
upgradeChannel: 'patch'
nodeOSUpgradeChannel: 'NodeImage'
}
securityProfile: {
defender: {
securityMonitoring: {
enabled: true
}
}
workloadIdentity: {
enabled: true
}
}
oidcIssuerProfile: {
enabled: true
}
agentPoolProfiles: [
{
name: 'system'
count: systemNodeCount
vmSize: 'Standard_D4s_v5'
osDiskSizeGB: 128
osDiskType: 'Managed'
osType: 'Linux'
mode: 'System'
availabilityZones: ['1', '2', '3']
enableAutoScaling: true
minCount: 3
maxCount: 5
nodeTaints: ['CriticalAddonsOnly=true:NoSchedule']
nodeLabels: { nodepool: 'system' }
maxPods: 110
}
{
name: 'workload'
count: workloadNodeCount
vmSize: 'Standard_D8s_v5'
osDiskSizeGB: 256
osDiskType: 'Managed'
osType: 'Linux'
mode: 'User'
availabilityZones: ['1', '2', '3']
enableAutoScaling: true
minCount: 3
maxCount: 20
nodeLabels: { 'workload-type': 'general' }
maxPods: 110
}
]
}
}
10. Post-creation cluster configuration¶
After creating the AKS cluster, apply these configurations:
# Get cluster credentials
az aks get-credentials --resource-group rg-aks-prod --name aks-prod-eastus2
# Install NGINX Ingress Controller
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx --create-namespace \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-internal"="true" \
--set controller.nodeSelector."kubernetes\.io/os"=linux
# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager --create-namespace \
--set installCRDs=true \
--set nodeSelector."kubernetes\.io/os"=linux
# Enable Azure Key Vault Secrets Provider
az aks enable-addons \
--resource-group rg-aks-prod \
--name aks-prod-eastus2 \
--addons azure-keyvault-secrets-provider
# Enable Azure Monitor Container Insights
az aks enable-addons \
--resource-group rg-aks-prod \
--name aks-prod-eastus2 \
--addons monitoring \
--workspace-resource-id /subscriptions/.../resourceGroups/.../providers/Microsoft.OperationalInsights/workspaces/law-csa-prod
# Enable Flux GitOps
az k8s-configuration flux create \
--resource-group rg-aks-prod \
--cluster-name aks-prod-eastus2 \
--cluster-type managedClusters \
--name cluster-config \
--url https://github.com/org/aks-cluster-config \
--branch main \
--kustomization name=infra path=./infrastructure prune=true \
--kustomization name=apps path=./applications prune=true dependsOn=infra
Maintainers: CSA-in-a-Box core team Last updated: 2026-04-30 Related: Workload Migration | Networking Migration | Feature Mapping