Pattern — AKS & Container Apps for Data Workloads¶

TL;DR: Container Apps for stateless / event-driven / KEDA-scaled data workloads (the new default). AKS when you need: Argo / Flyte / dbt-Airflow / Spark on K8s / GPU pools / pod-level networking control / multi-tenant isolation. Avoid AKS for "we just need to run a container" — Container Apps is simpler.

Problem¶

Modern data platforms have workloads that don't fit neatly into Synapse / Databricks / Functions:

Stream consumers that scale on queue depth (KEDA)
Bioinformatics pipelines (Nextflow, WDL, Snakemake)
Spark on K8s for cost optimization vs Databricks
Argo Workflows for DAG orchestration
dbt + Airflow / Dagster orchestration
Custom ML training that doesn't fit Azure ML
Long-running stateful agents (multi-step LLM workflows)

You have three Azure container platforms: Container Instances (simplest), Container Apps (managed serverless), AKS (full Kubernetes). Choose right.

Decision tree¶

flowchart TD
    Start[Need to run a container] --> Q1{Stateless +<br/>HTTP / queue triggered?}
    Q1 -->|Yes| Q2{Need Azure-native<br/>scale on events?}
    Q2 -->|Yes| ACA1[Container Apps<br/>+ KEDA scaler]
    Q2 -->|HTTP only| ACA2[Container Apps<br/>+ HTTP scaler]

    Q1 -->|No, complex orchestration| Q3{Need Argo / Flyte<br/>/ Airflow / Spark on K8s?}
    Q3 -->|Yes| AKS1[AKS<br/>+ chosen orchestrator]
    Q3 -->|No| Q4{GPU workload?}
    Q4 -->|Yes, training| AML[Azure ML compute<br/>not container platform]
    Q4 -->|Yes, inference| ACA3[Container Apps with GPU<br/>preview, or AKS GPU node pool]
    Q4 -->|No| Q5{One-shot job?}
    Q5 -->|Yes| ACI[Container Instances<br/>or Container Apps Job]
    Q5 -->|No| ACA4[Container Apps<br/>default]

    style ACA1 fill:#0078d4,color:#fff
    style ACA2 fill:#0078d4,color:#fff
    style ACA3 fill:#0078d4,color:#fff
    style ACA4 fill:#0078d4,color:#fff
    style AKS1 fill:#ff6b35,color:#fff
    style AML fill:#13a3b5,color:#fff
    style ACI fill:#888,color:#fff

Pattern 1: KEDA-driven stream consumer (Container Apps)¶

For Event Hubs / Service Bus / Cosmos change feed consumers that should scale to zero when idle:

resource consumer 'Microsoft.App/containerApps@2024-03-01' = {
  name: 'eh-consumer'
  properties: {
    configuration: {
      ingress: null  // No HTTP ingress, internal only
      secrets: [
        {
          name: 'eh-conn'
          keyVaultUrl: 'https://kv.../secrets/eh-conn'
          identity: 'system'
        }
      ]
    }
    template: {
      containers: [
        {
          name: 'consumer'
          image: 'mcr.microsoft.com/yourorg/eh-consumer:1.0'
          env: [
            { name: 'EH_CONNECTION', secretRef: 'eh-conn' }
            { name: 'EH_NAME', value: 'orders' }
          ]
          resources: { cpu: 0.5, memory: '1Gi' }
        }
      ]
      scale: {
        minReplicas: 0
        maxReplicas: 30
        rules: [
          {
            name: 'eh-scaler'
            custom: {
              type: 'azure-eventhub'
              metadata: {
                eventHubName: 'orders'
                consumerGroup: 'consumer-group'
                unprocessedEventThreshold: '1000'
              }
              auth: [{ secretRef: 'eh-conn', triggerParameter: 'connection' }]
            }
          }
        ]
      }
    }
  }
}

Container Apps + KEDA = scale to zero when no events, scale to N when queue grows.

Pattern 2: AKS for Argo Workflows (DAG orchestration)¶

When ADF / Fabric Data Pipelines / dbt-airflow aren't a fit (bioinformatics, complex retry/branching, file-based DAGs):

# aks-cluster.bicep summary
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
    name: variant-calling-
spec:
    entrypoint: variant-pipeline
    templates:
        - name: variant-pipeline
          steps:
              - - name: align
                  template: bwa
              - - name: call-variants
                  template: gatk
              - - name: annotate
                  template: vep

        - name: bwa
          container:
              image: biocontainers/bwa:latest
              command: [bwa, mem, ...]
              resources:
                  requests: { memory: 16Gi, cpu: "8" }

AKS + Argo gives you K8s-native DAG orchestration with bioinformatics container ecosystem.

Pattern 3: AKS for Spark on K8s¶

Cost-optimization alternative to Databricks for Spark workloads:

Spot node pools (60-90% discount)
Per-second billing, no DBU markup
Full control over Spark config
Kubernetes-native scheduling

Trade-offs:

You operate it (vs Databricks managing for you)
No Photon (Databricks-proprietary)
No managed Unity Catalog
More YAML, more breakage

Use only when: cost is the dominant driver, Spark expertise is in-house, and you have K8s ops capacity.

Pattern 4: Container Apps Jobs for batch¶

For one-shot batch jobs (e.g., nightly aggregation, data quality scan):

resource job 'Microsoft.App/jobs@2024-03-01' = {
  name: 'nightly-dq-scan'
  properties: {
    configuration: {
      triggerType: 'Schedule'
      scheduleTriggerConfig: {
        cronExpression: '0 2 * * *'  // 2am daily
        parallelism: 1
      }
    }
    template: {
      containers: [{
        image: 'mcr.../dq-scanner:1.0'
        resources: { cpu: 1, memory: '2Gi' }
      }]
    }
  }
}

Cheaper than running a Container App 24/7. Better than Functions when execution time can exceed Functions limits.

When NOT to use containers¶

Workload	Better choice
Spark jobs at scale	Databricks (managed) or Synapse Spark
Functions / event handlers <10min	Azure Functions
ML training	Azure ML compute clusters
Stored procs / SQL transforms	Synapse / Fabric / Databricks SQL — not in containers
Long-running stateful (>24h)	AKS only; stateful sets aren't a Container Apps fit

Cost comparison (rough, 2026)¶

Platform	$/cpu-hour	Notes
Container Instances	~$0.045	Simplest, no orchestration
Container Apps (Consumption)	~$0.04 + per-request	Free idle
Container Apps (Dedicated)	~$0.05	Fixed capacity, no scale-to-zero penalty
AKS (B2ms node)	~$0.025 (with bin-packing)	You manage cluster ops
AKS (Spot D4s_v5)	~$0.01 (varies)	Eviction risk; great for batch
AKS (GPU NC6s_v3)	~$3.06	Always rounded to full GPU

AKS is cheaper per CPU when you bin-pack well; Container Apps wins on operational simplicity and scale-to-zero.

Common pitfalls¶

Pitfall	Mitigation
Choosing AKS "for flexibility" without ops capacity	Container Apps unless you have a real reason
Container Apps for long-stateful workloads	Use AKS StatefulSets
Spark on K8s without Photon expectations met	If perf matters, Databricks is faster despite cost
GPU on AKS without GPU node pool taint/toleration	Pods schedule on CPU nodes, training silently CPU-bound
KEDA scaler with low min-replicas + cold-start sensitive workload	Keep min=1 to avoid cold starts on first event
Not using node taints for spot pools	Critical workloads land on spot, get evicted, fail

Reference Architecture — Hub-Spoke (where AKS / Container Apps fit)
Pattern — Streaming & CDC (KEDA-scaled consumers)
Best Practices — Cost Optimization
Industries — Manufacturing (bioinformatics-style pipelines)
Azure Container Apps docs: https://learn.microsoft.com/azure/container-apps/
AKS Production Baseline: https://learn.microsoft.com/azure/architecture/reference-architectures/containers/aks/baseline-aks