Reference Architecture — Identity & Secrets Flow¶

TL;DR: Humans authenticate to Entra ID and get RBAC; workloads use managed identities (never service principals with passwords); every secret lives in Key Vault and is referenced by URI; nothing — nothing — has a static credential in code, config, or pipeline.

The problem¶

Identity and secrets are the #1 source of compromise in cloud platforms. Service-principal secrets get checked into git, connection strings get pasted into ADF parameter files, and shared admin accounts make audit forensics impossible. The only sustainable answer is: workloads have identity, not credentials.

Architecture¶

flowchart TB
    subgraph Humans[Humans]
        Dev[Developer]
        SRE[SRE / Operator]
        Admin[Privileged Admin]
        Auditor[Auditor]
    end

    subgraph Entra[Entra ID Tenant]
        Users[User accounts<br/>+ MFA]
        Groups[Security groups<br/>+ PIM-eligible]
        SPN[Service principals<br/>federated only]
        ConditionalAccess[Conditional Access<br/>policies]
        PIM[Privileged Identity<br/>Management]
    end

    subgraph Workloads[Azure Workloads]
        ADF[ADF system MI]
        DBX[Databricks system MI]
        Func[Function App MI]
        Synapse[Synapse MI]
        AKS[AKS workload identity]
        Portal[Portal API MI]
    end

    subgraph KV[Azure Key Vault]
        Secrets[Connection strings,<br/>API keys,<br/>certificates]
        Keys[Encryption keys CMK]
        Cert[TLS certs]
    end

    subgraph Resources[Azure Data Plane]
        Storage[Storage<br/>RBAC]
        Cosmos[Cosmos<br/>RBAC]
        SQL[SQL Server<br/>Entra-only]
        AOAI[Azure OpenAI<br/>RBAC]
        Search[AI Search<br/>RBAC]
    end

    subgraph CICD[CI / CD]
        GH[GitHub Actions<br/>OIDC federated]
        AzDO[Azure DevOps<br/>workload identity federation]
    end

    subgraph Audit[Audit Trail]
        SignInLogs[Entra Sign-in Logs]
        ActivityLog[Azure Activity Log]
        DataPlaneLogs[Data plane diagnostic logs]
        DefenderCloud[Defender for Cloud]
    end

    Dev --> Users
    SRE --> Users
    Admin --> PIM
    PIM -. JIT activation .-> Groups
    Users --> Groups
    Groups --> Resources
    Groups -. RBAC .-> KV

    GH -. OIDC token<br/>no secrets .-> SPN
    AzDO -. WIF .-> SPN
    SPN -. RBAC .-> Resources
    SPN -. RBAC .-> KV

    ADF --> Workloads
    DBX --> Workloads
    Func --> Workloads
    Synapse --> Workloads
    AKS --> Workloads
    Portal --> Workloads

    Workloads -. system MI .-> KV
    Workloads -. system MI .-> Resources
    KV -. CMK .-> Storage
    KV -. CMK .-> SQL

    ConditionalAccess -. enforces .-> Users
    ConditionalAccess -. enforces .-> SPN

    Users --> SignInLogs
    SPN --> SignInLogs
    Workloads --> ActivityLog
    Resources --> DataPlaneLogs
    SignInLogs --> DefenderCloud
    ActivityLog --> DefenderCloud
    DataPlaneLogs --> DefenderCloud

    Auditor -. read-only .-> SignInLogs
    Auditor -. read-only .-> ActivityLog
    Auditor -. read-only .-> DataPlaneLogs

Three identity types — when to use which¶

Identity	Use for	Forbidden uses
User account (with MFA + Conditional Access)	Humans logging into the portal, running `az` commands, accessing dashboards	Workloads. Ever.
Managed Identity (system-assigned preferred, user-assigned for shared workloads)	Every Azure workload (ADF, Functions, Databricks, Synapse, AKS, Portal API) accessing other Azure resources	Cross-tenant scenarios (use federated SP)
Service Principal (federated identity only — no client secret)	CI/CD pipelines (GitHub Actions OIDC, Azure DevOps WIF), cross-tenant access	Anything where a managed identity would work

Service principals with client secrets are an anti-pattern. Period. If you find one, rotate it, replace it with federated credentials, and write a runbook entry about why it existed.

Secrets: Key Vault is the only answer¶

flowchart LR
    Code[Application code<br/>or pipeline] -->|reads URI<br/>not value| Config[App config<br/>or pipeline param<br/>has only KV URI]
    Config -.URI ref.-> KV[Key Vault]
    Workload[Workload MI] -.system MI auth.-> KV
    KV -->|short-lived value| Workload

Connection strings → Key Vault → workload reads via MI
API keys → Key Vault → workload reads via MI
TLS certs → Key Vault Certificates → auto-rotation enabled
Encryption keys (CMK) → Key Vault → CMK on Storage / SQL / Cosmos
Database passwords → don't have them — use Entra-only auth on SQL, Cosmos, Postgres

App settings reference Key Vault by URI, e.g.:

DB_CONNECTION_STRING = @Microsoft.KeyVault(VaultName=kv-csa-prod;SecretName=db-conn)

The workload's MI must have Key Vault Secrets User on the vault. Rotation in Key Vault is picked up automatically (cached ~10 min).

Privilege escalation — PIM, not standing access¶

Standing Owner / Contributor access is the bug, not the feature:

sequenceDiagram
    participant Admin
    participant PIM
    participant Approver
    participant Sub
    Admin->>PIM: Request "Owner" activation<br/>(2 hours, ticket #1234)
    PIM->>Approver: Approval request
    Approver-->>PIM: Approve
    PIM->>Sub: Grant role for 2 hours
    Sub-->>Admin: Action allowed
    Note over Admin,Sub: 2 hours later, role auto-removed
    PIM->>Admin: Notification: role expired

Configure PIM-eligible (not active) for:

Subscription Owner / Contributor
Key Vault Administrator
User Access Administrator
Any role with */write on production resources

Standing access is for read-only roles only.

CI/CD — federated, never client-secret¶

GitHub Actions:

- uses: azure/login@v2
  with:
      client-id: ${{ secrets.AZURE_CLIENT_ID }} # SP app ID, not a secret
      tenant-id: ${{ secrets.AZURE_TENANT_ID }}
      subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      # No client-secret. OIDC token issued by GitHub, validated by Entra.

The SP has a federated credential trust configured for the specific repo + branch (repo:fgarofalo56/csa-inabox:ref:refs/heads/main). Tokens are short-lived (1 hour) and scoped to that trust.

This eliminates the entire class of "the SP secret leaked from CI" incidents.

What goes in audit (and how long)¶

Log	Source	Retention	Where it lands
Entra Sign-in Logs	Every authentication	90 days free, longer with archive	Log Analytics + Storage archive
Entra Audit Logs	Every directory change	90 days free	Log Analytics + Storage archive
Azure Activity Log	Every control-plane action	90 days free	Log Analytics workspace
Resource diagnostic logs	Every data-plane action (Storage reads, KV reads, SQL queries)	Configurable per resource	Log Analytics + Storage archive
Defender for Cloud alerts	Threat detection	90 days hot	Log Analytics + Sentinel optional

Auditors get read-only Log Analytics + Storage archive access. They never get write access to anything.

Trade-offs¶

✅ What this gives you

Zero static credentials in code or pipelines
Every action traceable to a human or workload identity
Privilege escalation requires JIT activation + approval + reason
Secret rotation is a Key Vault config change, not a code deploy
Compliance auditors get a clean story

⚠️ What you give up

More upfront setup (PIM config, federated SP setup, Key Vault wiring)
Workloads need MI — adds a Bicep dependency on Microsoft.ManagedIdentity for every workload
PIM activation has a 0–5 minute delay; on-call gets used to it
A Key Vault outage (rare, but possible) takes down everything that reads secrets at startup. Mitigate with regional KV pairs + cached secrets at workload startup.

Variants¶

Scenario	Variant
Cross-tenant (e.g., partner access)	Federated SPs with cross-tenant federation, scoped to specific resource groups
Air-gapped / sovereign	Same pattern; Key Vault is regional and CMK uses Managed HSM rather than software-protected keys
Hybrid (on-prem AD + Entra)	Entra Connect Sync; on-prem accounts get Entra identity; same RBAC model
Workload identity for AKS	AKS workload identity (OIDC) instead of pod-managed-identity; same trust model as GH Actions