Skip to content

Home > Docs > Architecture Decision Records

Architecture Decision Records (ADRs)

Architects, platform engineers, auditors, federal customers preparing ATO packages

Note

Quick Summary: This directory captures the why behind the core technology choices in CSA-in-a-Box. Each record follows the MADR format. Records are immutable once accepted — revisions happen by adding a superseding ADR with an incremented number.


Index

# Title Status Date One-line summary
0001 ADF (+ dbt) over Airflow as primary orchestration accepted 2026-04-19 Managed Gov PaaS orchestrator with SQL-native transforms and Purview lineage.
0002 Azure Databricks over open-source Spark-on-AKS for heavy compute accepted 2026-04-19 Managed Spark + Unity Catalog + Photon in Gov; OSS Spark on AKS is too operationally expensive.
0003 Delta Lake over Iceberg and Parquet as canonical table format accepted 2026-04-19 Delta is Databricks- and Fabric-OneLake-native with ACID MERGE and Purview lineage.
0004 Bicep over Terraform as primary IaC (for now; Terraform path planned) accepted 2026-04-19 Day-one Azure API coverage, no state-file custody, aligned to Enterprise-Scale Landing Zone.
0005 Event Hubs over open-source Kafka for streaming ingestion accepted 2026-04-19 Managed PaaS broker with Kafka-protocol endpoint, Capture to Bronze, Gov-GA.
0006 Microsoft Purview over Apache Atlas for data catalog and lineage accepted 2026-04-19 Gov-GA catalog with MIP label propagation and native Azure scanners.
0007 Azure OpenAI over self-hosted LLM for AI integration accepted 2026-04-19 FedRAMP High inference endpoint with Private Endpoints; self-hosted fallback remains open.
0008 dbt Core over dbt Cloud for transformations accepted 2026-04-19 Open-source CLI keeps metadata inside the tenant boundary; no SaaS FedRAMP surface to clear.
0009 SQLite (portal dev) → Postgres (portal prod) phased database strategy accepted 2026-04-19 Zero-install dev loop; managed Postgres PaaS in Gov for production durability.
0010 Microsoft Fabric as strategic target; current build as Fabric-parity on Azure PaaS accepted 2026-04-19 Every primitive (Delta, Purview, dbt, Spark) maps forward into Fabric when Gov GA lands.
0011 Multi-cloud scope: OneLake shortcuts + Purview scans only; defer federated compute accepted 2026-04-20 Honest scope — ships governance story for S3/GCS/Snowflake/BigQuery/Redshift; defers cross-cloud compute.
0012 Data-mesh federation model: contract-driven, Purview-governed, portal-surfaced accepted 2026-04-20 Contract-first in-monorepo mesh — contract.yaml → CI validates → Purview registers → marketplace surfaces; per-domain CODEOWNERS.
0013 dbt Core as the canonical transformation layer accepted 2026-04-20 Deduplicates Bronze → Silver → Gold logic — dbt owns medallion transforms; Spark notebooks are deprecated for that path and reserved for exploration / provisioning / ML.
0014 MSAL Backend-for-Frontend (BFF) auth pattern accepted 2026-04-20 Phased CSA-0020 remediation — Phase 1 strict CSP + Trusted Types on the SPA; Phase 2 server-side Auth Code + PKCE flow with an httpOnly csa_sid session cookie. Tokens never reach the browser.
0015 Portal persistence: StoreBackend protocol + SQLite (dev) + Postgres Flexible Server (prod) accepted 2026-04-20 CSA-0046 implementation — dual-backend Protocol, managed-identity AAD tokens for Azure Database for PostgreSQL Flexible Server, Alembic migrations, SQLite kept as the zero-install dev loop.
0016 Async StoreBackend canonical; sync layer transitional accepted 2026-04-20 CSA-0046 follow-on — AsyncStoreBackend Protocol + AsyncSqliteStore (aiosqlite) + AsyncPostgresStore (asyncpg + SQLAlchemy AsyncEngine); FastAPI routers go async def via Depends; migration CLI ships; sync layer kept one release for backward compat.
0017 RAG pipeline service-layer extraction (CSA-0133) accepted 2026-04-20 Split the 1,285-line pipeline.py god-class into six submodules behind a RAGService facade; legacy pipeline module is preserved as a compat shim for one release.
0018 Fabric Real-Time Intelligence adapter (pre-GA, env-gated) accepted 2026-04-20 CSA-0137 follow-on — ship FabricRTISource today behind FABRIC_RTI_ENABLED; raise-with-pointer when the flag is unset so Gov tenants fail loudly until RTI Gov-GA lands.
0019 BFF reverse-proxy + HMAC-sealed MSAL token cache accepted 2026-04-20 CSA-0020 Phase 3 — mount /api/* proxy behind BFF_PROXY_ENABLED; persist MSAL SerializableTokenCache to Redis with HMAC sealing so a Redis compromise is tamper-evident. Completes AQ-0012's long-term column — tokens never reach the browser.
0020 Portal observability (OTel + Prometheus) and per-principal rate limiting accepted 2026-04-20 CSA-0042 / CSA-0061 / CSA-0030 — OpenTelemetry with OTLP exporter + W3C trace-context, Prometheus /metrics on a private registry, per-principal sliding-window rate limiter on every write endpoint. All three feature-flagged and lazy-imported so the portal still boots without the optional extras.
0021 Two rate limiters are intentional, not duplicates accepted 2026-04-23 The portal write-path limiter (per-principal, sliding-window, observability) and the AI router limiter (per-IP, fixed-window, abuse-defense) protect orthogonal failure modes; do not collapse them.
0022 Copilot surfaces vs. docs-site widget are intentional, not duplicates accepted 2026-04-23 The Azure Function (func-csa-inabox-copilot-fg) is the production chat backend; the docs-site widget (docs/javascripts/copilot-chat.js) is a thin client that talks to it. Two artifacts, one service.
0023 release-please PRs auto-pass required status checks accepted 2026-04-27 GITHUB_TOKEN-created PRs don't trigger downstream workflows, leaving release PRs permanently BLOCKED. The release-please workflow itself posts success commit statuses for each required check on the PR head SHA, gated on a strict allow-list of three version-metadata files.

Format

All records follow MADR 3.x. Each ADR has frontmatter (status, date, deciders, consulted, informed) and these sections:

  • Context and Problem Statement
  • Decision Drivers
  • Considered Options
  • Decision Outcome
  • Consequences (positive and negative)
  • Pros and Cons of the Options
  • Validation (how we'll know the decision was right)
  • References (decision trees, concrete code, compliance-control mappings)

Status lifecycle

proposed  →  accepted  →  deprecated
                   └──────→  superseded by NNNN
  • proposed — open for comment on a PR.
  • accepted — merged; the decision is in effect.
  • deprecated — no longer in effect; no replacement chosen yet.
  • superseded by NNNN — replaced by a newer ADR. The superseding ADR states what changed and why; the superseded ADR stays on disk as history.

ADRs are immutable once accepted. Corrections are made by authoring a new ADR that supersedes the old one. Typos and broken links may be fixed without superseding.

Authoring a new ADR

  1. Copy 0001-adf-dbt-over-airflow.md as a template.
  2. Increment the number (4-digit, zero-padded).
  3. Slugify the title (NNNN-short-slug.md).
  4. Keep the frontmatter + section order identical to 0001.
  5. Cite at least one concrete artifact in the repo and at least one compliance-control mapping from governance/compliance/*.yaml when the decision maps to a control family.
  6. Link forward to any relevant decision tree under docs/decisions/.
  7. Open a PR. Reviewers from security, governance, and dev-loop are expected. Status stays proposed until the PR merges.

Cross-references

Upstream references