Home > Docs > Architecture Decision Records
Architecture Decision Records (ADRs)¶
Architects, platform engineers, auditors, federal customers preparing ATO packages
Note
Quick Summary: This directory captures the why behind the core technology choices in CSA-in-a-Box. Each record follows the MADR format. Records are immutable once accepted — revisions happen by adding a superseding ADR with an incremented number.
Index¶
| # | Title | Status | Date | One-line summary |
|---|---|---|---|---|
| 0001 | ADF (+ dbt) over Airflow as primary orchestration | accepted | 2026-04-19 | Managed Gov PaaS orchestrator with SQL-native transforms and Purview lineage. |
| 0002 | Azure Databricks over open-source Spark-on-AKS for heavy compute | accepted | 2026-04-19 | Managed Spark + Unity Catalog + Photon in Gov; OSS Spark on AKS is too operationally expensive. |
| 0003 | Delta Lake over Iceberg and Parquet as canonical table format | accepted | 2026-04-19 | Delta is Databricks- and Fabric-OneLake-native with ACID MERGE and Purview lineage. |
| 0004 | Bicep over Terraform as primary IaC (for now; Terraform path planned) | accepted | 2026-04-19 | Day-one Azure API coverage, no state-file custody, aligned to Enterprise-Scale Landing Zone. |
| 0005 | Event Hubs over open-source Kafka for streaming ingestion | accepted | 2026-04-19 | Managed PaaS broker with Kafka-protocol endpoint, Capture to Bronze, Gov-GA. |
| 0006 | Microsoft Purview over Apache Atlas for data catalog and lineage | accepted | 2026-04-19 | Gov-GA catalog with MIP label propagation and native Azure scanners. |
| 0007 | Azure OpenAI over self-hosted LLM for AI integration | accepted | 2026-04-19 | FedRAMP High inference endpoint with Private Endpoints; self-hosted fallback remains open. |
| 0008 | dbt Core over dbt Cloud for transformations | accepted | 2026-04-19 | Open-source CLI keeps metadata inside the tenant boundary; no SaaS FedRAMP surface to clear. |
| 0009 | SQLite (portal dev) → Postgres (portal prod) phased database strategy | accepted | 2026-04-19 | Zero-install dev loop; managed Postgres PaaS in Gov for production durability. |
| 0010 | Microsoft Fabric as strategic target; current build as Fabric-parity on Azure PaaS | accepted | 2026-04-19 | Every primitive (Delta, Purview, dbt, Spark) maps forward into Fabric when Gov GA lands. |
| 0011 | Multi-cloud scope: OneLake shortcuts + Purview scans only; defer federated compute | accepted | 2026-04-20 | Honest scope — ships governance story for S3/GCS/Snowflake/BigQuery/Redshift; defers cross-cloud compute. |
| 0012 | Data-mesh federation model: contract-driven, Purview-governed, portal-surfaced | accepted | 2026-04-20 | Contract-first in-monorepo mesh — contract.yaml → CI validates → Purview registers → marketplace surfaces; per-domain CODEOWNERS. |
| 0013 | dbt Core as the canonical transformation layer | accepted | 2026-04-20 | Deduplicates Bronze → Silver → Gold logic — dbt owns medallion transforms; Spark notebooks are deprecated for that path and reserved for exploration / provisioning / ML. |
| 0014 | MSAL Backend-for-Frontend (BFF) auth pattern | accepted | 2026-04-20 | Phased CSA-0020 remediation — Phase 1 strict CSP + Trusted Types on the SPA; Phase 2 server-side Auth Code + PKCE flow with an httpOnly csa_sid session cookie. Tokens never reach the browser. |
| 0015 | Portal persistence: StoreBackend protocol + SQLite (dev) + Postgres Flexible Server (prod) | accepted | 2026-04-20 | CSA-0046 implementation — dual-backend Protocol, managed-identity AAD tokens for Azure Database for PostgreSQL Flexible Server, Alembic migrations, SQLite kept as the zero-install dev loop. |
| 0016 | Async StoreBackend canonical; sync layer transitional | accepted | 2026-04-20 | CSA-0046 follow-on — AsyncStoreBackend Protocol + AsyncSqliteStore (aiosqlite) + AsyncPostgresStore (asyncpg + SQLAlchemy AsyncEngine); FastAPI routers go async def via Depends; migration CLI ships; sync layer kept one release for backward compat. |
| 0017 | RAG pipeline service-layer extraction (CSA-0133) | accepted | 2026-04-20 | Split the 1,285-line pipeline.py god-class into six submodules behind a RAGService facade; legacy pipeline module is preserved as a compat shim for one release. |
| 0018 | Fabric Real-Time Intelligence adapter (pre-GA, env-gated) | accepted | 2026-04-20 | CSA-0137 follow-on — ship FabricRTISource today behind FABRIC_RTI_ENABLED; raise-with-pointer when the flag is unset so Gov tenants fail loudly until RTI Gov-GA lands. |
| 0019 | BFF reverse-proxy + HMAC-sealed MSAL token cache | accepted | 2026-04-20 | CSA-0020 Phase 3 — mount /api/* proxy behind BFF_PROXY_ENABLED; persist MSAL SerializableTokenCache to Redis with HMAC sealing so a Redis compromise is tamper-evident. Completes AQ-0012's long-term column — tokens never reach the browser. |
| 0020 | Portal observability (OTel + Prometheus) and per-principal rate limiting | accepted | 2026-04-20 | CSA-0042 / CSA-0061 / CSA-0030 — OpenTelemetry with OTLP exporter + W3C trace-context, Prometheus /metrics on a private registry, per-principal sliding-window rate limiter on every write endpoint. All three feature-flagged and lazy-imported so the portal still boots without the optional extras. |
| 0021 | Two rate limiters are intentional, not duplicates | accepted | 2026-04-23 | The portal write-path limiter (per-principal, sliding-window, observability) and the AI router limiter (per-IP, fixed-window, abuse-defense) protect orthogonal failure modes; do not collapse them. |
| 0022 | Copilot surfaces vs. docs-site widget are intentional, not duplicates | accepted | 2026-04-23 | The Azure Function (func-csa-inabox-copilot-fg) is the production chat backend; the docs-site widget (docs/javascripts/copilot-chat.js) is a thin client that talks to it. Two artifacts, one service. |
| 0023 | release-please PRs auto-pass required status checks | accepted | 2026-04-27 | GITHUB_TOKEN-created PRs don't trigger downstream workflows, leaving release PRs permanently BLOCKED. The release-please workflow itself posts success commit statuses for each required check on the PR head SHA, gated on a strict allow-list of three version-metadata files. |
Format¶
All records follow MADR 3.x. Each ADR has frontmatter (status, date, deciders, consulted, informed) and these sections:
- Context and Problem Statement
- Decision Drivers
- Considered Options
- Decision Outcome
- Consequences (positive and negative)
- Pros and Cons of the Options
- Validation (how we'll know the decision was right)
- References (decision trees, concrete code, compliance-control mappings)
Status lifecycle¶
- proposed — open for comment on a PR.
- accepted — merged; the decision is in effect.
- deprecated — no longer in effect; no replacement chosen yet.
- superseded by NNNN — replaced by a newer ADR. The superseding ADR states what changed and why; the superseded ADR stays on disk as history.
ADRs are immutable once accepted. Corrections are made by authoring a new ADR that supersedes the old one. Typos and broken links may be fixed without superseding.
Authoring a new ADR¶
- Copy
0001-adf-dbt-over-airflow.mdas a template. - Increment the number (4-digit, zero-padded).
- Slugify the title (
NNNN-short-slug.md). - Keep the frontmatter + section order identical to 0001.
- Cite at least one concrete artifact in the repo and at least one compliance-control mapping from
governance/compliance/*.yamlwhen the decision maps to a control family. - Link forward to any relevant decision tree under
docs/decisions/. - Open a PR. Reviewers from security, governance, and dev-loop are expected. Status stays
proposeduntil the PR merges.
Cross-references¶
- Decision trees (scenario-driven, "which option for what situation"):
docs/decisions/. - Architecture reference (current-stack narrative):
docs/ARCHITECTURE.md. - Compliance control matrices:
governance/compliance/—nist-800-53-rev5.yaml,hipaa-security-rule.yaml,cmmc-2.0-l2.yaml.
Upstream references¶
- MADR documentation
- adr-tools (CLI for managing ADRs)
- Michael Nygard's original ADR essay: "Documenting Architecture Decisions"