Getting Started — 30-Minute Tour¶
Goal: by the end of this page you will (1) understand what CSA-in-a-Box is, (2) know which deployment path matches your scenario, and (3) have a working dev environment that can run the platform locally.
If you only have 5 minutes, jump to Quickstart. If you want to deploy to Azure now, jump to Tutorial 01 — Foundation Platform.
1. What is CSA-in-a-Box?¶
CSA-in-a-Box (Cloud-Scale Analytics in a Box) is a reference implementation of an enterprise analytics + AI + data-management platform on Microsoft Azure. It bundles:
| Layer | What ships in the box |
|---|---|
| Landing zone (IaC) | 95 Bicep modules across DLZ (Data Landing Zone), DMLZ (Data Management LZ), gov (Azure Government overlay), and an ALZ fork |
| Data engineering | ADF + dbt-core, Delta Lake medallion, Synapse + Databricks side-by-side, Fabric strategic target |
| Streaming | Event Hubs + Stream Analytics + Fabric RTI adapter, Lambda + Kappa patterns |
| AI / GenAI | Azure OpenAI + AI Search RAG, GraphRAG, MCP server, Semantic Kernel agents, in-docs Copilot widget |
| Governance | Microsoft Purview sync, data product contracts (YAML), Great Expectations, Unity Catalog pattern |
| APIs / data apps | Data API Builder over Lakehouse, FastAPI BFF + React portal, MSAL auth, Power Apps starter |
| Compliance | NIST 800-53 r5, FedRAMP Moderate, CMMC 2.0 L2, HIPAA, SOC 2, PCI-DSS, GDPR crosswalks |
| Examples | 10 examples (9 verticals + iot-streaming cross-cutting pattern) spanning federal agencies, tribal, casino, ML lifecycle, IoT, AI agents |
| Operations | 8 production runbooks, DR drill automation, supply-chain security (SBOM + signing) |
It is not a turnkey SaaS — it is opinionated open-source IaC + reference code you fork into your tenant.
2. Pick your path (decision tree)¶
flowchart TD
Start[I want to...] --> A{Evaluate the<br/>platform?}
A -->|Yes| Eval[Read ARCHITECTURE.md<br/>+ Use Cases section<br/>+ skim 2-3 ADRs]
A -->|No| B{Deploy to Azure?}
B -->|Yes| C{Commercial<br/>or Government?}
C -->|Commercial| Comm[Tutorial 01<br/>Foundation Platform<br/>+ DLZ Bicep]
C -->|Government| Gov[deploy/bicep/gov<br/>+ Gov Service Matrix]
B -->|No| D{Contribute or<br/>extend?}
D -->|Yes| Dev[CONTRIBUTING.md<br/>+ Developer Pathways<br/>+ local dev below]
D -->|No| E{Migrate from<br/>another platform?}
E -->|Yes| Mig[See Migrations section:<br/>AWS, GCP, Snowflake,<br/>Databricks→Fabric,<br/>Teradata, Hadoop, Palantir]
E -->|No| Q[Open an issue<br/>or chat with the docs Copilot] 3. Local development setup (10 minutes)¶
Prerequisites¶
| Tool | Version | Why |
|---|---|---|
| Python | 3.11+ (3.12 recommended) | Platform code, dbt, AI integration |
| Node.js | 20 LTS | React portal, MSAL frontend |
| Azure CLI | 2.60+ | All az commands in tutorials |
| Bicep CLI | latest | az bicep upgrade |
| Docker | 24+ | Local Postgres, optional dbt CI |
| Make | any | Wraps every common task |
gh (GitHub CLI) | optional | For PR workflow |
Clone + bootstrap¶
git clone https://github.com/fgarofalo56/csa-inabox.git
cd csa-inabox
# Python venv with all platform deps
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev,ai,governance]"
# Verify
make typecheck # mypy: should report 0 errors
make test # pytest: ~1271 tests, all pass
mkdocs serve # docs at http://localhost:8000
Authenticate to Azure (only needed if deploying)¶
az login --tenant <YOUR_TENANT_ID>
az account set --subscription <YOUR_SUBSCRIPTION_ID>
# If deploying to Azure Government:
az cloud set --name AzureUSGovernment
az login --tenant <YOUR_GOV_TENANT_ID>
4. Pick your path through the docs¶
-
:material-drafting-compass:{ .lg .middle } Architect / Evaluator
- Architecture Overview
- Reference Architectures
- ADRs — 22 decisions, ~1 page each
- Best Practices — 9 guides
- Use Cases & White Papers
-
Data Engineer
- Tutorial 01 — Foundation Platform
- Tutorial 05 — Streaming (Lambda)
- Best Practices: Medallion, Data Engineering, Performance
- Patterns: Cosmos DB, Streaming & CDC
- Examples — pick a vertical close to yours
-
AI / GenAI Engineer
-
Platform / DevOps Engineer
-
Compliance / Security
- Compliance Overview
- Pick your framework: NIST 800-53 r5, FedRAMP, CMMC 2.0 L2, HIPAA, SOC 2, PCI-DSS, GDPR
- Best Practices — Security & Compliance
- Runbooks: Security Incident, Break-Glass
- Government Service Matrix
5. Common first-day questions¶
How long does it take to deploy a real DLZ?
With pre-existing networking and identity (typical enterprise), a fresh DLZ deploys in ~45 minutes end-to-end. From a blank subscription with no parent ALZ, plan half a day for the first deploy because you'll iterate on parameter files. Subsequent deploys are idempotent and run in 5–10 minutes.
What does this cost to run?
A dev tier (small Synapse pool, single-node Databricks Standard, B-series Functions, basic AI Search) runs roughly \(800–1,500/month** if left up 24×7. Most teams pause Synapse + Databricks outside business hours and run closer to **\)300–500/mo dev. Prod tiers scale with workload — see each example's deploy/params.prod.json for sizing assumptions.
Can I run this in Azure Government?
Yes. The deploy/bicep/gov/ overlay handles MAG/USGov-specific service availability. See the Government Service Matrix for which features are GA / preview / unavailable per cloud (Public, USGov, USGov Secret, China).
Where does Fabric fit vs Synapse vs Databricks?
See Reference Architecture — Fabric vs Synapse vs Databricks and ADR 0010 — Fabric Strategic Target. Short version: Synapse + Databricks are the production backbone today; Fabric is the strategic forward path for net-new workloads, particularly Real-Time Intelligence and Direct Lake semantic models.
How is governance / data mesh handled?
Microsoft Purview is the catalog of record (see ADR 0006). Each data product owns its YAML contract under examples/<vertical>/contracts/ and a sync job pushes contracts → Purview classifications + lineage. See Best Practices — Data Governance and ADR 0012 — Data Mesh Federation.
What about non-Azure clouds?
Out of scope — see ADR 0011 — Multi-Cloud Scope for the rationale. We document virtualization patterns (query AWS/GCP from Azure) but do not maintain parallel IaC for other clouds.
How current is this with Azure roadmap?
The repo is actively maintained against Azure GA features. Preview-only services are documented but gated behind feature flags. ADRs are revisited annually. The Platform Research Report is a snapshot of strategic direction.
6. Next steps¶
| If you want to... | Go to |
|---|---|
| See it running | Quickstart (5 min) → Tutorial 01 (45 min) |
| Pick a vertical | End-to-End Examples — 10 examples (9 verticals + iot-streaming cross-cutting pattern) |
| Understand the design | Architecture → ADRs |
| Deploy to production | Production Checklist → IaC & CI/CD Best Practices |
| Migrate from another platform | Migrations |
| Talk to the docs | AI Copilot (in-page chat widget) |
7. Getting help¶
- Open an issue: https://github.com/fgarofalo56/csa-inabox/issues
- GitHub Discussions: https://github.com/fgarofalo56/csa-inabox/discussions (Q&A + feature requests)
- Docs Copilot: in-page chat widget on every docs page (powered by Azure OpenAI in our DLZ — see ADR 0022)
- Security issues: see SECURITY.md (private disclosure)
See also:
- ← Previous: Documentation home
- → Next: Quickstart
- ⌂ Index: Documentation home