Skip to content

Getting Started — 30-Minute Tour

Goal: by the end of this page you will (1) understand what CSA-in-a-Box is, (2) know which deployment path matches your scenario, and (3) have a working dev environment that can run the platform locally.

If you only have 5 minutes, jump to Quickstart. If you want to deploy to Azure now, jump to Tutorial 01 — Foundation Platform.


1. What is CSA-in-a-Box?

CSA-in-a-Box (Cloud-Scale Analytics in a Box) is a reference implementation of an enterprise analytics + AI + data-management platform on Microsoft Azure. It bundles:

Layer What ships in the box
Landing zone (IaC) 95 Bicep modules across DLZ (Data Landing Zone), DMLZ (Data Management LZ), gov (Azure Government overlay), and an ALZ fork
Data engineering ADF + dbt-core, Delta Lake medallion, Synapse + Databricks side-by-side, Fabric strategic target
Streaming Event Hubs + Stream Analytics + Fabric RTI adapter, Lambda + Kappa patterns
AI / GenAI Azure OpenAI + AI Search RAG, GraphRAG, MCP server, Semantic Kernel agents, in-docs Copilot widget
Governance Microsoft Purview sync, data product contracts (YAML), Great Expectations, Unity Catalog pattern
APIs / data apps Data API Builder over Lakehouse, FastAPI BFF + React portal, MSAL auth, Power Apps starter
Compliance NIST 800-53 r5, FedRAMP Moderate, CMMC 2.0 L2, HIPAA, SOC 2, PCI-DSS, GDPR crosswalks
Examples 10 examples (9 verticals + iot-streaming cross-cutting pattern) spanning federal agencies, tribal, casino, ML lifecycle, IoT, AI agents
Operations 8 production runbooks, DR drill automation, supply-chain security (SBOM + signing)

It is not a turnkey SaaS — it is opinionated open-source IaC + reference code you fork into your tenant.


2. Pick your path (decision tree)

flowchart TD
    Start[I want to...] --> A{Evaluate the<br/>platform?}
    A -->|Yes| Eval[Read ARCHITECTURE.md<br/>+ Use Cases section<br/>+ skim 2-3 ADRs]
    A -->|No| B{Deploy to Azure?}
    B -->|Yes| C{Commercial<br/>or Government?}
    C -->|Commercial| Comm[Tutorial 01<br/>Foundation Platform<br/>+ DLZ Bicep]
    C -->|Government| Gov[deploy/bicep/gov<br/>+ Gov Service Matrix]
    B -->|No| D{Contribute or<br/>extend?}
    D -->|Yes| Dev[CONTRIBUTING.md<br/>+ Developer Pathways<br/>+ local dev below]
    D -->|No| E{Migrate from<br/>another platform?}
    E -->|Yes| Mig[See Migrations section:<br/>AWS, GCP, Snowflake,<br/>Databricks→Fabric,<br/>Teradata, Hadoop, Palantir]
    E -->|No| Q[Open an issue<br/>or chat with the docs Copilot]

3. Local development setup (10 minutes)

Prerequisites

Tool Version Why
Python 3.11+ (3.12 recommended) Platform code, dbt, AI integration
Node.js 20 LTS React portal, MSAL frontend
Azure CLI 2.60+ All az commands in tutorials
Bicep CLI latest az bicep upgrade
Docker 24+ Local Postgres, optional dbt CI
Make any Wraps every common task
gh (GitHub CLI) optional For PR workflow

Clone + bootstrap

git clone https://github.com/fgarofalo56/csa-inabox.git
cd csa-inabox

# Python venv with all platform deps
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install -e ".[dev,ai,governance]"

# Verify
make typecheck                      # mypy: should report 0 errors
make test                           # pytest: ~1271 tests, all pass
mkdocs serve                        # docs at http://localhost:8000

Authenticate to Azure (only needed if deploying)

az login --tenant <YOUR_TENANT_ID>
az account set --subscription <YOUR_SUBSCRIPTION_ID>

# If deploying to Azure Government:
az cloud set --name AzureUSGovernment
az login --tenant <YOUR_GOV_TENANT_ID>

4. Pick your path through the docs


5. Common first-day questions

How long does it take to deploy a real DLZ?

With pre-existing networking and identity (typical enterprise), a fresh DLZ deploys in ~45 minutes end-to-end. From a blank subscription with no parent ALZ, plan half a day for the first deploy because you'll iterate on parameter files. Subsequent deploys are idempotent and run in 5–10 minutes.

What does this cost to run?

A dev tier (small Synapse pool, single-node Databricks Standard, B-series Functions, basic AI Search) runs roughly \(800–1,500/month** if left up 24×7. Most teams pause Synapse + Databricks outside business hours and run closer to **\)300–500/mo dev. Prod tiers scale with workload — see each example's deploy/params.prod.json for sizing assumptions.

Can I run this in Azure Government?

Yes. The deploy/bicep/gov/ overlay handles MAG/USGov-specific service availability. See the Government Service Matrix for which features are GA / preview / unavailable per cloud (Public, USGov, USGov Secret, China).

Where does Fabric fit vs Synapse vs Databricks?

See Reference Architecture — Fabric vs Synapse vs Databricks and ADR 0010 — Fabric Strategic Target. Short version: Synapse + Databricks are the production backbone today; Fabric is the strategic forward path for net-new workloads, particularly Real-Time Intelligence and Direct Lake semantic models.

How is governance / data mesh handled?

Microsoft Purview is the catalog of record (see ADR 0006). Each data product owns its YAML contract under examples/<vertical>/contracts/ and a sync job pushes contracts → Purview classifications + lineage. See Best Practices — Data Governance and ADR 0012 — Data Mesh Federation.

What about non-Azure clouds?

Out of scope — see ADR 0011 — Multi-Cloud Scope for the rationale. We document virtualization patterns (query AWS/GCP from Azure) but do not maintain parallel IaC for other clouds.

How current is this with Azure roadmap?

The repo is actively maintained against Azure GA features. Preview-only services are documented but gated behind feature flags. ADRs are revisited annually. The Platform Research Report is a snapshot of strategic direction.


6. Next steps

If you want to... Go to
See it running Quickstart (5 min) → Tutorial 01 (45 min)
Pick a vertical End-to-End Examples — 10 examples (9 verticals + iot-streaming cross-cutting pattern)
Understand the design ArchitectureADRs
Deploy to production Production ChecklistIaC & CI/CD Best Practices
Migrate from another platform Migrations
Talk to the docs AI Copilot (in-page chat widget)

7. Getting help

  • Open an issue: https://github.com/fgarofalo56/csa-inabox/issues
  • GitHub Discussions: https://github.com/fgarofalo56/csa-inabox/discussions (Q&A + feature requests)
  • Docs Copilot: in-page chat widget on every docs page (powered by Azure OpenAI in our DLZ — see ADR 0022)
  • Security issues: see SECURITY.md (private disclosure)

See also: