Home > Docs > Platform Services

Platform Services Guide¶

Note

Quick Summary: Detailed guide to 10 platform services that deliver Fabric-parity capabilities on Azure PaaS — OneLake pattern, Data Activator, Direct Lake, Data Marketplace, Governance Framework, Multi-Synapse (legacy — see CSA-0139), Metadata Framework, AI Integration, Shared Services, and OSS alternatives. Intended for Azure Government (where Fabric is forecast, not GA) and for Commercial workloads that need a composable IaC stack as a stepping stone toward a future Fabric migration.

Platform services are the Fabric-parity capabilities that extend the base landing zones. Each service is independently deployable, has its own README with detailed usage instructions, and maps to a Microsoft Fabric equivalent so workloads can migrate incrementally as Fabric becomes available in their cloud/region.

🏗️ Services Overview¶

graph LR
    subgraph "Core Services"
        OL[Unity Catalog Pattern]
        MF[Metadata Framework]
        SS[Shared Services]
    end

    subgraph "Intelligence"
        AI[AI Integration]
        DA[Data Activator]
        DL[Direct Lake]
    end

    subgraph "Governance"
        DM[Data Marketplace]
        GV[Governance Framework]
        MS[Multi-Synapse]
    end

    subgraph "Gap Fillers"
        OSS[OSS Alternatives]
    end

    MF --> OL
    SS --> MF
    AI --> OL
    DA --> SS
    DL --> OL
    DM --> GV
    GV --> OL

1. 🗄️ Unity Catalog Pattern¶

Location: csa_platform/unity_catalog_pattern/ (renamed from onelake_pattern/ in CSA-0132; this pattern implements Databricks Unity Catalog with ADLS Gen2, not Microsoft OneLake.) Fabric Equivalent (conceptual): OneLake — a future csa_platform/fabric/ module (CSA-0129) will own the real OneLake integration.

Implements a unified data lake using ADLS Gen2 with Databricks Unity Catalog providing the shared metadata layer. All domain data lives in a single logical lake with physical separation via containers and folders.

What it does:

Provides a standardized storage layout (Bronze / Silver / Gold) per domain
Configures Unity Catalog for cross-domain metadata and access control
Sets up storage lifecycle policies (hot → cool → archive)
Creates shared Delta Lake tables accessible across Databricks and Synapse

Deploy:

az deployment group create \
  --resource-group rg-datalake \
  --template-file csa_platform/unity_catalog_pattern/deploy/onelake.bicep \
  --parameters csa_platform/unity_catalog_pattern/deploy/params.json

Dependencies: ADLS Gen2 (from DLZ deployment), Databricks workspace

2. ⚡ Data Activator¶

Location: csa_platform/data_activator/ Fabric Equivalent: Data Activator

Event-driven alerting and automation triggered by data conditions. Replaces Fabric Data Activator using Event Grid, Logic Apps, and Azure Functions.

What it does:

Monitors data lake events (new files, schema changes, quality violations)
Triggers alerts via Teams webhooks, email, or PagerDuty
Executes remediation workflows (re-run pipeline, quarantine bad data)
Provides configurable thresholds and notification routing

Deploy:

az deployment group create \
  --resource-group rg-platform \
  --template-file csa_platform/data_activator/deploy/activator.bicep \
  --parameters csa_platform/data_activator/deploy/params.json

Dependencies: Event Grid (from DLZ), Logic Apps, Azure Functions, Key Vault

3. 📊 Semantic Model¶

Location: csa_platform/semantic_model/ (renamed from direct_lake/ in CSA-0132; this pattern implements Power BI semantic models over Databricks SQL, not Microsoft Fabric Direct Lake.) Fabric Equivalent (conceptual): Direct Lake mode in Power BI — a future csa_platform/fabric/ module (CSA-0129) will own the real Direct Lake integration.

Enables Power BI to query Delta Lake files directly from ADLS Gen2 via Databricks SQL endpoints, eliminating the need to import data into Power BI datasets.

What it does:

Configures Databricks SQL Serverless endpoints for Power BI consumption
Provides DAX measures and M query templates for common patterns
Sets up row-level security passthrough from Entra ID to Unity Catalog
Optimizes Delta tables for Direct Lake performance (file size, Z-ordering)

Deploy:

# Databricks SQL endpoint is created via workspace configuration
databricks sql-endpoints create \
  --name "powerbi-direct-lake" \
  --cluster-size "Small" \
  --auto-stop-mins 30

Dependencies: Databricks workspace with Unity Catalog, Power BI Pro/Premium

4. 🛒 Data Marketplace¶

Location: csa_platform/data_marketplace/ Fabric Equivalent: Data Sharing / OneLake Data Hub

A self-service portal for discovering, requesting access to, and consuming data products published across the organization.

What it does:

Exposes a FastAPI-based catalog of data products with search and filtering
Integrates with Purview for asset metadata and lineage
Provides an access request and approval workflow (owner-based, time-bound)
Tracks data product quality scores and SLA compliance
Publishes usage metrics and consumer analytics

Deploy:

Important

CSA-0067 / CSA-0131. The legacy marketplace under csa_platform/data_marketplace/ is deprecated. It does not ship a --init CLI; the previously documented command never existed. Use the actively-served marketplace in portal.shared.api.routers.marketplace instead.

# Recommended — the portal seeds demo products on startup when
# ENVIRONMENT=local or DEMO_MODE=true.
cd portal/kubernetes/docker && docker compose up --build

# Browsable at:
#   http://localhost:3000/marketplace             (React frontend)
#   http://localhost:8000/api/v1/marketplace/...  (JSON API)

Dependencies: Purview, API Management, SQLite or Postgres for catalog state (see portal/shared/api/persistence_factory.py).

5. 📋 Governance Framework¶

Location: csa_platform/csa_platform/governance/purview/ (Python automation) + top-level csa_platform/governance/ (shared logging, contracts, dataquality, finops) Fabric Equivalent: Purview-integrated governance Note: These two trees overlap today and are scheduled for consolidation (see AQ-0025 / CSA-0126 in the audit approval queue). Both are canonical until that decision is made.

Extends Microsoft Purview with automated data governance workflows including classification, sensitivity labeling, and master data management.

What it does:

Automatically classifies new assets using built-in and custom classifiers
Applies sensitivity labels (Public, Internal, Confidential, CUI, PHI)
Captures lineage from ADF, Databricks, dbt, and Synapse
Enforces data product contracts (schema, SLA, quality thresholds)
Provides a master data management (MDM) framework for reference data

Deploy:

# Bootstrap Purview with glossary, classifications, and scan rules
python scripts/purview/bootstrap_catalog.py \
  --purview-account <purview-name> \
  --config csa_platform/governance/purview/catalog-config.yaml

Dependencies: Microsoft Purview, Key Vault

6. 🔄 Multi-Synapse¶

Location: csa_platform/multi_synapse/ (legacy / migration-only — see csa_platform/multi_synapse/README.md and csa_platform/multi_synapse/MIGRATION.md; CSA-0139 / AQ-0034) Fabric Equivalent: Multi-workspace Synapse Status: Legacy. New work should target Databricks + Unity Catalog (ADR-0002) or Fabric where GA (ADR-0010). This module stays deployable for existing Synapse footprints only.

Provides a shared Synapse Analytics environment with per-organization or per-domain isolation using workspace-level RBAC and network segmentation.

What it does:

Deploys multiple Synapse workspaces with shared managed VNet
Configures per-workspace SQL pools (dedicated and serverless)
Sets up cross-workspace linked services for shared data access
Implements workspace-level RBAC and audit logging

Deploy:

az deployment group create \
  --resource-group rg-synapse \
  --template-file csa_platform/multi_synapse/deploy/synapse.bicep \
  --parameters @csa_platform/multi_synapse/deploy/params.json

Dependencies: DLZ VNet, ADLS Gen2, Key Vault

7. ⚙️ Metadata Framework¶

Location: csa_platform/metadata_framework/ Fabric Equivalent: Metadata-driven Data Factory pipelines

Auto-generates ADF pipelines from YAML-based source registration metadata. Register a source once and the framework creates copy activities, Bronze ingestion, scheduling, and error handling automatically.

What it does:

Reads source registration YAML files with connection, schema, schedule metadata
Generates parameterized ADF pipeline JSON
Deploys pipelines via ARM/Bicep or ADF REST API
Supports incremental load watermarking and change data capture

Configuration:

# Example source registration
source:
    name: usda_crop_data
    type: rest_api
    connection:
        base_url: https://quickstats.nass.usda.gov/api/api_GET
        auth_type: api_key
        key_vault_secret: nass-api-key
    schedule:
        frequency: daily
        time: "06:00"
    destination:
        container: bronze
        folder: usda/crop_data
        format: parquet

Dependencies: Azure Data Factory, Key Vault, ADLS Gen2

8. 🤖 AI Integration¶

Location: csa_platform/ai_integration/ Fabric Equivalent: Copilot / AI features

Provides domain-aware AI capabilities including document enrichment, entity extraction, text summarization, and RAG-based question answering.

What it does:

Document Classifier — Categorizes incoming documents using Azure OpenAI
Entity Extractor — Extracts named entities (people, orgs, locations) from text
Text Summarizer — Generates concise summaries of data product descriptions
RAG Patterns — Retrieval-augmented generation over gold-layer data products
Model Serving — Deploys custom ML models as API endpoints

Deploy:

pip install -r csa_platform/ai_integration/requirements.txt

# Configure Azure OpenAI connection
export AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/
export AZURE_OPENAI_API_KEY=<key>
export AZURE_OPENAI_DEPLOYMENT=gpt-4

Dependencies: Azure OpenAI, Azure ML (optional), ADLS Gen2

9. 🔧 Shared Services¶

Location: csa_platform/functions/ (validation, aiEnrichment, eventProcessing, secretRotation) Fabric Equivalent: Shared utility functions

A library of reusable Azure Functions for common data operations used across pipelines and platform services.

Available Functions:

Function	Purpose
`detect_pii`	Scans text columns for PII using regex and AI classification
`validate_schema`	Validates incoming data against registered JSON/Avro schemas
`validate_quality`	Runs Great Expectations checkpoints and returns results
`send_teams_alert`	Posts formatted alerts to Microsoft Teams via webhook
Dead-letter pattern	Canonical per-pipeline DLQ (container + Event Grid + alert) — see `deploy/bicep/shared/modules/deadletter/` + runbooks/dead-letter.md (CSA-0138)

Deploy:

cd csa_platform/functions/validation

# Deploy to Azure Functions
func azure functionapp publish <function-app-name> --python

# Or deploy via Bicep
az deployment group create \
  --resource-group rg-platform \
  --template-file csa_platform/functions/deploy/functions.bicep

Dependencies: Azure Functions runtime, Key Vault, Teams webhook URL

10. 🔓 OSS Alternatives¶

Location: csa_platform/oss_alternatives/ Fabric Equivalent: N/A (fills Azure Government gaps)

Containerized open-source alternatives for services that are unavailable or restricted in Azure Government at certain impact levels.

Available Alternatives:

Service Gap	OSS Replacement	Deployment
Entra ID B2C (not in Gov)	Keycloak	Helm chart on AKS
AI Search (no IL5)	OpenSearch	Helm chart on AKS
Azure ML (no IL5)	MLflow + Kubeflow	Helm chart on AKS
Cognitive Services (limited)	Hugging Face Inference	Docker on AKS

Deploy:

# Example: deploy Keycloak on AKS
helm install keycloak csa_platform/oss_alternatives/keycloak/chart \
  --namespace identity \
  --values csa_platform/oss_alternatives/keycloak/values-gov.yaml

Dependencies: AKS cluster, Azure Container Registry

📦 Service Dependency Map¶

Deploy platform services in this recommended order:

Order	Service	Foundation
1	OneLake Pattern	Storage + metadata
2	Shared Services	Reusable functions
3	Governance Framework	Classification + lineage
4	Metadata Framework	Auto-pipeline generation
5	Data Marketplace	Discovery + access
6	AI Integration	Enrichment + RAG
7	Data Activator	Alerting + automation
8	Direct Lake	Power BI consumption
9	Multi-Synapse	Legacy — only if migrating an existing Synapse footprint (CSA-0139)
10	OSS Alternatives	If Gov gaps exist

⚙️ Configuration¶

All platform services read shared configuration from:

Key Vault — Connection strings, API keys, secrets
App Configuration — Feature flags, service endpoints, environment settings
Environment Variables — Local development overrides

See the root .env.example for all required environment variables.

See also:

← Previous: Architecture
→ Next: Multi-Region
⌂ Index: Documentation home