Cost Management Guide¶
Note
Quick Summary: FinOps practices for CSA-in-a-Box covering cost estimation (Bicep + Terraform paths), budget guardrails per environment, CI/CD cost comments, required tagging strategy, optimization tips (reserved instances, auto-pause, storage tiering, right-sizing), and a FinOps maturity roadmap (Crawl → Walk → Run).
This document covers cost estimation, budget guardrails, and FinOps practices for the CSA-in-a-Box platform.
📑 Table of Contents¶
- 💰 Cost Estimation Approach
- 🔧 Running Cost Estimates Locally
- 🔄 CI/CD Integration
- 🚨 Budget Thresholds and Alerts
- 🏷️ Tagging Strategy
- 📉 Cost Optimization Tips
- 📊 FinOps Maturity Model
- 📋 Resource-Specific Pricing Reference
- 📚 Further Reading
💰 Cost Estimation Approach¶
CSA-in-a-Box supports two IaC paths, each with its own cost estimation strategy.
Bicep Path (Primary)¶
The primary deployment path uses Azure Bicep templates under deploy/bicep/. Because Infracost does not natively support Bicep, we provide a custom script that:
- Compiles Bicep to ARM JSON using
az bicep build --stdout - Extracts resource types and counts with
jq - Queries the Azure Retail Prices API for each resource type
- Produces a formatted cost estimate with budget comparison
Script: scripts/deploy/estimate-costs.sh
Important
Bicep estimates are best-effort. The Azure Retail Prices API returns list prices — actual costs depend on EA/CSP agreements, reserved instances, and consumption-based meters.
Terraform Path (Roadmap — not implemented)¶
Warning
A parallel Terraform deployment is on the roadmap, not available today (CSA-0015 / audit approval queue item AQ-0024). deploy/terraform/ does not exist in the repository. The .infracost/terraform.yml configuration is a scaffold that will activate once Terraform modules ship. Until then, use the Bicep path (above) for all cost analyses.
🔧 Running Cost Estimates Locally¶
Prerequisites¶
- Azure CLI with Bicep extension (
az bicep install) jq(JSON processor)curl(HTTP client)bc(calculator, usually pre-installed)
Basic Usage¶
# Estimate costs for the DLZ
./scripts/deploy/estimate-costs.sh deploy/bicep/DLZ/main.bicep
# With parameters file
./scripts/deploy/estimate-costs.sh deploy/bicep/DLZ/main.bicep \
--params deploy/bicep/DLZ/params.dev.json
# JSON output for scripting
./scripts/deploy/estimate-costs.sh deploy/bicep/DLZ/main.bicep \
--format json
# Compare against a budget
./scripts/deploy/estimate-costs.sh deploy/bicep/DLZ/main.bicep \
--budget 5000
# Specify environment (loads budget from .infracost/policy.yml)
./scripts/deploy/estimate-costs.sh deploy/bicep/DLZ/main.bicep \
--environment dev
# Different region and currency
./scripts/deploy/estimate-costs.sh deploy/bicep/DLZ/main.bicep \
--region westus2 --currency EUR
Understanding the Output¶
The table output shows:
| Column | Description |
|---|---|
| Resource Type | ARM resource type (e.g., Storage/storageAccounts) |
| SKU | Pricing SKU returned by the API |
| Unit $ | Per-unit retail price |
| Qty | Number of instances in the template |
| Monthly $ | Estimated monthly cost (unit price × hours/month × qty) |
Resources without a pricing mapping or that cannot be found in the API are flagged as warnings.
Exit Codes¶
| Code | Meaning |
|---|---|
| 0 | Success, within budget (or no budget set) |
| 1 | Script error (missing file, failed build) |
| 2 | Over budget (table mode with --budget) |
🔄 CI/CD Integration¶
GitHub Actions — Cost Estimate Job¶
The bicep-whatif.yml workflow includes a cost-estimate job that runs after the what-if analysis on every PR that modifies Bicep files. It:
- Compiles each changed landing zone's Bicep to ARM JSON
- Runs
estimate-costs.shin JSON mode - Posts a cost summary as a PR comment
The cost estimate job uses the environment-specific budget from .infracost/policy.yml. If the estimate exceeds the budget, the job logs a warning but does not block the PR (to avoid false-positive rejections from list-price estimates).
Adding Cost Estimates to Other Workflows¶
- name: Run Cost Estimate
run: |
chmod +x scripts/deploy/estimate-costs.sh
./scripts/deploy/estimate-costs.sh deploy/bicep/DLZ/main.bicep \
--format json \
--environment ${{ vars.ENVIRONMENT || 'dev' }} \
--budget ${{ vars.COST_BUDGET || '5000' }}
Infracost (Terraform Path)¶
When the Terraform modules are available:
- name: Setup Infracost
uses: infracost/actions/setup@v3
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Run Infracost
run: |
infracost breakdown \
--config-file .infracost/terraform.yml \
--format json \
--out-file /tmp/infracost.json
- name: Post Infracost Comment
uses: infracost/actions/comment@v1
with:
path: /tmp/infracost.json
behavior: update
🚨 Budget Thresholds and Alerts¶
Budget thresholds are defined in .infracost/policy.yml:
| Environment | Monthly Budget | Alert Threshold |
|---|---|---|
| dev | $5,000 | 80% |
| staging | $10,000 | 80% |
| prod | $25,000 | 75% |
Policy Rules¶
The policy file also enforces cost guardrails:
| Rule | Environment | Action | Description |
|---|---|---|---|
no-premium-in-dev | dev | warn | Flag Premium SKUs in dev |
no-multi-region-dev | dev | warn | Flag RA-GZRS storage in dev |
enforce-serverless-dev | dev | warn | Prefer serverless Cosmos DB in dev |
max-databricks-nodes-dev | dev | deny | Limit Databricks cluster nodes to 4 in dev |
max-adx-sku-dev | dev | warn | Require Dev/Test ADX SKU in dev |
Azure Cost Management Alerts¶
In addition to pre-deployment estimates, configure Azure Cost Management alerts for runtime monitoring:
# Create a budget in Azure Cost Management
az consumption budget create \
--budget-name "csa-dev-monthly" \
--amount 5000 \
--category cost \
--time-grain Monthly \
--start-date "2024-01-01" \
--end-date "2025-12-31" \
--resource-group "rg-dlz-dev-*" \
--notifications '[{
"enabled": true,
"operator": "GreaterThanOrEqualTo",
"threshold": 80,
"contactEmails": ["platform-team@contoso.com"],
"thresholdType": "Actual"
}]'
🏷️ Tagging Strategy¶
All CSA-in-a-Box resources must include cost-attribution tags. These are enforced in the Bicep templates via the tagsDefault variable in each landing zone's main.bicep.
Required Tags¶
| Tag | Purpose | Example Values |
|---|---|---|
environment | Deployment environment | dev, staging, prod |
CostCenter | Billing/chargeback code | CSA-Platform, DataEng |
Owner | Team or project owner | Platform Team |
Project | Project name | Azure Demo ALZ & CSA |
PrimaryContact | Technical contact email | platform-team@contoso.com |
Toolkit | IaC tool used | Bicep, Terraform |
Enforcement¶
Tags are defined in the tagsDefault variable in each landing zone and merged with resource-specific tags:
var tagsDefault = {
Owner: 'Azure Landing Zone & Cloud Scale Analytics Scenario'
Project: 'Azure Demo ALZ & CSA'
environment: environment
Toolkit: 'Bicep'
PrimaryContact: primaryContact
CostCenter: costCenter
}
Azure Policy can further enforce tagging at the subscription or management group level:
{
"if": {
"field": "[concat('tags[', 'CostCenter', ']')]",
"exists": "false"
},
"then": {
"effect": "deny"
}
}
📉 Cost Optimization Tips¶
Reserved Instances & Savings Plans¶
| Service | Savings Opportunity |
|---|---|
| Databricks | Pre-purchase DBU commit (1-year: ~25%, 3-year: ~40%) |
| Cosmos DB | Reserved capacity for provisioned throughput |
| Data Explorer | Reserved capacity for cluster compute |
| VMs (SHIR) | Reserved Instances for always-on Integration Runtime |
Auto-Pause and Auto-Stop¶
// Synapse SQL Pools — auto-pause after 60 minutes of inactivity
autopauseDelayInMinutes: 60
// ADX Dev clusters — auto-stop enabled
enableAutoStop: true
In dev/staging, always enable auto-pause for:
- Synapse dedicated SQL pools
- Databricks clusters (via cluster policies)
- Data Explorer clusters (Dev SKU auto-stop)
Spot VMs and Low-Priority Compute¶
- Use Spot VMs for Databricks worker nodes in dev/test
- Use Low-Priority nodes for Synapse Spark pools in dev/test
- Typical savings: 60-90% over pay-as-you-go
Storage Tiering¶
| Tier | Use Case | Relative Cost |
|---|---|---|
| Hot | Frequently accessed data | 1.0x |
| Cool | Infrequent access (30+ days) | ~0.5x |
| Archive | Rarely accessed (180+ days) | ~0.1x |
Implement lifecycle management policies for each lake zone:
{
"rules": [
{
"name": "archive-old-data",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToCool": {
"daysAfterModificationGreaterThan": 30
},
"tierToArchive": {
"daysAfterModificationGreaterThan": 180
}
}
},
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["raw/", "enriched/"]
}
}
}
]
}
Right-Sizing¶
- Event Hubs: Start with Standard tier (1 TU); scale up only when throughput exceeds 1 MB/s ingress
- Stream Analytics: Start with 3 SUs; monitor SU% utilization and adjust
- Functions: Use Consumption plan in dev; switch to Premium only for VNet integration or sustained load
- Data Explorer: Use
Dev(No SLA)_Standard_E2a_v4for dev; move toStandard_E8ads_v5for production
📊 FinOps Maturity Model¶
Stage 1: Crawl (Current)¶
- Pre-deployment cost estimation via
estimate-costs.sh - Budget thresholds in
.infracost/policy.yml - Required cost-attribution tags on all resources
- PR-level cost impact comments
- Azure Cost Management budgets and alerts
Stage 2: Walk¶
- Terraform path with native Infracost support
- Historical cost tracking (Infracost Cloud or Azure Cost Export)
- Automated anomaly detection (Azure Cost Alerts)
- Monthly cost review cadence with team
- Showback reports by
CostCentertag
Stage 3: Run¶
- Chargeback model across domains/teams
- Reserved instance and savings plan optimization
- Automated right-sizing recommendations
- Cost-per-pipeline / cost-per-query attribution
- Integration with organizational FinOps tooling
📋 Resource-Specific Pricing Reference¶
Quick reference for the CSA services tracked by estimate-costs.sh:
| Service | ARM Type | Default SKU (Dev) | Pricing Model |
|---|---|---|---|
| Storage Account | Microsoft.Storage/storageAccounts | Standard_LRS | Per GB stored + ops |
| Event Hubs | Microsoft.EventHub/namespaces | Standard (1 TU) | Per TU/hour + events |
| Data Factory | Microsoft.DataFactory/factories | N/A (pay-per-pipeline) | Per activity run |
| Databricks | Microsoft.Databricks/workspaces | Premium | Per DBU/hour |
| Data Explorer | Microsoft.Kusto/clusters | Dev(No SLA)_Standard_E2a_v4 | Per cluster/hour |
| Key Vault | Microsoft.KeyVault/vaults | Standard | Per operation |
| Cosmos DB | Microsoft.DocumentDB/databaseAccounts | Serverless | Per RU + storage |
| Azure Functions | Microsoft.Web/sites | Consumption / EP1 | Per execution + GB-s |
| Stream Analytics | Microsoft.StreamAnalytics/streamingjobs | Standard (3 SU) | Per SU/hour |
| Log Analytics | Microsoft.OperationalInsights/workspaces | Per GB | Per GB ingested |
| Machine Learning | Microsoft.MachineLearningServices/workspaces | Basic | Per compute/hour |
| Synapse Analytics | Microsoft.Synapse/workspaces | Serverless SQL Pool | Per TB processed |
📚 Further Reading¶
- Azure Pricing Calculator
- Azure Retail Prices API
- Infracost Documentation
- FinOps Foundation
- Azure Cost Management Best Practices
See also:
- ← Previous: Documentation home
- → Next: Rollback
- ⌂ Index: Documentation home