Skip to content

Home > Docs > Runbooks > Tenant Migration: Dev β†’ Staging β†’ Prod

πŸš€ Tenant Migration: Dev β†’ Staging β†’ Prod Promotion

Last Updated: 2026-04-27 | Phase: 14 (Wave 1) | Feature 1.6 Audience: Release engineers, Platform Lead, on-call engineers, deployment approvers Purpose: End-to-end procedure for promoting Fabric workspace items (Lakehouses, Notebooks, Pipelines, Power BI datasets) across Dev β†’ Staging β†’ Prod using Deployment Pipelines + fabric-cicd, including rollback for bad deployments

Category Type Platform Promotion


πŸ“‘ Table of Contents

  1. Use Cases
  2. Pre-Requisites
  3. Standard Promotion Procedure
  4. Rollback Procedure
  5. Hotfix Procedure
  6. Verification Checklist
  7. Post-Promotion Actions
  8. Escalation
  9. Quick-Reference Commands
  10. Promotion & Rollback Flow Diagrams
  11. Common Failure Modes
  12. Related Runbooks

Use Cases

Pick the path that matches your scenario. Each path links to the section that handles it.

Scenario Path Approval Gate Typical Duration
Promoting tested feature from Dev β†’ Staging β†’ Prod (normal release) Standard Promotion Required for Prod 30–90 min
Rolling back a bad deployment in Prod (data or behavior regression) Rollback Procedure Notify, then act 15–45 min
Hotfix promotion (skip Dev, urgent Prod fix) Hotfix Procedure Required for Prod 30–60 min
Disaster recovery: rebuilding Prod from Staging after capacity loss Rollback / DR Variant SEV1 β€” page Platform Lead 1–4 hr
Routine bi-weekly release train Standard Promotion Required for Prod 30–90 min
Federal agency promotion (per-agency pipelines, FedRAMP audit trail) Standard Promotion (run per agency) Required for Prod, dual-approver 60–120 min

⚠️ Cross-reference: This runbook focuses on promotion mechanics. For incident handling during deployment, see Incident Response Template. For change-control policy (freeze windows, RFC, CAB), see Change Management.


Pre-Requisites

Before any promotion runs, verify the following are configured. Missing prerequisites are the #1 cause of failed promotions.

1. Workspace Pairs Configured

Stage Workspace Capacity Connected To
Dev casino-fabric-dev F2 (shared) Git branch dev
Staging casino-fabric-staging F4 (dedicated) Git branch staging (or via Deployment Pipelines)
Prod casino-fabric-prod F64 (dedicated) Git branch main (or via Deployment Pipelines)

Each workspace can belong to only one Deployment Pipeline. Verify with REST API: GET https://api.fabric.microsoft.com/v1/workspaces/{id} β†’ check capacityId and pipeline assignment.

2. Git Integration Enabled

  • Each workspace connected to its branch (Workspace Settings β†’ Git Integration)
  • Branch protection on main (required PR, β‰₯1 reviewer, build validation, comment resolution)
  • .platform files committed for every Fabric item
  • Fabric folder convention: casino-fabric-{stage}.Workspace/ per workspace

See: Git Integration for connection steps.

3. fabric-cicd Installed in CI Runners

# Verify in workflow logs after first run
pip install "fabric-cicd==1.0.0" azure-identity
python -c "from fabric_cicd import FabricWorkspace; print('OK')"

4. Service Principal & OIDC

Secret / Var GitHub Scope Purpose
AZURE_CLIENT_ID Repo secret App Registration client ID
AZURE_TENANT_ID Repo secret Microsoft Entra ID tenant
AZURE_SUBSCRIPTION_ID Repo secret Subscription
FABRIC_DEV_WORKSPACE_ID Environment dev Dev workspace GUID
FABRIC_STAGING_WORKSPACE_ID Environment staging Staging workspace GUID
FABRIC_PROD_WORKSPACE_ID Environment production Prod workspace GUID

The Service Principal must be Contributor (minimum) on each workspace and have Fabric API permissions granted with admin consent.

5. Branch Protection on main

Settings β†’ Branches β†’ Branch protection rule for "main"
  βœ… Require a pull request before merging
  βœ… Require approvals: 1 (or 2 for Prod-impacting changes)
  βœ… Require status checks: validate, test, lint
  βœ… Require conversation resolution before merging
  βœ… Do not allow bypassing the above settings

6. GitHub Environments with Approval Gates

Environment Required Reviewers Wait Timer Deployment Branches
dev 0 (auto) 0 main, manual dispatch
staging 1 0 manual dispatch only
production 2 (Platform Lead + Service Owner) 5 min manual dispatch from main only

7. Pre-Promotion Checklist

  • All open PRs targeting the release have been merged to main
  • Test suite green: pytest validation/unit_tests/ -v (612 tests)
  • Bicep validation green: az bicep build --file infra/main.bicep
  • Great Expectations checkpoints pass for affected Lakehouses
  • Release notes drafted (link in PR description)
  • Stakeholders notified of promotion window (Teams + email for Prod)
  • No active SEV1/SEV2 incidents (check incident channel)
  • Outside change-freeze window (no Friday/holiday Prod deploys without VP approval)

Standard Promotion Procedure

Goal: Move a tested change set from main β†’ Dev β†’ Staging β†’ Prod. Time budget: 30–90 min.

Step 1 β€” PR Merge to main

  1. Author opens PR against main with passing checks.
  2. Reviewer approves; PR is merged via squash-and-merge (preserves linear history for rollback).
  3. Merge triggers Deploy Fabric Items workflow (.github/workflows/deploy-fabric.yml) on push.

Step 2 β€” fabric-cicd Workflow Runs (Validate β†’ Deploy Dev)

The pipeline auto-runs on push:

validate β†’ deploy-dev (dry-run by default on push)

Watch the workflow run:

gh run watch --repo fgarofalo56/Suppercharge_Microsoft_Fabric

If validate fails, stop. Do not bypass β€” fix the underlying issue and re-merge.

Step 3 β€” Validate Dev β†’ Staging via Deployment Pipelines

Two paths are supported. Pick one and stick with it for the release.

Manually trigger the staging deploy:

gh workflow run deploy-fabric.yml \
  -f target_environment=staging \
  -f dry_run=false

Approval is enforced by the GitHub Environment staging (1 reviewer).

Path B β€” Deployment Pipelines (Visual / Portal)

  1. Open the Fabric portal β†’ Deployment Pipelines β†’ casino-analytics-pipeline.
  2. Review the side-by-side comparison (Dev vs. Staging).
  3. Select all changed items (or specific items for partial release).
  4. Click Deploy β†’ confirm deployment rules (Lakehouse rebind, connection overrides).
  5. Wait for completion β€” typically 2–10 min depending on item count.

Step 4 β€” Smoke Tests in Staging

Run the smoke test script and verify each control point:

python scripts/fabric_smoke_test.py \
  --workspace-id "$FABRIC_STAGING_WORKSPACE_ID"

Verification gates (must all pass before proceeding):

  • All notebooks executable (test-run a sample bronze, silver, gold notebook)
  • Lakehouse schemas match Dev schemas (DESCRIBE EXTENDED on key tables)
  • Pipeline test-run completes end-to-end without error
  • Power BI dataset refresh succeeds
  • Sample queries return expected row counts (within Β±5% of Dev)

Step 5 β€” Manual Approval Gate for Prod

The production GitHub Environment requires 2 approvers (Platform Lead + Service Owner). Approval window is the 5-min wait timer.

  1. Trigger the workflow:
gh workflow run deploy-fabric.yml \
  -f target_environment=prod \
  -f dry_run=false
  1. Both approvers receive notification. Each must:
  2. Re-verify smoke tests passed in Staging
  3. Confirm release notes are accurate
  4. Confirm no active incidents
  5. Approve in GitHub UI (or via gh api)

  6. After both approvals, the deploy-prod job runs.

Step 6 β€” Deploy to Prod via Deployment Pipelines / fabric-cicd

The workflow executes:

python scripts/fabric-cicd-deploy.py \
  --workspace-id "$FABRIC_PROD_WORKSPACE_ID" \
  --environment prod \
  --item-type-in-scope Notebook Lakehouse SemanticModel

Do not close the terminal. Watch logs for: - Authenticating with Azure... β†’ OIDC succeeded - Connected to workspace: ... β†’ SP has access - Publishing items to workspace... β†’ fabric-cicd is acting - Deployment to prod COMPLETE β†’ success

If errors appear, see Common Failure Modes.

Step 7 β€” Post-Deployment Verification

Run the Verification Checklist immediately after deploy. If anything fails, proceed to Rollback Procedure.


Rollback Procedure

Goal: Restore Prod to a known-good state when a deployment introduces regression. Time budget: 15–45 min.

Decision rule: Rollback within 30 min if customer impact is detected. Don't try to forward-fix in Prod under pressure.

Step 1 β€” Declare Rollback

  1. Open incident channel (#incident-{YYYY-MM-DD}-deploy-rollback-sev{N}) β€” see Incident Response Template.
  2. Page Platform Lead + on-call. Severity is at minimum SEV2 for Prod rollback.
  3. Communicate intent to stakeholders: "Rolling back deploy {commit-sha} due to {symptom}."

Step 2 β€” Identify the Bad Commit

# Show last 10 commits to main
git log --oneline -10 origin/main

# Identify the deploy commit (squash-merged from a PR)
gh pr list --state merged --limit 10 --base main

# Confirm which commit was deployed (cross-check workflow run)
gh run list --workflow=deploy-fabric.yml --limit 5

Capture the last-known-good commit SHA β€” this is your rollback target.

Step 3 β€” Choose Rollback Mechanism

Mechanism When to Use Speed Risk
Deployment Pipelines "previous version" Item-definition regression only, no data side-effects Fast (5–10 min) Low
fabric-cicd redeploy from prior commit Git is source of truth, want full state revert Medium (10–20 min) Low
Delta time-travel RESTORE Data-affecting change (bad transformation wrote to Gold) Medium (10–20 min per table) Medium β€” verify FK chain
Workspace restore from backup Catastrophic loss (workspace corruption, SEV1 DR) Slow (1–4 hr) High β€” last resort

Step 4a β€” Rollback via Deployment Pipelines (Item Definitions)

  1. Fabric portal β†’ Deployment Pipelines β†’ select pipeline.
  2. Click the Production stage β†’ Deployment History.
  3. Find the prior successful deployment (before the bad one).
  4. Click ... β†’ Redeploy this version (backwards/forward as needed) β€” Deployment Pipelines support back-deploy from Staging or roll-forward from a previous Dev snapshot.
  5. Confirm the diff. Deploy. Verify.

Step 4b β€” Rollback via fabric-cicd (Re-deploy Prior Commit)

# Check out last-known-good commit on a rollback branch
git checkout -b rollback/INC-$(date +%Y%m%d) <last-known-good-sha>
git push origin rollback/INC-$(date +%Y%m%d)

# Trigger deploy from rollback branch (workflow_dispatch supports any ref)
gh workflow run deploy-fabric.yml \
  --ref rollback/INC-$(date +%Y%m%d) \
  -f target_environment=prod \
  -f dry_run=false

After verification, fast-forward main with a revert PR:

git checkout main
git revert <bad-commit-sha> --no-edit
git push origin main

Step 4c β€” Restore Delta Tables via Time-Travel (Data-Affecting)

If the bad deployment wrote to Delta tables (silver/gold), data must also be reverted. Run in a Fabric notebook:

# Identify last-good version per affected table
spark.sql("DESCRIBE HISTORY lh_gold.fact_daily_revenue").show(20, False)

# Restore by version
spark.sql("RESTORE TABLE lh_gold.fact_daily_revenue TO VERSION AS OF 145")

# Or restore by timestamp (use deploy time as cutoff)
spark.sql("""
    RESTORE TABLE lh_gold.fact_daily_revenue
    TO TIMESTAMP AS OF '2026-04-27 06:00:00'
""")

⚠️ Order matters. Restore Bronze β†’ Silver β†’ Gold in dependency order. Restore Power BI dataset cache afterwards (refresh dataset).

Step 5 β€” Notify Stakeholders

Subject: [INC-{YYYYMMDD}-deploy-rollback] RESOLVED β€” Prod rolled back to {sha}

Status: RESOLVED
Started: {HH:MM UTC}
Resolved: {HH:MM UTC}

Impact:
- {feature/dataset} regressed from {bad-deploy-time} to {rollback-time}
- {N} customers affected / {N} reports stale

Action taken:
- Rolled back to commit {last-known-good-sha}
- {Delta tables restored if applicable}

Forward plan:
- Postmortem scheduled for {date}
- Root cause investigation owned by {name}
- Forward-fix PR tracked at {link}

Hotfix Procedure

Goal: Apply an urgent Prod fix without going through the full Dev β†’ Staging cycle. Use sparingly.

Threshold for hotfix: SEV1/SEV2 incident, customer impact > 30 min projected, no safe rollback path. All other fixes go through Standard Promotion.

Step 1 β€” Open Hotfix Branch from main

git checkout main
git pull origin main
git checkout -b hotfix/INC-$(date +%Y%m%d)-{short-desc}

Branch naming convention: hotfix/INC-YYYYMMDD-{slug} (e.g., hotfix/INC-20260427-bronze-null-fix).

Step 2 β€” Targeted Fix + Minimal Test

  • Change the smallest possible surface to fix the incident.
  • Add a regression test that fails without the fix and passes with it.
  • Run pytest validation/unit_tests/ for the affected module only.
  • Get one approver review (Platform Lead or designate). Hotfix PRs do NOT need full CAB.

Step 3 β€” Direct Staging β†’ Prod Deployment

Hotfix flow skips Dev but must hit Staging for at least a 5-minute smoke before Prod (no exceptions, even for SEV1).

# Push hotfix branch
git push origin hotfix/INC-20260427-bronze-null-fix

# Deploy to Staging from hotfix branch
gh workflow run deploy-fabric.yml \
  --ref hotfix/INC-20260427-bronze-null-fix \
  -f target_environment=staging \
  -f dry_run=false

# 5-minute smoke test in Staging β€” minimum
python scripts/fabric_smoke_test.py --workspace-id "$FABRIC_STAGING_WORKSPACE_ID"

# Deploy to Prod (still requires 2 approvers β€” incident commander can be one)
gh workflow run deploy-fabric.yml \
  --ref hotfix/INC-20260427-bronze-null-fix \
  -f target_environment=prod \
  -f dry_run=false

Step 4 β€” Required Followup (Within 24 Hours)

A hotfix deploy is incomplete until the fix is back-merged to all environments and main:

  • Open PR to merge hotfix/... β†’ main (squash merge to preserve history)
  • Verify Dev workspace receives the fix on next push-to-main deploy
  • Verify Staging is in sync (re-deploy to staging if drift)
  • Close the hotfix branch after merge

Step 5 β€” Postmortem Requirement

Every hotfix triggers a mandatory postmortem within 48 hours, regardless of incident severity:

  • Use Blameless Postmortem Template
  • File at docs/postmortems/{YYYY-MM-DD}-{slug}.md
  • Include: why hotfix was justified, why standard process couldn't be used, what process change would prevent recurrence

Verification Checklist

Run this checklist after every deploy to Staging and Prod. All boxes must be checked before declaring promotion complete.

Items Deployed

  • Item count match: GET /workspaces/{id}/items?type=Notebook returns expected count vs. source
  • Lakehouse count matches (lh_bronze, lh_silver, lh_gold + agency lakehouses)
  • Pipeline count matches
  • Semantic Model count matches
  • Reports count matches

Schemas

  • DESCRIBE EXTENDED lh_bronze.{table} matches expected schema for affected tables
  • No unexpected new columns or dropped columns in Silver/Gold
  • Delta table version incremented as expected (one per deploy)

Data Freshness

  • Latest partition timestamp on bronze tables is within SLA (<24 hr)
  • Silver/Gold pipelines have run successfully post-deploy
  • No backlog of failed pipeline activities

Power BI

  • Semantic model refresh succeeds (POST /datasets/{id}/refreshes)
  • Top 5 reports render with current data
  • No "Cannot connect to data source" errors in dataset settings

Customer Queries

  • Sample compliance query returns expected count (CTR, SAR thresholds)
  • Sample BI query latency within SLA (<2s p95)
  • Direct Lake fallback path tested (Power BI report load)

Operational

  • No new error alerts firing in Workspace Monitoring
  • Capacity utilization in green band (<70%) within 15 min of deploy
  • No new entries in pipeline failure KQL query (last 1 hr)

Post-Promotion Actions

After successful Prod deployment:

  1. Update CHANGELOG.md with release notes (link the PR / commit SHA)
  2. Tag the release:
git tag -a v$(date +%Y.%m.%d) -m "Release {description}"
git push origin v$(date +%Y.%m.%d)
  1. Notify stakeholders (deploy-success template):
βœ… Prod deploy complete β€” {commit-sha}
Items deployed: {N notebooks, M lakehouses, K pipelines}
Verification: all checks passed
Tag: v2026.04.27
Release notes: {link}
  1. Update Archon Session Context document with deploy outcome
  2. Monitor for 2 hours post-deploy: capacity, pipeline runs, Power BI errors
  3. Close the deploy ticket in Archon (status = done)

Escalation

Trigger Action Who
Validate stage fails 2x consecutively Page Release Engineer Release Engineer
Staging smoke test fails Notify Platform Lead, halt promotion Platform Lead
Prod deploy fails midway Page on-call, open incident channel SEV2 On-call + IC
Verification check fails post-Prod Begin Rollback Procedure IC + Platform Lead
Approver unavailable for Prod gate Escalate to VP Eng for delegation Platform Lead β†’ VP Eng
Service Principal auth failure See Auth Failure Playbook On-call
Data corruption suspected SEV1, page VP Eng + Compliance IC β†’ VP Eng

Quick-Reference Commands

fabric-cicd CLI

# Dry-run preview to Staging
python scripts/fabric-cicd-deploy.py \
  --workspace-id "$FABRIC_STAGING_WORKSPACE_ID" \
  --environment staging \
  --item-type-in-scope Notebook Lakehouse SemanticModel \
  --dry-run

# Live deploy to Prod
python scripts/fabric-cicd-deploy.py \
  --workspace-id "$FABRIC_PROD_WORKSPACE_ID" \
  --environment prod \
  --item-type-in-scope Notebook Lakehouse SemanticModel

GitHub Actions workflow_dispatch

# Trigger staging deploy from main
gh workflow run deploy-fabric.yml \
  -f target_environment=staging \
  -f dry_run=false

# Trigger prod deploy (requires 2 approvers post-trigger)
gh workflow run deploy-fabric.yml \
  -f target_environment=prod \
  -f dry_run=false

# Trigger from a rollback branch
gh workflow run deploy-fabric.yml \
  --ref rollback/INC-20260427 \
  -f target_environment=prod \
  -f dry_run=false

# Watch the latest run
gh run watch

# View prior run logs
gh run view --log

Deployment Pipelines REST API

# Get token
TOKEN=$(az account get-access-token \
  --resource https://api.fabric.microsoft.com \
  --query accessToken -o tsv)

# List deployment pipelines
curl -X GET \
  "https://api.fabric.microsoft.com/v1/deploymentPipelines" \
  -H "Authorization: Bearer $TOKEN"

# Get pipeline stages
curl -X GET \
  "https://api.fabric.microsoft.com/v1/deploymentPipelines/${PIPELINE_ID}/stages" \
  -H "Authorization: Bearer $TOKEN"

# Deploy Staging (stage 1) β†’ Prod (stage 2)
curl -X POST \
  "https://api.fabric.microsoft.com/v1/deploymentPipelines/${PIPELINE_ID}/deploy" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sourceStageOrder": 1,
    "targetStageOrder": 2,
    "options": {
      "allowOverwriteArtifact": true,
      "allowCreateArtifact": true
    }
  }'

# Check deploy operation status
curl -X GET \
  "https://api.fabric.microsoft.com/v1/deploymentPipelines/${PIPELINE_ID}/operations/${OPERATION_ID}" \
  -H "Authorization: Bearer $TOKEN"

Delta RESTORE (Data Rollback)

# Inspect history
spark.sql("DESCRIBE HISTORY lh_gold.fact_daily_revenue").show(20, False)

# Restore by version (preferred β€” exact)
spark.sql("RESTORE TABLE lh_gold.fact_daily_revenue TO VERSION AS OF 145")

# Restore by timestamp (use deploy timestamp)
spark.sql("""
    RESTORE TABLE lh_gold.fact_daily_revenue
    TO TIMESTAMP AS OF '2026-04-27 06:00:00'
""")

# Verify post-restore row count
spark.sql("SELECT COUNT(*) AS row_count FROM lh_gold.fact_daily_revenue").show()

Power BI Dataset Refresh (Post-Deploy)

curl -X POST \
  "https://api.powerbi.com/v1.0/myorg/groups/${WORKSPACE_ID}/datasets/${DATASET_ID}/refreshes" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"notifyOption": "MailOnFailure"}'

Promotion & Rollback Flow Diagrams

Standard Promotion Flow

flowchart TD
    PR[PR opened against main] -->|Approve + Merge| MAIN[main branch]
    MAIN -->|push trigger| VAL{Validate}
    VAL -->|Pass| DEV[Deploy to Dev<br/>auto, dry-run by default]
    VAL -->|Fail| FIX[Fix + new PR]
    DEV -->|Manual dispatch + 1 approver| STG[Deploy to Staging]
    STG --> SMOKE{Smoke Test}
    SMOKE -->|Pass| GATE{Prod Approval Gate<br/>2 approvers + 5 min timer}
    SMOKE -->|Fail| HALT[HALT β€” investigate<br/>do not promote]
    GATE -->|Approved| PROD[Deploy to Prod]
    GATE -->|Rejected| HALT
    PROD --> VERIFY{Verification Checklist}
    VERIFY -->|All pass| DONE[Tag release<br/>notify stakeholders<br/>monitor 2hr]
    VERIFY -->|Any fail| RB[Rollback Procedure]

    style DONE fill:#27AE60,stroke:#1E8449,color:#fff
    style HALT fill:#E74C3C,stroke:#C0392B,color:#fff
    style RB fill:#E74C3C,stroke:#C0392B,color:#fff
    style PROD fill:#F39C12,stroke:#D68910,color:#fff

Rollback Flow

flowchart TD
    DETECT[Customer impact / verification fail] --> DECL[Declare rollback<br/>open incident channel]
    DECL --> IDENT[Identify last-known-good commit]
    IDENT --> CHOOSE{Rollback type}
    CHOOSE -->|Item-only regression| DP[Deployment Pipelines<br/>redeploy prior version]
    CHOOSE -->|Git-native revert| FCD[fabric-cicd<br/>deploy prior commit]
    CHOOSE -->|Data-affecting| DELTA[Delta RESTORE<br/>Bronze→Silver→Gold]
    CHOOSE -->|Catastrophic| DR[Workspace restore<br/>SEV1 DR procedure]
    DP --> VERIFY[Re-run Verification Checklist]
    FCD --> VERIFY
    DELTA --> VERIFY
    DR --> VERIFY
    VERIFY -->|Pass| NOTIFY[Notify stakeholders<br/>schedule postmortem]
    VERIFY -->|Fail| ESCALATE[Escalate SEV β†’ page VP Eng]

    style DECL fill:#E74C3C,stroke:#C0392B,color:#fff
    style NOTIFY fill:#27AE60,stroke:#1E8449,color:#fff
    style ESCALATE fill:#922B21,stroke:#641E16,color:#fff

Hotfix Flow

flowchart LR
    INC[Active SEV1/SEV2 incident] --> HF[Open hotfix/ branch from main]
    HF --> FIX[Targeted fix + regression test]
    FIX --> APV[1 approver review]
    APV --> STG[Deploy to Staging]
    STG --> SMOKE[5-min smoke minimum]
    SMOKE --> PROD[Deploy to Prod<br/>2 approvers]
    PROD --> BACK[Backmerge to main within 24hr]
    BACK --> PM[Mandatory postmortem within 48hr]

    style INC fill:#E74C3C,stroke:#C0392B,color:#fff
    style PM fill:#F39C12,stroke:#D68910,color:#fff

Common Failure Modes

Failure Mode Symptom Diagnosis Resolution
Item not deploying publish_all_items skips an item, no error Item type not in --item-type-in-scope, or .platform file missing Add type to scope; commit .platform; re-run
Schema drift DESCRIBE EXTENDED shows columns not in source Manual portal edit in Prod (out-of-band change) Detect via Git Integration "incoming changes"; revert via Deployment Pipelines back-deploy
Auth failure DefaultAzureCredential returns 401 OIDC token expired, SP lost workspace permission Verify SP is Contributor on workspace; re-grant if needed; see Auth Failure Playbook
Lakehouse rebind failure Notebook in Prod references lh_bronze_dev Deployment rule missing on Prod stage Configure Deployment Rule for Lakehouse rebind; redeploy
Connection string drift Semantic Model points to dev SQL server Connection rule missing Configure connection rule per stage
Item dependency error Notebook deploys before Lakehouse exists fabric-cicd ordering edge case Deploy Lakehouses in a separate publish_all_items call first (see fabric-cicd doc)
Stale items not cleaned up Old notebook present in Prod after rename fabric-cicd does not delete Manually delete via portal or REST API: DELETE /items/{id}
Pipeline schedule still active in Dev Dev pipeline triggers on Prod data Dev pipeline binding wrong Disable schedule in Dev workspace; enable only in target environment
Power BI dataset refresh fails post-deploy Refresh job errors in PBI Connection/credentials not bound on deployed model Open dataset settings β†’ Edit credentials; re-run refresh
Workflow approver unavailable Prod deploy stuck pending approval Approver OOO Use designate list; escalate to Platform Lead
Backwards-deploy collision Deployment Pipelines blocks back-deploy Item exists in target with newer version Use allowOverwriteArtifact: true in REST body or selectively deploy
Concurrent deploy conflict Workflow waits or fails concurrency.group already running Wait for in-flight deploy to finish; do not cancel mid-deploy
Capacity throttling during deploy Items publish slowly or time out F-SKU CU pressure from concurrent workloads Schedule deploys outside peak; see Capacity Throttling Response

Runbook When to Use
Incident Response Template Master template β€” open incident channel, classify SEV, run PIR
Pipeline Failure Triage Pipeline activity failed during/after deploy
Auth Failure Playbook Workspace Identity / Service Principal failures
Capacity Throttling Response Deploy slow due to capacity CU pressure
Data Quality Incident GE failure post-deploy, downstream consumer impact
Multi-Region Failover Region outage requires DR rebuild from Staging
Document Description
Deployment Pipelines Stage-based promotion feature reference
Git Integration Workspace ↔ Git source control
fabric-cicd Deployment Programmatic deployment with the Python library
Change Management RFC, freeze windows, rollback policy
Identity & RBAC Patterns Service Principal scoping for deploys
Multi-Tenant Workspace Architecture Cross-tenant promotion considerations
Migration Patterns Larger structural migrations beyond standard promotion
Disaster Recovery (BCDR) Workspace restore, backup retention

⬆️ Back to Top | πŸ“š Runbooks Index | 🏠 Home