Power BI Guide¶

Power BI is the visualization and reporting layer for CSA-in-a-Box — connecting the Gold layer of the medallion architecture to interactive dashboards, embedded analytics, and regulatory reports.

Why Power BI¶

Power BI is the native BI tool in the Microsoft analytics stack. It consumes the Gold layer (star-schema Delta tables) via Import, DirectQuery, or Direct Lake, and inherits sensitivity labels from Purview automatically. For federal tenants, Power BI is available in Azure Government with FedRAMP High. As Fabric reaches Gov GA (ADR-0010), Power BI semantic models transition to Direct Lake with zero rewrite.

Architecture Overview¶

graph LR
    subgraph "CSA Gold Layer"
        DIM1[dim_customer]
        DIM2[dim_product]
        DIM3[dim_date]
        FACT[fact_sales]
    end

    subgraph "Power BI Service"
        SM[Semantic Model<br/>Star Schema + DAX]
        RPT[Reports]
        DASH[Dashboards]
        PAG[Paginated Reports<br/>Regulatory]
    end

    subgraph "Distribution"
        APP[Power BI App]
        EMBED[Embedded Analytics]
        TEAMS[Microsoft Teams Tab]
        SHARE[SharePoint Web Part]
        EMAIL[Email Subscriptions]
    end

    DIM1 --> SM
    DIM2 --> SM
    DIM3 --> SM
    FACT --> SM

    SM --> RPT
    SM --> DASH
    SM --> PAG

    RPT --> APP
    RPT --> EMBED
    RPT --> TEAMS
    DASH --> SHARE
    PAG --> EMAIL

Connectivity Modes¶

Choosing the right connectivity mode is one of the most consequential decisions in a Power BI deployment. The table below maps each mode to the CSA-in-a-Box context.

Mode	Data freshness	Performance	Max size	When to use	When to avoid
Import	Scheduled refresh (min 30 min)	Fastest queries	~10 GB (Pro) / ~400 GB (Premium)	<100 GB models, complex DAX, batch-refresh OK	Real-time freshness required
DirectQuery	Real-time (per-query)	Backend-dependent	Unlimited	Real-time dashboards, very large datasets	Complex DAX, high-concurrency reports
Composite	Mixed (Import dims + DQ facts)	Good with aggregations	Mixed	Large fact tables + small dimensions	Simple models where Import suffices
Direct Lake	Near real-time (Fabric only)	Import-tier perf, DQ-tier freshness	Fabric capacity limits	New models on Fabric/OneLake	Non-Fabric environments
Live Connection	Real-time	Depends on source AS model	Source-defined	Shared enterprise semantic models	When you need local DAX measures

flowchart TD
    START[New Semantic Model] --> Q1{Data in Fabric OneLake?}
    Q1 -->|Yes| DL[Direct Lake]
    Q1 -->|No| Q2{Model size?}
    Q2 -->|"< 100 GB"| Q3{Real-time needed?}
    Q2 -->|"> 100 GB"| Q4[Composite Model<br/>Import dims + DQ facts]
    Q3 -->|No| IMP[Import Mode]
    Q3 -->|Yes| DQ[DirectQuery]

    style DL fill:#2e7d32,color:#fff
    style IMP fill:#1565c0,color:#fff
    style DQ fill:#e65100,color:#fff
    style Q4 fill:#6a1b9a,color:#fff

CSA-in-a-Box Default

For current PaaS deployments, use Import mode against the Gold layer Delta tables in ADLS (via Databricks SQL endpoint or Synapse Serverless). When migrating to Fabric, switch to Direct Lake — the semantic model DAX and schema remain identical; only the data source connection changes.

Semantic Models¶

Star Schema from Gold Layer¶

CSA-in-a-Box Gold tables are modeled as star schemas in dbt. The Power BI semantic model maps directly to these tables.

Semantic Model
├── dim_customer     (Type: Dimension, Cardinality: ~100K)
├── dim_product      (Type: Dimension, Cardinality: ~10K)
├── dim_date         (Type: Dimension, Cardinality: ~10 years)
├── dim_geography    (Type: Dimension, Cardinality: ~50K)
├── fact_sales       (Type: Fact, Cardinality: ~50M)
└── fact_inventory   (Type: Fact, Cardinality: ~10M)

DAX Measures¶

Centralize business logic in DAX measures rather than report-level calculations.

// Revenue — base measure
Revenue = SUMX(fact_sales, fact_sales[quantity] * fact_sales[unit_price])

// Revenue YTD — time intelligence
Revenue YTD =
    TOTALYTD([Revenue], dim_date[date])

// Revenue vs Prior Year — comparison
Revenue vs PY =
    VAR _CurrentRevenue = [Revenue]
    VAR _PriorYear = CALCULATE([Revenue], SAMEPERIODLASTYEAR(dim_date[date]))
    RETURN
        DIVIDE(_CurrentRevenue - _PriorYear, _PriorYear)

// Customer Count — distinct count
Active Customers =
    DISTINCTCOUNT(fact_sales[customer_key])

Calculation Groups¶

Use calculation groups for time intelligence patterns that apply to multiple measures (YTD, QTD, MTD, Prior Year).

// Calculation group: Time Intelligence
// Items: Current, YTD, QTD, MTD, PY, PY YTD
// Applied to: Revenue, Cost, Margin, Customer Count

// YTD item
TOTALYTD(SELECTEDMEASURE(), dim_date[date])

// Prior Year item
CALCULATE(SELECTEDMEASURE(), SAMEPERIODLASTYEAR(dim_date[date]))

Row-Level Security (RLS)¶

// Role: Regional Manager
// Table: dim_geography
// DAX filter:
[region] = USERPRINCIPALNAME()

// Role: Department Head
// Table: dim_department
// DAX filter:
[department_id] IN
    VALUES(
        FILTER(security_mapping,
            security_mapping[user_email] = USERPRINCIPALNAME()
        )[department_id]
    )

RLS + DirectQuery

RLS filters are pushed to the backend on every query in DirectQuery mode. Ensure the backend data source has indexes on the filtered columns to avoid full table scans.

Performance¶

Aggregation Tables¶

For large fact tables (>100M rows), create pre-aggregated tables that Power BI queries first.

Aggregation level	Source table	Grain	Row count
`agg_sales_daily`	`fact_sales`	Date + Product + Region	~5M
`agg_sales_monthly`	`fact_sales`	Month + Product Category	~100K
`fact_sales` (detail)	`fact_sales`	Transaction-level	~500M

Power BI automatically routes queries to the appropriate aggregation level.

Composite Models¶

Combine Import and DirectQuery for the best of both:

Import: dimension tables, aggregation tables (fast filters, fast slicers)
DirectQuery: large fact tables (avoid importing 500M+ rows)

Query Folding¶

What is Query Folding?

Query folding translates Power Query (M) transformations into native SQL queries pushed to the source. Folded queries run at the source; unfolded queries download raw data and transform in-memory. Always verify folding in the Power Query editor (right-click step > View Native Query).

Incremental Refresh¶

Configure incremental refresh for large Import models to refresh only recent data.

Policy:
  Archive period:  3 years
  Refresh period:  30 days
  Detect changes:  dim_date[last_modified] (optional, for real-time)
  Only refresh complete periods: Yes

Parameter	Setting	Rationale
RangeStart	`DateTime.From(Date.AddDays(DateTime.FixedLocalNow(), -1095))`	3-year archive
RangeEnd	`DateTime.From(Date.AddDays(DateTime.FixedLocalNow(), -1))`	Yesterday
Refresh rows	Last 30 days	Balance between freshness and refresh time

Development Workflow¶

Power BI Desktop¶

The standard authoring tool for semantic models and reports.

PBIP (Power BI Project) Format¶

PBIP serializes .pbix files into human-readable JSON/TMDL for Git version control.

my-report/
├── .gitignore
├── my-report.pbip                    # Project file
├── definition.pbir                   # Report definition
├── my-semantic-model/
│   ├── definition.pbism              # Semantic model bindings
│   ├── model.tmdl                    # Model-level config
│   ├── tables/
│   │   ├── dim_customer.tmdl
│   │   ├── dim_product.tmdl
│   │   ├── fact_sales.tmdl
│   │   └── _measures.tmdl
│   └── relationships.tmdl
└── report/
    ├── report.json
    └── pages/
        ├── page1.json
        └── page2.json

Do: Use PBIP format for all new reports

PBIP enables PR reviews for semantic model changes. A DAX measure change is a one-line diff in _measures.tmdl — reviewable like any code change.

Deployment Pipelines¶

Stage	Purpose	Refresh schedule	Access
Development	Author and test	Manual or on-demand	Report authors
Test	UAT and validation	Scheduled (matches prod)	QA team + stakeholders
Production	End-user consumption	Scheduled (8x/day typical)	All authorized users

graph LR
    DEV[Development<br/>Workspace] -->|Deploy| TEST[Test<br/>Workspace]
    TEST -->|Deploy| PROD[Production<br/>Workspace]

    DEV -.->|Git sync| GIT[(Git Repository)]

ALM Toolkit¶

For environments without deployment pipelines, ALM Toolkit compares and deploys semantic model changes between workspaces.

Governance¶

Workspace Organization¶

Workspace pattern	Contents	Naming convention
Domain workspace	Domain-specific reports + semantic models	`WS-{Domain}-{Env}`
Shared models	Enterprise semantic models consumed cross-domain	`WS-SharedModels-{Env}`
Embedded	Reports embedded in external apps	`WS-Embedded-{Env}`
Self-service	User-created reports (sandboxed)	`WS-SelfService-{Domain}`

Endorsement¶

Level	Meaning	Who can apply
Certified	Approved for organization-wide use	Power BI admin or delegated certifier
Promoted	Recommended by the author; not formally certified	Report author
No endorsement	Default state	N/A

Certify Gold-Layer Models

Semantic models built on the CSA Gold layer and reviewed by data stewards should be certified. This signals to users that the model is the authoritative source for that domain.

Sensitivity Labels from Purview¶

Sensitivity labels assigned to Gold layer tables in Purview propagate to Power BI semantic models and reports automatically. Users see the label in the Power BI service and cannot export data beyond the label's protection policy.

Usage Metrics¶

Enable usage metrics on production workspaces to track:

Report views per day / per user
Most/least used reports (candidates for deprecation)
Query performance (slow queries indicate missing aggregations)

Embedded Analytics¶

Embed for Your Organization¶

Uses the user's own Entra ID identity. Best for internal portals.

// Power BI Client (JavaScript SDK)
const embedConfig = {
    type: "report",
    id: reportId,
    embedUrl: embedUrl,
    accessToken: aadToken, // User's Entra ID token
    tokenType: models.TokenType.Aad,
    settings: {
        filterPaneEnabled: false,
        navContentPaneEnabled: false,
    },
};

const report = powerbi.embed(container, embedConfig);

Embed for Your Customers (Power BI Embedded)¶

Uses a service principal; end users do not need Power BI licenses.

SKU	v-cores	Use case	Approx. monthly cost
A1	1	Dev/test	~$750
A2	2	Small production	~$1,500
A4	8	Medium production	~$6,000
F2 (Fabric)	2	Fabric-based embedding	~$500

Capacity Planning¶

Estimate concurrent users and average render time per report
Use the Power BI Embedded Capacity Calculator
Start with A2 for production; scale up based on usage metrics

Fabric Integration¶

Direct Lake Mode¶

Direct Lake reads Delta/Parquet files directly from OneLake — no import refresh, no DirectQuery backend hits. It delivers import-tier query performance with near real-time data freshness.

graph LR
    ONELAKE[(OneLake<br/>Delta Tables)] -->|Direct read| DL[Direct Lake<br/>Semantic Model]
    DL --> RPT[Reports & Dashboards]

Migration Path (PaaS to Fabric)¶

Current (PaaS)	Fabric equivalent	Migration effort
Import from Databricks SQL	Direct Lake from OneLake	Change data source; keep DAX
DirectQuery to Synapse Serverless	Direct Lake from Lakehouse	Change data source; keep DAX
Scheduled refresh (8x/day)	No refresh needed (Direct Lake)	Remove refresh schedule
PBIX in SharePoint	PBIP in Git + Fabric workspace	One-time conversion

Real-Time Reporting¶

Streaming Datasets¶

For operational dashboards that need second-level freshness:

Create a streaming dataset in Power BI
Push data via REST API from Azure Functions or Stream Analytics
Pin streaming tiles to a dashboard

Paginated Reports¶

For regulatory and compliance reporting that requires exact formatting, page breaks, and PDF export:

Feature	Interactive report	Paginated report
Layout	Responsive canvas	Fixed-pixel layout
Data volume	Aggregated	Row-level detail
Export	PDF, PPTX, PNG	PDF, Word, Excel, CSV
Use case	Exploration	Regulatory filing
Authoring tool	Power BI Desktop	Power BI Report Builder

Security¶

Access Control Layers¶

graph TD
    WS[Workspace Roles<br/>Admin / Member / Contributor / Viewer] --> SM2[Semantic Model Permissions<br/>Build / Read]
    SM2 --> RLS2[Row-Level Security<br/>DAX filters per role]
    RLS2 --> OLS[Object-Level Security<br/>Hide columns per role]
    OLS --> LABELS[Sensitivity Labels<br/>Export restrictions]

Layer	Scope	Managed by
Workspace roles	Who can see/edit reports	Power BI admin
Semantic model permissions	Who can build reports on a model	Model owner
Row-Level Security (RLS)	Which rows a user can see	DAX filter expressions
Object-Level Security (OLS)	Which columns/tables are hidden	Tabular Editor or TMDL
Sensitivity labels	Export and sharing restrictions	Purview / MIP

Cost Optimization¶

Cost driver	Optimization	Impact
Pro licenses	Use Free + Embedded for view-only consumers	Significant per-user savings
Import refresh frequency	Reduce from 8x/day to 4x/day where acceptable	50% fewer refresh-hours
Premium capacity	Right-size SKU; use autoscale in Fabric	Avoid over-provisioning
Paginated reports	Use only for regulatory — not ad hoc queries	Reduce compute consumption
Large models	Add aggregation tables; switch to composite model	Smaller import footprint

Anti-Patterns¶

Don't: Import > 100 GB without aggregations

Large Import models consume premium memory and take hours to refresh. Use aggregation tables or composite models to keep the in-memory footprint manageable.

Don't: Build reports without a semantic model layer

Reports built directly on SQL queries bypass DAX measures, RLS, and governance. Always build reports on top of a published semantic model.

Don't: Refresh imports every 30 minutes

Frequent refreshes consume capacity and overlap with user query time. If you need sub-hour freshness, switch to DirectQuery or Direct Lake.

Do: Use PBIP format and deploy through pipelines

Version-controlled semantic models are reviewable, auditable, and recoverable. Treat Power BI artifacts like code.

Do: Certify authoritative semantic models

Endorsement prevents report sprawl by guiding users to trusted models.

Checklist¶

Pattern — Power BI & Fabric Roadmap
ADR-0010 — Fabric Strategic Target
Best Practices — Data Governance
Guide — Microsoft Purview (sensitivity labels, catalog)
Guide — Azure Data Explorer (real-time dashboards)

Power BI Guide¶

Why Power BI¶

Architecture Overview¶

Connectivity Modes¶

Semantic Models¶

Star Schema from Gold Layer¶

DAX Measures¶

Calculation Groups¶

Row-Level Security (RLS)¶

Performance¶

Aggregation Tables¶

Composite Models¶

Query Folding¶

Incremental Refresh¶

Development Workflow¶

Power BI Desktop¶

PBIP (Power BI Project) Format¶

Deployment Pipelines¶

ALM Toolkit¶

Governance¶

Workspace Organization¶

Endorsement¶

Sensitivity Labels from Purview¶

Usage Metrics¶

Embedded Analytics¶

Embed for Your Organization¶

Embed for Your Customers (Power BI Embedded)¶

Capacity Planning¶

Fabric Integration¶

Direct Lake Mode¶

Migration Path (PaaS to Fabric)¶

Real-Time Reporting¶

Streaming Datasets¶

Paginated Reports¶

Security¶

Access Control Layers¶

Cost Optimization¶

Anti-Patterns¶

Checklist¶

Related Documentation¶