Home > Best Practices > Performance & Parallelism
🚀 Performance & Parallelism Best Practices¶
Last Updated: 2026-04-15 | Version: 2.0 Status: ✅ Final | Maintainer: Documentation Team
📖 Overview¶
Performance optimization in Microsoft Fabric spans multiple workloads: Copy Activities, Spark notebooks, pipeline orchestration, Direct Lake semantic models, and KQL queries. This guide provides concrete configuration recommendations, benchmark guidelines, and cost-performance trade-off analysis for the casino gaming, federal agency, and healthcare workloads in this project.
🧭 Table of Contents¶
- 📦 Copy Activity Optimization
- ✨ Spark Notebook Performance
- 🔄 Pipeline Parallelism
- ⚡ Direct Lake Performance
- 🔍 KQL Query Performance
- 📈 Benchmark Guidelines by Data Volume
- 💰 Cost vs Performance Trade-Offs
📦 Copy Activity Optimization¶
Data Integration Units (DIU)¶
DIUs (formerly Data Movement Units) determine the compute power allocated to a Copy Activity. Higher DIU counts increase throughput for large data moves but also increase CU consumption.
Default behavior:
| Source/Sink Combination | Default DIU | Auto-Tune |
|---|---|---|
| Cloud to Cloud | 4 | Yes (up to 256) |
| On-premises to Cloud (via Gateway) | 4 | No |
| Cloud to On-premises (via Gateway) | 4 | No |
When to increase DIUs beyond default:
| Scenario | Recommended DIU | Reason |
|---|---|---|
| Files > 1 GB from blob/ADLS | 16-32 | Parallel file reads |
| Single large file (> 10 GB) | 32-64 | Parallel block reads |
| Many small files (> 10,000) | 16-32 | Parallel file listing and transfer |
| Cross-region copy | 16-64 | Overcome network latency |
| USDA/NOAA bulk historical load | 32 | Large initial backfill |
| Casino daily slot telemetry | 8-16 | Moderate daily volume |
Configuration:
{
"type": "Copy",
"typeProperties": {
"source": { "type": "DelimitedTextSource" },
"sink": { "type": "LakehouseTableSink" },
"dataIntegrationUnits": 32,
"enableStaging": true
}
}
Monitoring DIU effectiveness:
After a Copy Activity run, check the usedDataIntegrationUnits and throughput in the run output. If usedDataIntegrationUnits < dataIntegrationUnits, the source cannot saturate the allocated DIUs and you should reduce them.
Parallel Copy Settings¶
The parallelCopies setting controls how many threads read from the source simultaneously within a single Copy Activity.
Default: Auto (typically 4-5 for cloud sources)
When to increase beyond default:
| Source Type | Recommended parallelCopies | Condition |
|---|---|---|
| Partitioned SQL table | 8-32 | Table has good partition column |
| File-based (many files) | 16-32 | > 1,000 files per batch |
| REST API (paginated) | 1-4 | API rate limits apply |
| Oracle via Gateway | 4-8 | Gateway node has sufficient CPU |
| USDA NASS API | 2-4 | Respect API rate limits |
| NOAA weather stations | 8-16 | Many endpoints, low per-request cost |
{
"typeProperties": {
"parallelCopies": 16,
"source": {
"type": "SqlSource",
"partitionOption": "DynamicRange",
"partitionSettings": {
"partitionColumnName": "transaction_date",
"partitionUpperBound": "2026-03-12",
"partitionLowerBound": "2026-01-01"
}
}
}
}
Source Partitioning Strategies¶
Partitioning the source read is the single most impactful optimization for large table extraction.
Physical partitioning:
Use when the source table already has partitions (e.g., Oracle RANGE partitions, SQL Server partition schemes).
Dynamic range partitioning:
Use when the source has a good distribution column (date, ID) but is not physically partitioned.
{
"partitionOption": "DynamicRange",
"partitionSettings": {
"partitionColumnName": "player_id",
"partitionUpperBound": "10000000",
"partitionLowerBound": "1"
}
}
Decision matrix:
flowchart TD
A[Source Table] --> B{Has Physical Partitions?}
B -->|Yes| C[Use PhysicalPartitionsOfTable]
B -->|No| D{Has Good Distribution Column?}
D -->|Yes| E{Column Type?}
D -->|No| F[No Partitioning - Consider Gateway Scale-Up]
E -->|Integer/BigInt| G[DynamicRange on Integer Column]
E -->|Date/DateTime| H[DynamicRange on Date Column]
E -->|String| I[Avoid - Poor Distribution]
G --> J[Set Lower/Upper Bounds]
H --> J
J --> K[parallelCopies = Partition Count] Staging with Compression¶
For large cross-network transfers, enable staging through an intermediate blob storage with compression.
{
"typeProperties": {
"enableStaging": true,
"stagingSettings": {
"linkedServiceName": {
"referenceName": "ls_staging_blob",
"type": "LinkedServiceReference"
},
"path": "staging/fabric-copy",
"enableCompression": true
}
}
}
When to use staging:
| Scenario | Staging Recommended | Compression |
|---|---|---|
| On-premises to Lakehouse (via Gateway) | Yes | Yes (gzip) |
| Cross-region cloud transfer | Yes | Yes |
| REST API with large payloads | Optional | Yes |
| Cloud same-region | No | Not needed |
| Small tables (< 100 MB) | No | Overhead exceeds benefit |
Network Optimization¶
| Optimization | Configuration | Impact |
|---|---|---|
| Use Managed VNet integration | Workspace settings > Networking | Eliminates public internet hops |
| Self-hosted IR placement | Deploy in same region as source | Reduces latency |
| ExpressRoute for on-prem | Azure networking | Dedicated bandwidth |
| Parallel gateway nodes | Add nodes to IR cluster | Linear throughput scaling |
✨ Spark Notebook Performance¶
Partition Strategy¶
The number of partitions determines parallelism in Spark. Too few partitions underutilize the cluster; too many create overhead from task scheduling and small file problems.
Rule of thumb: partition count = 2-4x total cores
| Cluster Size | Total Cores | Recommended Partitions | Use Case |
|---|---|---|---|
| Small (4 nodes x 8 cores) | 32 | 64-128 | Bronze ingestion |
| Medium (8 nodes x 8 cores) | 64 | 128-256 | Silver transformations |
| Large (16 nodes x 8 cores) | 128 | 256-512 | Gold aggregations, ML |
Setting partition count:
# Read with explicit partition count
df = spark.read.format("delta").load("abfss://bronze@onelake.dfs.fabric.microsoft.com/slot_telemetry/")
# Check current partition count
print(f"Current partitions: {df.rdd.getNumPartitions()}")
# Repartition for processing (use coalesce to reduce, repartition to increase)
df = df.repartition(128) # Shuffle-based, use for increasing partitions
df = df.coalesce(64) # No shuffle, use for reducing partitions
# Repartition by key column for join optimization
df = df.repartition(128, "player_id")
When to repartition:
flowchart TD
A[Read Data] --> B{Partition Count?}
B -->|< 2x cores| C[Repartition Up]
B -->|2x-4x cores| D[Good - No Change]
B -->|> 10x cores| E[Coalesce Down]
C --> F{Need Key-Based Joins?}
F -->|Yes| G["repartition(n, 'key_col')"]
F -->|No| H["repartition(n)"]
E --> I["coalesce(target_partitions)"] Broadcast Joins for Small Tables¶
When joining a large table with a small table (< 100 MB by default, configurable up to 2 GB), use broadcast joins to avoid expensive shuffles.
from pyspark.sql import functions as F
# Automatic broadcast (Spark decides based on table size)
# Default threshold: spark.sql.autoBroadcastJoinThreshold = 10485760 (10 MB)
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 100 * 1024 * 1024) # 100 MB
# Explicit broadcast hint
from pyspark.sql.functions import broadcast
large_df = spark.read.format("delta").load(".../slot_telemetry/")
small_df = spark.read.format("delta").load(".../machine_reference/")
result = large_df.join(
broadcast(small_df),
on="machine_id",
how="left"
)
Broadcast join candidates in this project:
| Small Table | Typical Size | Joined With |
|---|---|---|
machine_reference | ~5 MB | Slot telemetry (billions) |
player_demographics | ~50 MB | Transaction records |
usda_state_codes | < 1 MB | Crop production data |
noaa_station_metadata | ~10 MB | Weather observations |
epa_monitor_locations | ~5 MB | AQI readings |
icd10_codes | ~20 MB | Healthcare encounter records |
faa_airport_codes | ~2 MB | Flight/incident records |
Caching Strategies¶
Cache DataFrames that are read multiple times in the same notebook.
# Cache when a DataFrame is used in multiple downstream operations
silver_df = spark.read.format("delta").load(".../silver_player_transactions/")
silver_df.cache()
silver_df.count() # Trigger materialization
# Use in multiple operations
daily_summary = silver_df.groupBy("transaction_date").agg(F.sum("amount"))
player_summary = silver_df.groupBy("player_id").agg(F.sum("amount"))
# Unpersist when done
silver_df.unpersist()
Cache vs Persist:
| Method | Storage Level | Use When |
|---|---|---|
cache() | MEMORY_AND_DISK (default) | DataFrame used 2-5 times |
persist(StorageLevel.MEMORY_ONLY) | Memory only | Small DataFrame, no spill tolerance |
persist(StorageLevel.DISK_ONLY) | Disk only | Large DataFrame, memory pressure |
| No caching | Recompute each time | DataFrame used once |
Cache management rules:
- Always
unpersist()when done -- cached data consumes cluster memory - Never cache DataFrames larger than 50% of cluster memory
- Cache after filters and projections, not before
- Trigger materialization with
.count()after caching
Predicate Pushdown with Delta Lake¶
Delta Lake supports predicate pushdown, which means filters are pushed down to the file level, skipping entire files that don't match.
# Good: Filter pushes down to Delta file-level statistics
df = spark.read.format("delta") \
.load(".../bronze_slot_telemetry/") \
.filter(F.col("event_date") == "2026-03-12") # Only reads relevant files
# Bad: UDF prevents pushdown
from pyspark.sql.types import BooleanType
@F.udf(BooleanType())
def custom_filter(date_val):
return str(date_val) == "2026-03-12"
df = spark.read.format("delta") \
.load(".../bronze_slot_telemetry/") \
.filter(custom_filter(F.col("event_date"))) # Reads ALL files
Maximizing pushdown effectiveness:
| Technique | Impact | Implementation |
|---|---|---|
| Filter early | High | Place .filter() immediately after .read() |
| Use built-in functions | High | Avoid UDFs in filter conditions |
| Partition by date | High | PARTITIONED BY (event_date) |
| Maintain file statistics | Medium | OPTIMIZE regularly |
| Use column pruning | Medium | Select only needed columns after filter |
Z-ORDER Optimization¶
Z-ORDER co-locates related data in the same files, dramatically improving query performance for filtered reads.
-- Optimize slot telemetry for queries filtered by machine_id and event_date
OPTIMIZE bronze_slot_telemetry
ZORDER BY (machine_id, event_date);
-- Optimize USDA data for queries filtered by state and commodity
OPTIMIZE silver_usda_crop_production
ZORDER BY (state_code, commodity_code);
-- Optimize EPA data for geographic queries
OPTIMIZE silver_epa_aqi_readings
ZORDER BY (state_code, county_code, parameter_code);
Z-ORDER strategy by table:
| Table | Z-ORDER Columns | Query Pattern |
|---|---|---|
bronze_slot_telemetry | machine_id, event_date | Machine-specific time range queries |
silver_player_transactions | player_id, transaction_date | Player activity history |
gold_daily_revenue | property_id, revenue_date | Property-level reporting |
silver_usda_crop_production | state_code, year, commodity_code | State/commodity analysis |
silver_noaa_observations | station_id, observation_date | Station time series |
silver_epa_aqi | state_code, monitor_id, date | Regional air quality trends |
silver_healthcare_encounters | facility_id, encounter_date | Facility-level analytics |
V-ORDER for Direct Lake¶
V-ORDER is a write-time optimization that sorts data within Parquet files for optimal Direct Lake performance. It is enabled by default in Fabric Lakehouses.
# V-ORDER is applied automatically when writing to Fabric Lakehouse tables
# Verify it is enabled:
spark.conf.get("spark.sql.parquet.vorder.enabled") # Should be "true"
# Force V-ORDER on external writes:
spark.conf.set("spark.sql.parquet.vorder.enabled", "true")
df.write.mode("overwrite").format("delta").saveAsTable("lh_gold.daily_revenue")
V-ORDER impact:
| Metric | Without V-ORDER | With V-ORDER | Improvement |
|---|---|---|---|
| Direct Lake query time | Baseline | 20-50% faster | Significant |
| VertiPaq scan efficiency | Standard | Optimized column segments | High |
| Write time | Baseline | 5-15% slower | Minor trade-off |
| Storage size | Baseline | Similar or slightly smaller | Neutral |
Adaptive Query Execution (AQE)¶
AQE dynamically optimizes query execution based on runtime statistics. It is enabled by default in Fabric Spark.
# Verify AQE is enabled (should be by default)
spark.conf.get("spark.sql.adaptive.enabled") # "true"
# Key AQE settings
spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true")
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true")
spark.conf.set("spark.sql.adaptive.localShuffleReader.enabled", "true")
# AQE automatically:
# 1. Coalesces small post-shuffle partitions
# 2. Splits skewed partitions for balanced processing
# 3. Converts sort-merge joins to broadcast joins when runtime stats show small tables
AQE benefits by scenario:
| Scenario | AQE Optimization | Benefit |
|---|---|---|
| Highly filtered data | Coalesces small partitions | Fewer tasks, less overhead |
| Skewed join keys (e.g., popular players) | Splits skewed partitions | Even execution time |
| Small table after filters | Converts to broadcast join | Eliminates shuffle |
| Uncertain data volumes | Adapts at runtime | No manual tuning needed |
🔄 Pipeline Parallelism¶
ForEach Activity with Batch Count¶
The ForEach activity processes items in parallel up to the configured batchCount.
{
"name": "ForEach_Ingest_Sources",
"type": "ForEach",
"typeProperties": {
"isSequential": false,
"batchCount": 20,
"items": "@pipeline().parameters.source_tables"
}
}
Batch count recommendations:
| Scenario | Recommended Batch Count | Reasoning |
|---|---|---|
| Copy from multiple API endpoints | 5-10 | API rate limits |
| Copy from multiple database tables | 10-20 | Database connection limits |
| Process multiple files | 20-50 | I/O bound, high parallelism safe |
| Execute multiple notebooks | 5-10 | Each notebook consumes Spark capacity |
| USDA multi-dataset ingestion | 4-8 | NASS API rate limits |
| NOAA station data collection | 10-20 | Many stations, light per-call cost |
| EPA monitor data collection | 10-15 | Moderate API limits |
Maximum batch count: 50 (Fabric limit)
Concurrent Pipeline Runs¶
Control how many instances of the same pipeline can run simultaneously.
Concurrency by pipeline type:
| Pipeline Type | Recommended Concurrency | Reason |
|---|---|---|
| Master orchestration | 1 | Prevent overlapping runs |
| Source-specific ingestion | 3-5 | Allow multiple sources in parallel |
| Notebook execution | 2-3 | Limited by Spark capacity |
| Data quality checks | 5-10 | Lightweight, I/O bound |
Dependency Optimization: Fan-Out / Fan-In¶
Design pipeline dependencies to maximize parallel execution.
flowchart TD
subgraph FanOut["Fan-Out: Parallel Ingestion"]
A[Start] --> B[Copy: USDA]
A --> C[Copy: NOAA]
A --> D[Copy: EPA]
A --> E[Copy: DOI]
A --> F[Copy: SBA]
end
subgraph FanIn["Fan-In: Wait for All"]
B --> G[Checkpoint: All Sources Complete]
C --> G
D --> G
E --> G
F --> G
end
subgraph Sequential["Sequential: Transform"]
G --> H[Notebook: Silver USDA]
G --> I[Notebook: Silver NOAA]
G --> J[Notebook: Silver EPA]
H --> K[Notebook: Gold Federal Summary]
I --> K
J --> K
end Anti-pattern: unnecessary sequential dependencies
BAD: Copy A -> Copy B -> Copy C -> Transform A -> Transform B -> Transform C
GOOD: Copy A ─┐ Transform A ─┐
Copy B ─┼─ Checkpoint ─ Transform B ─┼─ Gold Aggregation
Copy C ─┘ Transform C ─┘
Pipeline Activity Limits¶
Be aware of Fabric pipeline limits when designing parallel flows:
| Limit | Value | Mitigation |
|---|---|---|
| Activities per pipeline | 80 | Split into child pipelines |
| ForEach batch count | 50 | Use multiple ForEach activities |
| Nested pipeline depth | 8 levels | Flatten where possible |
| Concurrent pipeline runs (workspace) | Capacity-dependent | Stagger schedules |
| Variables per pipeline | 50 | Use JSON parameters for complex config |
⚡ Direct Lake Performance¶
Framing Optimization¶
Direct Lake reads Delta Lake files directly without importing data. "Framing" is the process of identifying which files to read. Optimize framing by keeping the number of Delta files manageable.
-- Check file count per table
SELECT COUNT(*) as file_count, SUM(size) / (1024*1024*1024) as size_gb
FROM delta_log('.../gold_daily_revenue/')
WHERE add IS NOT NULL;
Framing optimization targets:
| Table Size | Target File Count | Target File Size | Optimization |
|---|---|---|---|
| < 1 GB | 1-10 | 100-200 MB | OPTIMIZE weekly |
| 1-10 GB | 10-50 | 200 MB - 1 GB | OPTIMIZE daily |
| 10-100 GB | 50-200 | 500 MB - 1 GB | OPTIMIZE daily + ZORDER |
| > 100 GB | 100-500 | 1 GB | OPTIMIZE + partitioning |
-- Optimize and compact files
OPTIMIZE gold_daily_revenue;
-- Vacuum old files (default retention: 7 days)
VACUUM gold_daily_revenue RETAIN 168 HOURS;
Column Pruning¶
Direct Lake performance improves when queries request fewer columns. Design semantic models with this in mind.
Best practices:
- Create Gold layer views with only the columns needed for reporting
- Avoid
SELECT *in semantic model table definitions - Use calculated columns sparingly -- prefer measures
- Split wide tables into fact and dimension tables (star schema)
-- Instead of one wide table with 50+ columns:
-- Create focused Gold tables
-- Fact: daily_slot_revenue (narrow, high-row)
CREATE TABLE gold_daily_slot_revenue AS
SELECT
revenue_date,
machine_id,
property_id,
coin_in,
coin_out,
gross_revenue,
net_revenue,
handle_pulls
FROM silver_slot_summary;
-- Dimension: machine_details (wide, low-row)
CREATE TABLE gold_dim_machine AS
SELECT
machine_id,
machine_type,
manufacturer,
denomination,
floor_location,
install_date,
property_id
FROM silver_machine_master;
Aggregation Tables¶
Pre-aggregate common query patterns to reduce Direct Lake scan volume.
-- Pre-aggregated table for common dashboard queries
CREATE TABLE gold_agg_monthly_revenue AS
SELECT
DATE_TRUNC('month', revenue_date) AS revenue_month,
property_id,
machine_type,
SUM(gross_revenue) AS total_gross_revenue,
SUM(net_revenue) AS total_net_revenue,
SUM(handle_pulls) AS total_handle_pulls,
COUNT(DISTINCT machine_id) AS active_machines,
AVG(gross_revenue / NULLIF(handle_pulls, 0)) AS avg_revenue_per_pull
FROM gold_daily_slot_revenue
GROUP BY DATE_TRUNC('month', revenue_date), property_id, machine_type;
Aggregation table strategy:
| Aggregation Level | Refresh Frequency | Use Case |
|---|---|---|
| Hourly | Every hour | Real-time floor monitoring |
| Daily | Daily at 6 AM | Standard reporting |
| Weekly | Weekly on Monday | Trend analysis |
| Monthly | 1st of month | Executive dashboards |
| YTD | Daily | Year-over-year comparison |
Composite Models for Large Datasets¶
When a single Direct Lake model cannot hold all required data, use composite models.
flowchart LR
subgraph DirectLake["Direct Lake (Hot Data)"]
A[Current Year Facts]
B[Active Dimensions]
end
subgraph Import["Import Mode (Cold Data)"]
C[Historical Aggregates]
D[Reference Tables]
end
subgraph DualStorage["Dual Storage"]
E[Shared Dimensions]
end
DirectLake --> F[Composite Model]
Import --> F
DualStorage --> F
F --> G[Power BI Reports] When to use composite models:
| Scenario | Approach | Reason |
|---|---|---|
| Current + historical data | Direct Lake (current) + Import (historical) | Historical data rarely changes |
| Detail + aggregates | Direct Lake (detail) + Import (pre-aggregated) | Aggregates are small |
| Multi-source reporting | Direct Lake (primary) + DirectQuery (live) | Some data needs real-time |
🔍 KQL Query Performance¶
Materialized Views¶
Materialized views pre-compute and store aggregation results, dramatically improving repeated query patterns.
// Create materialized view for hourly slot telemetry aggregates
.create materialized-view SlotTelemetryHourly on table slot_telemetry
{
slot_telemetry
| summarize
avg_coin_in = avg(coin_in),
total_coin_in = sum(coin_in),
total_coin_out = sum(coin_out),
handle_pulls = count(),
active_machines = dcount(machine_id)
by bin(event_time, 1h), property_id, machine_type
}
// Create materialized view for EPA AQI hourly readings
.create materialized-view AqiHourly on table epa_aqi_readings
{
epa_aqi_readings
| summarize
avg_aqi = avg(aqi_value),
max_aqi = max(aqi_value),
reading_count = count()
by bin(reading_time, 1h), state_code, county_code, parameter_code
}
Update Policies¶
Update policies transform data at ingestion time, avoiding repeated transformation costs on query.
// Define a function that transforms raw data
.create function ParseSlotTelemetry()
{
raw_slot_events
| extend
machine_id = tostring(parsed_data.machine_id),
event_type = tostring(parsed_data.event_type),
coin_in = todouble(parsed_data.coin_in),
coin_out = todouble(parsed_data.coin_out),
event_time = todatetime(parsed_data.timestamp)
| project-away parsed_data
}
// Attach update policy to target table
.alter table slot_telemetry policy update
@'[{"IsEnabled": true, "Source": "raw_slot_events", "Query": "ParseSlotTelemetry()", "IsTransactional": true}]'
Hot / Warm Cache Tiers¶
Configure cache policies to keep frequently accessed data in hot (SSD) cache while older data moves to warm (blob) storage.
// Hot cache: last 30 days (fast queries)
.alter table slot_telemetry policy caching hot = 30d
// Retention: keep 2 years total
.alter table slot_telemetry policy retention softdelete = 730d
// Different policies by table importance
.alter table compliance_ctr_filings policy caching hot = 365d // Keep compliance data hot for 1 year
.alter table noaa_weather_observations policy caching hot = 90d // 3 months hot for weather
.alter table epa_aqi_readings policy caching hot = 90d // 3 months hot for AQI
.alter table usda_crop_production policy caching hot = 180d // 6 months hot for crop data
Cache tier sizing:
| Data Volume (Hot) | Recommended Eventhouse SKU | Use Case |
|---|---|---|
| < 100 GB | Small | Dev/test, low-volume monitoring |
| 100 GB - 1 TB | Medium | Production streaming + federal data |
| 1-10 TB | Large | Full casino floor + all federal agencies |
| > 10 TB | Extra Large | Multi-property enterprise |
KQL Query Best Practices¶
// GOOD: Filter early, project early
slot_telemetry
| where event_time > ago(24h) // Filter first
| where machine_type == "slot" // Narrow further
| project event_time, machine_id, coin_in, coin_out // Select only needed columns
| summarize total_coin_in = sum(coin_in) by bin(event_time, 1h), machine_id
// BAD: Late filtering, wide scan
slot_telemetry
| summarize total_coin_in = sum(coin_in) by bin(event_time, 1h), machine_id
| where event_time > ago(24h) // Filter AFTER aggregation -- scans all data
// GOOD: Use has/has_cs instead of contains for string matching
slot_telemetry
| where machine_id has "SLOT-A" // Uses term index, fast
// BAD: contains scans all strings
slot_telemetry
| where machine_id contains "SLOT-A" // Full string scan, slow
// GOOD: Limit joins to filtered datasets
let active_machines = machine_reference | where status == "active" | project machine_id;
slot_telemetry
| where event_time > ago(1h)
| where machine_id in (active_machines)
// GOOD: Use materialized views for repeated patterns
SlotTelemetryHourly // Pre-computed materialized view
| where event_time > ago(7d)
| summarize daily_revenue = sum(total_coin_in - total_coin_out) by bin(event_time, 1d)
Query performance checklist:
| Rule | Impact | Example |
|---|---|---|
| Filter by time first | High | where event_time > ago(24h) |
| Project early | High | project col1, col2 before aggregation |
Use has over contains | High | has uses term index |
Avoid * in join | Medium | Specify explicit columns |
Use materialized_view() hint | Medium | Forces use of materialized view |
Limit dcount to necessary precision | Medium | Use dcount(col, 1) for approximate |
| Batch time-series queries | Medium | Use bin() instead of exact timestamps |
📈 Benchmark Guidelines by Data Volume¶
Small (< 1 GB daily ingestion)¶
Typical for: SBA loan data (weekly), DOI resource metadata, reference tables.
| Component | Configuration | Expected Performance |
|---|---|---|
| Copy Activity | DIU: 4, parallelCopies: auto | < 2 minutes |
| Spark Transform | Starter pool, single node | < 5 minutes |
| Pipeline total | Sequential OK | < 15 minutes |
| Direct Lake refresh | Automatic framing | < 30 seconds |
Medium (1-10 GB daily ingestion)¶
Typical for: USDA crop data, NOAA weather, EPA AQI, casino daily summaries.
| Component | Configuration | Expected Performance |
|---|---|---|
| Copy Activity | DIU: 16, parallelCopies: 8-16 | 5-15 minutes |
| Spark Transform | Medium pool, 4-8 nodes | 10-20 minutes |
| Pipeline total | Parallel where possible | 30-60 minutes |
| Direct Lake refresh | OPTIMIZE weekly | < 2 minutes |
| KQL ingestion | Streaming ingestion | Near real-time |
Large (10-100 GB daily ingestion)¶
Typical for: Casino slot telemetry (all properties), NOAA historical backfill, healthcare encounter data.
| Component | Configuration | Expected Performance |
|---|---|---|
| Copy Activity | DIU: 32-64, parallelCopies: 16-32 | 15-45 minutes |
| Spark Transform | Large pool, 8-16 nodes, AQE enabled | 20-45 minutes |
| Pipeline total | Full parallelism, fan-out/fan-in | 1-2 hours |
| Direct Lake refresh | OPTIMIZE daily + ZORDER | 2-5 minutes |
| Delta OPTIMIZE | Run after major loads | 10-30 minutes |
Very Large (> 100 GB daily ingestion)¶
Typical for: Multi-property casino telemetry, real-time streaming at scale, full federal data warehouse.
| Component | Configuration | Expected Performance |
|---|---|---|
| Copy Activity | DIU: 64-256, staging enabled | 30-90 minutes |
| Spark Transform | Custom pool, 16+ nodes, partitioned writes | 30-60 minutes |
| Pipeline total | Micro-batch pattern, hourly increments | Continuous |
| Direct Lake | Composite model, aggregation tables | 5-15 minutes framing |
| Delta OPTIMIZE | Scheduled, partitioned tables | 30-60 minutes |
| Eventhouse | Materialized views, update policies | Sub-second query |
💰 Cost vs Performance Trade-Offs¶
CU Consumption by Activity Type¶
Microsoft Fabric uses Capacity Units (CUs) as the universal billing metric. Understanding CU consumption helps optimize cost.
| Activity | CU Consumption | Optimization |
|---|---|---|
| Copy Activity (high DIU) | High during copy | Reduce DIUs if throughput is not bottleneck |
| Spark Notebook (large cluster) | Very high | Right-size cluster, use autoscale |
| Direct Lake query | Low (read-only) | Optimize file layout to reduce scan |
| KQL query | Medium | Materialized views reduce query CUs |
| Pipeline orchestration | Minimal | Negligible cost |
| Data Activator Reflex | Low | Minimal cost per evaluation |
Cost Optimization Strategies¶
flowchart TD
A[Cost Optimization] --> B[Right-Size Compute]
A --> C[Optimize Data Layout]
A --> D[Schedule Intelligently]
A --> E[Monitor and Adjust]
B --> B1[Match Spark pool to workload]
B --> B2[Use Starter Pools for dev]
B --> B3[Autoscale for variable loads]
C --> C1[OPTIMIZE and VACUUM regularly]
C --> C2[Partition by query patterns]
C --> C3[V-ORDER for Direct Lake]
D --> D1[Off-peak scheduling]
D --> D2[Stagger competing workloads]
D --> D3[Incremental over full loads]
E --> E1[Monitor CU utilization]
E --> E2[Review throttling events]
E --> E3[Track cost per pipeline] Cost-Performance Matrix¶
| Optimization | Performance Gain | Cost Impact | Priority |
|---|---|---|---|
| OPTIMIZE + ZORDER | 2-10x query speed | Moderate CU cost | High |
| Broadcast joins | 2-5x join speed | No additional cost | High |
| Incremental loads | 5-50x faster than full | Major CU savings | High |
| Materialized views (KQL) | 10-100x query speed | Storage + ingestion CU | High |
| Right-size Spark pool | 0-2x (depends on current sizing) | Significant savings | Medium |
| V-ORDER | 1.2-1.5x Direct Lake speed | Minor write overhead | Medium |
| AQE | 1.5-3x for skewed workloads | No additional cost | Medium |
| Pre-aggregation tables | 5-20x dashboard speed | Moderate storage + compute | Medium |
| Staging with compression | 1.5-2x copy speed | Additional storage cost | Low |
| Higher DIU count | 1-4x copy speed | Linear CU increase | Low |
F64 Capacity Budget Planning¶
For this project's F64 SKU:
| Workload | Estimated CU % | Schedule | Notes |
|---|---|---|---|
| Casino Bronze ingestion | 15% | Continuous (5-min micro-batches) | Slot telemetry, transactions |
| Federal data ingestion | 10% | Daily (6 AM UTC) | USDA, NOAA, EPA, DOI, SBA |
| Healthcare ingestion | 5% | Daily (2 AM UTC) | Tribal healthcare records |
| Silver transformations | 20% | After ingestion completes | Spark notebooks |
| Gold aggregations | 10% | After silver completes | Spark + SQL |
| Direct Lake / BI queries | 20% | Business hours | User queries, dashboards |
| KQL (Eventhouse) | 10% | Continuous | Real-time monitoring |
| Maintenance (OPTIMIZE, VACUUM) | 5% | Off-peak (midnight UTC) | Delta table maintenance |
| Buffer | 5% | - | Headroom for spikes |
⭐ Summary¶
Performance optimization in Microsoft Fabric follows a layered approach:
- Copy Activity: Right-size DIUs, enable parallel copies with source partitioning, use staging for cross-network transfers
- Spark: Optimize partitions (2-4x cores), broadcast small tables, cache reused DataFrames, enable AQE
- Delta Lake: OPTIMIZE + ZORDER for query patterns, V-ORDER for Direct Lake, VACUUM to manage file count
- Pipelines: Maximize parallelism with ForEach batch count and fan-out/fan-in dependency patterns
- Direct Lake: Column pruning, aggregation tables, composite models for mixed hot/cold data
- KQL: Materialized views, update policies, hot/warm cache tiers, filter-first query patterns
- Cost: Right-size compute, schedule off-peak, prefer incremental loads, monitor CU utilization
📚 Related Documents¶
| Document | Description |
|---|---|
| Pipelines & Data Movement | Pipeline configuration details |
| Spark & Notebooks | Spark configuration and NEE |
| Lakehouse Setup | Delta Lake table management |
| Warehouse Configuration | SQL optimization |
| Error Handling & Monitoring | Performance-related error handling |