๐ฌ Frequently Asked Questions (FAQ)¶
๐ Home > ๐ Docs > ๐ฌ FAQ
Last Updated: 2026-04-27 | Version: 3.0 Status: โ Final | Maintainer: Documentation Team
๐ Table of Contents¶
- ๐ฏ General Questions
- ๐ Prerequisites & Setup
- ๐ Deployment Questions
- ๐ฒ Data Generation
- ๐ Tutorial Questions
- ๐ Power BI & Reporting
- ๐ Security & Compliance
- ๐ง Troubleshooting
- ๐ฐ Cost & Licensing
- ๐ณ Docker & Dev Containers
- ๐๏ธ Architecture Deep Dive
- โก Performance Tuning
- ๐ค MLOps & AI
- ๐ Migrations
- ๐ ๏ธ Dev Experience
- ๐ Compliance Frameworks
๐ฏ General Questions¶
What is this POC?¶
This is a production-ready proof-of-concept environment for Microsoft Fabric, purpose-built for the casino and gaming industry. It demonstrates:
- โ Complete medallion architecture (Bronze/Silver/Gold)
- โ Real-time slot machine telemetry
- โ Player 360 analytics
- โ Regulatory compliance (NIGC MICS, FinCEN BSA)
- โ Direct Lake Power BI dashboards
- โ Data governance with Microsoft Purview
Key Value: Provides a working reference implementation that can be customized for your specific casino operations.
Who is this POC for?¶
๐ฅ Click to see target audience
| Audience | Use Case | |----------|----------| | ๐๏ธ **Data Architects** | Evaluate Fabric for enterprise data platforms | | ๐ป **Data Engineers** | Learn medallion architecture patterns | | ๐ **BI Developers** | Build Direct Lake Power BI solutions | | ๐ฐ **Gaming Industry** | Implement analytics for casino operations | | ๐๏ธ **Solution Architects** | Design cloud-native analytics platforms | | ๐ **Students/Learners** | Hands-on experience with Microsoft Fabric |What data domains are covered?¶
๐ฒ Click to see data domains
| Domain | Icon | Description | |--------|------|-------------| | **Slot Machines** | ๐ฐ | Telemetry, meter readings, jackpots, performance | | **Table Games** | ๐ฒ | Hand results, chip tracking, dealer analytics | | **Player/Loyalty** | ๐ค | Profiles, rewards, Player 360 insights | | **Financial/Cage** | ๐ฐ | Transactions, fills, credits, cash management | | **Security** | ๐ | Surveillance, access control, incident tracking | | **Compliance** | ๐ | CTR/SAR reporting, W-2G forms, regulatory filings |How long does it take to complete the full POC?¶
3-Day Workshop Format: - Day 1 (8 hours): Medallion foundation (Bronze + Silver) - Day 2 (8 hours): Gold layer + Real-time analytics - Day 3 (8 hours): Power BI + Governance
Self-Paced Learning: 2-4 weeks (2-3 hours per week)
See the 3-Day POC Agenda for detailed schedules.
Can I use this for non-gaming industries?¶
Yes! While this POC is casino-focused, the architecture patterns apply to many industries:
| Industry | Adaptations |
|---|---|
| ๐ฅ Healthcare | Patient analytics, HIPAA compliance |
| ๐๏ธ Federal Government | DOT/FAA datasets, FedRAMP compliance |
| ๐ Retail/E-commerce | Customer 360, supply chain optimization |
| ๐ฆ Financial Services | Transaction monitoring, fraud detection |
See the tutorial modules (14โ37) for vertical-specific examples.
๐ Prerequisites & Setup¶
What Azure resources do I need?¶
โ๏ธ Click to see requirements
**Required:** - โ Azure subscription (Owner or Contributor access) - โ Microsoft Fabric capacity (F64 recommended for POC) - โ Resource providers registered (see below) **Optional:** - Microsoft Purview account (for governance features) - Azure Key Vault (for secrets management) - Private endpoints (for production security) **Resource Providers to Register:** See [Prerequisites Guide](PREREQUISITES.md) for complete details.What is the minimum Fabric capacity SKU?¶
| SKU | Use Case | Monthly Cost (24/7) |
|---|---|---|
| F2 | Minimal testing | ~$265 |
| F4 | Development (recommended minimum) | ~$530 |
| F64 | POC recommended | ~$8,480 |
Why F64 for POC? - Sufficient compute for parallel data processing - Can handle real-time streaming workloads - Supports multiple concurrent users
Can I start smaller? Yes, but expect slower performance with F2/F4. Good for learning tutorials but not representative of production performance.
See Cost Estimation Guide for detailed pricing.
Do I need to install anything locally?¶
Quick Answer: No, if using Dev Container or Codespaces.
Local Installation Option:
| Tool | Version | Required? |
|---|---|---|
| Azure CLI | 2.50+ | Yes |
| Bicep | 0.22+ | Yes |
| Git | 2.40+ | Yes |
| PowerShell | 7.0+ | Yes |
| Python | 3.10+ | For data generation |
| Docker | Latest | For containerized generators |
Easiest Setup: Use GitHub Codespaces (zero installation) or VS Code Dev Container.
See Prerequisites - Dev Container Setup.
How do I enable Fabric in my tenant?¶
๐ Click to see steps
**Requirements:** - Microsoft Entra ID Global Administrator OR Fabric Administrator role **Steps:** 1. Navigate to [Microsoft Fabric Admin Portal](https://app.fabric.microsoft.com/admin-portal) 2. Select **Tenant settings** 3. Under **Microsoft Fabric**, enable: - โ Users can create Fabric items - โ Users can use OneLake 4. (Optional) Restrict to specific security groups 5. Click **Apply** **Verification:** 1. Go to [app.fabric.microsoft.com](https://app.fabric.microsoft.com) 2. You should see the Fabric home page 3. Click **+ New** - you should see Lakehouse, Warehouse, etc. **Troubleshooting:** If Fabric options don't appear, contact your Microsoft Entra ID admin to verify tenant settings.What permissions do I need?¶
๐ Click to see permission requirements
**Azure Subscription:** - Minimum: **Contributor** role - Recommended: **Owner** role (for initial setup) **Fabric Workspace:** - **Admin**: Full control (workspace owners) - **Member**: Can create and edit items (data engineers) - **Contributor**: Can create/edit but not share (developers) - **Viewer**: Read-only (business users) **Why Owner for Setup?** - Configure RBAC and resource providers - Create service principals for CI/CD - Set up managed identities After initial setup, Contributor is sufficient for day-to-day operations.๐ Deployment Questions¶
What are the deployment options?¶
| Method | Best For | Time to Deploy |
|---|---|---|
| ๐ณ Docker Quick Start | Generate sample data, test generators | ~5 minutes |
| โ๏ธ Azure Bicep | Full infrastructure deployment | ~30 minutes |
| ๐ PowerShell Scripts | Automated CI/CD workflows | ~30 minutes |
| ๐ GitHub Actions | Continuous deployment pipelines | One-time setup |
See Deployment Guide for detailed instructions.
How do I deploy to Azure?¶
๐ Click to see quick deployment steps
**Prerequisites:** - Azure CLI logged in - Bicep extension installed - `.env` file configured **Quick Deployment:** **Deployment Time:** ~30 minutes for complete infrastructure. **What Gets Deployed:** - Fabric capacity - Purview account - ADLS Gen2 storage - Key Vault - Log Analytics workspace - Network security groupsCommon deployment errors?¶
โ ๏ธ Click to see common issues and fixes
#### Error: `Microsoft.Fabric/capacities resource provider not registered` **Fix:** Wait for "Registered" status (can take 5-10 minutes). --- #### Error: `AuthorizationFailed` **Cause:** Insufficient permissions **Fix:** Ensure you have Owner or Contributor role: --- #### Error: `SKU F64 not available in region` **Cause:** Capacity not available in selected region **Fix:** Check [Fabric capacity availability](https://learn.microsoft.com/fabric/enterprise/region-availability) and choose supported region. --- #### Error: `Purview account name already exists` **Cause:** Purview names are globally unique **Fix:** Choose a different name in `.env` file:How do I verify deployment succeeded?¶
โ Click to see verification steps
**Automated Verification:** **Manual Verification Checklist:** - [ ] Fabric capacity shows in Azure Portal - [ ] Fabric capacity shows in [Fabric Admin Portal](https://app.fabric.microsoft.com/admin-portal) - [ ] Purview account accessible - [ ] Storage account has ADLS Gen2 enabled - [ ] Key Vault accessible - [ ] Log Analytics receiving logs **Quick Portal Check:**How do I delete everything?¶
โ ๏ธ Warning: This is irreversible. Ensure you have backups.
# Delete resource group (removes all resources)
az group delete --name "rg-fabric-poc-dev" --yes --no-wait
# Remove lock first if resources are locked
az lock delete --name "CanNotDelete" --resource-group "rg-fabric-poc-dev"
See Deployment Guide - Cleanup for details.
๐ฒ Data Generation¶
How do I generate sample data?¶
๐ฐ Click to see data generation options
**Option 1: Docker (Easiest)**# Quick demo dataset (7 days, small)
docker-compose run --rm demo-generator
# Full dataset (30 days, production-like)
docker-compose run --rm data-generator
# Custom parameters
docker-compose run --rm data-generator --slots 100000 --players 5000 --days 14
What data volumes are generated by default?¶
| Data Type | Records | Size | Bronze Table |
|---|---|---|---|
| Slot Events | 500,000 | ~500 MB | bronze_slot_telemetry |
| Table Games | 100,000 | ~100 MB | bronze_table_games |
| Players | 10,000 | ~10 MB | bronze_player_profile |
| Financial | 50,000 | ~50 MB | bronze_financial_txn |
| Security | 25,000 | ~25 MB | bronze_security_events |
| Compliance | 10,000 | ~10 MB | bronze_compliance |
| Total | ~700,000 | ~700 MB |
Customization:
# Scale up for larger POCs
docker-compose run --rm data-generator --all --days 90
# Scale down for quick testing
docker-compose run --rm demo-generator # 7 days, smaller volumes
Is the generated data realistic?¶
Yes! The data generators include:
โ Realistic distributions based on industry patterns โ Referential integrity (Player IDs match across tables) โ Compliance logic (CTR $10K threshold, W-2G $1,200) โ Time-series patterns (hourly/daily seasonality) โ PII protection (hashed SSN, masked credit cards)
Example realistic patterns: - Slot machine hold percentage: 8-12% - Player loyalty tiers: Bronze (60%), Silver (30%), Gold (8%), Platinum (2%) - Peak gaming hours: 7pm-2am (weekends higher) - CTR generation: ~0.5% of transactions
See Data Generation - Data Quality Features.
How do I customize the generated data?¶
๐ง Click to see customization options
**Command Line Options:** **Programmatic Customization:** **Configuration Files:** Edit `data_generation/config/` YAML files for domain-specific customization.How do I stream data to Event Hub?¶
โก Click to see streaming setup
**Prerequisites:** - Azure Event Hub created - Connection string obtained **Docker Streaming:**EVENTHUB_CONNECTION_STRING="Endpoint=sb://..." \
EVENTHUB_NAME="slot-telemetry" \
STREAMING_RATE=10 \
docker-compose up streaming-generator
from generators import SlotMachineGenerator
from streaming import EventHubStreamer
# Configure streamer
streamer = EventHubStreamer(
connection_string=os.getenv("EVENTHUB_CONNECTION_STRING"),
eventhub_name="slot-telemetry"
)
# Stream events
generator = SlotMachineGenerator()
for event in generator.generate_stream(events_per_second=10):
streamer.send(event)
๐ Tutorial Questions¶
What's the learning path?¶
graph LR
T00[00-Setup] --> T01[01-Bronze]
T01 --> T02[02-Silver]
T02 --> T03[03-Gold]
T03 --> T04[04-Real-Time]
T04 --> T05[05-Power BI]
T05 --> T06[06-Pipelines]
T06 --> T07[07-Governance]
T07 --> T08[08-Mirroring]
T08 --> T09[09-AI/ML] Recommended Path: 1. ๐ข Foundation (00-01): Environment setup, Bronze layer 2. ๐ก Core (02-03): Silver and Gold layers 3. ๐ Advanced (04-05): Real-time analytics, Power BI 4. ๐ด Enterprise (06-09): Pipelines, governance, AI/ML
See Tutorials README for complete learning path.
Can I skip tutorials?¶
Not Recommended. Each tutorial builds on the previous one:
| Tutorial | Can Skip? | Notes |
|---|---|---|
| 00-Setup | โ No | Creates workspace and Lakehouses |
| 01-Bronze | โ No | Required for Silver layer |
| 02-Silver | โ No | Required for Gold layer |
| 03-Gold | โ No | Required for Power BI |
| 04-Real-Time | โ Yes | Optional for basic POC |
| 05-Power BI | โ ๏ธ Partial | Can use pre-built reports |
| 06-Pipelines | โ Yes | Optional for manual workflows |
| 07-Governance | โ Yes | Optional for POC |
| 08-Mirroring | โ Yes | Optional feature |
| 09-AI/ML | โ Yes | Advanced feature |
Minimum POC: Complete tutorials 00-03 + 05 (Power BI).
Tutorial 00: Environment setup issues?¶
โ ๏ธ Common setup problems
#### Issue: Can't create workspace **Cause:** Fabric not enabled in tenant **Fix:** Ask Microsoft Entra ID admin to enable Fabric tenant settings. --- #### Issue: Capacity appears paused **Cause:** Auto-pause enabled or manually paused **Fix:** 1. Go to [Fabric Admin Portal](https://app.fabric.microsoft.com/admin-portal) 2. Navigate to **Capacity settings** 3. Click **Resume** on your capacity --- #### Issue: Can't create Lakehouse **Cause:** Insufficient workspace permissions **Fix:** Ensure you have Member or Admin role in the workspace.Tutorial 01: Bronze layer issues?¶
โ ๏ธ Common Bronze layer problems
#### Issue: Data not loading into Lakehouse **Cause:** File path or format issues **Fix:** --- #### Issue: Schema mismatch errors **Cause:** Generated data doesn't match expected schema **Fix:** Use `mergeSchema` option: --- #### Issue: Large files causing timeouts **Cause:** File too large for single operation **Fix:** Process in batches or use streaming read.Tutorial 05: Direct Lake not working?¶
โ ๏ธ Common Direct Lake problems
#### Issue: Semantic model falls back to DirectQuery **Cause:** Delta table not V-Order optimized or too complex **Fix:** --- #### Issue: "Not supported in Direct Lake mode" error **Cause:** Using unsupported DAX features **Supported in Direct Lake:** - โ Most DAX functions - โ Calculated columns - โ Measures - โ Row-level security **Not Supported:** - โ Calculated tables - โ Some complex M queries - โ Composite models with Import **Fix:** Simplify DAX or move calculation to Gold layer. --- #### Issue: Performance is slow **Cause:** Query complexity or missing optimization **Fix:** 1. Run `OPTIMIZE` on Delta tables 2. Ensure table partitioning 3. Check Fabric capacity is active 4. Review DAX query performance in Performance Analyzer๐ Power BI & Reporting¶
What is Direct Lake mode?¶
Direct Lake is a revolutionary data connectivity mode that combines the best of Import and DirectQuery:
| Mode | Speed | Freshness | Data Duplication |
|---|---|---|---|
| Import | โก Fast | โ Stale (requires refresh) | Yes |
| DirectQuery | ๐ Slow | โ Fresh | No |
| Direct Lake | โก Fast | โ Fresh | No |
How it works: - Queries execute directly against Delta tables in OneLake - Uses V-Order optimization for sub-second performance - No data import or scheduled refresh required - Automatic fallback to DirectQuery if needed
See Tutorial 05: Direct Lake & Power BI.
Do I need a Power BI license?¶
๐ณ Click to see licensing requirements
**To Create Content:** - Power BI Pro OR - Power BI Premium Per User (PPU) **To Consume Content:** - Power BI Free (if content is in Premium/Fabric capacity) - Power BI Pro (if not in Premium capacity) **Fabric Capacity Advantage:** Users with free licenses can view reports published to Fabric capacity workspaces. **POC Recommendation:** - Get Power BI Pro trial (60 days free) - Or use Fabric capacity with free license for viewersHow often does Direct Lake refresh?¶
Answer: It doesn't! That's the beauty of Direct Lake.
Traditional Import Mode: - Requires scheduled refresh (e.g., every 8 hours) - Data is stale between refreshes - Consumes refresh capacity
Direct Lake: - โ Always queries the latest data - โ No refresh schedule needed - โ Updates appear immediately when Delta tables update
Exception: If you have calculated tables or composite models, those components might need refresh.
What reports are included?¶
| Report | Description | Key Visuals |
|---|---|---|
| ๐ฐ Casino Executive Dashboard | High-level KPIs | Revenue trends, floor performance, player metrics |
| ๐ฒ Slot Performance Analysis | Machine-level analytics | Hold %, utilization, jackpot frequency |
| ๐ค Player 360 View | Customer analytics | Segments, lifetime value, visit patterns |
| ๐ Compliance Monitoring | Regulatory reporting | CTR/SAR status, W-2G tracking, audit trails |
| โก Real-Time Floor Monitor | Live casino status | Machine status, alerts, occupancy |
Location: reports/report-definitions/
See Reports README for import instructions.
Can I customize the reports?¶
Absolutely! The reports are provided as starting templates.
Customization Options: 1. Edit in Power BI Desktop: - Open .pbip files - Modify visuals, add pages - Adjust DAX measures
- Create New Reports:
- Connect to existing semantic model
- Build custom visuals
-
Apply your branding
-
Add Custom DAX:
Best Practice: Copy template first, then customize.
How do I implement Row-Level Security (RLS)?¶
๐ Click to see RLS implementation
**Use Case:** Users should only see data for their casino property. **Step 1: Create Role in Semantic Model** 1. Open semantic model in Power BI Desktop 2. Go to **Modeling** > **Manage roles** 3. Create role: `PropertyFilter` 4. Add DAX filter: **Step 2: Test Role** 1. Click **Modeling** > **View as** 2. Select role and test user 3. Verify data is filtered correctly **Step 3: Assign Users** 1. Publish report to Fabric workspace 2. Go to semantic model security settings 3. Add users/groups to roles **Row-Level Security Patterns:** - Filter by region: `[Region] = "West"` - Filter by user email: `USERPRINCIPALNAME()` - Dynamic filtering from lookup table See [Security Guide - Row-Level Security](SECURITY.md#row-level-security-rls) for complete examples.๐ Security & Compliance¶
What compliance frameworks are covered?¶
๐ Click to see compliance coverage
| Framework | Description | Implementation | |-----------|-------------|----------------| | ๐ฐ **NIGC MICS** | Minimum Internal Control Standards | Meter accuracy validation, drop count verification | | ๐ฐ **FinCEN BSA** | Bank Secrecy Act | CTR/SAR reporting, $10K threshold detection | | ๐ณ **PCI-DSS** | Payment Card Industry | Card number masking, access controls | | ๐๏ธ **State Gaming** | Jurisdiction Requirements | Configurable audit trails, retention policies | **Compliance Features:** - Automated CTR generation (>= $10,000) - SAR pattern detection (structuring) - W-2G auto-generation ($1,200 slots, $600 keno) - 5-year data retention policies See [Security Guide - Compliance Requirements](SECURITY.md#-compliance-requirements).How is PII protected?¶
๐ Click to see PII handling
**Default PII Protection:** | PII Type | Method | Example | |----------|--------|---------| | SSN | Hashed (SHA-256) + Masked | `XXX-XX-1234` | | Names | First initial only | `J*** S***` | | Credit Cards | Masked last 4 digits | `****-****-****-1234` | | Phone | Partial mask | `(***) ***-4567` | | Email | Domain only | `j***@example.com` | **Bronze Layer:** Raw PII hashed/masked on ingestion **Silver Layer:** Only hashed values, no raw PII **Gold Layer:** No PII, only aggregated/anonymized data **Sample Data:** All provided sample data has PII pre-masked. **Testing Only:** Use `--include-pii` flag for development (never in production). See [Security Guide - PII Handling](SECURITY.md#pii-handling).How are secrets managed?¶
Never commit secrets to Git! This repository has multiple protections:
โ
.gitignore: Blocks common secret files โ
Pre-commit hook: Scans for high-risk patterns โ
.env.sample: Provides template without secrets โ
Sample data: All PII masked
Best Practices: 1. Use .env files locally (gitignored) 2. Store production secrets in Azure Key Vault 3. Use managed identities for Azure authentication 4. Enable the pre-commit hook:
If you accidentally commit a secret: 1. Assume it's compromised - rotate immediately 2. Remove from Git history with BFG Repo-Cleaner 3. Report per your security policy
See Security Guide - Repository Security.
What network security options are available?¶
๐ Click to see network security
**Private Endpoint Support:** - Azure Storage (ADLS Gen2) - Key Vault - Microsoft Purview - Log Analytics **Network Security Groups (NSG):** - Restrict inbound/outbound traffic - Segment subnets by function - Deny-by-default rules **Fabric Network Isolation:** - Connect Fabric workspace to VNet (Preview) - Private endpoints for OneLake - Firewall rules for managed endpoints **Configuration:** Edit `infra/modules/network.bicep` to enable private endpoints. See [Security Guide - Network Security](SECURITY.md#-network-security).๐ง Troubleshooting¶
Notebook fails with "Capacity not available"¶
Cause: Fabric capacity is paused or inactive.
Fix: 1. Go to Fabric Admin Portal 2. Navigate to Capacity settings 3. Ensure capacity status is Active 4. If paused, click Resume
Prevention: Disable auto-pause for POC demos.
"Cannot connect to Lakehouse" error¶
๐ Click to see debugging steps
**Check 1: Lakehouse exists** **Check 2: Workspace permissions** - Verify you have Member or Admin role - Check workspace settings > Users & permissions **Check 3: Lakehouse attached to notebook** 1. Open notebook 2. Click **Add Lakehouse** in left pane 3. Select existing Lakehouse 4. Click **Add** **Check 4: Capacity active** Ensure Fabric capacity is not paused.Delta table "file not found" errors¶
Cause: Delta transaction log corruption or incomplete writes.
Fix:
# Repair Delta table
from delta.tables import DeltaTable
DeltaTable.forPath(spark, "Tables/bronze_slot_telemetry").generate("symlink_format_manifest")
# Or vacuum old files
DeltaTable.forPath(spark, "Tables/bronze_slot_telemetry").vacuum()
Prevention: Use proper DataFrame writes with checkpoints.
Power BI reports show "Unable to connect"¶
๐ Click to see connection debugging
**Check 1: Semantic model exists** 1. Go to workspace 2. Verify semantic model is published 3. Click semantic model > Settings 4. Check data source credentials **Check 2: Direct Lake requirements** - Gold tables must be Delta format - Tables must be in OneLake - Workspace must be on Fabric capacity **Check 3: Permissions** - User must have Build permission on semantic model - Or Viewer permission for read-only **Check 4: Refresh semantic model** 1. Open semantic model 2. Click **Refresh now** 3. Check refresh history for errorsData generator produces "invalid schema" warnings¶
Cause: Schema mismatch between generator and expected Bronze schema.
Fix:
# Update to latest generator code
git pull origin main
# Or specify schema version
python generate.py --all --schema-version 1.1
Workaround: Use mergeSchema when reading:
"Out of memory" errors during data processing¶
๐พ Click to see memory optimization
**Cause:** Processing too much data at once or inefficient code. **Fix 1: Process in batches** **Fix 2: Repartition data** **Fix 3: Increase capacity** - Use larger Fabric SKU temporarily - Or reduce data volume for POC **Fix 4: Optimize DataFrame operations** - Use `select()` to limit columns early - Avoid `collect()` on large datasets - Use `coalesce()` instead of `repartition()` when reducing partitions๐ฐ Cost & Licensing¶
What will this POC cost me?¶
Quick Estimates:
| Scenario | Duration | Capacity | Estimated Cost |
|---|---|---|---|
| ๐งช POC Demo | 3 days | F64 (24 hrs/day) | $35-50 |
| ๐ง Development | 1 month | F4 (8 hrs/day, weekdays) | $175-265 |
| ๐ญ Production Pilot | 1 month | F64 (24/7) | $8,500-9,500 |
Cost Breakdown (1-month F64 24/7): - Fabric Capacity: ~\(8,500 (80%) - ADLS Gen2 Storage: ~\)500 (5%) - Purview: ~\(800 (8%) - Other services: ~\)700 (7%)
Major Cost Driver: Fabric capacity (75-80% of total cost).
See Cost Estimation Guide for detailed scenarios.
How can I reduce costs?¶
๐ก Click to see cost optimization strategies
**Strategy 1: Pause/Resume Capacity**# Pause capacity when not in use
az fabric capacity pause --name "fabric-casino-poc"
# Resume when needed
az fabric capacity resume --name "fabric-casino-poc"
Do I need to pay for Power BI separately?¶
Short Answer: Maybe, depends on your usage.
Licensing Options:
| Scenario | License Required | Cost |
|---|---|---|
| Create reports/semantic models | Power BI Pro or PPU | $10-20/user/month |
| View reports (Fabric capacity) | Free license | $0 |
| View reports (non-Fabric) | Power BI Pro | $10/user/month |
| Enterprise distribution | Premium capacity or Fabric | Capacity cost |
POC Recommendation: 1. Use Power BI Pro trial (60 days free) 2. Publish to Fabric workspace 3. Viewers can use free licenses
Note: Fabric capacity is already included in your F64 cost estimate - it provides Power BI Premium features.
What about Azure free tier/credits?¶
Azure Free Tier: - โ Microsoft Fabric not included in free tier - โ Some supporting services (Storage, Key Vault) have free allowances
Azure Credits: - โ Can use Fabric with Azure credits (students, startups) - โ Visual Studio subscriptions include monthly credits
Free Trials: - Power BI Pro: 60-day trial - Fabric capacity: Trial available via Microsoft - Azure subscription: $200 credit for 30 days (new customers)
POC on a Budget: - Use F2 capacity (\(265/month or ~\)9/day) - Pause when not in use - Limit to 3-day POC demo ($27-35 total)
๐ณ Docker & Dev Containers¶
What's the difference between Docker and Dev Container?¶
| Feature | Docker | Dev Container |
|---|---|---|
| Purpose | Run data generators | Full development environment |
| Requires | Docker Desktop only | Docker + VS Code |
| What's Inside | Python + generators | Python + Azure CLI + Bicep + extensions |
| Use Case | Generate data quickly | Complete coding environment |
| Persistent | No | Yes (VS Code workspace) |
Docker: Run generators, validate data, stream to Event Hub Dev Container: Complete development setup with all tools pre-installed
How do I use Docker for data generation?¶
# Quick demo (7 days, small dataset)
docker-compose run --rm demo-generator
# Full dataset (30 days, production-like volumes)
docker-compose run --rm data-generator
# Custom parameters
docker-compose run --rm data-generator --all --days 14 --format csv
# Specific data domains
docker-compose run --rm data-generator --slots 100000 --players 5000
Output: ./output directory
See Docker Support in main README.
How do I use Dev Containers?¶
Option 1: VS Code Local 1. Install Docker Desktop 2. Install Dev Containers extension 3. Open repository in VS Code 4. Click "Reopen in Container" when prompted
Option 2: GitHub Codespaces (Zero Installation) 1. Go to repository on GitHub 2. Click Code > Codespaces tab 3. Click Create codespace on main 4. Wait ~2 minutes for environment to build
What You Get: - โ Python 3.11 with all dependencies - โ Azure CLI + Bicep - โ PowerShell 7 - โ Git configured - โ All VS Code extensions pre-installed
Docker commands are failing?¶
๐ง Click to see Docker troubleshooting
**Issue: `docker-compose: command not found`** **Cause:** Docker Compose not installed or using wrong command **Fix:**# Docker Compose V2 (preferred)
docker compose run --rm data-generator
# Docker Compose V1 (legacy)
docker-compose run --rm data-generator
๐๏ธ Architecture Deep Dive¶
Why Lakehouse instead of Warehouse for this POC?¶
The POC chose Lakehouse as the primary store for three reasons: (1) the diverse data formats across 9 industry verticals (Parquet, CSV, JSON) favor schema-on-read flexibility; (2) the PySpark-first notebook workflow aligns naturally with Lakehouse's Spark engine; and (3) Direct Lake mode provides zero-copy Power BI connectivity without Import refresh schedules. Warehouse is the better choice for T-SQL-heavy teams or migrations from Synapse Dedicated SQL Pool.
See: DECISION_TREES.md | Lakehouse/Warehouse/SQL DB Decision Guide
What goes in each medallion layer?¶
| Layer | Content | Schema | Retention |
|---|---|---|---|
| Bronze | Raw ingested data, append-only, minimal transformation | Schema-on-read, source schema preserved | Full history |
| Silver | Cleansed, deduplicated, validated, enriched data | Schema-on-write, enforced constraints | Full history |
| Gold | Business aggregations, KPIs, star schema fact/dim tables | Star schema, V-Order optimized for Direct Lake | Rolling window or full |
The key principle: Bronze is append-only (never modify source records), Silver deduplicates and validates (MERGE upserts), Gold aggregates for consumption (overwrite on refresh).
See: Medallion Architecture Deep Dive
How should I design workspaces?¶
The recommended pattern for this POC is a per-environment workspace layout:
| Workspace | Purpose | Capacity |
|---|---|---|
ws-fabric-poc-dev | Development, notebook authoring | F4 (dev) |
ws-fabric-poc-staging | Integration testing, UAT | F16 (staging) |
ws-fabric-poc-prod | Production workloads | F64 (prod) |
Each workspace contains three Lakehouses (lh_bronze, lh_silver, lh_gold), one Warehouse (for T-SQL consumers), and one Eventhouse (for real-time). For multi-tenant scenarios, see Multi-Tenant Workspace Architecture.
See: Workspace Naming
When should I use shortcuts vs. copying data?¶
Use shortcuts when you want to query data in-place without storage duplication (e.g., referencing ADLS Gen2 landing zones or cross-workspace tables). Use copy (pipeline Copy Activity) when you need to transform data during ingestion, the source requires a data gateway, or you want full control over the data lifecycle in OneLake. Shortcuts are free (no storage cost); copies consume storage.
See: DECISION_TREES.md | Shortcut Transformations Notebook
What is Workspace Identity and when do I need it?¶
Workspace Identity is a managed identity scoped to a Fabric workspace. It enables credential-free authentication to Azure resources (Storage, Key Vault, Purview) from notebooks and pipelines -- no service principal secrets to rotate. Use it whenever your notebooks access Azure resources. The POC deploys it via infra/modules/security/workspace-identity.bicep.
See: OneLake Security | Workspace Identity Module
โก Performance Tuning¶
What is V-Order and do I need it?¶
V-Order is a write-time optimization for Parquet files that dramatically improves Direct Lake query performance. It reorders data within row groups for optimal column compression and scan efficiency. You need it on every Gold table that feeds a Power BI semantic model via Direct Lake. Enable it with:
Or apply retroactively: OPTIMIZE gold_table_name USING VORDER.
See: Direct Lake | Performance & Parallelism
How should I partition large tables?¶
Partition by the most common filter column (typically a date column). For this POC, Bronze and Silver tables partition by event_date for efficient time-range queries. Rules of thumb:
- Partition size target: 256 MB - 1 GB per partition
- Do not over-partition: Avoid partitioning by high-cardinality columns (player_id) -- too many small files
- Combine with Z-Order:
OPTIMIZE table ZORDER BY (property_id)within each partition for multi-column filtering
See: Performance & Parallelism | Medallion Deep Dive
What Spark settings should I tune first?¶
For POC-scale data (~700K-1M records per table), the most impactful settings are:
| Setting | POC Value | Default | Why |
|---|---|---|---|
spark.sql.shuffle.partitions | 8 | 200 | POC data is small; 200 partitions creates too many tiny files |
spark.sql.parquet.vorder.enabled | true | false | Required for Direct Lake performance |
spark.sql.autoBroadcastJoinThreshold | 10485760 | 10485760 | 10 MB is fine for POC dimension tables |
spark.sql.adaptive.enabled | true | true | AQE auto-tunes at runtime |
See: CHEAT_SHEETS.md | Spark Notebooks Best Practices
How do I prevent Direct Lake fallback to DirectQuery?¶
Direct Lake falls back to DirectQuery when: (1) the model contains calculated tables; (2) column cardinality exceeds guardrails; (3) the query uses unsupported DAX patterns. To prevent fallback:
- Move all calculated tables into Gold notebooks (materialize as Delta tables)
- Pre-aggregate high-cardinality columns in Gold layer
- Monitor fallback using Power BI Performance Analyzer
- Keep Gold tables V-Order optimized
See: Direct Lake | CHEAT_SHEETS.md
๐ค MLOps & AI¶
What ML models does this POC include?¶
The POC includes three ML notebooks:
| Notebook | Model | Purpose | Algorithm |
|---|---|---|---|
01_ml_player_churn_prediction.py | Player Churn | Predict player attrition risk | Gradient Boosted Trees |
02_ml_fraud_detection.py | Fraud Detection | Identify anomalous transactions | Isolation Forest |
03_ml_automl_weather_forecasting.py | Weather Forecast | Predict weather patterns (NOAA data) | AutoML |
All models use MLflow for experiment tracking and model registry.
See: ML Notebooks | AutoML Model Endpoints
How does model versioning work in Fabric?¶
Fabric uses MLflow's model registry natively. Models are logged during training with mlflow.log_model(), registered in the workspace model registry, and versioned automatically. Fabric's ML model item provides a UI for version comparison, stage transitions (Staging/Production), and deployment to endpoints.
Can I use AI Functions in notebooks?¶
Yes. Fabric AI Functions (ai_summarize, ai_classify, ai_translate, etc.) are available in Spark SQL for inline LLM-powered transformations. The POC demonstrates compliance-aware usage in 17_gold_ai_functions_compliance.py, including token cost estimation and PII guardrails.
See: AI Copilot Configuration | AI Functions Notebook
What about Data Agents?¶
Data Agents are autonomous AI-powered analytics assistants that can answer natural language questions about your data. They run inside Fabric workspaces with governed access to Lakehouses and Warehouses. The POC documents configuration patterns but does not deploy a live agent (requires tenant admin enablement).
See: Data Agents | Fabric IQ
๐ Migrations¶
How do I migrate from Synapse Analytics?¶
The migration path depends on your current Synapse component:
| Synapse Component | Fabric Equivalent | Migration Approach |
|---|---|---|
| Dedicated SQL Pool | Warehouse | T-SQL compatible; CTAS scripts transfer directly |
| Serverless SQL Pool | Lakehouse SQL endpoint | Repoint external tables to OneLake |
| Spark Pool | Fabric Spark | Notebooks largely compatible; update dbutils to mssparkutils |
| Pipelines | Fabric Pipelines | JSON-compatible with minor activity type changes |
| Data Explorer | Eventhouse | KQL fully compatible; export/import databases |
See: Migration Patterns | Tutorial 13: Migration Planning
How do I migrate from Databricks?¶
Key differences to address:
- Runtime: Replace
dbutilswithmssparkutils(file system, credentials, notebook orchestration) - Unity Catalog: Map to Fabric OneLake + Purview for governance
- Delta Lake: Fully compatible -- Delta tables work as-is in OneLake
- MLflow: Supported natively in Fabric
- Notebook format: Databricks notebook source format imports directly
The POC notebooks already use the Databricks notebook format with # COMMAND ---------- separators. Phase 11 remediation ensured all dbutils references were replaced with mssparkutils.
See: Migration Patterns
How do I migrate from Snowflake?¶
Use Fabric Mirroring for continuous replication from Snowflake into OneLake (Delta format). This provides near-real-time sync without building custom ETL. Alternatively, use Snowflake's COPY INTO to export to ADLS Gen2, then create Lakehouse shortcuts to the exported data.
See: Mirroring | Tutorial 24: Snowflake to Fabric
What about Teradata and IBM DB2?¶
Both are covered in the POC:
- Teradata: Tutorial 10 covers TPT export patterns and migration planning
- IBM DB2: Streaming notebook
04_ibm_db2_cdc.pydemonstrates CDC from DB2 z/OS and LUW with EBCDIC handling
For both, the typical pattern is: set up an on-premises Data Gateway, configure a pipeline Copy Activity, and land data in the Bronze Lakehouse.
See: Tutorial 10: Teradata Migration | IBM DB2 CDC Notebook
๐ ๏ธ Dev Experience¶
Can I develop notebooks locally?¶
Yes, but with caveats. Notebooks use the Databricks notebook format (.py files with # COMMAND ---------- separators) and can be edited in any IDE. However, mssparkutils and spark are only available inside Fabric. The POC includes a _get_arg shim at the top of every notebook so code can run in both Fabric and local pytest:
The 612 unit tests in validation/unit_tests/ validate notebook logic locally without a Fabric session.
See: Testing Strategies
How does Git integration work with Fabric?¶
Fabric workspaces can connect to Azure DevOps or GitHub repos. Each Fabric item (notebook, pipeline, semantic model) is serialized as a JSON/YAML/Python file and synced bi-directionally. Best practice: establish a one-way flow (edit in IDE, push to Git, sync to Fabric) to avoid merge conflicts.
See: Git Integration | fabric-cicd Deployment
What CI/CD tool should I use?¶
The POC uses two complementary approaches:
| Tool | Purpose | Configuration |
|---|---|---|
| GitHub Actions | Bicep IaC deployment, testing | .github/workflows/deploy-fabric.yml |
| fabric-cicd (Python) | Fabric item deployment (notebooks, pipelines) | scripts/fabric-cicd-deploy.py |
fabric-cicd is the Microsoft-recommended tool for deploying Fabric workspace items. It handles notebook uploads, pipeline definitions, and semantic model refreshes.
See: fabric-cicd Deployment | Tutorial 12: CI/CD DevOps
How do I run tests?¶
# All 612 unit tests
pytest validation/unit_tests/ -v
# By category
pytest validation/unit_tests/test_generators.py -v # Casino (30 tests)
pytest validation/unit_tests/federal/ -v # Federal (54 tests)
pytest validation/unit_tests/streaming/ -v # Streaming (20 tests)
pytest validation/unit_tests/analytics/ -v # Analytics (30 tests)
# Data quality (Great Expectations)
great_expectations checkpoint run bronze_checkpoint
See: Testing Strategies
๐ Compliance Frameworks¶
What compliance frameworks does this POC address?¶
| Framework | Domain | POC Implementation |
|---|---|---|
| NIGC MICS | Casino/Gaming | Meter accuracy validation, drop count verification, audit trails |
| FinCEN BSA | Casino/Financial | CTR (>$10K), SAR (structuring detection), W-2G auto-generation |
| HIPAA | Tribal Healthcare | PHI masking, audit logging, 42 CFR Part 2 substance abuse protections |
| FedRAMP | Federal (DOT/FAA) | Encryption at rest (CMK), private endpoints, audit logging |
| SOX | Financial | Immutable audit trails, access controls, data retention |
| GDPR | General | Data subject access rights, right to erasure (Delta DELETE) |
| CCPA | California | Consumer data inventory, opt-out mechanisms |
| PCI-DSS | Payment | Card number masking, Key Vault (HSM-backed) for card data |
See: Security | SQL Audit Logs | CMK
How are CTR and SAR reports generated?¶
Currency Transaction Reports (CTR): Any cash transaction >= $10,000 triggers automatic CTR flagging in the Bronze compliance notebook (04_bronze_compliance.py). The Silver layer validates amounts and deadlines. The Gold layer (03_gold_compliance_reporting.py) produces FinCEN-ready reports.
Suspicious Activity Reports (SAR): The Silver layer detects structuring patterns -- multiple transactions between \(8,000-\)9,999 by the same individual within 24 hours. The fraud detection ML model (02_ml_fraud_detection.py) provides additional anomaly scoring.
See: Compliance Reporting Notebook
How is HIPAA compliance handled?¶
Tribal Healthcare notebooks implement HIPAA safeguards:
- PHI Masking: Silver layer (
07_silver_tribal_health.py) masks protected health information - Audit Logging: Every data access is logged with user ID, timestamp, and data accessed
- FHIR R4 Mapping: Data mapped to standardized FHIR R4 format for interoperability
- 42 CFR Part 2: Substance abuse treatment records have additional access restrictions
- Retention: Log Analytics configured for >= 6 years (HIPAA requirement) via
log-analytics.bicep
See: Tribal Health Analytics | Tutorial 30: Tribal Healthcare
What encryption options are available?¶
| Layer | Mechanism | Configuration |
|---|---|---|
| At Rest (default) | Microsoft-managed keys (MMK) | Automatic, no config needed |
| At Rest (enhanced) | Customer-managed keys (CMK) | infra/modules/storage/storage-account.bicep with enableCmk=true |
| In Transit | TLS 1.2+ | Automatic for all Fabric endpoints |
| Key Storage | Azure Key Vault (HSM-backed for PCI-DSS) | infra/modules/security/security.bicep with skuName='premium' |
| PII Fields | Application-level hashing (SHA-256) | Implemented in Bronze notebooks (SSN, card numbers) |
See: Customer-Managed Keys | Network Security
๐ Additional Resources¶
Where can I learn more about Microsoft Fabric?¶
Official Documentation: - Microsoft Fabric Documentation - Fabric Architecture Center - Fabric Pricing
Community Resources: - Microsoft Fabric Blog - Microsoft Fabric Community - Power BI Community
Training: - Microsoft Learn: Fabric Learning Path - Data Engineering with Fabric
Where do I report issues or contribute?¶
GitHub Repository: - ๐ Report bugs: Open an issue - ๐ก Feature requests: Start a discussion - ๐ง Pull requests: See Contributing Guide
Before Opening an Issue: 1. Check existing issues for duplicates 2. Review FAQ (this document) 3. Include reproduction steps 4. Provide error messages and logs
How do I stay updated?¶
Watch the Repository: - Click Watch on GitHub - Choose notification preferences
Release Notes: - Check CHANGELOG.md for version history - Subscribe to releases on GitHub
Social Media: - Follow Microsoft Fabric on Twitter - Join the LinkedIn Fabric community
๐ฏ Quick Reference¶
Essential Links¶
| Resource | Link |
|---|---|
| ๐ Main README | README.md |
| ๐๏ธ Architecture | ARCHITECTURE.md |
| ๐ Deployment | DEPLOYMENT.md |
| ๐ Prerequisites | PREREQUISITES.md |
| ๐ Security | SECURITY.md |
| ๐ฐ Cost Estimation | COST_ESTIMATION.md |
| ๐ Tutorials | tutorials/ |
| ๐ฒ Data Generation | data_generation/ |
| ๐ Reports | reports/ |
| ๐ POC Agenda | poc-agenda/ |
Command Cheat Sheet¶
# Deployment
az deployment sub create --location eastus2 --template-file infra/main.bicep --parameters infra/environments/dev/dev.bicepparam
# Data Generation
docker-compose run --rm data-generator --all --days 30
# Verify
./scripts/verify-deployment.sh
# Cleanup
az group delete --name "rg-fabric-poc-dev" --yes
Support¶
Need help? Try these resources in order:
- ๐ Check this FAQ
- ๐ Search existing issues
- ๐ฌ Ask in GitHub Discussions
- ๐ Open a new issue
๐ Documentation maintained by: Microsoft Fabric POC Team ๐ Repository: Suppercharge_Microsoft_Fabric ๐ Last Updated: 2025-01-21