Home > Best Practices > Data Governance Deep Dive
🛡️ Data Governance Deep Dive¶
Last Updated: 2026-04-15 | Version: 2.0 Status: ✅ Final | Maintainer: Documentation Team
📖 Overview¶
Data governance in Microsoft Fabric spans classification, protection, access control, quality assurance, and compliance enforcement. This guide provides a comprehensive governance framework for the casino gaming, federal agency, and tribal healthcare domains in this project, covering Microsoft Purview integration, sensitivity labels, row-level and column-level security, data quality governance, compliance frameworks, data sharing, audit logging, and retention lifecycle management.
🧭 Table of Contents¶
- 🟣 Microsoft Purview in Fabric
- 🏷️ Classification Taxonomy
- 🔐 Row-Level Security
- 👁️ Column-Level Security
- ✅ Data Quality Governance
- ⚖️ Compliance Frameworks
Data Sharing and External Access
- 🔎 Audit and Monitoring
Default Domain Sensitivity Labels (GA 2026)
Data Retention and Lifecycle Management
🟣 Microsoft Purview in Fabric¶
Workspace-Level Governance¶
Microsoft Purview provides governance capabilities that are natively integrated with Fabric workspaces. Governance is configured at the workspace and item levels.
flowchart TB
subgraph Purview["Microsoft Purview"]
A[Data Catalog]
B[Data Map]
C[Sensitivity Labels]
D[Information Protection]
E[Data Lineage]
end
subgraph Fabric["Microsoft Fabric Workspaces"]
F[ws_casino_prod]
G[ws_federal_prod]
H[ws_healthcare_prod]
I[ws_shared_gold]
end
A --> F
A --> G
A --> H
B --> E
C --> F
C --> G
C --> H
D --> I
E --> F
E --> G
E --> H Workspace governance configuration:
| Workspace | Domain | Sensitivity Default | Admin Group | Purpose |
|---|---|---|---|---|
ws_casino_prod | Gaming | Confidential | sg-casino-admins | Casino gaming analytics |
ws_casino_dev | Gaming | Internal | sg-casino-dev | Casino development |
ws_federal_prod | Government | Internal | sg-federal-admins | Federal agency data |
ws_federal_dev | Government | Internal | sg-federal-dev | Federal development |
ws_healthcare_prod | Healthcare | Highly Confidential | sg-healthcare-admins | HIPAA-governed data |
ws_healthcare_dev | Healthcare | Confidential | sg-healthcare-dev | Healthcare development |
ws_shared_gold | Analytics | Internal | sg-analytics-admins | Cross-domain reporting |
Sensitivity Labels¶
Sensitivity labels classify and protect data based on its confidentiality requirements. Configure labels in Microsoft Purview and apply them to Fabric items.
Label hierarchy:
flowchart TD
A[Public] --> B[Internal]
B --> C[Confidential]
C --> D[Highly Confidential]
A -.- A1["Publicly available data
USDA crop stats, NOAA weather,
EPA AQI, DOI public data"]
B -.- B1["Internal business data
Aggregated reports, KPIs,
non-PII operational metrics"]
C -.- C1["Sensitive business data
Player profiles, financial transactions,
CTR/SAR filings, federal CUI"]
D -.- D1["Regulated data
PHI, SSN, substance abuse records,
HIPAA data, 42 CFR Part 2"] Label configuration in Purview:
| Label | Encryption | Watermark | Access Restriction | Auto-Apply Rule |
|---|---|---|---|---|
| Public | None | None | None | Content from public APIs (USDA, NOAA, EPA) |
| Internal | None | None | Organization only | Default for Fabric items |
| Confidential | Azure RMS | Header/footer | Named users/groups | Contains PII patterns (SSN, player ID) |
| Highly Confidential | Azure RMS + DLP | Header/footer + visual marking | Named users only | Contains PHI, substance abuse, or HIPAA data |
Applying labels to Fabric items:
Fabric Workspace > Item > ... (more options) > Sensitivity Label > Select Label
Applicable to:
- Lakehouses
- Warehouses
- Semantic models (Power BI datasets)
- Reports
- Pipelines
- Notebooks
- Eventstreams
Information Protection Policies¶
Configure policies that enforce sensitivity label requirements:
| Policy | Scope | Enforcement |
|---|---|---|
| Mandatory labeling | All workspaces | Users must assign a label before saving |
| Label inheritance | Data pipelines | Downstream items inherit the highest label from sources |
| DLP for Fabric | Highly Confidential items | Block external sharing, require justification for download |
| Default label | New items | Auto-apply "Internal" to new Fabric items |
Data Lineage Tracking¶
Purview automatically tracks lineage across Fabric items, showing data flow from source through medallion layers to reports.
flowchart LR
subgraph Sources["External Sources"]
S1[USDA NASS API]
S2[Casino POS System]
S3[Healthcare EHR]
end
subgraph Bronze["Bronze Layer"]
B1[bronze_usda_crop_production]
B2[bronze_slot_telemetry]
B3[bronze_healthcare_encounters]
end
subgraph Silver["Silver Layer"]
V1[silver_usda_crop_production]
V2[silver_slot_telemetry]
V3[silver_healthcare_encounters]
end
subgraph Gold["Gold Layer"]
G1[gold_crop_summary]
G2[gold_daily_revenue]
G3[gold_patient_outcomes]
end
subgraph Reports["Power BI"]
R1[USDA Dashboard]
R2[Casino Ops Report]
R3[Clinical Analytics]
end
S1 --> B1 --> V1 --> G1 --> R1
S2 --> B2 --> V2 --> G2 --> R2
S3 --> B3 --> V3 --> G3 --> R3 Lineage benefits:
- Impact analysis before making schema changes
- Compliance auditing (trace data from source to report)
- Root cause analysis when data quality issues arise
- Regulatory evidence for FISMA, HIPAA, and NIGC audits
🏷️ Classification Taxonomy¶
Unified Classification Framework¶
This project spans multiple regulatory domains. A unified taxonomy ensures consistent classification across all data.
| Classification | Abbreviation | Regulatory Basis | Examples |
|---|---|---|---|
| Personally Identifiable Information | PII | Various | SSN, name, address, DOB, email, phone |
| Protected Health Information | PHI | HIPAA | Medical records, diagnoses, treatment plans, insurance IDs |
| Controlled Unclassified Information | CUI | NIST 800-171 / FISMA | Federal agency operational data with access controls |
| For Official Use Only | FOUO | Federal policy | Internal federal reports, pre-decisional data |
| Substance Abuse Records | SAR-42CFR | 42 CFR Part 2 | Substance abuse treatment records (strictest federal protection) |
| Gaming Financial Data | GFD | NIGC MICS / BSA | Player financial transactions, CTR/SAR filings |
| Publicly Releasable | PUBLIC | FOIA / Agency policy | Published USDA stats, NOAA weather, EPA AQI, DOI public data |
Classification by Data Domain¶
Casino Gaming¶
| Data Element | Classification | Label | Handling |
|---|---|---|---|
| Player SSN | PII | Confidential | Hash before storage, never display in reports |
| Player name | PII | Confidential | Mask in non-compliance contexts |
| Credit card number | PII | Confidential | Truncate to last 4, tokenize |
| Transaction amounts | GFD | Confidential | Accessible to compliance and finance |
| Slot telemetry | Internal | Internal | Machine data, no PII |
| CTR filing data | GFD + PII | Confidential | Restricted to BSA compliance team |
| SAR filing data | GFD + PII | Highly Confidential | Restricted to BSA officer + legal |
| W-2G records | PII + GFD | Confidential | Tax records, restricted access |
| Aggregated revenue | Internal | Internal | Business metrics, no PII |
Federal Agencies¶
| Data Element | Classification | Label | Handling |
|---|---|---|---|
| USDA crop production stats | PUBLIC | Public | Publicly available via NASS |
| USDA food recall notices | PUBLIC | Internal (pre-release) | Public after release, internal before |
| NOAA weather observations | PUBLIC | Public | Publicly available via NCDC |
| NOAA severe weather alerts | PUBLIC | Public | Public safety data |
| EPA AQI readings | PUBLIC | Public | Available via AirNow |
| EPA facility emissions | CUI | Confidential | Facility-specific data may have restrictions |
| DOI resource boundaries | PUBLIC | Public | Available via USGS |
| DOI species data | PUBLIC/CUI | Internal | Some species location data is sensitive |
| SBA loan data | CUI | Confidential | Contains business financial information |
| SBA borrower details | PII + CUI | Confidential | Name, EIN, address of borrowers |
Tribal Healthcare¶
| Data Element | Classification | Label | Handling |
|---|---|---|---|
| Patient demographics | PHI + PII | Highly Confidential | HIPAA protected |
| Medical diagnoses (ICD-10) | PHI | Highly Confidential | De-identify for analytics |
| Treatment records | PHI | Highly Confidential | Minimum necessary access |
| Substance abuse records | SAR-42CFR | Highly Confidential | Strictest protections, separate consent |
| Lab results | PHI | Highly Confidential | De-identify for population health |
| Insurance claims | PHI + PII | Highly Confidential | Financial + health data |
| De-identified analytics | Internal | Internal | Safe for broader analytics use |
| Population health aggregates | Internal | Internal | No individual identifiers |
Auto-Classification Rules¶
Configure Purview auto-classification to detect and label sensitive data patterns:
Rule: Detect SSN Pattern
Pattern: \d{3}-\d{2}-\d{4}
Classification: PII
Label: Confidential
Action: Apply label, log detection
Rule: Detect Credit Card
Pattern: \d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}
Classification: PII
Label: Confidential
Action: Apply label, log detection
Rule: Detect PHI Indicators
Pattern: Column names containing: diagnosis, icd10, treatment, medication, lab_result
Classification: PHI
Label: Highly Confidential
Action: Apply label, alert Privacy Officer
Rule: Detect 42 CFR Part 2
Pattern: Tables/columns in substance abuse processing pipeline
Classification: SAR-42CFR
Label: Highly Confidential
Action: Apply label, enforce consent check
🔐 Row-Level Security¶
Implementation Patterns for Multi-Agency Data¶
Row-Level Security (RLS) in Fabric restricts which rows a user can see based on their identity and role assignments.
Pattern 1: Agency-Specific Views¶
Each federal agency team sees only their agency's data.
// RLS Role: USDA Team
[agency_code] = "USDA"
// RLS Role: NOAA Team
[agency_code] = "NOAA"
// RLS Role: EPA Team
[agency_code] = "EPA"
// RLS Role: DOI Team
[agency_code] = "DOI"
// RLS Role: SBA Team
[agency_code] = "SBA"
// RLS Role: All Agencies (Admin)
TRUE()
Role-to-group mapping:
| RLS Role | Microsoft Entra ID Group | Data Access |
|---|---|---|
| USDA Team | sg-fabric-usda | USDA rows only |
| NOAA Team | sg-fabric-noaa | NOAA rows only |
| EPA Team | sg-fabric-epa | EPA rows only |
| DOI Team | sg-fabric-doi | DOI rows only |
| SBA Team | sg-fabric-sba | SBA rows only |
| Federal Admin | sg-fabric-federal-admin | All agency rows |
| Executive | sg-fabric-executives | All rows, aggregated views |
Pattern 2: Casino Property-Based RLS¶
Multi-property casinos restrict data to the user's assigned property.
// RLS Role: Property Manager
// User's property_id is stored in a security mapping table
VAR CurrentUser = USERPRINCIPALNAME()
VAR UserProperties =
CALCULATETABLE(
VALUES(SecurityMapping[property_id]),
SecurityMapping[user_email] = CurrentUser
)
RETURN
[property_id] IN UserProperties
Security mapping table:
CREATE TABLE dbo.security_mapping (
user_email NVARCHAR(256),
property_id NVARCHAR(64),
access_level NVARCHAR(32), -- PROPERTY, REGION, ENTERPRISE
effective_date DATE,
expiry_date DATE
);
-- Example entries
INSERT INTO dbo.security_mapping VALUES
('manager.a@casino.com', 'PROP-001', 'PROPERTY', '2026-01-01', '2026-12-31'),
('manager.a@casino.com', 'PROP-002', 'PROPERTY', '2026-01-01', '2026-12-31'),
('vp.west@casino.com', 'REGION-WEST', 'REGION', '2026-01-01', '2026-12-31'),
('ceo@casino.com', 'ALL', 'ENTERPRISE', '2026-01-01', '2026-12-31');
Pattern 3: Healthcare Role-Based RLS¶
HIPAA minimum necessary access: users see only data relevant to their role.
// RLS Role: Clinical Staff (specific facility)
VAR CurrentUser = USERPRINCIPALNAME()
VAR UserFacilities =
CALCULATETABLE(
VALUES(HealthcareRoleMapping[facility_id]),
HealthcareRoleMapping[user_email] = CurrentUser,
HealthcareRoleMapping[role] IN {"clinician", "nurse", "admin"}
)
RETURN
[facility_id] IN UserFacilities
// RLS Role: Population Health Analyst (de-identified only)
[is_deidentified] = TRUE()
// RLS Role: Privacy Officer (audit access, all data)
TRUE()
Dynamic RLS with DAX¶
For complex scenarios where access rules change frequently, use a dynamic RLS pattern with a security configuration table.
// Dynamic RLS filter expression
VAR CurrentUser = USERPRINCIPALNAME()
VAR UserAccess =
CALCULATETABLE(
SELECTCOLUMNS(
AccessControl,
"dimension_key", AccessControl[dimension_key],
"dimension_value", AccessControl[dimension_value]
),
AccessControl[user_email] = CurrentUser,
AccessControl[is_active] = TRUE(),
AccessControl[effective_date] <= TODAY(),
AccessControl[expiry_date] >= TODAY()
)
RETURN
IF(
COUNTROWS(FILTER(UserAccess, [dimension_key] = "ALL")) > 0,
TRUE(), // User has ALL access
[agency_code] IN SELECTCOLUMNS(
FILTER(UserAccess, [dimension_key] = "agency_code"),
"value", [dimension_value]
)
)
Access control table design:
CREATE TABLE dbo.access_control (
access_id INT IDENTITY(1,1) PRIMARY KEY,
user_email NVARCHAR(256) NOT NULL,
dimension_key NVARCHAR(128) NOT NULL, -- "agency_code", "property_id", "facility_id", "ALL"
dimension_value NVARCHAR(256) NOT NULL, -- "USDA", "PROP-001", "FAC-101", "*"
access_level NVARCHAR(32) NOT NULL, -- "READ", "WRITE", "ADMIN"
is_active BIT DEFAULT 1,
effective_date DATE NOT NULL,
expiry_date DATE NOT NULL,
granted_by NVARCHAR(256),
grant_reason NVARCHAR(512),
created_at DATETIME2 DEFAULT GETUTCDATE()
);
RLS Testing¶
Always test RLS filters before deploying to production:
// Test RLS as a specific user in Power BI Desktop:
// Modeling > View as Roles > Select role > Enter username
// Verification query (DAX query in SSMS or DAX Studio):
EVALUATE
CALCULATETABLE(
SUMMARIZE(
FactTable,
DimAgency[agency_code],
"RowCount", COUNTROWS(FactTable)
),
USERELATIONSHIP(...),
KEEPFILTERS(...)
)
👁️ Column-Level Security¶
Masking Patterns for Sensitive Fields¶
Column-level security restricts visibility of specific columns based on user roles. In Fabric, this is implemented through a combination of warehouse-level security and semantic model column visibility.
Warehouse Column Masking¶
-- Dynamic Data Masking in Fabric Warehouse
-- Mask SSN: Show only last 4 digits
ALTER TABLE dbo.player_profiles
ALTER COLUMN ssn ADD MASKED WITH (FUNCTION = 'partial(0, "XXX-XX-", 4)');
-- Mask email: Show first character and domain
ALTER TABLE dbo.player_profiles
ALTER COLUMN email ADD MASKED WITH (FUNCTION = 'email()');
-- Mask credit card: Show last 4 digits
ALTER TABLE dbo.player_profiles
ALTER COLUMN credit_card_number ADD MASKED WITH (FUNCTION = 'partial(0, "XXXX-XXXX-XXXX-", 4)');
-- Mask phone: Full mask
ALTER TABLE dbo.player_profiles
ALTER COLUMN phone_number ADD MASKED WITH (FUNCTION = 'default()');
-- Grant UNMASK to authorized roles
GRANT UNMASK ON dbo.player_profiles TO [ComplianceTeam];
GRANT UNMASK ON dbo.player_profiles(ssn) TO [TaxReporting];
PySpark Column-Level Protection¶
For Lakehouse data processed through Spark notebooks:
from pyspark.sql import functions as F
from pyspark.sql import DataFrame
def apply_column_masking(df: DataFrame, user_role: str) -> DataFrame:
"""Apply column-level masking based on user role."""
masking_rules = {
"analyst": {
"ssn": F.concat(F.lit("XXX-XX-"), F.substring(F.col("ssn"), 8, 4)),
"player_name": F.concat(F.substring(F.col("player_name"), 1, 1), F.lit("****")),
"email": F.regexp_replace(F.col("email"), r"(.)(.*)(@.*)", r"$1***$3"),
"credit_card": F.concat(F.lit("****-****-****-"), F.substring(F.col("credit_card"), -4, 4)),
"phone": F.lit("***-***-****"),
"dob": F.date_trunc("year", F.col("dob")), # Show only year
},
"compliance": {
# Compliance team sees all fields unmasked
},
"executive": {
"ssn": F.lit("[REDACTED]"),
"credit_card": F.lit("[REDACTED]"),
"phone": F.lit("[REDACTED]"),
# Name and email visible for executive context
}
}
role_rules = masking_rules.get(user_role, masking_rules["analyst"])
for column, mask_expr in role_rules.items():
if column in df.columns:
df = df.withColumn(column, mask_expr)
return df
Healthcare De-Identification¶
HIPAA Safe Harbor de-identification removes or generalizes 18 identifier types:
def hipaa_safe_harbor_deidentify(df: DataFrame) -> DataFrame:
"""Apply HIPAA Safe Harbor de-identification method."""
# 1. Remove direct identifiers
columns_to_remove = [
"patient_name", "ssn", "medical_record_number",
"health_plan_id", "account_number", "certificate_number",
"vehicle_serial", "device_identifier", "web_url",
"ip_address", "biometric_id", "photo"
]
df = df.drop(*[c for c in columns_to_remove if c in df.columns])
# 2. Generalize quasi-identifiers
if "date_of_birth" in df.columns:
# Reduce to year only (or remove if age > 89)
df = df.withColumn("birth_year", F.year(F.col("date_of_birth")))
df = df.withColumn("birth_year",
F.when(F.datediff(F.current_date(), F.col("date_of_birth")) / 365.25 > 89,
F.lit(None))
.otherwise(F.col("birth_year")))
df = df.drop("date_of_birth")
if "zip_code" in df.columns:
# Truncate to 3 digits (or set to 000 if population < 20,000)
df = df.withColumn("zip3", F.substring(F.col("zip_code"), 1, 3))
df = df.drop("zip_code")
if "admission_date" in df.columns:
# Shift dates by random offset (consistent per patient)
df = df.withColumn("admission_date",
F.date_add(F.col("admission_date"),
F.abs(F.hash(F.col("patient_id"))) % 365 - 182))
if "phone" in df.columns:
df = df.drop("phone")
if "email" in df.columns:
df = df.drop("email")
if "address" in df.columns:
df = df.drop("address")
# 3. Replace patient_id with pseudonym
if "patient_id" in df.columns:
df = df.withColumn("pseudo_patient_id", F.sha2(F.col("patient_id"), 256))
df = df.drop("patient_id")
return df
Semantic Model Column Visibility¶
In Power BI semantic models, hide sensitive columns from specific roles:
Model Configuration:
Table: dim_player
Column: player_name -> Visible to: Compliance, Management
Column: player_id -> Visible to: All (pseudonym OK)
Column: ssn_hash -> Hidden from all report views
Column: email_masked -> Visible to: Compliance, Management
Column: tier_level -> Visible to: All
Table: fact_healthcare
Column: diagnosis_code -> Visible to: Clinical, Privacy Officer
Column: treatment_plan -> Visible to: Clinical only
Column: patient_pseudo -> Visible to: All (pseudonymized)
Column: encounter_type -> Visible to: All
✅ Data Quality Governance¶
DQ Score Thresholds and Enforcement¶
Data quality is measured and enforced at each medallion layer transition.
flowchart LR
A[Bronze] -->|DQ Score >= 50| B[Silver]
B -->|DQ Score >= 80| C[Gold]
C -->|DQ Score >= 95| D[Reports/BI]
A -->|DQ Score < 50| E[Quarantine]
B -->|DQ Score < 80| F[Review Queue]
C -->|DQ Score < 95| G[Manual Approval] DQ score components:
| Component | Weight | Measurement |
|---|---|---|
| Completeness | 25% | Percentage of non-null values in required fields |
| Accuracy | 25% | Percentage of values passing business rules |
| Timeliness | 20% | Data age vs expected freshness |
| Consistency | 15% | Cross-reference validation across related tables |
| Uniqueness | 15% | Percentage of unique records (no duplicates) |
Thresholds by layer and domain:
| Domain | Bronze Min | Silver Min | Gold Min | Compliance Min |
|---|---|---|---|---|
| Casino gaming | 50 | 80 | 95 | 99 |
| Federal (public data) | 40 | 75 | 90 | N/A |
| Federal (CUI) | 50 | 85 | 95 | 98 |
| Healthcare (PHI) | 60 | 90 | 98 | 99 |
Data Steward Workflows¶
flowchart TD
A[DQ Check Fails] --> B{Severity?}
B -->|Critical: Score < 50| C[Auto-quarantine + Alert Steward]
B -->|High: Score 50-70| D[Flag for Review + Continue]
B -->|Medium: Score 70-80| E[Log Issue + Continue]
C --> F[Data Steward Reviews]
D --> F
F --> G{Root Cause?}
G -->|Source issue| H[Contact Source Owner]
G -->|Schema drift| I[Update Schema Config]
G -->|Business rule change| J[Update Validation Rules]
G -->|One-time anomaly| K[Approve Override]
H --> L[Remediate + Reprocess]
I --> L
J --> L
K --> L
L --> M[Update DQ Issue Tracker] Issue Resolution Tracking¶
CREATE TABLE dbo.dq_issues (
issue_id INT IDENTITY(1,1) PRIMARY KEY,
detected_timestamp DATETIME2 DEFAULT GETUTCDATE(),
table_name NVARCHAR(256),
layer NVARCHAR(16), -- BRONZE, SILVER, GOLD
domain NVARCHAR(64), -- CASINO, FEDERAL, HEALTHCARE
dq_dimension NVARCHAR(32), -- COMPLETENESS, ACCURACY, TIMELINESS, CONSISTENCY, UNIQUENESS
dq_score DECIMAL(5,2),
threshold DECIMAL(5,2),
affected_records INT,
total_records INT,
check_name NVARCHAR(256),
check_details NVARCHAR(MAX),
assigned_steward NVARCHAR(256),
status NVARCHAR(32), -- OPEN, INVESTIGATING, REMEDIATED, ACCEPTED, WAIVED
root_cause NVARCHAR(512),
resolution NVARCHAR(MAX),
resolved_timestamp DATETIME2,
waiver_approved_by NVARCHAR(256),
waiver_reason NVARCHAR(512),
waiver_expiry DATE
);
Compliance Frameworks¶
This project operates under multiple compliance frameworks simultaneously. Each framework imposes specific data handling requirements.
Framework Summary¶
| Framework | Scope | Key Requirements | Affected Data |
|---|---|---|---|
| FISMA | Federal information systems | Risk categorization, security controls, continuous monitoring | All federal agency data |
| FedRAMP | Cloud services for federal | Baseline security controls, 3PAO assessment, continuous monitoring | DOT/FAA cloud workloads |
| HIPAA | Healthcare data | Privacy Rule, Security Rule, Breach Notification Rule | Tribal healthcare PHI |
| CIPSEA | Statistical data | Confidentiality of individually identifiable data | Census, survey, statistical data |
| 42 CFR Part 2 | Substance abuse records | Written consent for any disclosure, stricter than HIPAA | Substance abuse treatment records |
| NIGC MICS | Casino gaming | Minimum internal control standards, BSA compliance | Gaming operations data |
FISMA (Federal Information Systems)¶
Applies to all federal agency data in this project (USDA, SBA, NOAA, EPA, DOI).
Impact level determination:
| Impact Level | Confidentiality | Integrity | Availability | Example |
|---|---|---|---|---|
| Low | Limited adverse effect | Limited adverse effect | Limited adverse effect | Public USDA crop stats |
| Moderate | Serious adverse effect | Serious adverse effect | Serious adverse effect | SBA loan data, EPA facility data |
| High | Severe/catastrophic | Severe/catastrophic | Severe/catastrophic | CUI, enforcement data |
FISMA controls in Fabric:
| Control Family | Implementation |
|---|---|
| Access Control (AC) | Microsoft Entra ID + Conditional Access + RLS |
| Audit and Accountability (AU) | Purview audit logs + pipeline error table |
| Configuration Management (CM) | Bicep IaC, version-controlled configuration |
| Identification and Authentication (IA) | Microsoft Entra ID MFA, service principals |
| System and Communications Protection (SC) | Private endpoints, encryption at rest/transit |
| System and Information Integrity (SI) | Data quality checks, anomaly detection |
FedRAMP (Cloud Services)¶
Applies to DOT/FAA workloads processed in Fabric (a FedRAMP authorized cloud service).
FedRAMP requirements mapped to Fabric:
| Requirement | Fabric Implementation |
|---|---|
| Data encryption at rest | OneLake encryption (Microsoft-managed keys) |
| Data encryption in transit | TLS 1.2+ for all API/data traffic |
| Multi-factor authentication | Microsoft Entra ID Conditional Access with MFA |
| Audit logging | Fabric admin audit logs, Purview |
| Incident response | Data Activator alerts + runbooks |
| Continuous monitoring | Pipeline error table + KQL dashboards |
| Boundary protection | Managed VNet, private endpoints |
HIPAA (Healthcare)¶
Applies to all tribal healthcare data.
HIPAA Security Rule implementation:
| Safeguard | Category | Fabric Implementation |
|---|---|---|
| Administrative | Security management | Purview policies, access reviews |
| Administrative | Workforce security | Microsoft Entra ID groups, JIT access |
| Administrative | Information access management | RLS + Column masking |
| Physical | Facility access | Azure datacenter controls (inherited) |
| Physical | Workstation security | Intune device management |
| Technical | Access control | Microsoft Entra ID + Conditional Access |
| Technical | Audit controls | Fabric audit logs, Purview |
| Technical | Integrity controls | Data quality framework, checksums |
| Technical | Transmission security | TLS 1.2+, private endpoints |
HIPAA minimum necessary principle in Fabric:
Principle: Users should only access the minimum PHI necessary for their role.
Implementation:
1. RLS filters restrict rows to assigned facility
2. Column masking hides non-essential PHI fields
3. De-identified views for analytics users
4. Audit logging on all PHI access
5. Regular access reviews (quarterly)
CIPSEA (Statistical Data Confidentiality)¶
Applies to statistical data from federal agencies where individual identifiers could be reverse-engineered.
Requirements:
- Suppress cells with fewer than 3 respondents
- Apply noise injection for small cell sizes
- Prevent linkage attacks through k-anonymity (k >= 5)
- Restrict access to microdata
- Use only aggregated/suppressed data in public reports
42 CFR Part 2 (Substance Abuse Records)¶
The strictest federal data protection standard. More restrictive than HIPAA.
Key differences from HIPAA:
- Requires WRITTEN consent for ANY disclosure (HIPAA allows some without consent)
- Consent must specify: who, what, purpose, expiration
- No exception for treatment/payment/operations without consent
- Re-disclosure prohibition: recipients cannot re-share
- Criminal penalties for violations
Fabric implementation:
- Separate workspace with additional access controls
- Consent verification at data access time
- Audit logging with consent reference
- No data sharing outside designated workspace
- Quarterly access reviews with legal sign-off
NIGC MICS (Casino Gaming)¶
Minimum Internal Control Standards for gaming operations.
Key requirements:
- Transaction monitoring: Track all transactions > $3,000
- CTR filing: Report cash transactions > $10,000 within 15 days
- SAR filing: Report suspicious activity (e.g., structuring $8K-$9.9K)
- W-2G: Report gambling winnings above thresholds
- Player exclusion: Enforce self-exclusion and involuntary exclusion lists
- Audit trail: Maintain complete audit trail for all financial transactions
Fabric implementation:
- Real-time transaction monitoring via Eventstream
- Automated CTR/SAR threshold detection in Silver layer
- W-2G generation pipeline triggered by jackpot events
- Exclusion list sync and enforcement
- Complete audit trail in Bronze (append-only)
- Compliance dashboard with filing status and deadlines
Data Sharing and External Access¶
OneLake Data Sharing¶
OneLake shortcuts enable cross-workspace data access without copying data.
flowchart LR
subgraph CasinoWS["ws_casino_prod"]
A[gold_daily_revenue]
B[gold_player_activity]
end
subgraph FederalWS["ws_federal_prod"]
C[gold_crop_summary]
D[gold_weather_aggregates]
end
subgraph SharedWS["ws_shared_gold"]
E["Shortcut: gold_daily_revenue"]
F["Shortcut: gold_crop_summary"]
G["Shortcut: gold_weather_aggregates"]
end
A -->|Shortcut| E
C -->|Shortcut| F
D -->|Shortcut| G
SharedWS --> H[Cross-Domain Reports] Shortcut governance rules:
| Rule | Implementation |
|---|---|
| No shortcuts to PHI data outside healthcare workspace | Workspace admin policy |
| No shortcuts to SAR filing data | BSA officer approval required |
| Shortcuts inherit source sensitivity labels | Purview label inheritance |
| Shortcut access requires workspace membership | Microsoft Entra ID group membership |
| Audit shortcut creation and access | Purview audit logs |
Cross-Workspace Access Patterns¶
| Pattern | Use Case | Security |
|---|---|---|
| OneLake shortcut (read-only) | Shared Gold layer for reporting | Source workspace controls access |
| Warehouse cross-database query | SQL analytics across domains | Row-level security applies |
| Semantic model composite | Multi-domain Power BI reports | RLS at semantic model level |
| Pipeline cross-workspace | Orchestration across workspaces | Service principal with scoped permissions |
External Sharing with B2B¶
For sharing data with external partners (e.g., regulatory agencies, tribal councils):
External sharing tiers:
Tier 1: Public Data
- Method: Public Power BI published report (no sign-in)
- Data: Aggregated USDA, NOAA, EPA public stats
- Security: No PII, pre-aggregated only
Tier 2: Partner Access
- Method: Microsoft Entra ID B2B guest accounts
- Data: Agency-specific Gold layer views
- Security: RLS + column masking + MFA required
- Audit: Full access logging
Tier 3: Regulatory Reporting
- Method: Secure file transfer (SFTP) or direct API
- Data: Compliance filings (CTR, SAR, HIPAA reports)
- Security: Encrypted in transit + at rest
- Audit: Full chain of custody logging
Audit and Monitoring¶
Admin Portal Audit Logs¶
Fabric generates comprehensive audit logs accessible through the Admin Portal and the Microsoft 365 unified audit log.
Key audit events to monitor:
| Event Category | Events | Monitoring Frequency |
|---|---|---|
| Item access | ViewReport, GetDataset, ReadTable | Continuous (for PHI) |
| Item modification | UpdateDataset, UpdatePipeline, DeleteItem | Real-time alerts |
| Permission changes | AddWorkspaceMember, RemoveWorkspaceMember | Real-time alerts |
| Data export | ExportReport, DownloadDataset | Real-time alerts |
| Admin actions | UpdateCapacitySetting, UpdateTenantSetting | Real-time alerts |
| Sensitivity labels | ApplyLabel, RemoveLabel, DowngradeLabel | Real-time alerts |
Usage Metrics¶
Track usage patterns to identify anomalies and optimize governance:
// Unusual access patterns (potential security concern)
fabric_audit_log
| where timestamp > ago(24h)
| where operation in ("ViewReport", "GetDataset", "ReadTable")
| summarize access_count = count() by user_email, workspace_name, item_name
| join kind=leftouter (
fabric_audit_log
| where timestamp between (ago(31d) .. ago(1d))
| summarize avg_daily_access = count() / 30.0 by user_email, workspace_name, item_name
) on user_email, workspace_name, item_name
| extend anomaly_ratio = access_count / max_of(avg_daily_access, 1.0)
| where anomaly_ratio > 5.0 // 5x normal access rate
| order by anomaly_ratio desc
Compliance Reporting Dashboards¶
Build Power BI dashboards for compliance officers:
Dashboard: Governance Overview
| Page | Content | Audience |
|---|---|---|
| Data Classification | Label distribution across items, unlabeled item count | Data Steward |
| Access Patterns | Who accessed what, anomaly detection results | Security Team |
| DQ Scorecard | DQ scores by domain/layer, trend lines, open issues | Data Steward |
| Compliance Status | CTR/SAR filing status, HIPAA compliance checks | Compliance Officer |
| Audit Summary | Access logs, permission changes, export events | Auditor |
DAX measures for governance reporting:
// Percentage of items with sensitivity labels
Label Coverage =
DIVIDE(
CALCULATE(COUNTROWS(FabricItems), NOT ISBLANK(FabricItems[sensitivity_label])),
COUNTROWS(FabricItems),
0
)
// Average DQ score across production tables
Avg Production DQ Score =
CALCULATE(
AVERAGE(DQResults[dq_score]),
DQResults[environment] = "PROD",
DQResults[layer] IN {"SILVER", "GOLD"}
)
// Compliance filing on-time rate
Filing On-Time Rate =
DIVIDE(
CALCULATE(COUNTROWS(ComplianceFilings), ComplianceFilings[filed_on_time] = TRUE()),
COUNTROWS(ComplianceFilings),
0
)
Default Domain Sensitivity Labels (GA 2026)¶
Default Domain Sensitivity Labels (GA, February 2026) automatically apply sensitivity labels to new Fabric items based on the domain they belong to. This eliminates the risk of unclassified data items in production workspaces and ensures consistent data classification from creation.
How Default Domain Labels Work¶
When a Fabric domain is configured with a default sensitivity label, every new item created within that domain automatically inherits the label. This includes lakehouses, warehouses, notebooks, semantic models, reports, and all other Fabric item types.
flowchart LR
subgraph Domain["📋 Fabric Domain"]
DL["Default Label:<br/>Confidential"]
end
subgraph Items["🆕 New Items"]
LH["Lakehouse"]
WH["Warehouse"]
NB["Notebook"]
SM["Semantic Model"]
end
Domain -->|"Auto-apply"| Items
style Domain fill:#6C3483,stroke:#4A235A,color:#fff Domain-to-Label Mapping for This POC¶
| Domain | Default Label | Justification |
|---|---|---|
| Casino Gaming | Confidential | PII, financial data, compliance filings |
| Casino Development | Internal | Non-production data, synthetic datasets |
| Federal - USDA | Internal | Publicly available agricultural statistics |
| Federal - NOAA | Internal | Public weather/climate data |
| Federal - EPA | Internal | Environmental monitoring data |
| Federal - DOI | Internal | Public lands and resource data |
| Federal - SBA | Internal | Loan program aggregate data |
| Tribal Healthcare | Highly Confidential | HIPAA PHI, 42 CFR Part 2 substance abuse records |
| DOT/FAA | Confidential | Flight operations, security-sensitive |
| Shared Gold / BI | Confidential | Cross-domain aggregations may contain sensitive data |
Configuration via Admin Portal¶
Fabric Admin Portal → Domains → [Select Domain]
→ Governance Settings
→ Default Sensitivity Label: [Select Label]
→ Override Allowed: [Yes/No]
→ Label Inheritance: [Downstream items inherit from source]
Configuration via PowerShell¶
# Set default sensitivity label for a Fabric domain
# Requires: Microsoft.PowerBI.Admin module
Connect-PowerBIServiceAccount
# Get domain ID
$domain = Get-PowerBIDomain -Name "Casino Gaming"
# Set default label (label GUID from Microsoft Purview)
Set-PowerBIDomainDefaultSensitivityLabel `
-DomainId $domain.Id `
-LabelId "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
# Verify configuration
Get-PowerBIDomainDefaultSensitivityLabel -DomainId $domain.Id
Label Inheritance Rules¶
| Scenario | Behavior |
|---|---|
| New item in domain | Automatically gets domain default label |
| Item moved to domain | Label updated to domain default (if less restrictive) |
| Item moved out of domain | Label retained (never downgraded automatically) |
| Child item from labeled parent | Inherits parent label (downstream inheritance) |
| User manually sets higher label | Manual label takes precedence |
| User tries to set lower label | Blocked unless user has override permission |
Integration with OneLake Security¶
Default domain labels work in conjunction with OneLake Security (Preview) to provide layered protection:
- Domain label ensures all items are classified from creation
- OneLake Security roles enforce access based on classification
- Purview policies can auto-apply additional protections based on label
💡 Tip: For this POC, configure the Tribal Healthcare domain with "Highly Confidential" default label and set Override Allowed to "No". This ensures PHI data can never be downgraded, satisfying HIPAA minimum necessary requirements.
⚠️ Warning: Changing a domain's default label does NOT retroactively update existing items. Only new items receive the updated default. Run a bulk relabeling script for existing items when changing domain label policies.
Data Retention and Lifecycle Management¶
Retention Policies by Data Classification¶
| Classification | Retention Period | Archive After | Delete After | Regulatory Basis |
|---|---|---|---|---|
| Casino transactions (Bronze) | 7 years | 2 years | 7 years | BSA / IRS requirements |
| CTR/SAR filings | 10 years | 5 years | Never (permanent) | FinCEN requirements |
| W-2G records | 7 years | 3 years | 7 years | IRS requirements |
| Player PII | Active + 2 years | Upon deactivation | 2 years post-deactivation | Privacy policy |
| Federal public data | Indefinite | 5 years | Never | Public record |
| Federal CUI | Per agency policy | 3-7 years | Per agency schedule | NARA records schedule |
| Healthcare PHI | 6 years (minimum) | 3 years | 6 years post-last encounter | HIPAA |
| Substance abuse (42 CFR) | Per consent + state law | 3 years | Per consent terms | 42 CFR Part 2 |
| Audit logs | 7 years | 1 year | 7 years | Multiple frameworks |
| Aggregated analytics | Indefinite | 5 years | Never | Business value |
Lifecycle Implementation¶
flowchart LR
subgraph Hot["Hot Storage (Active)"]
A[Current Year Data]
B[Active Dimensions]
end
subgraph Warm["Warm Storage (Queryable Archive)"]
C[1-3 Year Old Data]
D[Infrequently Accessed]
end
subgraph Cold["Cold Storage (Compliance Archive)"]
E[3-7 Year Old Data]
F[Regulatory Retention]
end
subgraph Disposed["Secure Disposal"]
G[Past Retention Period]
end
Hot -->|After 1 year| Warm
Warm -->|After 3 years| Cold
Cold -->|After retention period| Disposed Implementation in Fabric:
from datetime import datetime, timedelta
# Lifecycle management notebook - runs weekly
# 1. Move old data from Hot to Warm (Lakehouse to archive Lakehouse)
archive_cutoff = datetime.now() - timedelta(days=365)
old_data = spark.sql(f"""
SELECT * FROM lh_gold.daily_revenue
WHERE revenue_date < '{archive_cutoff.strftime('%Y-%m-%d')}'
""")
old_data.write.mode("append").format("delta").saveAsTable("lh_archive.daily_revenue")
spark.sql(f"""
DELETE FROM lh_gold.daily_revenue
WHERE revenue_date < '{archive_cutoff.strftime('%Y-%m-%d')}'
""")
# 2. VACUUM archived tables
spark.sql("VACUUM lh_archive.daily_revenue RETAIN 168 HOURS")
# 3. Check for data past retention period
retention_cutoff = datetime.now() - timedelta(days=365 * 7) # 7-year retention
expired_count = spark.sql(f"""
SELECT COUNT(*) as cnt FROM lh_archive.daily_revenue
WHERE revenue_date < '{retention_cutoff.strftime('%Y-%m-%d')}'
""").collect()[0]["cnt"]
if expired_count > 0:
# Log for manual review before deletion (never auto-delete without approval)
print(f"WARNING: {expired_count} records past 7-year retention period. "
"Requires manual approval for deletion.")
Secure Disposal¶
When data reaches end-of-life:
Disposal procedure:
1. Verify retention period has expired (automated check)
2. Generate disposal request with:
- Table name and record count
- Retention policy reference
- Regulatory basis for disposal
3. Approval workflow:
- Data steward approval
- Compliance officer approval (for regulated data)
- Legal approval (for litigation-hold data)
4. Execute disposal:
- DELETE from Delta table
- VACUUM to remove physical files
- Verify removal from all copies (shortcuts, exports)
5. Document disposal:
- Log in disposal registry
- Retain disposal certificate
- Update data catalog
Disposal registry table:
CREATE TABLE dbo.data_disposal_registry (
disposal_id INT IDENTITY(1,1) PRIMARY KEY,
disposal_date DATETIME2 DEFAULT GETUTCDATE(),
table_name NVARCHAR(256),
record_count BIGINT,
date_range_start DATE,
date_range_end DATE,
retention_policy NVARCHAR(256),
regulatory_basis NVARCHAR(512),
approved_by NVARCHAR(256),
approval_date DATETIME2,
disposal_method NVARCHAR(64), -- DELETE_VACUUM, CRYPTO_ERASE, PHYSICAL_DESTROY
verification_status NVARCHAR(32), -- PENDING, VERIFIED, FAILED
verified_by NVARCHAR(256),
verified_date DATETIME2,
notes NVARCHAR(MAX)
);
Summary¶
Comprehensive data governance in Microsoft Fabric requires:
- Microsoft Purview for classification, labeling, lineage tracking, and information protection
- A unified classification taxonomy that maps PII, PHI, CUI, FOUO, and public data across all domains
- Row-Level Security with dynamic DAX filters for multi-agency, multi-property, and multi-facility access control
- Column-Level Security with dynamic data masking and HIPAA de-identification for sensitive fields
- Data quality governance with enforced thresholds at each medallion layer and steward workflows for issue resolution
- Multi-framework compliance addressing FISMA, FedRAMP, HIPAA, CIPSEA, 42 CFR Part 2, and NIGC MICS simultaneously
- Controlled data sharing through OneLake shortcuts, cross-workspace access patterns, and B2B for external partners
- Comprehensive audit logging with anomaly detection, usage metrics, and compliance dashboards
- Lifecycle management with retention policies aligned to regulatory requirements and secure disposal procedures
Related Documents¶
- Security Guide -- Security architecture and controls
- Compliance Templates -- CTR, SAR, W-2G, MICS templates
- Error Handling & Monitoring -- Error logging and alerting
- Alerting & Data Activator -- Compliance alert patterns
- Workspaces & Naming -- Workspace organization