Gold Layer¶

🥇 Gold Layer Notebooks¶

Purpose: Business-level aggregations, KPIs, star schema models, and analytics-ready datasets optimized for Direct Lake connectivity and Power BI consumption.

The Gold layer transforms cleansed Silver data into business intelligence assets. This layer creates dimensional models, calculates KPIs, builds aggregated views, and prepares compliance reports for regulatory submission.

📚 Notebook Inventory¶

Notebook	Purpose	Input	Output
`01_gold_slot_performance.py`	Slot machine KPIs, RTP, hold percentages	`silver.slot_cleansed`	`gold.fact_slot_performance`, `gold.dim_machine`
`02_gold_player_360.py`	Customer 360 view, lifetime value, segments	`silver.player_master`, `silver.financial_reconciled`	`gold.dim_player_360`
`03_gold_compliance_reporting.py`	CTR, SAR, W-2G regulatory reports	`silver.compliance_validated`, `silver.financial_reconciled`	`gold.fact_compliance_reports`
`04_gold_table_analytics.py`	Table game profitability, dealer performance	`silver.table_enriched`	`gold.fact_table_performance`
`05_gold_financial_summary.py`	Revenue, drop, cage operations summary	`silver.financial_reconciled`	`gold.fact_financial_summary`
`06_gold_security_dashboard.py`	Security incidents, response times, zone analysis	`silver.security_enriched`	`gold.fact_security_metrics`
`07_gold_tribal_health_360.py`	Population health KPIs, health equity metrics	`silver.tribal_health`	`gold.tribal_health_360`
`08_gold_dot_faa_analytics.py`	Aviation safety KPIs, delay analysis	`silver.dot_faa`	`gold.dot_faa_analytics`
`09_gold_video_security_kpis.py`	Video security KPIs, incident patterns	`silver.video_analytics`	`gold.video_security_kpis`
`10_gold_movement_analytics.py`	Foot traffic KPIs, zone performance	`silver.people_movement`	`gold.movement_analytics`
`11_gold_geolocation_insights.py`	Geospatial KPIs, proximity analytics	`silver.geolocation`	`gold.geolocation_insights`
`12_gold_usda_analytics.py`	Agricultural KPIs: crop rankings, yields, food safety index	`silver.usda_*`	`gold.usda_crop_summary`, `gold.usda_state_agriculture`, `gold.usda_food_safety_dashboard`, `gold.usda_executive_summary`
`13_gold_sba_analytics.py`	Small business KPIs: loan portfolio, economic impact, lender scorecards	`silver.sba_*`	`gold.sba_loan_portfolio`, `gold.sba_economic_impact`, `gold.sba_lender_scorecard`
`14_gold_noaa_analytics.py`	Weather/climate KPIs: storm impact, climate trends, severe weather risk	`silver.noaa_*`	`gold.noaa_weather_summary`, `gold.noaa_storm_impact`, `gold.noaa_climate_trends`, `gold.noaa_severe_weather_risk`
`15_gold_epa_analytics.py`	Environmental KPIs: AQI index, facility risk, water compliance, environmental scorecard	`silver.epa_*`	`gold.epa_air_quality_index`, `gold.epa_facility_risk`, `gold.epa_water_compliance`, `gold.epa_environmental_scorecard`
`16_gold_doi_analytics.py`	Natural resources KPIs: seismic risk, water resources, park performance	`silver.doi_*`	`gold.doi_seismic_risk`, `gold.doi_water_resources`, `gold.doi_park_performance`, `gold.doi_natural_resources_dashboard`

⚡ Execution Order¶

graph TD
    subgraph Silver Sources
        S1[silver.slot_cleansed]
        S2[silver.player_master]
        S3[silver.table_enriched]
        S4[silver.financial_reconciled]
        S5[silver.security_enriched]
        S6[silver.compliance_validated]
    end

    subgraph Gold Models
        G1[01_gold_slot_performance]
        G2[02_gold_player_360]
        G3[03_gold_compliance_reporting]
        G4[04_gold_table_analytics]
        G5[05_gold_financial_summary]
        G6[06_gold_security_dashboard]
    end

    subgraph Output Tables
        F1[fact_slot_performance]
        F2[dim_player_360]
        F3[fact_compliance_reports]
        F4[fact_table_performance]
        F5[fact_financial_summary]
        F6[fact_security_metrics]
    end

    S1 --> G1 --> F1
    S2 --> G2
    S4 --> G2 --> F2
    S6 --> G3
    S4 --> G3 --> F3
    S3 --> G4 --> F4
    S4 --> G5 --> F5
    S5 --> G6 --> F6

    style G1 fill:#ffd700
    style G2 fill:#ffd700
    style G3 fill:#ffd700
    style G4 fill:#ffd700
    style G5 fill:#ffd700
    style G6 fill:#ffd700

Note: Gold notebooks can run in parallel after Silver layer completes. Order shown is for logical grouping only.

🔧 Key Transformations¶

Transformation	Description	Business Value
Star Schema	Fact and dimension tables	Optimized for BI tools
KPI Calculation	Pre-aggregated metrics	Dashboard performance
Segmentation	Player tiers, machine categories	Marketing targeting
Time Intelligence	Period-over-period calculations	Trend analysis
Compliance Aggregation	Regulatory report formats	FinCEN submission
Direct Lake Optimization	Delta Lake with V-Order	Power BI performance

Common Gold Patterns¶

# Star Schema - Fact Table Pattern
from pyspark.sql.functions import sum, avg, count, date_trunc

fact_slot_performance = (spark.table("silver.slot_cleansed")
    # Aggregate to grain (machine, hour)
    .groupBy(
        col("machine_id"),
        date_trunc("hour", col("spin_timestamp")).alias("hour_key"),
        col("denomination")
    )
    .agg(
        count("*").alias("total_spins"),
        sum("wager_amount").alias("total_coin_in"),
        sum("payout_amount").alias("total_coin_out"),
        (sum("wager_amount") - sum("payout_amount")).alias("net_win"),
        avg("session_duration_seconds").alias("avg_session_duration")
    )
    # Calculate KPIs
    .withColumn("hold_percentage", 
        (col("net_win") / col("total_coin_in") * 100).cast("decimal(5,2)"))
    .withColumn("theoretical_rtp",
        (col("total_coin_out") / col("total_coin_in") * 100).cast("decimal(5,2)"))
)

# Write with V-Order optimization for Direct Lake
fact_slot_performance.write \
    .format("delta") \
    .option("delta.parquet.vorder.enabled", "true") \
    .mode("overwrite") \
    .saveAsTable("gold.fact_slot_performance")

Dimension Table Pattern¶

# SCD Type 1 Dimension with Business Attributes
dim_player_360 = (spark.table("silver.player_master")
    .select(
        col("player_id").alias("player_key"),
        col("player_id"),
        col("first_name"),
        col("last_name"),
        col("player_tier"),
        col("enrollment_date"),
        col("preferred_denomination"),
        col("home_casino")
    )
    # Add calculated attributes
    .join(lifetime_value_df, "player_id", "left")
    .join(visit_frequency_df, "player_id", "left")
    .withColumn("player_segment", 
        when(col("lifetime_value") > 100000, "VIP")
        .when(col("lifetime_value") > 10000, "HIGH")
        .when(col("lifetime_value") > 1000, "MEDIUM")
        .otherwise("CASUAL"))
)

📋 Dependencies¶

Dependency	Type	Description
Silver Layer	Data	All Silver tables must be validated
Date Dimension	Reference	`gold.dim_date` calendar table
Business Rules	Config	KPI formulas, threshold definitions
Direct Lake Semantic Model	BI	Power BI workspace connection

Pre-requisites Checklist¶

Silver layer validation passed
Date dimension table created
Power BI workspace provisioned
Direct Lake semantic model configured

✅ Validation Steps¶

Data Quality Checks¶

Check	Table	Query	Expected
KPI Ranges	`fact_slot_performance`	`hold_percentage BETWEEN 0 AND 20`	100%
Referential Integrity	`fact_*`	All dimension keys exist	0 orphans
Completeness	`fact_financial_summary`	Sum matches Silver totals	± 0.01%
Timeliness	All Gold tables	Last refresh timestamp	Within SLA
Compliance Accuracy	`fact_compliance_reports`	CTR count vs manual audit	Match

Validation Commands¶

# Validate KPI calculations
kpi_validation = spark.sql("""
    SELECT 
        SUM(CASE WHEN hold_percentage < 0 OR hold_percentage > 20 THEN 1 ELSE 0 END) as invalid_hold,
        SUM(CASE WHEN theoretical_rtp < 80 OR theoretical_rtp > 100 THEN 1 ELSE 0 END) as invalid_rtp
    FROM gold.fact_slot_performance
""").first()
assert kpi_validation.invalid_hold == 0, "Invalid hold percentages found!"
assert kpi_validation.invalid_rtp == 0, "Invalid RTP values found!"

# Verify totals match Silver layer
gold_total = spark.sql("SELECT SUM(total_coin_in) FROM gold.fact_slot_performance").first()[0]
silver_total = spark.sql("SELECT SUM(wager_amount) FROM silver.slot_cleansed").first()[0]
variance = abs(gold_total - silver_total) / silver_total
assert variance < 0.0001, f"Variance {variance:.4%} exceeds threshold!"

# Check Direct Lake compatibility
from delta import DeltaTable
dt = DeltaTable.forName(spark, "gold.fact_slot_performance")
history = dt.history(1).select("operationMetrics").first()
print(f"Files: {history.operationMetrics.get('numFiles', 'N/A')}")

Tutorial: Day 2: Gold Aggregation Tutorial
Previous Layer: Silver Layer Notebooks
Power BI: Direct Lake Semantic Model Setup
Dashboards: Power BI Report Templates

🎰 Casino Domain Context¶

Key Performance Indicators¶

KPI	Formula	Business Use
Hold %	(Coin In - Coin Out) / Coin In × 100	Machine profitability
Theoretical RTP	Coin Out / Coin In × 100	Regulatory compliance
Theoretical Win	Coin In × Theoretical Hold %	Revenue forecasting
Actual Win	Coin In - Coin Out	P&L reporting
ADT (Average Daily Theoretical)	Avg bet × Hands/Hr × Hours × House Edge	Player valuation

Player 360 Metrics¶

Metric	Description	Segmentation Use
Lifetime Value	Total net win from player	VIP identification
Visit Frequency	Visits per month	Engagement scoring
Average Bet	Mean wager per session	Risk assessment
Game Preference	Most played game type	Marketing targeting
Comp Ratio	Comps given / Theoretical win	Reinvestment tracking

Compliance Reporting Tables¶

-- CTR Summary for FinCEN
SELECT 
    report_date,
    player_id,
    player_name,
    ssn_hash,
    total_cash_in,
    total_cash_out,
    transaction_count,
    filing_status
FROM gold.fact_compliance_reports
WHERE report_type = 'CTR'
  AND total_cash_in >= 10000

-- W-2G Summary for IRS
SELECT 
    tax_year,
    player_id,
    game_type,
    win_amount,
    federal_withholding,
    state_withholding
FROM gold.fact_compliance_reports
WHERE report_type = 'W2G'

Star Schema Diagram¶

                          ┌─────────────────┐
                          │   dim_date      │
                          ├─────────────────┤
                          │ date_key (PK)   │
                          │ calendar_date   │
                          │ day_of_week     │
                          │ fiscal_period   │
                          └────────┬────────┘
                                   │
┌─────────────────┐       ┌────────┴────────┐       ┌─────────────────┐
│  dim_machine    │       │ fact_slot_perf  │       │  dim_player_360 │
├─────────────────┤       ├─────────────────┤       ├─────────────────┤
│ machine_key (PK)│◄──────│ machine_key(FK) │       │ player_key (PK) │
│ machine_id      │       │ date_key (FK)   │──────►│ player_id       │
│ denomination    │       │ player_key (FK) │       │ player_tier     │
│ manufacturer    │       │ total_spins     │       │ lifetime_value  │
│ floor_location  │       │ total_coin_in   │       │ player_segment  │
└─────────────────┘       │ hold_percentage │       └─────────────────┘
                          └─────────────────┘

🔌 Direct Lake Integration¶

Optimization for Power BI¶

# Enable V-Order for optimal Direct Lake performance
spark.conf.set("spark.databricks.delta.properties.defaults.parquet.vorder.enabled", "true")

# Optimize file sizes (target 128MB-256MB)
spark.sql("OPTIMIZE gold.fact_slot_performance ZORDER BY (date_key, machine_key)")

# Vacuum old versions (maintain 7 days for time travel)
spark.sql("VACUUM gold.fact_slot_performance RETAIN 168 HOURS")

Semantic Model Refresh¶

Gold tables are configured for automatic refresh to Direct Lake semantic model: - Refresh Frequency: Every 15 minutes for operational dashboards - Full Refresh: Nightly at 2:00 AM for historical recalculations

Next Steps: After Gold layer completes, configure Direct Lake semantic model and build Power BI reports.

← PreviousSilver NotebooksRead more →Next →ML NotebooksRead more →