🥈 Silver Layer Notebooks
Purpose: Data cleansing, validation, deduplication, and standardization. Enforces schema contracts and business rules while maintaining referential integrity.
The Silver layer transforms raw Bronze data into cleansed, validated datasets. This layer applies data quality rules, resolves duplicates, enriches with reference data, and enforces compliance constraints (PII masking, SSN hashing).
📚 Notebook Inventory
| Notebook | Purpose | Input | Output |
01_silver_slot_cleansed.py | Cleanse slot telemetry, validate ranges, dedupe | bronze.slot_telemetry | silver.slot_cleansed |
02_silver_player_master.py | Create player master with SCD Type 2, hash PII | bronze.player_profile | silver.player_master |
03_silver_table_enriched.py | Enrich table games with dealer/pit info | bronze.table_games | silver.table_enriched |
04_silver_financial_reconciled.py | Reconcile transactions, detect anomalies | bronze.financial_transactions | silver.financial_reconciled |
05_silver_security_enriched.py | Enrich security events with location/zone data | bronze.security_events | silver.security_enriched |
06_silver_compliance_validated.py | Validate compliance rules, flag violations | bronze.compliance_events | silver.compliance_validated |
07_silver_tribal_health.py | HIPAA-compliant healthcare data cleansing, PHI masking | bronze.tribal_health | silver.tribal_health |
08_silver_dot_faa.py | Aviation safety data validation and standardization | bronze.dot_faa | silver.dot_faa |
09_silver_video_analytics.py | Video analytics event cleansing and enrichment | bronze.video_analytics | silver.video_analytics |
10_silver_people_movement.py | Movement data validation and zone assignment | bronze.people_movement | silver.people_movement |
11_silver_geolocation.py | Geolocation data cleansing, H3 indexing | bronze.geolocation | silver.geolocation |
12_silver_usda.py | USDA crop production & food safety validation | bronze.usda_* | silver.usda_crop_production, silver.usda_food_safety |
13_silver_sba.py | SBA loan data cleansing, NAICS validation | bronze.sba_* | silver.sba_ppp_loans, silver.sba_7a_504_loans |
14_silver_noaa.py | NOAA weather/storm/climate data validation | bronze.noaa_* | silver.noaa_weather, silver.noaa_storm_events, silver.noaa_climate |
15_silver_epa.py | EPA environmental data validation, AQI classification | bronze.epa_* | silver.epa_air_quality, silver.epa_toxic_releases, silver.epa_water_quality |
16_silver_doi.py | DOI natural resources data validation | bronze.doi_* | silver.doi_earthquakes, silver.doi_water_data, silver.doi_park_visitation |
⚡ Execution Order
graph TD
subgraph Bronze Sources
B1[bronze.slot_telemetry]
B2[bronze.player_profile]
B3[bronze.table_games]
B4[bronze.financial_transactions]
B5[bronze.security_events]
B6[bronze.compliance_events]
end
subgraph Silver Processing
S1[01_silver_slot_cleansed]
S2[02_silver_player_master]
S3[03_silver_table_enriched]
S4[04_silver_financial_reconciled]
S5[05_silver_security_enriched]
S6[06_silver_compliance_validated]
end
B1 --> S1
B2 --> S2
B3 --> S3
B4 --> S4
B5 --> S5
B6 --> S6
S2 --> S4
S2 --> S6
style S1 fill:#c0c0c0
style S2 fill:#c0c0c0
style S3 fill:#c0c0c0
style S4 fill:#c0c0c0
style S5 fill:#c0c0c0
style S6 fill:#c0c0c0 Note: 02_silver_player_master must complete before 04_silver_financial_reconciled and 06_silver_compliance_validated (player key dependency).
| Transformation | Description | Example |
| Deduplication | Remove duplicate records by business key | Dedupe by (machine_id, spin_timestamp) |
| Data Type Enforcement | Cast to proper types with validation | wager_amount → DECIMAL(10,2) |
| Null Handling | Apply default values or flag for review | player_tier → 'UNKNOWN' if null |
| PII Masking | Hash SSN, mask card numbers | SSN → SHA-256 hash |
| Range Validation | Validate business rules | wager_amount between 0 and max denomination |
| Referential Integrity | Validate foreign keys exist | player_id exists in silver.player_master |
| SCD Type 2 | Track historical changes | Player address, tier changes |
Common Silver Patterns
# Standard Silver cleansing pattern
from pyspark.sql.functions import sha2, regexp_replace, when, col
df_cleansed = (spark.table("bronze.player_profile")
# Deduplicate by business key
.dropDuplicates(["player_id", "_ingested_at"])
# Hash SSN for privacy
.withColumn("ssn_hash", sha2(col("ssn"), 256))
.drop("ssn")
# Mask card numbers (keep last 4)
.withColumn("card_masked", regexp_replace(col("card_number"), r"\d(?=\d{4})", "*"))
# Validate and default null tiers
.withColumn("player_tier", when(col("player_tier").isNull(), "BRONZE").otherwise(col("player_tier")))
# Add processing metadata
.withColumn("_processed_at", current_timestamp())
.withColumn("_is_valid", lit(True))
)
df_cleansed.write.format("delta").mode("overwrite").saveAsTable("silver.player_master")
📋 Dependencies
| Dependency | Type | Description |
| Bronze Layer | Data | All Bronze tables must be populated |
| Reference Data | Lookup | Denomination codes, player tiers, zone mappings |
| Compliance Rules | Config | CTR thresholds, W-2G limits |
| Schema Registry | Governance | Expected schema definitions |
Pre-requisites Checklist
✅ Validation Steps
Data Quality Checks
| Check | Table | Query | Expected |
| No Duplicates | slot_cleansed | Count vs distinct by key | Equal counts |
| Valid Ranges | slot_cleansed | wager_amount > 0 | 100% |
| PII Masked | player_master | ssn IS NULL | All records |
| Referential Integrity | financial_reconciled | Orphan player_ids | 0 |
| Compliance Flags | compliance_validated | CTR flagged ≥ $10,000 | Correct flagging |
Validation Commands
# Check for duplicates
slot_count = spark.sql("SELECT COUNT(*) FROM silver.slot_cleansed").first()[0]
distinct_count = spark.sql("""
SELECT COUNT(*) FROM (
SELECT DISTINCT machine_id, spin_timestamp
FROM silver.slot_cleansed
)
""").first()[0]
assert slot_count == distinct_count, "Duplicates detected!"
# Validate PII masking
pii_exposed = spark.sql("""
SELECT COUNT(*)
FROM silver.player_master
WHERE ssn IS NOT NULL OR LENGTH(card_number) = 16
""").first()[0]
assert pii_exposed == 0, "PII not properly masked!"
# Check referential integrity
orphans = spark.sql("""
SELECT COUNT(*)
FROM silver.financial_reconciled f
LEFT JOIN silver.player_master p ON f.player_id = p.player_id
WHERE p.player_id IS NULL
""").first()[0]
assert orphans == 0, f"Found {orphans} orphan records!"
Great Expectations Integration
# Run Great Expectations checkpoint
import great_expectations as gx
context = gx.get_context()
result = context.run_checkpoint(checkpoint_name="silver_checkpoint")
assert result.success, "Data quality checks failed!"
🎰 Casino Domain Context
Compliance Thresholds
| Rule | Threshold | Action |
| CTR (Currency Transaction Report) | ≥ $10,000 | Auto-flag for FinCEN reporting |
| Structuring Detection | Multiple \(8K-\)9.9K transactions | Flag for SAR review |
| W-2G (Gambling Winnings) | Slots ≥ $1,200, Keno ≥ $600, Poker ≥ $5,000 | IRS reporting required |
| Title 31 | Cash transactions by known players | Enhanced due diligence |
PII Handling Requirements
| Field | Treatment | Storage |
SSN | SHA-256 hash | Only hash stored |
Card Number | Mask (show last 4) | ****-****-****-1234 |
Date of Birth | Keep full date | Required for age verification |
Address | Encrypt at rest | Delta Lake encryption |
Phone/Email | Tokenize | Lookup table with restricted access |
Player Master SCD Type 2
# Track player tier changes over time
player_history = spark.sql("""
SELECT
player_id,
player_tier,
effective_from,
effective_to,
is_current
FROM silver.player_master_history
WHERE player_id = 'PLY-12345'
ORDER BY effective_from
""")
Next Steps: After Silver layer validation passes, proceed to Gold Layer for business aggregations and analytics.