🔗 Shared Metadata Store - Azure Synapse Analytics¶
Unified metadata catalog that enables seamless table sharing across Spark, SQL, and Data Explorer engines in Azure Synapse Analytics.
🌟 Overview¶
The Shared Metadata Store in Azure Synapse Analytics provides a unified catalog that allows tables created in one compute engine (Spark, SQL, Data Explorer) to be automatically discovered and queried by other engines. This eliminates data silos and enables true polyglot analytics.
🔥 Key Benefits¶
- Cross-Engine Discovery: Tables automatically visible across all engines
- Schema Synchronization: Schema changes propagate automatically
- Security Inheritance: Access controls apply consistently
- Data Lineage: Track data flow across engines
- Simplified Governance: Single source of truth for metadata
🏗️ Architecture¶
graph TB
subgraph "Shared Metadata Store"
MetaDB[(Metadata<br/>Database)]
end
subgraph "Compute Engines"
Spark[Spark Pools<br/>Python, Scala, SQL]
ServerlessSQL[Serverless SQL<br/>T-SQL]
DedicatedSQL[Dedicated SQL<br/>T-SQL]
DataExplorer[Data Explorer<br/>KQL]
end
subgraph "Storage"
DataLake[Data Lake<br/>Storage Gen2]
end
Spark -->|Register Tables| MetaDB
ServerlessSQL -->|Query Metadata| MetaDB
DedicatedSQL -->|Query Metadata| MetaDB
DataExplorer -->|Query Metadata| MetaDB
MetaDB -->|Table Locations| DataLake
Spark -->|Read/Write| DataLake
ServerlessSQL -->|Read| DataLake 🎯 Cross-Engine Table Sharing¶
Create Table in Spark, Query in SQL¶
# Create Delta table in Spark
df = spark.read.parquet("/data/sales/*.parquet")
df.write.format("delta") \
.mode("overwrite") \
.saveAsTable("sales.customer_orders")
# Table automatically visible in SQL pools
-- Query Spark-created table in Serverless SQL
SELECT
customer_id,
COUNT(*) as order_count,
SUM(order_amount) as total_spent
FROM sales.customer_orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id
ORDER BY total_spent DESC;
Metadata Synchronization Example¶
# Spark: Create managed Delta table
spark.sql("""
CREATE TABLE IF NOT EXISTS analytics.sales_summary (
sale_date DATE,
region STRING,
product_category STRING,
total_revenue DECIMAL(18,2),
order_count INT
)
USING DELTA
PARTITIONED BY (sale_date)
LOCATION '/delta/sales_summary'
""")
# Insert data
spark.sql("""
INSERT INTO analytics.sales_summary
SELECT
order_date as sale_date,
region,
product_category,
SUM(amount) as total_revenue,
COUNT(*) as order_count
FROM sales.customer_orders
GROUP BY order_date, region, product_category
""")
-- Serverless SQL: Table immediately available
SELECT TOP 10 *
FROM analytics.sales_summary
ORDER BY total_revenue DESC;
-- Check table metadata
SELECT
TABLE_SCHEMA,
TABLE_NAME,
TABLE_TYPE
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'sales_summary';
🔒 Unified Security¶
Row-Level Security Across Engines¶
# Define security policy in Spark
spark.sql("""
CREATE OR REPLACE FUNCTION get_user_region()
RETURNS STRING
RETURN current_user()
""")
# Create secure view
spark.sql("""
CREATE OR REPLACE VIEW sales.secure_customer_orders AS
SELECT *
FROM sales.customer_orders
WHERE region = get_user_region()
""")
-- Same security applies in SQL
-- Users only see their region's data
SELECT * FROM sales.secure_customer_orders;
📊 Best Practices¶
1. Use Managed Tables for Shared Access¶
# ✅ GOOD: Managed table (metadata tracked)
df.write.format("delta") \
.mode("overwrite") \
.saveAsTable("sales.managed_table")
# ❌ LESS IDEAL: External path (not in catalog)
df.write.format("delta") \
.mode("overwrite") \
.save("/delta/external_path")
2. Consistent Naming Conventions¶
# Use database.table naming
spark.sql("CREATE DATABASE IF NOT EXISTS sales")
spark.sql("CREATE DATABASE IF NOT EXISTS analytics")
spark.sql("CREATE DATABASE IF NOT EXISTS ml_features")
# Organize by domain
# sales.customer_orders
# analytics.sales_summary
# ml_features.customer_features
3. Schema Evolution Management¶
# Enable schema evolution for Delta tables
spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")
# Add new columns safely
spark.sql("""
ALTER TABLE sales.customer_orders
ADD COLUMNS (
customer_segment STRING COMMENT 'Customer tier: Gold, Silver, Bronze',
lifetime_value DECIMAL(18,2) COMMENT 'Total customer value'
)
""")
# Changes visible across all engines immediately
📚 Related Resources¶
🎓 Implementation Guides¶
- Delta Lake Metadata
- Security Best Practices
- Data Governance
Last Updated: 2025-01-28 Feature: Shared Metadata Catalog Documentation Status: Complete