🔗 Delta Sharing Setup Guide¶
Configure Delta Sharing for secure cross-organization data sharing.
🎯 Overview¶
Delta Sharing is an open protocol for secure data sharing across organizations, platforms, and clouds without copying data.
Key Features¶
- Open Protocol: Works with any client supporting Delta Sharing
- No Data Copying: Share data in place from Delta Lake
- Fine-Grained Access: Control access at table and partition level
- Audit Trail: Track all data access
📋 Prerequisites¶
- Azure Databricks Premium or Enterprise tier
- Unity Catalog enabled
- Metastore admin privileges
- External storage configured
🔧 Implementation¶
Step 1: Enable Delta Sharing on Metastore¶
-- Enable Delta Sharing for your metastore
ALTER METASTORE
SET OWNER TO `metastore-admin@company.com`;
-- Verify Delta Sharing is enabled
DESCRIBE METASTORE;
Step 2: Create a Share¶
-- Create a share for external partners
CREATE SHARE IF NOT EXISTS partner_sales_data
COMMENT 'Sales data shared with partners';
-- Verify share creation
SHOW SHARES;
Step 3: Add Tables to Share¶
-- Add a table to the share
ALTER SHARE partner_sales_data
ADD TABLE gold.sales.daily_aggregates;
-- Add with partition filter (share only specific partitions)
ALTER SHARE partner_sales_data
ADD TABLE gold.sales.transactions
PARTITION (region = 'NA');
-- View share contents
SHOW ALL IN SHARE partner_sales_data;
Step 4: Create Recipients¶
-- Create a recipient for an external organization
CREATE RECIPIENT IF NOT EXISTS partner_acme
COMMENT 'ACME Corporation - Sales Team';
-- Get the activation link (send to recipient)
DESCRIBE RECIPIENT partner_acme;
Step 5: Grant Access¶
-- Grant share access to recipient
GRANT SELECT ON SHARE partner_sales_data TO RECIPIENT partner_acme;
-- Verify grants
SHOW GRANTS ON SHARE partner_sales_data;
👥 Recipient Setup¶
Python Client (Recipient Side)¶
import delta_sharing
# Load the share profile (provided by data provider)
profile_file = "partner_share_profile.json"
# List available shares
shares = delta_sharing.list_shares(profile_file)
print(f"Available shares: {shares}")
# List tables in a share
tables = delta_sharing.list_all_tables(profile_file)
for table in tables:
print(f"Table: {table.share}.{table.schema}.{table.name}")
# Load a shared table into Pandas
df = delta_sharing.load_as_pandas(
f"{profile_file}#partner_sales_data.gold.daily_aggregates"
)
print(df.head())
Spark Client (Recipient Side)¶
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.config("spark.jars.packages", "io.delta:delta-sharing-spark_2.12:1.0.0") \
.getOrCreate()
# Load shared table
shared_df = spark.read \
.format("deltaSharing") \
.load("partner_share_profile.json#partner_sales_data.gold.daily_aggregates")
shared_df.show()
🔐 Security Configuration¶
IP Access Lists¶
-- Restrict recipient access by IP
ALTER RECIPIENT partner_acme
SET IP_ACCESS_LIST = ('10.0.0.0/8', '192.168.1.0/24');
Token Rotation¶
# Rotate recipient authentication token
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Rotate token for a recipient
new_token = w.recipients.rotate_token(
name="partner_acme"
)
print(f"New activation link: {new_token.activation_url}")
📊 Monitoring and Auditing¶
Access Audit¶
-- Query audit logs for share access
SELECT
event_time,
user_identity.email as accessor,
action_name,
request_params.share_name,
request_params.table_name
FROM system.access.audit
WHERE service_name = 'deltasharing'
AND event_date > current_date() - INTERVAL 7 DAYS
ORDER BY event_time DESC;
📚 Related Documentation¶
Last Updated: January 2025