Home > Docs > Features > OneLake Iceberg Interoperability
🧊 OneLake Iceberg Interoperability¶
Seamless Multi-Engine Data Access with Delta-Iceberg Virtualization
Last Updated: 2026-04-13 | Version: 1.0.0
📑 Table of Contents¶
- 🎯 Overview
- 🏗️ Architecture
- 🔄 Delta-as-Iceberg Virtualization
- ❄️ Snowflake Bidirectional
- 🔌 OneLake Table APIs
- 🎰 Casino Use Cases
- 🏛️ Federal Use Cases
- ⚙️ Configuration Guide
- 📊 Performance Considerations
- ⚠️ Limitations
- 📚 References
🎯 Overview¶
OneLake now supports Apache Iceberg format alongside Delta Lake as a Generally Available feature (February 2026). Delta tables stored in OneLake can be virtualized as Iceberg without any data movement or duplication, enabling external compute engines to read Fabric-managed data through the industry-standard Iceberg protocol. Snowflake bidirectional interoperability reached GA simultaneously, allowing Snowflake to write Iceberg tables directly into OneLake and Fabric to read Snowflake-managed Iceberg data via shortcuts.
Key Capabilities¶
| Capability | Description |
|---|---|
| Delta-as-Iceberg Virtualization | Any Delta table in OneLake is automatically readable as Iceberg -- no ETL, no data copy |
| Iceberg REST Catalog | OneLake exposes a standards-compliant Iceberg REST Catalog for external engine discovery |
| Snowflake Bidirectional | Snowflake writes Iceberg to OneLake; Fabric reads Snowflake Iceberg via shortcuts |
| OneLake Table APIs (GA) | Programmatic table management via REST -- create, read, update, delete across both formats |
| Multi-Engine Read | Spark, Trino, Presto, Databricks, DuckDB, and Snowflake all read the same physical data |
| Zero-Copy Sharing | Share data with partners and agencies without moving or duplicating files |
Why Iceberg Interoperability Matters¶
Organizations rarely operate on a single analytics engine. Gaming operators may use Fabric internally but share data with regulators running Snowflake. Federal agencies need to publish datasets that state departments consume via Spark or Trino. Iceberg interoperability eliminates the data copies and ETL pipelines that traditionally bridge these engine boundaries.
flowchart TB
subgraph OneLake["🔷 OneLake"]
DT["📊 Delta Lake Tables<br/>Native Fabric Format"]
VL["🧊 Iceberg Virtualization<br/>Auto-Generated Metadata"]
IC["🧊 Iceberg Tables<br/>Written by External Engines"]
end
subgraph Fabric["🟣 Microsoft Fabric"]
LH["🏠 Lakehouse"]
WH["🏢 Warehouse"]
NB["📓 Notebooks"]
PBI["📊 Power BI"]
end
subgraph External["🟢 External Engines"]
SF["❄️ Snowflake"]
SP["⚡ Apache Spark"]
TR["🔍 Trino / Presto"]
DB["🔶 Databricks"]
DK["🦆 DuckDB"]
end
DT --> VL
VL --> External
IC --> Fabric
LH --> DT
WH --> DT
NB --> DT
PBI --> DT
SF -->|"Writes Iceberg"| IC
style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
style Fabric fill:#6C3483,stroke:#4A235A,color:#fff
style External fill:#27AE60,stroke:#1E8449,color:#fff 🏗️ Architecture¶
End-to-End Data Flow¶
The Iceberg interoperability architecture introduces a virtualization layer between OneLake's physical Delta storage and external consumers. This layer generates Iceberg-compatible metadata files on the fly, translating Delta's transaction log into Iceberg manifest lists, manifest files, and snapshots.
flowchart LR
subgraph Writers["✏️ Data Writers"]
FW["Fabric Notebooks<br/>Pipelines, Dataflows"]
SW["Snowflake<br/>External Volume"]
OW["Other Engines<br/>via Iceberg REST"]
end
subgraph OneLake["🔷 OneLake Storage"]
DP["Delta Protocol<br/>Parquet + _delta_log"]
IP["Iceberg Protocol<br/>Parquet + metadata/"]
VZ["Virtualization Engine<br/>Delta → Iceberg Translation"]
end
subgraph APIs["🔌 Access Layer"]
RC["Iceberg REST Catalog<br/>(GA)"]
DL["Delta Lake APIs<br/>(GA)"]
AB["ABFS / OneLake APIs<br/>(GA)"]
end
subgraph Readers["👁️ Data Readers"]
FR["Fabric (Native Delta)"]
SR["Snowflake (Iceberg)"]
ER["Spark / Trino / Presto<br/>(Iceberg)"]
end
FW --> DP
SW --> IP
OW --> IP
DP --> VZ --> RC
DP --> DL
IP --> AB
RC --> SR & ER
DL --> FR
AB --> FR
style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
style APIs fill:#E67E22,stroke:#CA6F1E,color:#fff
style Readers fill:#27AE60,stroke:#1E8449,color:#fff Component Responsibilities¶
| Component | Role | Protocol |
|---|---|---|
| OneLake Storage | Single-copy physical Parquet files for both Delta and Iceberg | ABFS, REST |
| Virtualization Engine | Translates Delta transaction log to Iceberg metadata on read | Internal |
| Iceberg REST Catalog | Exposes OneLake tables as Iceberg catalog entries per the REST spec | HTTPS / REST |
| Delta Lake APIs | Provides Delta-native access for Fabric workloads | HTTPS / REST |
| OneLake Table APIs | Unified CRUD operations for table management across formats | HTTPS / REST |
| Shortcuts | Zero-copy pointers to Iceberg tables written by external engines | ABFS |
Metadata Architecture¶
When a Delta table is accessed via the Iceberg protocol, OneLake dynamically generates the required Iceberg metadata hierarchy:
onelake://<workspace>/<lakehouse>/Tables/<table_name>/
├── _delta_log/ # Delta transaction log (source of truth)
│ ├── 00000000000000000000.json
│ ├── 00000000000000000001.json
│ └── ...
├── _iceberg_metadata/ # Auto-generated Iceberg metadata
│ ├── v1.metadata.json # Table metadata (schema, partitioning, properties)
│ ├── snap-<id>.avro # Snapshot manifest list
│ └── <manifest-id>.avro # Manifest file (file-level stats)
└── part-00000-*.parquet # Shared physical data files (no duplication)
📝 Note: The
_iceberg_metadata/directory is generated and maintained automatically by OneLake. You do not need to create or manage these files. They are refreshed each time the underlying Delta table is modified.
🔄 Delta-as-Iceberg Virtualization¶
How It Works¶
Delta-as-Iceberg virtualization is the cornerstone of OneLake's interoperability strategy. When an external engine requests a table through the Iceberg REST Catalog, OneLake performs the following steps:
- Catalog Lookup -- The REST Catalog maps the requested table namespace and name to a OneLake path
- Delta Log Read -- OneLake reads the Delta transaction log (
_delta_log/) to determine the current table state - Metadata Translation -- The Delta schema, partitioning, and file statistics are translated into Iceberg metadata format
- Metadata Caching -- Generated Iceberg metadata is cached and invalidated when the Delta log changes
- Data File Reference -- Iceberg metadata points to the same physical Parquet files that Delta uses -- no data copy occurs
- Response -- The external engine receives standard Iceberg metadata and reads Parquet files directly
Enabling Virtualization¶
Delta-as-Iceberg is enabled by default for all Lakehouse tables in OneLake. No explicit configuration is required. To verify and manage the feature:
Lakehouse Settings → OneLake → Table Format Interoperability
├── Delta-as-Iceberg Virtualization → Enabled (default)
├── Iceberg REST Catalog Endpoint → https://<workspace>.onelake.fabric.microsoft.com/iceberg
└── Catalog Authentication → Microsoft Entra ID (OAuth 2.0)
Schema Compatibility¶
Delta and Iceberg support overlapping but not identical type systems. OneLake handles the mapping automatically:
| Delta Type | Iceberg Type | Notes |
|---|---|---|
STRING | string | Direct mapping |
INT / LONG | int / long | Direct mapping |
FLOAT / DOUBLE | float / double | Direct mapping |
BOOLEAN | boolean | Direct mapping |
DATE | date | Direct mapping |
TIMESTAMP | timestamptz | Converted to UTC-normalized timestamp |
TIMESTAMP_NTZ | timestamp | Direct mapping (no timezone) |
DECIMAL(p,s) | decimal(p,s) | Direct mapping |
BINARY | binary | Direct mapping |
ARRAY<T> | list<T> | Element type recursively mapped |
MAP<K,V> | map<K,V> | Key/value types recursively mapped |
STRUCT<...> | struct<...> | Field types recursively mapped |
⚠️ Warning: Delta's
VOIDtype and certain complex nested types with column mapping modes may not translate cleanly to Iceberg. Test schema compatibility before exposing complex tables to external consumers.
Performance Characteristics¶
| Aspect | Detail |
|---|---|
| First Read Latency | 200-500ms overhead for initial metadata generation |
| Subsequent Reads | Near-zero overhead -- metadata is cached until Delta log changes |
| Cache Invalidation | Automatic when new commits appear in _delta_log/ |
| Data Scan Performance | Identical to native Iceberg -- same Parquet files, same column statistics |
| Concurrent Readers | No limit -- metadata is read-only for external engines |
| Write Path | External engines cannot write through the virtualized Iceberg view (use Iceberg REST Catalog for writes) |
❄️ Snowflake Bidirectional¶
Snowflake and Microsoft Fabric now support bidirectional data sharing through Iceberg tables stored in OneLake. This eliminates the traditional approach of copying data between platforms via staged files or ETL pipelines.
How Bidirectional Interop Works¶
flowchart LR
subgraph Snowflake["❄️ Snowflake"]
SE["Snowflake Engine"]
ST["Snowflake Tables"]
EV["External Volume<br/>(OneLake)"]
end
subgraph OneLake["🔷 OneLake"]
IT["Iceberg Tables<br/>(Written by Snowflake)"]
DT["Delta Tables<br/>(Written by Fabric)"]
VZ["Virtualization<br/>Delta → Iceberg"]
end
subgraph Fabric["🟣 Fabric"]
SC["Shortcut to<br/>Iceberg Table"]
LH["Lakehouse"]
PBI["Power BI"]
end
SE -->|"CREATE ICEBERG TABLE"| EV
EV -->|"Writes Parquet + Metadata"| IT
IT -->|"Shortcut"| SC --> LH --> PBI
DT --> VZ
VZ -->|"Iceberg REST Catalog"| SE
style Snowflake fill:#29B5E8,stroke:#1A8FBF,color:#fff
style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
style Fabric fill:#6C3483,stroke:#4A235A,color:#fff Direction 1: Snowflake Writes to OneLake¶
Snowflake can write Iceberg tables directly to OneLake via external volumes. Fabric then reads these tables through shortcuts.
Step 1: Create an External Volume in Snowflake
-- In Snowflake, create an external volume pointing to OneLake
CREATE OR REPLACE EXTERNAL VOLUME onelake_volume
STORAGE_LOCATIONS = (
(
NAME = 'fabric_onelake'
STORAGE_BASE_URL = 'azure://onelake.dfs.fabric.microsoft.com/<workspace_id>/<lakehouse_id>/Tables/'
AZURE_TENANT_ID = '<tenant_id>'
)
);
-- Grant Snowflake's service principal access to OneLake
-- (Requires Microsoft Entra ID app registration and RBAC assignment)
Step 2: Create an Iceberg Table in Snowflake
-- Create a managed Iceberg table that writes to OneLake
CREATE OR REPLACE ICEBERG TABLE gaming_analytics.public.player_summary
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'onelake_volume'
BASE_LOCATION = 'player_summary/'
AS
SELECT
player_id,
SUM(total_wagered) AS lifetime_wagered,
SUM(total_won) AS lifetime_won,
COUNT(DISTINCT session_date) AS total_visits,
MAX(session_date) AS last_visit
FROM gaming_analytics.public.player_sessions
GROUP BY player_id;
Step 3: Create a Shortcut in Fabric
Lakehouse → Tables → New Shortcut → OneLake
Source: <workspace>/<lakehouse>/Tables/player_summary
Name: snowflake_player_summary
💡 Tip: The shortcut is a zero-copy pointer. When Snowflake updates the Iceberg table, Fabric sees the changes immediately without any sync process.
Direction 2: Fabric Tables Readable by Snowflake¶
Snowflake reads Fabric Delta tables through the Iceberg REST Catalog exposed by OneLake.
-- In Snowflake, create a catalog integration for OneLake
CREATE OR REPLACE CATALOG INTEGRATION onelake_catalog
CATALOG_SOURCE = REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = '<workspace_name>.<lakehouse_name>'
REST_CONFIG = (
CATALOG_URI = 'https://<workspace>.onelake.fabric.microsoft.com/iceberg'
WAREHOUSE = '<lakehouse_id>'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = '<client_id>'
OAUTH_CLIENT_SECRET = '<client_secret>'
OAUTH_TOKEN_URI = 'https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token'
);
-- Query a Fabric Delta table as if it were native Snowflake
SELECT *
FROM onelake_catalog.lh_gold.gold_slot_performance
WHERE gaming_date >= DATEADD(DAY, -7, CURRENT_DATE())
ORDER BY hold_pct ASC
LIMIT 100;
Snowflake Item in OneLake¶
The new Snowflake item in OneLake provides a managed connection to a Snowflake account, simplifying bidirectional setup:
Workspace → + New → Snowflake (Preview)
├── Snowflake Account URL: <account>.snowflakecomputing.com
├── Authentication: OAuth (Microsoft Entra ID) or Key Pair
├── Default Warehouse: COMPUTE_WH
├── Default Database: GAMING_ANALYTICS
└── Sync Mode: Bidirectional Iceberg
Authentication Setup¶
| Auth Method | Direction | Setup |
|---|---|---|
| OAuth (Entra ID) | Both | Register app in Microsoft Entra ID, configure Snowflake security integration |
| Service Principal | Snowflake → OneLake | Create SPN with Storage Blob Data Contributor on OneLake |
| Key Pair | Fabric → Snowflake | Generate RSA key pair, assign public key to Snowflake user |
| Managed Identity | Fabric → OneLake | Automatic for Fabric workloads (no configuration needed) |
📝 Note: For federal workloads, ensure the Snowflake account is deployed in a FedRAMP-authorized region and that OAuth tokens are scoped to the minimum required permissions.
🔌 OneLake Table APIs¶
The OneLake Table APIs (GA) provide a unified programmatic interface for managing tables across both Delta and Iceberg formats. These APIs follow the Iceberg REST Catalog specification and extend it with Delta-specific operations.
API Endpoints¶
| Operation | Method | Endpoint | Description |
|---|---|---|---|
| List Namespaces | GET | /iceberg/v1/namespaces | List all lakehouses in the workspace |
| List Tables | GET | /iceberg/v1/namespaces/{ns}/tables | List tables in a lakehouse |
| Get Table Metadata | GET | /iceberg/v1/namespaces/{ns}/tables/{table} | Retrieve full table metadata |
| Create Table | POST | /iceberg/v1/namespaces/{ns}/tables | Create a new table |
| Drop Table | DELETE | /iceberg/v1/namespaces/{ns}/tables/{table} | Delete a table |
| Load Table | POST | /iceberg/v1/namespaces/{ns}/tables/{table} | Load table for reading |
| Rename Table | POST | /iceberg/v1/tables/rename | Rename an existing table |
Python Examples¶
List Tables in a Lakehouse¶
import requests
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
token = credential.get_token("https://storage.azure.com/.default")
base_url = "https://<workspace>.onelake.fabric.microsoft.com/iceberg/v1"
headers = {
"Authorization": f"Bearer {token.token}",
"Content-Type": "application/json"
}
# List all tables in the lakehouse
response = requests.get(
f"{base_url}/namespaces/lh_gold/tables",
headers=headers
)
tables = response.json()
for table in tables["identifiers"]:
print(f"Table: {table['namespace']}.{table['name']}")
Get Table Metadata (Schema, Partitioning, Statistics)¶
# Get full metadata for a specific table
response = requests.get(
f"{base_url}/namespaces/lh_gold/tables/gold_slot_performance",
headers=headers
)
metadata = response.json()
# Print schema information
schema = metadata["metadata"]["schemas"][-1] # Current schema
print(f"Schema ID: {schema['schema-id']}")
for field in schema["fields"]:
print(f" {field['name']}: {field['type']} (nullable={field.get('required', True)})")
# Print partition spec
partition_spec = metadata["metadata"].get("partition-specs", [])
for spec in partition_spec:
for field in spec["fields"]:
print(f" Partition: {field['name']} ({field['transform']})")
# Print snapshot summary (row count, file count)
snapshots = metadata["metadata"].get("snapshots", [])
if snapshots:
latest = snapshots[-1]
summary = latest.get("summary", {})
print(f" Rows: {summary.get('total-records', 'N/A')}")
print(f" Files: {summary.get('total-data-files', 'N/A')}")
print(f" Size: {summary.get('total-files-size', 'N/A')} bytes")
Connect with PyIceberg¶
from pyiceberg.catalog import load_catalog
# Connect to OneLake via the Iceberg REST Catalog
catalog = load_catalog(
"onelake",
**{
"type": "rest",
"uri": "https://<workspace>.onelake.fabric.microsoft.com/iceberg",
"credential": "<oauth_token>",
"warehouse": "<lakehouse_id>",
}
)
# List namespaces (lakehouses)
namespaces = catalog.list_namespaces()
print(f"Namespaces: {namespaces}")
# Load a table
table = catalog.load_table("lh_gold.gold_slot_performance")
print(f"Schema: {table.schema()}")
print(f"Snapshots: {len(table.history())}")
# Scan the table with filters
scan = table.scan(
row_filter="gaming_date >= '2026-04-01'",
selected_fields=("machine_id", "hold_pct", "coin_in", "gaming_date")
)
# Convert to pandas DataFrame
df = scan.to_pandas()
print(f"Rows: {len(df)}")
print(df.head(10))
Connect with Apache Spark¶
# PySpark configuration for reading OneLake tables via Iceberg
spark = SparkSession.builder \
.appName("OneLake-Iceberg-Reader") \
.config("spark.sql.catalog.onelake", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.onelake.type", "rest") \
.config("spark.sql.catalog.onelake.uri",
"https://<workspace>.onelake.fabric.microsoft.com/iceberg") \
.config("spark.sql.catalog.onelake.credential", "<oauth_token>") \
.config("spark.sql.catalog.onelake.warehouse", "<lakehouse_id>") \
.getOrCreate()
# Query a Fabric Delta table through the Iceberg catalog
df = spark.sql("""
SELECT machine_id, gaming_date, hold_pct, coin_in
FROM onelake.lh_gold.gold_slot_performance
WHERE gaming_date >= '2026-04-01'
ORDER BY hold_pct ASC
""")
df.show(10)
🎰 Casino Use Cases¶
Iceberg interoperability unlocks several high-value scenarios for casino and gaming operations where data must cross organizational and platform boundaries.
Use Case 1: Share Analytics with External Partners¶
Casino operators often work with slot manufacturers, game developers, and consulting firms that run their own analytics platforms -- frequently Snowflake or Databricks.
flowchart LR
subgraph Casino["🎰 Casino (Fabric)"]
GS["Gold: Slot<br/>Performance"]
GP["Gold: Player<br/>Analytics"]
end
subgraph OneLake["🔷 OneLake"]
VZ["Iceberg<br/>Virtualization"]
end
subgraph Partners["🤝 Partners"]
MFG["Slot Manufacturer<br/>(Snowflake)"]
CON["Consulting Firm<br/>(Databricks)"]
REG["Gaming Regulator<br/>(Trino)"]
end
Casino --> VZ
VZ -->|"Read-Only"| Partners
style Casino fill:#6C3483,stroke:#4A235A,color:#fff
style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
style Partners fill:#27AE60,stroke:#1E8449,color:#fff Benefits: - Slot manufacturers analyze machine performance data without receiving data extracts - Consulting firms run analytics on the same data Fabric uses -- no stale copies - Gaming regulators query compliance data directly rather than waiting for file submissions
Use Case 2: Regulatory Reporting to Iceberg-Based Systems¶
State gaming commissions increasingly adopt modern data platforms. Iceberg interoperability allows casinos to expose compliance data (CTR, SAR, W-2G) through a standard protocol:
# Gaming regulator queries casino compliance data from their Trino cluster
# Trino configuration: catalog = onelake_casino
# Regulator runs:
trino> SELECT
filing_type,
COUNT(*) AS filing_count,
SUM(transaction_amount) AS total_amount
FROM onelake_casino.lh_gold.gold_compliance_filings
WHERE filing_date >= DATE '2026-01-01'
GROUP BY filing_type
ORDER BY filing_count DESC;
Use Case 3: Multi-Engine Analytics¶
Large gaming enterprises may use multiple analytics engines for different workloads. Iceberg interoperability ensures all engines read from the same data:
| Engine | Workload | Access Method |
|---|---|---|
| Fabric (PySpark) | Medallion ETL, data engineering | Native Delta |
| Power BI | Executive dashboards, Direct Lake | Native Delta |
| Snowflake | Ad-hoc analyst queries, partner sharing | Iceberg REST Catalog |
| Databricks | ML model training, feature engineering | Iceberg REST Catalog |
| DuckDB | Local development, data exploration | Iceberg REST Catalog |
💡 Tip: Multi-engine access works best when one engine (Fabric) is the authoritative writer and all others are readers. Avoid concurrent writes from multiple engines to the same table -- use separate tables or namespaces per writer.
🏛️ Federal Use Cases¶
Federal agencies that publish open data or share datasets across departments benefit significantly from Iceberg interoperability, as it eliminates the need to produce and distribute data extracts.
Use Case 1: Cross-Agency Data Sharing¶
flowchart TB
subgraph Federal["🏛️ Federal Agencies (Fabric)"]
USDA["🌾 USDA<br/>Crop Production"]
EPA["🌊 EPA<br/>Toxic Releases"]
NOAA["🌀 NOAA<br/>Weather Data"]
SBA["💼 SBA<br/>Loan Programs"]
DOI["🏔️ DOI<br/>Land & Geology"]
end
subgraph OneLake["🔷 OneLake Iceberg Catalog"]
CAT["Federated<br/>Catalog"]
end
subgraph Consumers["🟢 External Consumers"]
STATE["State Departments<br/>(Snowflake)"]
UNIV["Universities<br/>(Spark Clusters)"]
NGO["Research NGOs<br/>(Trino)"]
PUB["Public Data Portals<br/>(DuckDB / Parquet)"]
end
Federal --> CAT --> Consumers
style Federal fill:#6C3483,stroke:#4A235A,color:#fff
style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
style Consumers fill:#27AE60,stroke:#1E8449,color:#fff Use Case 2: USDA Crop Data for State Agriculture Departments¶
State agriculture departments using Snowflake can query USDA crop production data directly from OneLake:
-- State Dept of Agriculture (Snowflake) queries USDA data
SELECT
state,
commodity,
year,
production_value,
yield_per_acre,
acres_harvested
FROM onelake_federal.lh_gold.gold_usda_crop_production
WHERE state = 'IOWA'
AND commodity IN ('CORN', 'SOYBEANS', 'WHEAT')
AND year >= 2020
ORDER BY year DESC, commodity;
Use Case 3: NOAA Weather Data for Research Systems¶
University research groups running Spark clusters can connect to NOAA weather data without downloading bulk CSV files:
# University research Spark cluster reads NOAA data via Iceberg
storm_events = spark.sql("""
SELECT event_type, state, year,
COUNT(*) AS event_count,
AVG(damage_property) AS avg_property_damage
FROM onelake_federal.lh_gold.gold_noaa_storm_summary
WHERE year >= 2015
AND event_type IN ('Hurricane', 'Tornado', 'Flood')
GROUP BY event_type, state, year
ORDER BY year DESC, event_count DESC
""")
storm_events.write.format("parquet").save("/research/storm_analysis_2026/")
Use Case 4: Interagency Analytics Without Data Movement¶
Analysts performing cross-agency research can query datasets from multiple agencies through a single Iceberg catalog without any data leaving OneLake:
-- Cross-agency query: EPA facilities near SBA-funded businesses
-- Executed from any Iceberg-compatible engine
SELECT
e.facility_name,
e.chemical_name,
e.total_releases_lbs,
s.business_name,
s.loan_amount,
s.naics_code
FROM onelake_federal.lh_silver.silver_epa_tri_releases e
JOIN onelake_federal.lh_silver.silver_sba_7a_loans s
ON e.zip_code = s.zip_code
WHERE e.year = 2025
AND e.total_releases_lbs > 1000
ORDER BY e.total_releases_lbs DESC
LIMIT 50;
📝 Note: Cross-agency queries through a single Iceberg catalog require all agencies' data to reside within the same OneLake workspace or federated catalog. For data sovereignty requirements, use workspace-level RBAC to control access.
⚙️ Configuration Guide¶
Step 1: Enable Iceberg Virtualization (Workspace Level)¶
Navigate to workspace settings to confirm Iceberg support is active:
Workspace Settings → OneLake → Interoperability
├── Delta-as-Iceberg Virtualization → Enabled (default for new workspaces)
├── Iceberg REST Catalog → Enabled
└── External Engine Access → Configure per security requirements
Step 2: Configure the Iceberg REST Catalog Endpoint¶
Retrieve the catalog endpoint for your workspace:
# Discover the Iceberg REST Catalog endpoint
import requests
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
token = credential.get_token("https://storage.azure.com/.default")
# Catalog configuration endpoint
workspace_id = "<workspace_guid>"
catalog_url = f"https://onelake.fabric.microsoft.com/{workspace_id}/iceberg/v1/config"
response = requests.get(
catalog_url,
headers={"Authorization": f"Bearer {token.token}"}
)
config = response.json()
print(f"Catalog URI: {config['defaults']['uri']}")
print(f"Warehouse: {config['defaults']['warehouse']}")
Step 3: Configure Snowflake External Volume¶
-- 1. Create storage integration (admin-level operation)
CREATE OR REPLACE STORAGE INTEGRATION onelake_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'AZURE'
ENABLED = TRUE
AZURE_TENANT_ID = '<your_tenant_id>'
STORAGE_ALLOWED_LOCATIONS = (
'azure://onelake.dfs.fabric.microsoft.com/<workspace_id>/'
);
-- 2. Retrieve the Snowflake service principal for RBAC
DESC STORAGE INTEGRATION onelake_integration;
-- Note: AZURE_CONSENT_URL and AZURE_MULTI_TENANT_APP_NAME
-- 3. Create external volume
CREATE OR REPLACE EXTERNAL VOLUME onelake_vol
STORAGE_LOCATIONS = (
(
NAME = 'fabric_lakehouse'
STORAGE_BASE_URL = 'azure://onelake.dfs.fabric.microsoft.com/<workspace_id>/<lakehouse_id>/Tables/'
AZURE_TENANT_ID = '<tenant_id>'
STORAGE_INTEGRATION = onelake_integration
)
);
Step 4: Grant OneLake RBAC to External Service Principals¶
In the Azure portal or via CLI, assign the Snowflake service principal access to OneLake:
# Assign Storage Blob Data Contributor to the Snowflake SPN
az role assignment create \
--assignee "<snowflake_service_principal_id>" \
--role "Storage Blob Data Contributor" \
--scope "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/onelake/blobServices/default/containers/<workspace_id>"
Step 5: Configure Catalog Federation (Multi-Workspace)¶
For organizations with multiple workspaces (e.g., one per agency), set up catalog federation:
# Federated catalog configuration
federation_config = {
"catalogs": [
{
"name": "casino_ops",
"uri": "https://ws-casino.onelake.fabric.microsoft.com/iceberg",
"warehouse": "<casino_lakehouse_id>"
},
{
"name": "federal_usda",
"uri": "https://ws-federal.onelake.fabric.microsoft.com/iceberg",
"warehouse": "<usda_lakehouse_id>"
},
{
"name": "federal_epa",
"uri": "https://ws-federal.onelake.fabric.microsoft.com/iceberg",
"warehouse": "<epa_lakehouse_id>"
}
]
}
Step 6: Validate Connectivity¶
# Validate that external engines can reach the catalog
from pyiceberg.catalog import load_catalog
catalog = load_catalog("onelake", type="rest",
uri="https://<workspace>.onelake.fabric.microsoft.com/iceberg",
credential="<oauth_token>",
warehouse="<lakehouse_id>"
)
# List namespaces
namespaces = catalog.list_namespaces()
print(f"Available namespaces: {namespaces}")
# List tables in each namespace
for ns in namespaces:
tables = catalog.list_tables(ns)
print(f" {ns}: {len(tables)} tables")
for table_id in tables[:5]:
table = catalog.load_table(table_id)
print(f" - {table_id}: {len(table.schema().fields)} columns, "
f"{len(table.history())} snapshots")
📊 Performance Considerations¶
Iceberg vs Delta Query Performance¶
When querying the same data through Delta (native) vs Iceberg (virtualized), expect the following characteristics:
| Metric | Native Delta (Fabric) | Virtualized Iceberg (External) | Notes |
|---|---|---|---|
| Metadata Overhead | None (native format) | 200-500ms initial, cached thereafter | One-time per session or log change |
| File Pruning | V-Order optimized | Standard Iceberg pruning | Delta V-Order benefits not available via Iceberg |
| Predicate Pushdown | Full Delta statistics | Iceberg column-level min/max stats | Both support effective pushdown |
| Partition Pruning | Delta partition filters | Iceberg partition spec filters | Equivalent effectiveness |
| Scan Throughput | Identical | Identical | Same physical Parquet files |
| Concurrent Reads | Unlimited | Unlimited | No locking on read path |
Metadata Overhead Analysis¶
flowchart LR
subgraph First["First Request (~500ms)"]
R1["External Engine<br/>Request"] --> C1["Catalog Lookup<br/>~50ms"]
C1 --> T1["Delta Log Read<br/>~100ms"]
T1 --> G1["Metadata Gen<br/>~200ms"]
G1 --> CA1["Cache Store<br/>~50ms"]
CA1 --> RS1["Response<br/>~100ms"]
end
subgraph Cached["Cached Request (~50ms)"]
R2["External Engine<br/>Request"] --> C2["Catalog Lookup<br/>~25ms"]
C2 --> CA2["Cache Hit<br/>~5ms"]
CA2 --> RS2["Response<br/>~20ms"]
end
style First fill:#E67E22,stroke:#CA6F1E,color:#fff
style Cached fill:#27AE60,stroke:#1E8449,color:#fff Caching Strategies¶
| Strategy | Description | Best For |
|---|---|---|
| Automatic (Default) | OneLake caches Iceberg metadata until Delta log changes | Most workloads |
| Eager Refresh | Pre-generate Iceberg metadata on Delta commit | High-frequency external reads |
| Snapshot Pinning | Pin external reads to a specific snapshot version | Reproducible analytics |
| TTL Override | Set max metadata age for time-sensitive workloads | Near-real-time external access |
When to Use Native Delta vs Iceberg Virtualization¶
| Scenario | Recommendation | Rationale |
|---|---|---|
| Fabric notebooks and pipelines | Native Delta | Full V-Order, Z-Order, liquid clustering benefits |
| Power BI Direct Lake | Native Delta | Required -- Direct Lake only supports Delta |
| Snowflake analyst queries | Iceberg Virtualization | Only way to read from Snowflake without data copy |
| Spark clusters outside Fabric | Iceberg Virtualization | Standard Iceberg protocol works universally |
| ML training on Databricks | Iceberg Virtualization | Read training data without export |
| Data quality with Great Expectations | Native Delta | Runs inside Fabric notebooks |
| Regulatory data submission | Iceberg Virtualization | Regulators use their own tools |
💡 Tip: Use native Delta for all write paths and internal Fabric workloads. Reserve Iceberg virtualization exclusively for cross-engine read scenarios where external tools cannot read Delta natively.
⚠️ Limitations¶
Format Compatibility Gaps¶
| Area | Limitation | Impact | Workaround |
|---|---|---|---|
| Deletion Vectors | Iceberg readers cannot process Delta deletion vectors | Reads may include logically deleted rows until compaction | Run OPTIMIZE with VACUUM to compact before exposing tables |
| Column Mapping | Delta column mapping modes (name, id) may not translate cleanly | Column renames in Delta may not reflect in Iceberg metadata | Avoid column renames on tables exposed via Iceberg; recreate if needed |
| V-Order | Iceberg readers do not benefit from V-Order optimization | No performance degradation, but no V-Order speedup either | V-Order still benefits native Fabric/Power BI reads |
| Liquid Clustering | Iceberg metadata does not expose Delta liquid clustering info | External engines cannot leverage clustering for query optimization | External engines rely on Iceberg partition pruning instead |
| Change Data Feed | Delta CDF is not exposed through Iceberg | External engines cannot read incremental changes | Use timestamp-based filtering or Iceberg snapshots for incremental reads |
Schema Evolution Differences¶
| Operation | Delta Behavior | Iceberg Behavior | Compatibility |
|---|---|---|---|
| Add Column | Supported | Supported | ✅ Compatible |
| Drop Column | Supported | Supported | ✅ Compatible |
| Rename Column | Supported (with column mapping) | Supported | ⚠️ May require metadata refresh |
| Widen Type (e.g., INT → LONG) | Supported | Supported | ✅ Compatible |
| Narrow Type (e.g., LONG → INT) | Not supported | Not supported | ✅ Both reject |
| Reorder Columns | Supported | Supported | ✅ Compatible |
| Nested Schema Changes | Supported | Supported | ⚠️ Complex nesting may require testing |
Transaction Semantics Differences¶
| Aspect | Delta Lake | Apache Iceberg | Impact |
|---|---|---|---|
| ACID Transactions | Full (serializable) | Full (serializable) | No impact for reads |
| Concurrent Writers | Optimistic concurrency via Delta log | Optimistic concurrency via metadata commits | Only one engine should write to a given table |
| Time Travel | VERSION AS OF / TIMESTAMP AS OF | Snapshot-based history | Iceberg readers see snapshots, not Delta versions |
| MERGE / UPSERT | Native MERGE INTO | Copy-on-write or merge-on-read | Different physical implementations |
| Compaction | OPTIMIZE command | Rewrite manifests / data files | Must compact in the native format |
⚠️ Warning: Never configure two engines to write concurrently to the same table. Delta and Iceberg use different transaction logs, and concurrent writes will corrupt data. Designate a single authoritative writer per table and use the other engine(s) for reads only.
Feature Support Matrix¶
| Feature | Delta (Fabric) | Iceberg (Virtualized) | Iceberg (Native/Snowflake) |
|---|---|---|---|
| Read | ✅ | ✅ | ✅ |
| Write | ✅ | ❌ (read-only view) | ✅ (via external volume) |
| Schema Evolution | ✅ Full | ✅ Read-only | ✅ Full |
| Partition Pruning | ✅ | ✅ | ✅ |
| Time Travel | ✅ | ✅ (via snapshots) | ✅ |
| Predicate Pushdown | ✅ | ✅ | ✅ |
| Direct Lake (Power BI) | ✅ | ❌ | ❌ |
| Streaming Ingestion | ✅ | ❌ | ❌ |
📚 References¶
| Resource | URL |
|---|---|
| OneLake Iceberg Support Announcement | https://learn.microsoft.com/fabric/onelake/onelake-iceberg |
| OneLake Table APIs | https://learn.microsoft.com/fabric/onelake/onelake-table-apis |
| Iceberg REST Catalog Spec | https://iceberg.apache.org/spec/#iceberg-rest-catalog |
| Apache Iceberg Format Specification | https://iceberg.apache.org/spec/ |
| Snowflake Iceberg Tables | https://docs.snowflake.com/en/user-guide/tables-iceberg |
| Snowflake External Volumes (Azure) | https://docs.snowflake.com/en/sql-reference/sql/create-external-volume |
| Delta Lake to Iceberg Interop Blog | https://blog.fabric.microsoft.com/blog/delta-iceberg-interoperability |
| PyIceberg Documentation | https://py.iceberg.apache.org/ |
| OneLake Shortcuts | https://learn.microsoft.com/fabric/onelake/onelake-shortcuts |
| Fabric Lakehouse Architecture | https://learn.microsoft.com/fabric/data-engineering/lakehouse-overview |
🔗 Related Documents¶
- Real-Time Intelligence -- RTI streaming data accessible via Iceberg
- Fabric IQ -- Natural language queries across Delta and Iceberg tables
- Data Mesh Enterprise Patterns -- Cross-domain data sharing strategies
- Mirroring -- Database mirroring as alternative to Iceberg for cross-platform data access
- Data Sharing & Federation -- Shortcut patterns and external federation
- Direct Lake -- Direct Lake semantic models over Iceberg/Delta tables
- AI Copilot Configuration -- Copilot for multi-format data exploration
- Architecture -- System architecture overview
📝 Document Metadata - Author: Documentation Team - Reviewers: Data Engineering, Platform Team, Security - Classification: Internal - Next Review: 2026-06-13