Home > Docs > Features > OneLake Iceberg Interoperability

🧊 OneLake Iceberg Interoperability¶

Seamless Multi-Engine Data Access with Delta-Iceberg Virtualization

Last Updated: 2026-04-13 | Version: 1.0.0

📑 Table of Contents¶

🎯 Overview
🏗️ Architecture
🔄 Delta-as-Iceberg Virtualization
❄️ Snowflake Bidirectional
🔌 OneLake Table APIs
🎰 Casino Use Cases
🏛️ Federal Use Cases
⚙️ Configuration Guide
📊 Performance Considerations
⚠️ Limitations
📚 References

🎯 Overview¶

OneLake now supports Apache Iceberg format alongside Delta Lake as a Generally Available feature (February 2026). Delta tables stored in OneLake can be virtualized as Iceberg without any data movement or duplication, enabling external compute engines to read Fabric-managed data through the industry-standard Iceberg protocol. Snowflake bidirectional interoperability reached GA simultaneously, allowing Snowflake to write Iceberg tables directly into OneLake and Fabric to read Snowflake-managed Iceberg data via shortcuts.

Key Capabilities¶

Capability	Description
Delta-as-Iceberg Virtualization	Any Delta table in OneLake is automatically readable as Iceberg -- no ETL, no data copy
Iceberg REST Catalog	OneLake exposes a standards-compliant Iceberg REST Catalog for external engine discovery
Snowflake Bidirectional	Snowflake writes Iceberg to OneLake; Fabric reads Snowflake Iceberg via shortcuts
OneLake Table APIs (GA)	Programmatic table management via REST -- create, read, update, delete across both formats
Multi-Engine Read	Spark, Trino, Presto, Databricks, DuckDB, and Snowflake all read the same physical data
Zero-Copy Sharing	Share data with partners and agencies without moving or duplicating files

Why Iceberg Interoperability Matters¶

Organizations rarely operate on a single analytics engine. Gaming operators may use Fabric internally but share data with regulators running Snowflake. Federal agencies need to publish datasets that state departments consume via Spark or Trino. Iceberg interoperability eliminates the data copies and ETL pipelines that traditionally bridge these engine boundaries.

flowchart TB
    subgraph OneLake["🔷 OneLake"]
        DT["📊 Delta Lake Tables<br/>Native Fabric Format"]
        VL["🧊 Iceberg Virtualization<br/>Auto-Generated Metadata"]
        IC["🧊 Iceberg Tables<br/>Written by External Engines"]
    end

    subgraph Fabric["🟣 Microsoft Fabric"]
        LH["🏠 Lakehouse"]
        WH["🏢 Warehouse"]
        NB["📓 Notebooks"]
        PBI["📊 Power BI"]
    end

    subgraph External["🟢 External Engines"]
        SF["❄️ Snowflake"]
        SP["⚡ Apache Spark"]
        TR["🔍 Trino / Presto"]
        DB["🔶 Databricks"]
        DK["🦆 DuckDB"]
    end

    DT --> VL
    VL --> External
    IC --> Fabric
    LH --> DT
    WH --> DT
    NB --> DT
    PBI --> DT
    SF -->|"Writes Iceberg"| IC

    style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
    style Fabric fill:#6C3483,stroke:#4A235A,color:#fff
    style External fill:#27AE60,stroke:#1E8449,color:#fff

🏗️ Architecture¶

End-to-End Data Flow¶

The Iceberg interoperability architecture introduces a virtualization layer between OneLake's physical Delta storage and external consumers. This layer generates Iceberg-compatible metadata files on the fly, translating Delta's transaction log into Iceberg manifest lists, manifest files, and snapshots.

flowchart LR
    subgraph Writers["✏️ Data Writers"]
        FW["Fabric Notebooks<br/>Pipelines, Dataflows"]
        SW["Snowflake<br/>External Volume"]
        OW["Other Engines<br/>via Iceberg REST"]
    end

    subgraph OneLake["🔷 OneLake Storage"]
        DP["Delta Protocol<br/>Parquet + _delta_log"]
        IP["Iceberg Protocol<br/>Parquet + metadata/"]
        VZ["Virtualization Engine<br/>Delta → Iceberg Translation"]
    end

    subgraph APIs["🔌 Access Layer"]
        RC["Iceberg REST Catalog<br/>(GA)"]
        DL["Delta Lake APIs<br/>(GA)"]
        AB["ABFS / OneLake APIs<br/>(GA)"]
    end

    subgraph Readers["👁️ Data Readers"]
        FR["Fabric (Native Delta)"]
        SR["Snowflake (Iceberg)"]
        ER["Spark / Trino / Presto<br/>(Iceberg)"]
    end

    FW --> DP
    SW --> IP
    OW --> IP
    DP --> VZ --> RC
    DP --> DL
    IP --> AB
    RC --> SR & ER
    DL --> FR
    AB --> FR

    style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
    style APIs fill:#E67E22,stroke:#CA6F1E,color:#fff
    style Readers fill:#27AE60,stroke:#1E8449,color:#fff

Component Responsibilities¶

Component	Role	Protocol
OneLake Storage	Single-copy physical Parquet files for both Delta and Iceberg	ABFS, REST
Virtualization Engine	Translates Delta transaction log to Iceberg metadata on read	Internal
Iceberg REST Catalog	Exposes OneLake tables as Iceberg catalog entries per the REST spec	HTTPS / REST
Delta Lake APIs	Provides Delta-native access for Fabric workloads	HTTPS / REST
OneLake Table APIs	Unified CRUD operations for table management across formats	HTTPS / REST
Shortcuts	Zero-copy pointers to Iceberg tables written by external engines	ABFS

Metadata Architecture¶

When a Delta table is accessed via the Iceberg protocol, OneLake dynamically generates the required Iceberg metadata hierarchy:

onelake://<workspace>/<lakehouse>/Tables/<table_name>/
├── _delta_log/                    # Delta transaction log (source of truth)
│   ├── 00000000000000000000.json
│   ├── 00000000000000000001.json
│   └── ...
├── _iceberg_metadata/             # Auto-generated Iceberg metadata
│   ├── v1.metadata.json           # Table metadata (schema, partitioning, properties)
│   ├── snap-<id>.avro             # Snapshot manifest list
│   └── <manifest-id>.avro         # Manifest file (file-level stats)
└── part-00000-*.parquet           # Shared physical data files (no duplication)

📝 Note: The _iceberg_metadata/ directory is generated and maintained automatically by OneLake. You do not need to create or manage these files. They are refreshed each time the underlying Delta table is modified.

🔄 Delta-as-Iceberg Virtualization¶

How It Works¶

Delta-as-Iceberg virtualization is the cornerstone of OneLake's interoperability strategy. When an external engine requests a table through the Iceberg REST Catalog, OneLake performs the following steps:

Catalog Lookup -- The REST Catalog maps the requested table namespace and name to a OneLake path
Delta Log Read -- OneLake reads the Delta transaction log (_delta_log/) to determine the current table state
Metadata Translation -- The Delta schema, partitioning, and file statistics are translated into Iceberg metadata format
Metadata Caching -- Generated Iceberg metadata is cached and invalidated when the Delta log changes
Data File Reference -- Iceberg metadata points to the same physical Parquet files that Delta uses -- no data copy occurs
Response -- The external engine receives standard Iceberg metadata and reads Parquet files directly

Enabling Virtualization¶

Delta-as-Iceberg is enabled by default for all Lakehouse tables in OneLake. No explicit configuration is required. To verify and manage the feature:

Lakehouse Settings → OneLake → Table Format Interoperability
  ├── Delta-as-Iceberg Virtualization → Enabled (default)
  ├── Iceberg REST Catalog Endpoint → https://<workspace>.onelake.fabric.microsoft.com/iceberg
  └── Catalog Authentication → Microsoft Entra ID (OAuth 2.0)

Schema Compatibility¶

Delta and Iceberg support overlapping but not identical type systems. OneLake handles the mapping automatically:

Delta Type	Iceberg Type	Notes
`STRING`	`string`	Direct mapping
`INT` / `LONG`	`int` / `long`	Direct mapping
`FLOAT` / `DOUBLE`	`float` / `double`	Direct mapping
`BOOLEAN`	`boolean`	Direct mapping
`DATE`	`date`	Direct mapping
`TIMESTAMP`	`timestamptz`	Converted to UTC-normalized timestamp
`TIMESTAMP_NTZ`	`timestamp`	Direct mapping (no timezone)
`DECIMAL(p,s)`	`decimal(p,s)`	Direct mapping
`BINARY`	`binary`	Direct mapping
`ARRAY<T>`	`list<T>`	Element type recursively mapped
`MAP<K,V>`	`map<K,V>`	Key/value types recursively mapped
`STRUCT<...>`	`struct<...>`	Field types recursively mapped

⚠️ Warning: Delta's VOID type and certain complex nested types with column mapping modes may not translate cleanly to Iceberg. Test schema compatibility before exposing complex tables to external consumers.

Performance Characteristics¶

Aspect	Detail
First Read Latency	200-500ms overhead for initial metadata generation
Subsequent Reads	Near-zero overhead -- metadata is cached until Delta log changes
Cache Invalidation	Automatic when new commits appear in `_delta_log/`
Data Scan Performance	Identical to native Iceberg -- same Parquet files, same column statistics
Concurrent Readers	No limit -- metadata is read-only for external engines
Write Path	External engines cannot write through the virtualized Iceberg view (use Iceberg REST Catalog for writes)

❄️ Snowflake Bidirectional¶

Snowflake and Microsoft Fabric now support bidirectional data sharing through Iceberg tables stored in OneLake. This eliminates the traditional approach of copying data between platforms via staged files or ETL pipelines.

How Bidirectional Interop Works¶

flowchart LR
    subgraph Snowflake["❄️ Snowflake"]
        SE["Snowflake Engine"]
        ST["Snowflake Tables"]
        EV["External Volume<br/>(OneLake)"]
    end

    subgraph OneLake["🔷 OneLake"]
        IT["Iceberg Tables<br/>(Written by Snowflake)"]
        DT["Delta Tables<br/>(Written by Fabric)"]
        VZ["Virtualization<br/>Delta → Iceberg"]
    end

    subgraph Fabric["🟣 Fabric"]
        SC["Shortcut to<br/>Iceberg Table"]
        LH["Lakehouse"]
        PBI["Power BI"]
    end

    SE -->|"CREATE ICEBERG TABLE"| EV
    EV -->|"Writes Parquet + Metadata"| IT
    IT -->|"Shortcut"| SC --> LH --> PBI

    DT --> VZ
    VZ -->|"Iceberg REST Catalog"| SE

    style Snowflake fill:#29B5E8,stroke:#1A8FBF,color:#fff
    style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
    style Fabric fill:#6C3483,stroke:#4A235A,color:#fff

Direction 1: Snowflake Writes to OneLake¶

Snowflake can write Iceberg tables directly to OneLake via external volumes. Fabric then reads these tables through shortcuts.

Step 1: Create an External Volume in Snowflake

-- In Snowflake, create an external volume pointing to OneLake
CREATE OR REPLACE EXTERNAL VOLUME onelake_volume
  STORAGE_LOCATIONS = (
    (
      NAME = 'fabric_onelake'
      STORAGE_BASE_URL = 'azure://onelake.dfs.fabric.microsoft.com/<workspace_id>/<lakehouse_id>/Tables/'
      AZURE_TENANT_ID = '<tenant_id>'
    )
  );

-- Grant Snowflake's service principal access to OneLake
-- (Requires Microsoft Entra ID app registration and RBAC assignment)

Step 2: Create an Iceberg Table in Snowflake

-- Create a managed Iceberg table that writes to OneLake
CREATE OR REPLACE ICEBERG TABLE gaming_analytics.public.player_summary
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'onelake_volume'
  BASE_LOCATION = 'player_summary/'
AS
SELECT
    player_id,
    SUM(total_wagered) AS lifetime_wagered,
    SUM(total_won) AS lifetime_won,
    COUNT(DISTINCT session_date) AS total_visits,
    MAX(session_date) AS last_visit
FROM gaming_analytics.public.player_sessions
GROUP BY player_id;

Step 3: Create a Shortcut in Fabric

Lakehouse → Tables → New Shortcut → OneLake
  Source: <workspace>/<lakehouse>/Tables/player_summary
  Name: snowflake_player_summary

💡 Tip: The shortcut is a zero-copy pointer. When Snowflake updates the Iceberg table, Fabric sees the changes immediately without any sync process.

Direction 2: Fabric Tables Readable by Snowflake¶

Snowflake reads Fabric Delta tables through the Iceberg REST Catalog exposed by OneLake.

-- In Snowflake, create a catalog integration for OneLake
CREATE OR REPLACE CATALOG INTEGRATION onelake_catalog
  CATALOG_SOURCE = REST
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = '<workspace_name>.<lakehouse_name>'
  REST_CONFIG = (
    CATALOG_URI = 'https://<workspace>.onelake.fabric.microsoft.com/iceberg'
    WAREHOUSE = '<lakehouse_id>'
  )
  REST_AUTHENTICATION = (
    TYPE = OAUTH
    OAUTH_CLIENT_ID = '<client_id>'
    OAUTH_CLIENT_SECRET = '<client_secret>'
    OAUTH_TOKEN_URI = 'https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token'
  );

-- Query a Fabric Delta table as if it were native Snowflake
SELECT *
FROM onelake_catalog.lh_gold.gold_slot_performance
WHERE gaming_date >= DATEADD(DAY, -7, CURRENT_DATE())
ORDER BY hold_pct ASC
LIMIT 100;

Snowflake Item in OneLake¶

The new Snowflake item in OneLake provides a managed connection to a Snowflake account, simplifying bidirectional setup:

Workspace → + New → Snowflake (Preview)
  ├── Snowflake Account URL: <account>.snowflakecomputing.com
  ├── Authentication: OAuth (Microsoft Entra ID) or Key Pair
  ├── Default Warehouse: COMPUTE_WH
  ├── Default Database: GAMING_ANALYTICS
  └── Sync Mode: Bidirectional Iceberg

Authentication Setup¶

Auth Method	Direction	Setup
OAuth (Entra ID)	Both	Register app in Microsoft Entra ID, configure Snowflake security integration
Service Principal	Snowflake → OneLake	Create SPN with Storage Blob Data Contributor on OneLake
Key Pair	Fabric → Snowflake	Generate RSA key pair, assign public key to Snowflake user
Managed Identity	Fabric → OneLake	Automatic for Fabric workloads (no configuration needed)

📝 Note: For federal workloads, ensure the Snowflake account is deployed in a FedRAMP-authorized region and that OAuth tokens are scoped to the minimum required permissions.

🔌 OneLake Table APIs¶

The OneLake Table APIs (GA) provide a unified programmatic interface for managing tables across both Delta and Iceberg formats. These APIs follow the Iceberg REST Catalog specification and extend it with Delta-specific operations.

API Endpoints¶

Operation	Method	Endpoint	Description
List Namespaces	`GET`	`/iceberg/v1/namespaces`	List all lakehouses in the workspace
List Tables	`GET`	`/iceberg/v1/namespaces/{ns}/tables`	List tables in a lakehouse
Get Table Metadata	`GET`	`/iceberg/v1/namespaces/{ns}/tables/{table}`	Retrieve full table metadata
Create Table	`POST`	`/iceberg/v1/namespaces/{ns}/tables`	Create a new table
Drop Table	`DELETE`	`/iceberg/v1/namespaces/{ns}/tables/{table}`	Delete a table
Load Table	`POST`	`/iceberg/v1/namespaces/{ns}/tables/{table}`	Load table for reading
Rename Table	`POST`	`/iceberg/v1/tables/rename`	Rename an existing table

Python Examples¶

List Tables in a Lakehouse¶

import requests
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
token = credential.get_token("https://storage.azure.com/.default")

base_url = "https://<workspace>.onelake.fabric.microsoft.com/iceberg/v1"
headers = {
    "Authorization": f"Bearer {token.token}",
    "Content-Type": "application/json"
}

# List all tables in the lakehouse
response = requests.get(
    f"{base_url}/namespaces/lh_gold/tables",
    headers=headers
)

tables = response.json()
for table in tables["identifiers"]:
    print(f"Table: {table['namespace']}.{table['name']}")

Get Table Metadata (Schema, Partitioning, Statistics)¶

# Get full metadata for a specific table
response = requests.get(
    f"{base_url}/namespaces/lh_gold/tables/gold_slot_performance",
    headers=headers
)

metadata = response.json()

# Print schema information
schema = metadata["metadata"]["schemas"][-1]  # Current schema
print(f"Schema ID: {schema['schema-id']}")
for field in schema["fields"]:
    print(f"  {field['name']}: {field['type']} (nullable={field.get('required', True)})")

# Print partition spec
partition_spec = metadata["metadata"].get("partition-specs", [])
for spec in partition_spec:
    for field in spec["fields"]:
        print(f"  Partition: {field['name']} ({field['transform']})")

# Print snapshot summary (row count, file count)
snapshots = metadata["metadata"].get("snapshots", [])
if snapshots:
    latest = snapshots[-1]
    summary = latest.get("summary", {})
    print(f"  Rows: {summary.get('total-records', 'N/A')}")
    print(f"  Files: {summary.get('total-data-files', 'N/A')}")
    print(f"  Size: {summary.get('total-files-size', 'N/A')} bytes")

Connect with PyIceberg¶

from pyiceberg.catalog import load_catalog

# Connect to OneLake via the Iceberg REST Catalog
catalog = load_catalog(
    "onelake",
    **{
        "type": "rest",
        "uri": "https://<workspace>.onelake.fabric.microsoft.com/iceberg",
        "credential": "<oauth_token>",
        "warehouse": "<lakehouse_id>",
    }
)

# List namespaces (lakehouses)
namespaces = catalog.list_namespaces()
print(f"Namespaces: {namespaces}")

# Load a table
table = catalog.load_table("lh_gold.gold_slot_performance")
print(f"Schema: {table.schema()}")
print(f"Snapshots: {len(table.history())}")

# Scan the table with filters
scan = table.scan(
    row_filter="gaming_date >= '2026-04-01'",
    selected_fields=("machine_id", "hold_pct", "coin_in", "gaming_date")
)

# Convert to pandas DataFrame
df = scan.to_pandas()
print(f"Rows: {len(df)}")
print(df.head(10))

Connect with Apache Spark¶

# PySpark configuration for reading OneLake tables via Iceberg
spark = SparkSession.builder \
    .appName("OneLake-Iceberg-Reader") \
    .config("spark.sql.catalog.onelake", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.onelake.type", "rest") \
    .config("spark.sql.catalog.onelake.uri",
            "https://<workspace>.onelake.fabric.microsoft.com/iceberg") \
    .config("spark.sql.catalog.onelake.credential", "<oauth_token>") \
    .config("spark.sql.catalog.onelake.warehouse", "<lakehouse_id>") \
    .getOrCreate()

# Query a Fabric Delta table through the Iceberg catalog
df = spark.sql("""
    SELECT machine_id, gaming_date, hold_pct, coin_in
    FROM onelake.lh_gold.gold_slot_performance
    WHERE gaming_date >= '2026-04-01'
    ORDER BY hold_pct ASC
""")

df.show(10)

🎰 Casino Use Cases¶

Iceberg interoperability unlocks several high-value scenarios for casino and gaming operations where data must cross organizational and platform boundaries.

Casino operators often work with slot manufacturers, game developers, and consulting firms that run their own analytics platforms -- frequently Snowflake or Databricks.

flowchart LR
    subgraph Casino["🎰 Casino (Fabric)"]
        GS["Gold: Slot<br/>Performance"]
        GP["Gold: Player<br/>Analytics"]
    end

    subgraph OneLake["🔷 OneLake"]
        VZ["Iceberg<br/>Virtualization"]
    end

    subgraph Partners["🤝 Partners"]
        MFG["Slot Manufacturer<br/>(Snowflake)"]
        CON["Consulting Firm<br/>(Databricks)"]
        REG["Gaming Regulator<br/>(Trino)"]
    end

    Casino --> VZ
    VZ -->|"Read-Only"| Partners

    style Casino fill:#6C3483,stroke:#4A235A,color:#fff
    style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
    style Partners fill:#27AE60,stroke:#1E8449,color:#fff

Benefits: - Slot manufacturers analyze machine performance data without receiving data extracts - Consulting firms run analytics on the same data Fabric uses -- no stale copies - Gaming regulators query compliance data directly rather than waiting for file submissions

Use Case 2: Regulatory Reporting to Iceberg-Based Systems¶

State gaming commissions increasingly adopt modern data platforms. Iceberg interoperability allows casinos to expose compliance data (CTR, SAR, W-2G) through a standard protocol:

# Gaming regulator queries casino compliance data from their Trino cluster
# Trino configuration: catalog = onelake_casino

# Regulator runs:
trino> SELECT
         filing_type,
         COUNT(*) AS filing_count,
         SUM(transaction_amount) AS total_amount
       FROM onelake_casino.lh_gold.gold_compliance_filings
       WHERE filing_date >= DATE '2026-01-01'
       GROUP BY filing_type
       ORDER BY filing_count DESC;

Use Case 3: Multi-Engine Analytics¶

Large gaming enterprises may use multiple analytics engines for different workloads. Iceberg interoperability ensures all engines read from the same data:

Engine	Workload	Access Method
Fabric (PySpark)	Medallion ETL, data engineering	Native Delta
Power BI	Executive dashboards, Direct Lake	Native Delta
Snowflake	Ad-hoc analyst queries, partner sharing	Iceberg REST Catalog
Databricks	ML model training, feature engineering	Iceberg REST Catalog
DuckDB	Local development, data exploration	Iceberg REST Catalog

💡 Tip: Multi-engine access works best when one engine (Fabric) is the authoritative writer and all others are readers. Avoid concurrent writes from multiple engines to the same table -- use separate tables or namespaces per writer.

🏛️ Federal Use Cases¶

Federal agencies that publish open data or share datasets across departments benefit significantly from Iceberg interoperability, as it eliminates the need to produce and distribute data extracts.

flowchart TB
    subgraph Federal["🏛️ Federal Agencies (Fabric)"]
        USDA["🌾 USDA<br/>Crop Production"]
        EPA["🌊 EPA<br/>Toxic Releases"]
        NOAA["🌀 NOAA<br/>Weather Data"]
        SBA["💼 SBA<br/>Loan Programs"]
        DOI["🏔️ DOI<br/>Land & Geology"]
    end

    subgraph OneLake["🔷 OneLake Iceberg Catalog"]
        CAT["Federated<br/>Catalog"]
    end

    subgraph Consumers["🟢 External Consumers"]
        STATE["State Departments<br/>(Snowflake)"]
        UNIV["Universities<br/>(Spark Clusters)"]
        NGO["Research NGOs<br/>(Trino)"]
        PUB["Public Data Portals<br/>(DuckDB / Parquet)"]
    end

    Federal --> CAT --> Consumers

    style Federal fill:#6C3483,stroke:#4A235A,color:#fff
    style OneLake fill:#2471A3,stroke:#1A5276,color:#fff
    style Consumers fill:#27AE60,stroke:#1E8449,color:#fff

Use Case 2: USDA Crop Data for State Agriculture Departments¶

State agriculture departments using Snowflake can query USDA crop production data directly from OneLake:

-- State Dept of Agriculture (Snowflake) queries USDA data
SELECT
    state,
    commodity,
    year,
    production_value,
    yield_per_acre,
    acres_harvested
FROM onelake_federal.lh_gold.gold_usda_crop_production
WHERE state = 'IOWA'
  AND commodity IN ('CORN', 'SOYBEANS', 'WHEAT')
  AND year >= 2020
ORDER BY year DESC, commodity;

Use Case 3: NOAA Weather Data for Research Systems¶

University research groups running Spark clusters can connect to NOAA weather data without downloading bulk CSV files:

# University research Spark cluster reads NOAA data via Iceberg
storm_events = spark.sql("""
    SELECT event_type, state, year,
           COUNT(*) AS event_count,
           AVG(damage_property) AS avg_property_damage
    FROM onelake_federal.lh_gold.gold_noaa_storm_summary
    WHERE year >= 2015
      AND event_type IN ('Hurricane', 'Tornado', 'Flood')
    GROUP BY event_type, state, year
    ORDER BY year DESC, event_count DESC
""")

storm_events.write.format("parquet").save("/research/storm_analysis_2026/")

Use Case 4: Interagency Analytics Without Data Movement¶

Analysts performing cross-agency research can query datasets from multiple agencies through a single Iceberg catalog without any data leaving OneLake:

-- Cross-agency query: EPA facilities near SBA-funded businesses
-- Executed from any Iceberg-compatible engine
SELECT
    e.facility_name,
    e.chemical_name,
    e.total_releases_lbs,
    s.business_name,
    s.loan_amount,
    s.naics_code
FROM onelake_federal.lh_silver.silver_epa_tri_releases e
JOIN onelake_federal.lh_silver.silver_sba_7a_loans s
  ON e.zip_code = s.zip_code
WHERE e.year = 2025
  AND e.total_releases_lbs > 1000
ORDER BY e.total_releases_lbs DESC
LIMIT 50;

📝 Note: Cross-agency queries through a single Iceberg catalog require all agencies' data to reside within the same OneLake workspace or federated catalog. For data sovereignty requirements, use workspace-level RBAC to control access.

⚙️ Configuration Guide¶

Step 1: Enable Iceberg Virtualization (Workspace Level)¶

Navigate to workspace settings to confirm Iceberg support is active:

Workspace Settings → OneLake → Interoperability
  ├── Delta-as-Iceberg Virtualization → Enabled (default for new workspaces)
  ├── Iceberg REST Catalog → Enabled
  └── External Engine Access → Configure per security requirements

Step 2: Configure the Iceberg REST Catalog Endpoint¶

Retrieve the catalog endpoint for your workspace:

# Discover the Iceberg REST Catalog endpoint
import requests
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
token = credential.get_token("https://storage.azure.com/.default")

# Catalog configuration endpoint
workspace_id = "<workspace_guid>"
catalog_url = f"https://onelake.fabric.microsoft.com/{workspace_id}/iceberg/v1/config"

response = requests.get(
    catalog_url,
    headers={"Authorization": f"Bearer {token.token}"}
)

config = response.json()
print(f"Catalog URI: {config['defaults']['uri']}")
print(f"Warehouse: {config['defaults']['warehouse']}")

Step 3: Configure Snowflake External Volume¶

-- 1. Create storage integration (admin-level operation)
CREATE OR REPLACE STORAGE INTEGRATION onelake_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'AZURE'
  ENABLED = TRUE
  AZURE_TENANT_ID = '<your_tenant_id>'
  STORAGE_ALLOWED_LOCATIONS = (
    'azure://onelake.dfs.fabric.microsoft.com/<workspace_id>/'
  );

-- 2. Retrieve the Snowflake service principal for RBAC
DESC STORAGE INTEGRATION onelake_integration;
-- Note: AZURE_CONSENT_URL and AZURE_MULTI_TENANT_APP_NAME

-- 3. Create external volume
CREATE OR REPLACE EXTERNAL VOLUME onelake_vol
  STORAGE_LOCATIONS = (
    (
      NAME = 'fabric_lakehouse'
      STORAGE_BASE_URL = 'azure://onelake.dfs.fabric.microsoft.com/<workspace_id>/<lakehouse_id>/Tables/'
      AZURE_TENANT_ID = '<tenant_id>'
      STORAGE_INTEGRATION = onelake_integration
    )
  );

Step 4: Grant OneLake RBAC to External Service Principals¶

In the Azure portal or via CLI, assign the Snowflake service principal access to OneLake:

# Assign Storage Blob Data Contributor to the Snowflake SPN
az role assignment create \
  --assignee "<snowflake_service_principal_id>" \
  --role "Storage Blob Data Contributor" \
  --scope "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/onelake/blobServices/default/containers/<workspace_id>"

Step 5: Configure Catalog Federation (Multi-Workspace)¶

For organizations with multiple workspaces (e.g., one per agency), set up catalog federation:

# Federated catalog configuration
federation_config = {
    "catalogs": [
        {
            "name": "casino_ops",
            "uri": "https://ws-casino.onelake.fabric.microsoft.com/iceberg",
            "warehouse": "<casino_lakehouse_id>"
        },
        {
            "name": "federal_usda",
            "uri": "https://ws-federal.onelake.fabric.microsoft.com/iceberg",
            "warehouse": "<usda_lakehouse_id>"
        },
        {
            "name": "federal_epa",
            "uri": "https://ws-federal.onelake.fabric.microsoft.com/iceberg",
            "warehouse": "<epa_lakehouse_id>"
        }
    ]
}

Step 6: Validate Connectivity¶

# Validate that external engines can reach the catalog
from pyiceberg.catalog import load_catalog

catalog = load_catalog("onelake", type="rest",
    uri="https://<workspace>.onelake.fabric.microsoft.com/iceberg",
    credential="<oauth_token>",
    warehouse="<lakehouse_id>"
)

# List namespaces
namespaces = catalog.list_namespaces()
print(f"Available namespaces: {namespaces}")

# List tables in each namespace
for ns in namespaces:
    tables = catalog.list_tables(ns)
    print(f"  {ns}: {len(tables)} tables")
    for table_id in tables[:5]:
        table = catalog.load_table(table_id)
        print(f"    - {table_id}: {len(table.schema().fields)} columns, "
              f"{len(table.history())} snapshots")

📊 Performance Considerations¶

Iceberg vs Delta Query Performance¶

When querying the same data through Delta (native) vs Iceberg (virtualized), expect the following characteristics:

Metric	Native Delta (Fabric)	Virtualized Iceberg (External)	Notes
Metadata Overhead	None (native format)	200-500ms initial, cached thereafter	One-time per session or log change
File Pruning	V-Order optimized	Standard Iceberg pruning	Delta V-Order benefits not available via Iceberg
Predicate Pushdown	Full Delta statistics	Iceberg column-level min/max stats	Both support effective pushdown
Partition Pruning	Delta partition filters	Iceberg partition spec filters	Equivalent effectiveness
Scan Throughput	Identical	Identical	Same physical Parquet files
Concurrent Reads	Unlimited	Unlimited	No locking on read path

Metadata Overhead Analysis¶

flowchart LR
    subgraph First["First Request (~500ms)"]
        R1["External Engine<br/>Request"] --> C1["Catalog Lookup<br/>~50ms"]
        C1 --> T1["Delta Log Read<br/>~100ms"]
        T1 --> G1["Metadata Gen<br/>~200ms"]
        G1 --> CA1["Cache Store<br/>~50ms"]
        CA1 --> RS1["Response<br/>~100ms"]
    end

    subgraph Cached["Cached Request (~50ms)"]
        R2["External Engine<br/>Request"] --> C2["Catalog Lookup<br/>~25ms"]
        C2 --> CA2["Cache Hit<br/>~5ms"]
        CA2 --> RS2["Response<br/>~20ms"]
    end

    style First fill:#E67E22,stroke:#CA6F1E,color:#fff
    style Cached fill:#27AE60,stroke:#1E8449,color:#fff

Caching Strategies¶

Strategy	Description	Best For
Automatic (Default)	OneLake caches Iceberg metadata until Delta log changes	Most workloads
Eager Refresh	Pre-generate Iceberg metadata on Delta commit	High-frequency external reads
Snapshot Pinning	Pin external reads to a specific snapshot version	Reproducible analytics
TTL Override	Set max metadata age for time-sensitive workloads	Near-real-time external access

When to Use Native Delta vs Iceberg Virtualization¶

Scenario	Recommendation	Rationale
Fabric notebooks and pipelines	Native Delta	Full V-Order, Z-Order, liquid clustering benefits
Power BI Direct Lake	Native Delta	Required -- Direct Lake only supports Delta
Snowflake analyst queries	Iceberg Virtualization	Only way to read from Snowflake without data copy
Spark clusters outside Fabric	Iceberg Virtualization	Standard Iceberg protocol works universally
ML training on Databricks	Iceberg Virtualization	Read training data without export
Data quality with Great Expectations	Native Delta	Runs inside Fabric notebooks
Regulatory data submission	Iceberg Virtualization	Regulators use their own tools

💡 Tip: Use native Delta for all write paths and internal Fabric workloads. Reserve Iceberg virtualization exclusively for cross-engine read scenarios where external tools cannot read Delta natively.

⚠️ Limitations¶

Format Compatibility Gaps¶

Area	Limitation	Impact	Workaround
Deletion Vectors	Iceberg readers cannot process Delta deletion vectors	Reads may include logically deleted rows until compaction	Run `OPTIMIZE` with `VACUUM` to compact before exposing tables
Column Mapping	Delta column mapping modes (name, id) may not translate cleanly	Column renames in Delta may not reflect in Iceberg metadata	Avoid column renames on tables exposed via Iceberg; recreate if needed
V-Order	Iceberg readers do not benefit from V-Order optimization	No performance degradation, but no V-Order speedup either	V-Order still benefits native Fabric/Power BI reads
Liquid Clustering	Iceberg metadata does not expose Delta liquid clustering info	External engines cannot leverage clustering for query optimization	External engines rely on Iceberg partition pruning instead
Change Data Feed	Delta CDF is not exposed through Iceberg	External engines cannot read incremental changes	Use timestamp-based filtering or Iceberg snapshots for incremental reads

Schema Evolution Differences¶

Operation	Delta Behavior	Iceberg Behavior	Compatibility
Add Column	Supported	Supported	✅ Compatible
Drop Column	Supported	Supported	✅ Compatible
Rename Column	Supported (with column mapping)	Supported	⚠️ May require metadata refresh
Widen Type (e.g., INT → LONG)	Supported	Supported	✅ Compatible
Narrow Type (e.g., LONG → INT)	Not supported	Not supported	✅ Both reject
Reorder Columns	Supported	Supported	✅ Compatible
Nested Schema Changes	Supported	Supported	⚠️ Complex nesting may require testing

Transaction Semantics Differences¶

Aspect	Delta Lake	Apache Iceberg	Impact
ACID Transactions	Full (serializable)	Full (serializable)	No impact for reads
Concurrent Writers	Optimistic concurrency via Delta log	Optimistic concurrency via metadata commits	Only one engine should write to a given table
Time Travel	`VERSION AS OF` / `TIMESTAMP AS OF`	Snapshot-based history	Iceberg readers see snapshots, not Delta versions
MERGE / UPSERT	Native `MERGE INTO`	Copy-on-write or merge-on-read	Different physical implementations
Compaction	`OPTIMIZE` command	Rewrite manifests / data files	Must compact in the native format

⚠️ Warning: Never configure two engines to write concurrently to the same table. Delta and Iceberg use different transaction logs, and concurrent writes will corrupt data. Designate a single authoritative writer per table and use the other engine(s) for reads only.

Feature Support Matrix¶

Feature	Delta (Fabric)	Iceberg (Virtualized)	Iceberg (Native/Snowflake)
Read	✅	✅	✅
Write	✅	❌ (read-only view)	✅ (via external volume)
Schema Evolution	✅ Full	✅ Read-only	✅ Full
Partition Pruning	✅	✅	✅
Time Travel	✅	✅ (via snapshots)	✅
Predicate Pushdown	✅	✅	✅
Direct Lake (Power BI)	✅	❌	❌
Streaming Ingestion	✅	❌	❌

📚 References¶

Resource	URL
OneLake Iceberg Support Announcement	https://learn.microsoft.com/fabric/onelake/onelake-iceberg
OneLake Table APIs	https://learn.microsoft.com/fabric/onelake/onelake-table-apis
Iceberg REST Catalog Spec	https://iceberg.apache.org/spec/#iceberg-rest-catalog
Apache Iceberg Format Specification	https://iceberg.apache.org/spec/
Snowflake Iceberg Tables	https://docs.snowflake.com/en/user-guide/tables-iceberg
Snowflake External Volumes (Azure)	https://docs.snowflake.com/en/sql-reference/sql/create-external-volume
Delta Lake to Iceberg Interop Blog	https://blog.fabric.microsoft.com/blog/delta-iceberg-interoperability
PyIceberg Documentation	https://py.iceberg.apache.org/
OneLake Shortcuts	https://learn.microsoft.com/fabric/onelake/onelake-shortcuts
Fabric Lakehouse Architecture	https://learn.microsoft.com/fabric/data-engineering/lakehouse-overview

Real-Time Intelligence -- RTI streaming data accessible via Iceberg
Fabric IQ -- Natural language queries across Delta and Iceberg tables
Data Mesh Enterprise Patterns -- Cross-domain data sharing strategies
Mirroring -- Database mirroring as alternative to Iceberg for cross-platform data access
Data Sharing & Federation -- Shortcut patterns and external federation
Direct Lake -- Direct Lake semantic models over Iceberg/Delta tables
AI Copilot Configuration -- Copilot for multi-format data exploration
Architecture -- System architecture overview

📝 Document Metadata - Author: Documentation Team - Reviewers: Data Engineering, Platform Team, Security - Classification: Internal - Next Review: 2026-06-13