Home > Docs > Features > OneLake Shortcuts: S3, GCS, Dataverse

🔗 OneLake Shortcuts - Multi-Cloud Data Federation¶

Access S3, GCS, Dataverse, and ADLS Data Without Copying

Last Updated: 2026-04-27 | Version: 1.0.0

Table of Contents¶

Overview
Architecture
Shortcut Types
Multi-Cloud Federation Patterns
Authentication per Source
Refresh Behavior and Caching
Cost Implications
Security
Performance
Decision Tree: Shortcut vs Copy
REST API for Shortcuts
Casino Implementation
Federal Agency Implementation
Limitations
References

Overview¶

OneLake shortcuts are symbolic links that make data stored in external locations appear as if it lives natively in a Fabric Lakehouse. They enable data federation across cloud providers and storage systems without physically copying data, reducing storage costs and eliminating ETL latency for read-heavy scenarios.

A shortcut is a metadata pointer -- when a Spark notebook, SQL query, or Power BI report reads from a shortcut path, OneLake transparently routes the request to the underlying storage, handles authentication, and returns the data as if it were local.

Supported Sources¶

Source Type	GA Status	Authentication	Use Case
OneLake (same tenant)	GA	Workspace Identity	Cross-workspace federation
ADLS Gen2	GA	Service Principal, Key, SAS	Azure data lake federation
Amazon S3	GA	IAM Role, Access Key	Multi-cloud federation
Google Cloud Storage	GA	Service Account Key	Multi-cloud federation
Dataverse	GA	Entra ID (automatic)	Dynamics 365 / Power Platform data
On-premises (gateway)	Preview	Data Gateway	Hybrid cloud scenarios

Architecture¶

graph TB
    subgraph "OneLake Lakehouse"
        LH[lh_bronze]
        LH --> T1[Tables/slot_telemetry - Native Delta]
        LH --> S1["Tables/s3_iot_data ↗ Shortcut to S3"]
        LH --> S2["Tables/gcs_weather ↗ Shortcut to GCS"]
        LH --> S3["Tables/adls_reference ↗ Shortcut to ADLS"]
        LH --> S4["Tables/dynamics_customers ↗ Shortcut to Dataverse"]
    end

    subgraph "External Sources"
        AWS[Amazon S3 Bucket]
        GCP[Google Cloud Storage]
        ADLS[ADLS Gen2]
        DV[Dataverse / Dynamics 365]
    end

    S1 -.->|IAM Role| AWS
    S2 -.->|Service Account| GCP
    S3 -.->|Service Principal| ADLS
    S4 -.->|Entra ID| DV

    subgraph "Consumers (Transparent Access)"
        NB[Spark Notebooks]
        SQL[SQL Endpoint]
        PBI[Power BI / Direct Lake]
    end

    T1 --> NB
    S1 --> NB
    S2 --> SQL
    S3 --> PBI
    S4 --> NB

How Shortcuts Work¶

Create a shortcut via UI or REST API, pointing to an external path
Metadata registration -- OneLake records the mapping (no data movement)
Read access -- queries against the shortcut path are transparently routed to the source
Authentication -- credentials (IAM role, SAS, service principal) are resolved at read time
Caching -- OneLake caches metadata (file listing) and optionally data blocks for performance

Shortcut Types¶

OneLake (Cross-Workspace)¶

Source: Another Lakehouse or Warehouse within the same Fabric tenant
Auth: Workspace Identity (automatic)
Format: Delta, Parquet, CSV
Use case: Shared reference data, cross-domain federation

ADLS Gen2¶

Source: Azure Data Lake Storage Gen2 (any subscription/tenant)
Auth: Service Principal, Account Key, SAS Token
Format: Delta, Parquet, CSV, JSON
Use case: Existing Azure data lakes, partner data sharing

Amazon S3¶

Source: S3 bucket in any AWS account
Auth: IAM Role (cross-account), Access Key + Secret
Format: Delta, Parquet, CSV, JSON
Use case: Multi-cloud analytics, AWS data federation

Google Cloud Storage¶

Source: GCS bucket in any GCP project
Auth: Service Account Key (JSON)
Format: Delta, Parquet, CSV, JSON
Use case: Multi-cloud analytics, GCP data federation

Dataverse¶

Source: Microsoft Dataverse environment (Dynamics 365, Power Platform)
Auth: Entra ID (automatic via org identity)
Format: Dataverse tables (auto-converted to Delta)
Use case: Dynamics 365 analytics, CRM data in Fabric

Multi-Cloud Federation Patterns¶

Pattern 1: Multi-Cloud Data Lake¶

graph LR
    subgraph "AWS"
        S3[S3: IoT Sensor Data]
    end

    subgraph "GCP"
        GCS[GCS: Weather Data]
    end

    subgraph "Azure"
        ADLS[ADLS: Reference Data]
    end

    subgraph "Fabric OneLake"
        LH[Unified Lakehouse]
        LH --> JOIN[Spark: Join All Sources]
        JOIN --> GOLD[Gold: Unified Analytics]
    end

    S3 -.->|Shortcut| LH
    GCS -.->|Shortcut| LH
    ADLS -.->|Shortcut| LH

Pattern 2: Dataverse + External Enrichment¶

# Notebook: Join Dynamics 365 customer data with external analytics

# Dataverse shortcut provides customer records
customers = spark.read.format("delta").load(
    "abfss://bronze@onelake.dfs.fabric.microsoft.com/lh_bronze.Lakehouse/Tables/dynamics_customers"
)

# S3 shortcut provides external behavioral data
behavior = spark.read.format("delta").load(
    "abfss://bronze@onelake.dfs.fabric.microsoft.com/lh_bronze.Lakehouse/Tables/s3_customer_behavior"
)

# Join without any data copying
enriched = customers.join(behavior, "customer_id", "left")
enriched.write.format("delta").mode("overwrite").save(
    "abfss://silver@onelake.dfs.fabric.microsoft.com/lh_silver.Lakehouse/Tables/enriched_customers"
)

Pattern 3: Hub-and-Spoke Data Mesh¶

graph TB
    subgraph "Central Platform (Hub)"
        REF[Reference Data Lakehouse]
        GOV[Governance & Catalog]
    end

    subgraph "Casino Domain (Spoke)"
        CAS_LH[Casino Lakehouse]
        CAS_LH --> REF_SHORT["zones ↗ Shortcut to Hub"]
    end

    subgraph "Federal Domain (Spoke)"
        FED_LH[Federal Lakehouse]
        FED_LH --> REF_SHORT2["state_codes ↗ Shortcut to Hub"]
    end

    REF -.-> REF_SHORT
    REF -.-> REF_SHORT2

Authentication per Source¶

S3: IAM Role (Recommended)¶

{
    "shortcutType": "AmazonS3",
    "source": {
        "location": "https://my-bucket.s3.us-east-1.amazonaws.com",
        "subpath": "/iot-data/2026/",
        "connection": {
            "connectionId": "s3-cross-account-connection"
        }
    }
}

IAM Trust Policy (in AWS):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::FABRIC_AWS_ACCOUNT:role/OneLakeAccess"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "your-fabric-tenant-id"
                }
            }
        }
    ]
}

GCS: Service Account¶

{
    "shortcutType": "GoogleCloudStorage",
    "source": {
        "location": "https://storage.googleapis.com/my-bucket",
        "subpath": "/weather-data/",
        "connection": {
            "connectionId": "gcs-weather-connection"
        }
    }
}

ADLS Gen2: Service Principal¶

{
    "shortcutType": "AdlsGen2",
    "source": {
        "location": "https://mystorageaccount.dfs.core.windows.net",
        "subpath": "/container/reference-data/",
        "connection": {
            "connectionId": "adls-reference-connection"
        }
    }
}

Dataverse: Automatic¶

{
    "shortcutType": "Dataverse",
    "source": {
        "environmentDomain": "org12345.crm.dynamics.com",
        "tableName": "account",
        "deltaTimeTravel": true
    }
}

Refresh Behavior and Caching¶

Metadata vs Data¶

Aspect	Behavior
File listing (metadata)	Cached for ~1 hour, refreshable
File content	Read-through on every query (no data caching by default)
Delta log	Cached, refreshed on query or manual refresh
Schema	Cached, refreshed when shortcut is updated

Refresh Triggers¶

# Force metadata refresh programmatically
from notebookutils import mssparkutils

mssparkutils.lakehouse.refreshShortcut(
    lakehouse="lh_bronze",
    shortcut_name="s3_iot_data"
)

Staleness Considerations¶

Source	Typical Freshness	Notes
OneLake shortcut	Near real-time	Same platform, no cross-network
ADLS Gen2	~1-5 minutes	Delta log polling
S3	~5-15 minutes	Cross-cloud metadata sync
GCS	~5-15 minutes	Cross-cloud metadata sync
Dataverse	~15-60 minutes	Dataverse sync cadence

Cost Implications¶

Cross-Cloud Egress¶

Scenario	Egress Cost	Mitigation
S3 → Fabric (Azure)	AWS egress: ~$0.09/GB	Place Fabric in same region as S3
GCS → Fabric (Azure)	GCP egress: ~$0.12/GB	Consider data copy for heavy reads
ADLS → Fabric (same region)	Free	Best-case scenario
ADLS → Fabric (cross-region)	~$0.02/GB	Co-locate resources
OneLake → OneLake	Free	Always free within tenant

Cost Decision Framework¶

flowchart TD
    A[External data in S3/GCS?] --> B{Read frequency?}
    B -->|Daily or less| C{Data size per read?}
    C -->|< 10 GB| D[Use Shortcut ✓]
    C -->|> 10 GB| E{Budget for egress?}
    E -->|Yes| D
    E -->|No| F[Copy to OneLake ✓]
    B -->|Hourly+| G{Data changes frequently?}
    G -->|Yes| D
    G -->|No| F

Monthly Cost Estimator¶

def estimate_shortcut_cost(
    read_gb_per_day: float,
    source: str,
    days: int = 30
) -> dict:
    """Estimate monthly cost of shortcut vs copy."""
    egress_rates = {
        "s3": 0.09,      # USD per GB
        "gcs": 0.12,
        "adls_cross_region": 0.02,
        "adls_same_region": 0.0,
        "onelake": 0.0,
    }

    storage_rate = 0.023  # OneLake per GB/month (if copied)

    total_gb = read_gb_per_day * days
    egress_cost = total_gb * egress_rates.get(source, 0)
    copy_storage_cost = (read_gb_per_day * 1.5) * storage_rate  # Assume 1.5x for Delta overhead

    return {
        "shortcut_monthly_egress": round(egress_cost, 2),
        "copy_monthly_storage": round(copy_storage_cost, 2),
        "recommendation": "shortcut" if egress_cost < copy_storage_cost else "copy",
        "total_gb_read": total_gb
    }

# Example: 5 GB/day from S3
print(estimate_shortcut_cost(5, "s3"))
# {'shortcut_monthly_egress': 13.5, 'copy_monthly_storage': 0.17, 'recommendation': 'copy'}

# Example: 0.5 GB/day from S3
print(estimate_shortcut_cost(0.5, "s3"))
# {'shortcut_monthly_egress': 1.35, 'copy_monthly_storage': 0.02, 'recommendation': 'copy'}

# Example: 5 GB/day from ADLS same region
print(estimate_shortcut_cost(5, "adls_same_region"))
# {'shortcut_monthly_egress': 0, 'copy_monthly_storage': 0.17, 'recommendation': 'shortcut'}

Security¶

Row-Level Security Through Shortcuts¶

RLS defined on the Lakehouse or semantic model applies to shortcut data just as it does to native data. The key consideration: ensure the shortcut's authentication identity has sufficient access, while Fabric-level RLS restricts what end users see.

# RLS is transparent to shortcuts
# If a Direct Lake model has RLS on zone_id,
# queries to shortcut tables are filtered the same way

Credential Management¶

Credential Type	Storage	Rotation
S3 Access Key	Fabric Connection	Manual (rotate every 90 days)
S3 IAM Role	AWS IAM	Automatic (STS tokens)
GCS Service Account	Fabric Connection	Manual (rotate key)
ADLS SAS Token	Fabric Connection	Set expiry, auto-renew
ADLS Service Principal	Entra ID	Certificate rotation
Dataverse	Entra ID (automatic)	Managed by platform

Governance¶

flowchart LR
    subgraph "Source (S3)"
        BUCKET[S3 Bucket Policy]
    end

    subgraph "Fabric"
        CONN[Connection Credential]
        LHSEC[Lakehouse RBAC]
        RLS[Row-Level Security]
        PURVIEW[Purview Lineage]
    end

    subgraph "Consumer"
        USER[End User]
    end

    BUCKET --> CONN --> LHSEC --> RLS --> USER
    CONN --> PURVIEW

Performance¶

Query Pushdown¶

Source	Predicate Pushdown	Column Pruning	Partition Pruning
OneLake	Full	Full	Full
ADLS Gen2 (Delta)	Full	Full	Full
ADLS Gen2 (Parquet)	File-level	Full	Directory-based
S3 (Delta)	Full	Full	Full
S3 (Parquet)	File-level	Full	Directory-based
GCS (Delta)	Full	Full	Full
Dataverse	Limited	Full	No

Optimization Tips¶

# 1. Always filter on partition columns first
df = spark.read.format("delta").load(shortcut_path) \
    .filter(F.col("date") == "2026-04-27")  # Partition pruning

# 2. Select only needed columns
df = df.select("machine_id", "bet_amount", "win_amount")  # Column pruning

# 3. For cross-cloud joins, broadcast the smaller table
from pyspark.sql.functions import broadcast
local_lookup = spark.read.format("delta").load(local_path)
result = df.join(broadcast(local_lookup), "lookup_key")

# 4. Cache shortcut data if reading multiple times
shortcut_df = spark.read.format("delta").load(shortcut_path).cache()
analysis_1 = shortcut_df.groupBy("zone").agg(F.sum("amount"))
analysis_2 = shortcut_df.groupBy("player").agg(F.avg("amount"))
shortcut_df.unpersist()

Latency Benchmarks¶

Source	First Read (Cold)	Subsequent Reads	1 GB Scan
OneLake native	~2s	~1s	~5s
OneLake shortcut (same region)	~3s	~1.5s	~6s
ADLS Gen2 shortcut	~4s	~2s	~8s
S3 shortcut	~6s	~3s	~15s
GCS shortcut	~7s	~4s	~18s
Dataverse shortcut	~8s	~5s	~25s

Decision Tree: Shortcut vs Copy¶

flowchart TD
    A[External data source?] --> B{Data ownership?}
    B -->|You own it| C{In Azure same region?}
    C -->|Yes| D[Shortcut ✓ - Zero cost]
    C -->|No| E{Read frequency?}
    E -->|< 1x/day| F[Shortcut ✓ - Low egress]
    E -->|Multiple/day| G{Size per read?}
    G -->|< 1 GB| F
    G -->|> 1 GB| H[Copy ✓ - Amortize egress]
    B -->|Partner/vendor owns it| I{SLA for freshness?}
    I -->|Real-time| J[Shortcut ✓ - Always fresh]
    I -->|Daily OK| K{Egress budget?}
    K -->|Unlimited| J
    K -->|Constrained| L[Copy on schedule ✓]

Factor	Favor Shortcut	Favor Copy
Freshness	Need latest data	Daily/weekly OK
Read frequency	Low (< 5x/day)	High (100x+/day)
Data size	Small (< 1 GB per read)	Large (> 10 GB)
Egress cost	Same cloud/region (free)	Cross-cloud (expensive)
Write access	Read-only is fine	Need to modify data
Governance	Source manages lifecycle	Need full control
Performance	Acceptable latency	Need sub-second

REST API for Shortcuts¶

Create a Shortcut¶

import requests

base_url = "https://api.fabric.microsoft.com/v1"
workspace_id = "your-workspace-id"
lakehouse_id = "your-lakehouse-id"
token = "your-bearer-token"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

# Create an S3 shortcut
payload = {
    "name": "s3_iot_sensor_data",
    "path": "Tables",
    "target": {
        "amazonS3": {
            "location": "https://my-iot-bucket.s3.us-east-1.amazonaws.com",
            "subpath": "/sensor-data/2026/",
            "connectionId": "s3-connection-id"
        }
    }
}

response = requests.post(
    f"{base_url}/workspaces/{workspace_id}/lakehouses/{lakehouse_id}/shortcuts",
    headers=headers,
    json=payload
)
print(response.status_code, response.json())

Create ADLS Gen2 Shortcut¶

payload = {
    "name": "adls_reference_data",
    "path": "Tables",
    "target": {
        "adlsGen2": {
            "location": "https://refdata.dfs.core.windows.net",
            "subpath": "/reference/casino-zones/",
            "connectionId": "adls-connection-id"
        }
    }
}

Create GCS Shortcut¶

payload = {
    "name": "gcs_weather_observations",
    "path": "Tables",
    "target": {
        "googleCloudStorage": {
            "location": "https://storage.googleapis.com/noaa-weather-public",
            "subpath": "/observations/2026/",
            "connectionId": "gcs-connection-id"
        }
    }
}

Create Dataverse Shortcut¶

payload = {
    "name": "dynamics_customers",
    "path": "Tables",
    "target": {
        "dataverse": {
            "environmentDomain": "org12345.crm.dynamics.com",
            "tableName": "account"
        }
    }
}

List All Shortcuts¶

response = requests.get(
    f"{base_url}/workspaces/{workspace_id}/lakehouses/{lakehouse_id}/shortcuts",
    headers=headers
)

for shortcut in response.json()["value"]:
    print(f"  {shortcut['name']} → {shortcut['target']}")

Delete a Shortcut¶

response = requests.delete(
    f"{base_url}/workspaces/{workspace_id}/lakehouses/{lakehouse_id}/shortcuts/{shortcut_name}",
    headers=headers
)

Casino Implementation¶

Casino Shortcut Architecture¶

# Shortcuts for the casino POC:

# 1. Cross-workspace reference data
#    Source: Central reference workspace → Casino workspace
shortcuts = [
    {
        "name": "ref_casino_zones",
        "source": "onelake://ref-workspace/ref-lakehouse/Tables/casino_zones"
    },
    {
        "name": "ref_game_types",
        "source": "onelake://ref-workspace/ref-lakehouse/Tables/game_type_mappings"
    },
]

# 2. Reading shortcut data in notebooks
zones = spark.read.format("delta").load(
    "abfss://bronze@onelake.dfs.fabric.microsoft.com/lh_bronze.Lakehouse/Tables/ref_casino_zones"
)
# Works identically to native tables

Federal Agency Implementation¶

Multi-Source Federal Shortcuts¶

# NOAA weather data from GCS (public dataset)
noaa_shortcut = {
    "name": "noaa_ghcn_daily",
    "target": {
        "googleCloudStorage": {
            "location": "https://storage.googleapis.com/gcp-public-data-noaa",
            "subpath": "/ghcn-d/",
            "connectionId": "gcs-noaa-public"
        }
    }
}

# USDA data from S3 (USDA open data)
usda_shortcut = {
    "name": "usda_crop_data",
    "target": {
        "amazonS3": {
            "location": "https://usda-open-data.s3.amazonaws.com",
            "subpath": "/nass/crop-production/",
            "connectionId": "s3-usda-public"
        }
    }
}

# EPA AQS data from ADLS (Azure Open Datasets)
epa_shortcut = {
    "name": "epa_aqs_daily",
    "target": {
        "adlsGen2": {
            "location": "https://azureopendatastorage.dfs.core.windows.net",
            "subpath": "/epaaqsdaily/",
            "connectionId": "adls-azure-open"
        }
    }
}

Reading Federal Shortcuts in Notebooks¶

# In a federal bronze notebook, shortcut paths are transparent:
noaa_df = spark.read.format("parquet").load(
    "abfss://bronze@onelake.dfs.fabric.microsoft.com/lh_bronze.Lakehouse/Tables/noaa_ghcn_daily"
)

# Filter and process as if data were local
east_coast = noaa_df.filter(
    (F.col("longitude") > -85) & (F.col("longitude") < -65)
)

print(f"East Coast weather observations: {east_coast.count()}")

Limitations¶

Limitation	Details	Workaround
Write-through	Cannot write to shortcuts (read-only)	Write to native tables, use shortcuts for reads
File format	Source must be Delta, Parquet, CSV, or JSON	Convert source data before creating shortcut
Cross-tenant	Cannot shortcut to another Entra tenant's OneLake	Use ADLS shortcut with service principal
Max shortcuts	~1000 shortcuts per Lakehouse	Organize into multiple Lakehouses
Nested shortcuts	Cannot create a shortcut to a shortcut	Shortcut directly to the original source
Delta time travel	Limited to source's Delta log retention	Set appropriate log retention on source
Direct Lake	Shortcuts to non-Delta formats may cause Direct Lake fallback	Ensure source data is Delta format
Latency	Cross-cloud reads add 3-15s latency	Use copy for latency-sensitive workloads