Home > Docs > Features > Notebook Resources & Environments

📦 Notebook Resources & Environments - Dependency Management in Fabric¶

Manage Libraries, Resource Files, and Shared Environments Across Workspaces

Last Updated: 2026-04-27 | Version: 1.0.0

Table of Contents¶

Overview
Architecture
Notebook Resource Files
Notebook Chaining with %run
Fabric Environments
Library Management
Environment Pinning and Versioning
Shared Environments Across Workspaces
Casino Implementation
Federal Agency Implementation
Best Practices
Limitations
References

Overview¶

Microsoft Fabric notebooks support two complementary mechanisms for managing dependencies and shared code:

Notebook Resource Files -- data files, configuration files, images, and small Python modules attached directly to a notebook
Fabric Environments -- workspace-level or shared compute configurations that define Python/R libraries, Spark properties, and runtime settings

Together, these enable reproducible, maintainable notebook workflows where dependencies are explicit, versioned, and consistent across team members.

Key Concepts¶

Concept	Scope	Purpose
Resource files	Per-notebook	Attach data, configs, utility scripts to a notebook
%run magic	Cross-notebook	Chain notebooks, share functions and variables
Environment	Workspace or shared	Define libraries, Spark config, runtime version
Inline %pip	Per-session	Install packages ad hoc (not recommended for production)

Architecture¶

graph TB
    subgraph "Fabric Environment"
        ENV[Environment Definition]
        ENV --> PY[PyPI Libraries]
        ENV --> CONDA[Conda Packages]
        ENV --> WHL[Custom Wheels]
        ENV --> SPARK[Spark Properties]
        ENV --> RT[Runtime Version]
    end

    subgraph "Notebook A"
        NBA[Notebook Code]
        RES_A[Resource Files]
        RES_A --> CFG[config.json]
        RES_A --> UTIL[utils.py]
        RES_A --> CSV[lookup.csv]
    end

    subgraph "Notebook B"
        NBB[Notebook Code]
        NBB -->|"%run Notebook_A"| NBA
    end

    ENV --> NBA
    ENV --> NBB

    subgraph "Workspace"
        WS[Workspace Settings]
        WS --> ENV
    end

Notebook Resource Files¶

What Are Resource Files?¶

Resource files are files attached to a notebook that become available on the Spark driver during execution. They support any file type: Python scripts, JSON configs, CSV lookups, images, certificates, and more.

Adding Resource Files¶

Via the Fabric Portal:

Open a notebook
Click the Explorer panel (left sidebar)
Select Resources
Click + Add to upload files
Files appear under builtin/ in the resource tree

Via the Notebook File System:

# Resource files are accessible at a well-known path
import os

# List all resource files
resource_path = "/lakehouse/default/Files/notebook-resources/"
for f in os.listdir(resource_path):
    print(f)

Reading Resource Files¶

# Read a JSON configuration file
import json

with open("builtin/config.json") as f:
    config = json.load(f)

print(f"Environment: {config['env']}")
print(f"CTR Threshold: {config['ctr_threshold']}")

# Read a CSV lookup table
import pandas as pd

zones_df = pd.read_csv("builtin/casino_zones.csv")
# Convert to Spark DataFrame for joins
zones_spark = spark.createDataFrame(zones_df)

# Import a Python utility module
import importlib.util

spec = importlib.util.spec_from_file_location("utils", "builtin/utils.py")
utils = importlib.util.module_from_spec(spec)
spec.loader.exec_module(utils)

# Now use functions from the module
result = utils.hash_pii("123-45-6789", salt=config["hash_salt"])

Resource File Types and Use Cases¶

File Type	Use Case	Size Guidance
`.py`	Shared utility functions	< 1 MB
`.json`	Configuration, schemas	< 1 MB
`.yaml`	Environment config, DAG definitions	< 1 MB
`.csv`	Small lookup/reference tables	< 50 MB
`.parquet`	Compact reference data	< 100 MB
`.pem` / `.cer`	TLS certificates for external connections	< 1 MB
`.png` / `.jpg`	Documentation images, logos	< 10 MB
`.whl`	Custom Python packages	< 100 MB

Size Limits¶

Limit	Value
Single file max	500 MB
Total resources per notebook	1 GB
Recommended max for performance	< 200 MB total

Notebook Chaining with %run¶

Basic %run Usage¶

The %run magic command executes another notebook inline, making all its defined functions, variables, and imports available in the calling notebook.

# In Notebook: 01_bronze_slot_telemetry
# =====================================

# Run the shared utilities notebook first
%run bronze_utils

# Now functions from bronze_utils are available
df = read_source_data(spark, "slot_telemetry", processing_date)
df = add_metadata_columns(df)
validate_bronze(df)
write_bronze(df, "slot_telemetry")

The Shared Utilities Pattern¶

# Notebook: bronze_utils
# =======================
# This notebook is %run'd by all bronze layer notebooks.
# It defines shared functions and constants.

from pyspark.sql import functions as F
from datetime import datetime

# ---- CONSTANTS ----
BRONZE_PATH = "abfss://bronze@onelake.dfs.fabric.microsoft.com/lh_bronze.Lakehouse/Tables"
CHECKPOINT_BASE = "abfss://bronze@onelake.dfs.fabric.microsoft.com/lh_bronze.Lakehouse/Files/_checkpoints"

# ---- SHARED FUNCTIONS ----

def read_source_data(spark, table_name, date=None):
    """Read source data from the landing zone."""
    path = f"{BRONZE_PATH}/_landing/{table_name}"
    df = spark.read.format("parquet").load(path)
    if date:
        df = df.filter(F.col("date") == date)
    return df

def add_metadata_columns(df):
    """Add standard metadata columns to bronze DataFrames."""
    return df.withColumns({
        "_ingestion_timestamp": F.current_timestamp(),
        "_source_file": F.input_file_name(),
        "_record_hash": F.sha2(F.concat_ws("|", *df.columns), 256),
    })

def validate_bronze(df, min_rows=1, max_null_rate=0.05):
    """Run standard bronze validations."""
    total = df.count()
    if total < min_rows:
        raise ValueError(f"Expected >= {min_rows} rows, got {total}")

    for col_name in df.columns:
        if not col_name.startswith("_"):
            null_count = df.filter(F.col(col_name).isNull()).count()
            if null_count / total > max_null_rate:
                print(f"WARNING: {col_name} has {null_count/total:.1%} nulls")

def write_bronze(df, table_name, mode="append"):
    """Write DataFrame to bronze Delta table."""
    path = f"{BRONZE_PATH}/{table_name}"
    df.write.format("delta").mode(mode).save(path)
    print(f"Wrote {df.count()} records to {table_name}")

%run with Parameters¶

# Pass parameters to the called notebook
%run shared_config {"env": "prod", "agency": "USDA"}

# In shared_config notebook, access via:
env = spark.conf.get("spark.notebook.param.env", "dev")
agency = spark.conf.get("spark.notebook.param.agency", "")

%run Dependency Graph¶

graph TD
    UTILS[bronze_utils] --> SLOT[01_bronze_slot_telemetry]
    UTILS --> TABLE[02_bronze_table_games]
    UTILS --> PLAYER[03_bronze_player_tracking]

    SILVER_UTILS[silver_utils] --> S_SLOT[01_silver_slot_cleansed]
    SILVER_UTILS --> S_TABLE[02_silver_table_games_cleansed]

    GOLD_UTILS[gold_utils] --> G_PERF[01_gold_slot_performance]
    GOLD_UTILS --> G_COMP[05_gold_compliance_monitoring]

    CONFIG[shared_config] --> UTILS
    CONFIG --> SILVER_UTILS
    CONFIG --> GOLD_UTILS

%run Best Practices¶

Do	Do Not
Use %run for shared utility functions	Use %run for orchestration (use Pipelines/Airflow instead)
Keep %run targets small and focused	Create chains > 3 levels deep
Document what %run provides	Rely on side effects from %run notebooks
Place %run at the top of the notebook	Mix %run and data processing in the same cell

Fabric Environments¶

What Is a Fabric Environment?¶

A Fabric Environment is a workspace-level item that defines the compute configuration for notebooks and Spark Job Definitions:

Python and R library versions
Custom packages (wheels, tarballs)
Spark configuration properties
Runtime version (Spark 3.4, 3.5, etc.)

Creating an Environment¶

Via the Portal:

Navigate to workspace > + New > Environment
Name it (e.g., casino-poc-env)
Configure libraries, Spark properties, and runtime

Via YAML Definition:

# environment.yml
name: casino-poc-env
description: Casino POC compute environment for all medallion notebooks
runtime: 
  spark_version: "3.5"
  python_version: "3.11"

libraries:
  pypi:
    - great-expectations==0.18.0
    - delta-spark==3.1.0
    - pydantic==2.5.0
    - requests==2.31.0
    - tenacity==8.2.0
    - python-dateutil==2.8.2

  conda:
    - numpy=1.26.0
    - pandas=2.1.0
    - pyarrow=14.0.0

  custom_wheels:
    - path: libs/casino_utils-1.0.0-py3-none-any.whl
    - path: libs/fabric_helpers-0.5.0-py3-none-any.whl

spark_properties:
  spark.sql.adaptive.enabled: "true"
  spark.sql.adaptive.coalescePartitions.enabled: "true"
  spark.sql.shuffle.partitions: "200"
  spark.serializer: "org.apache.spark.serializer.KryoSerializer"
  spark.sql.parquet.compression.codec: "snappy"
  spark.fabric.lakehouse.default: "lh_bronze"

Attaching an Environment to a Notebook¶

# In notebook settings (gear icon):
# Environment: casino-poc-env

# Or programmatically via metadata:
# The notebook JSON includes:
{
    "environment": {
        "environmentId": "env-id-here",
        "workspaceId": "workspace-id"
    }
}

Library Management¶

Installation Methods Comparison¶

Method	Scope	Persistence	Best For
Environment (PyPI)	All notebooks using env	Permanent	Production dependencies
Environment (conda)	All notebooks using env	Permanent	Scientific packages
Environment (wheel)	All notebooks using env	Permanent	Internal packages
%pip install	Current session only	Ephemeral	Quick testing
Resource file (.whl)	Single notebook	Per-notebook	Notebook-specific libs

%pip install (Development Only)¶

# Only use for quick testing -- NOT for production
%pip install great-expectations==0.18.0

# For production, add to the Fabric Environment instead

Custom Wheel Deployment¶

# Build your custom library
cd casino_utils/
python -m build --wheel
# Produces: dist/casino_utils-1.0.0-py3-none-any.whl

# Upload to Fabric Environment:
# 1. Open Environment > Libraries > Custom Libraries
# 2. Upload casino_utils-1.0.0-py3-none-any.whl
# 3. Publish the environment

Dependency Conflict Resolution¶

# If two packages conflict, pin explicitly in environment.yml
libraries:
  pypi:
    - package-a==1.0.0
    - package-b==2.0.0
    # Pin shared dependency to compatible version
    - shared-dep==3.5.2

Environment Pinning and Versioning¶

Version Strategy¶

Strategy	Example	When to Use
Exact pin	`great-expectations==0.18.0`	Production environments
Compatible release	`great-expectations~=0.18.0`	Staging (allow patches)
Range	`great-expectations>=0.17,<0.19`	Development (more flexibility)
Unpinned	`great-expectations`	Never in production

Environment Promotion¶

graph LR
    DEV[Dev Environment] -->|Test| STG[Staging Environment]
    STG -->|Approve| PROD[Production Environment]

    DEV --> |"~= pins"| DEV
    STG --> |"== pins"| STG
    PROD --> |"== pins + hash"| PROD

Locking Dependencies¶

# Generate a lock file from your environment
# (Run in your dev environment)
pip freeze > requirements-lock.txt

# Use the lock file for production environment
# This ensures exact reproducibility

Shared Environments Across Workspaces¶

Create an Environment in a central "Platform" workspace
Share the Environment with target workspaces via RBAC
Notebooks in target workspaces reference the shared Environment

Scenario	Approach
All workspaces need same libraries	Share from central workspace
Different teams need different versions	Duplicate and customize
Dev/staging/prod isolation	Separate environments per workspace
Central governance required	Share from governed workspace

Casino Implementation¶

Casino Environment Configuration¶

# casino-poc-env.yml
name: casino-poc-env
runtime:
  spark_version: "3.5"
  python_version: "3.11"

libraries:
  pypi:
    - great-expectations==0.18.0
    - delta-spark==3.1.0
    - pydantic==2.5.0
    - cryptography==41.0.0  # For PII hashing

spark_properties:
  spark.sql.adaptive.enabled: "true"
  spark.fabric.lakehouse.default: "lh_bronze"
  spark.sql.shuffle.partitions: "200"

Casino %run Hierarchy¶

# All casino bronze notebooks start with:
%run bronze_utils

# bronze_utils provides:
# - read_source_data(spark, table, date)
# - add_metadata_columns(df)
# - validate_bronze(df)
# - write_bronze(df, table)
# - hash_pii(value, salt)  # For SSN, card numbers

# Casino-specific resource files:
# - builtin/compliance_thresholds.json
# - builtin/casino_zones.csv
# - builtin/game_type_mappings.json

Federal Agency Implementation¶

Federal Environment Configuration¶

# federal-poc-env.yml
name: federal-poc-env
runtime:
  spark_version: "3.5"
  python_version: "3.11"

libraries:
  pypi:
    - great-expectations==0.18.0
    - delta-spark==3.1.0
    - requests==2.31.0    # For API calls to federal data sources
    - sodapy==2.2.0       # For Socrata API (open data portals)
    - geopandas==0.14.0   # For DOI geospatial data

spark_properties:
  spark.sql.adaptive.enabled: "true"
  spark.fabric.lakehouse.default: "lh_bronze"

Per-Agency Resource Files¶

# Each federal agency notebook includes agency-specific config:

# USDA notebook resources:
# - builtin/usda_api_config.json
# - builtin/crop_categories.csv
# - builtin/state_fips_codes.csv

# NOAA notebook resources:
# - builtin/noaa_station_list.csv
# - builtin/weather_variable_codes.json

# EPA notebook resources:
# - builtin/aqi_breakpoints.json
# - builtin/pollutant_standards.csv

Best Practices¶

Resource Files¶

Practice	Reason
Keep resources < 50 MB each	Large files slow notebook startup
Use `.json` or `.yaml` for config	Human-readable, Git-friendly
Never store secrets in resource files	Use Key Vault or Variable Libraries
Version config files alongside notebooks	Ensures reproducibility
Prefer Delta tables over CSV resources for large lookups	Better performance at scale

Environments¶

Practice	Reason
Pin all versions in production	Avoid surprise breakages
Test environment changes in dev first	Catch conflicts early
Use separate environments per domain	Avoid dependency bloat
Document environment purpose in description	Team discoverability
Publish environment changes during low-traffic windows	Avoid disrupting running notebooks

%run¶

Practice	Reason
Limit %run depth to 2 levels	Deeper chains are hard to debug
Use %run only for shared functions	Not for orchestration
Test %run targets independently	Ensure they work standalone
Document what each %run notebook provides	Help new team members

Limitations¶

Limitation	Details	Workaround
Resource file count	Max 100 files per notebook	Combine small files into archives
%run cross-workspace	Cannot %run notebooks in other workspaces	Copy shared notebooks or use Environments
Environment publish time	5-15 minutes for library installation	Plan changes in advance
No conda + pip mixing	Some packages only available in one channel	Prefer pip; use conda only when needed
Environment rollback	No built-in version history	Store environment YAML in Git
Session restart required	Library changes need session restart	Restart session after environment publish