Skip to content
CSA Loom — the Microsoft Fabric experience for Azure tenants where Fabric isn't yet available: lakehouses, warehouses, notebooks, semantic models, Activator rules, Data Agents, across Commercial, GCC, GCC-High, and DoD IL5

Loom Dataset Editor — Foundry-parity spec

Captured 2026-05-26 by catalog agent foundry-parity-2026-05-26. Sources: Microsoft Learn — Create and manage data assets, Data concepts in Azure Machine Learning, Working with tables in Azure Machine Learning, How Azure Machine Learning works: resources and assets, CLI (v2) data YAML schema, Share data across workspaces with registries. Cross-checked against existing Loom editor at apps/fiab-console/lib/editors/foundry-sub-editors.tsx::DatasetEditor and the foundry client at apps/fiab-console/lib/azure/foundry-client.ts::listDataAssets/getDataAsset/createDataAsset.

What it is

A Foundry / Azure Machine Learning dataset (officially "data asset" in SDK/CLI v2) is a versioned, named reference to data living in a datastore. The asset is a container; under it sit one or more versions, each pointing at an immutable storage URI. Three types exist:

  • uri_file — single file in storage (CSV, Parquet, image, audio, JSON, etc.). Mapped one-to-one onto the compute filesystem at run time
  • uri_folder — a folder of files. Mapped one-to-one (recursive) onto the compute filesystem. The canonical type for training images, Parquet shards, raw text
  • mltable — a tabular abstraction with a serialized MLTable artifact that captures path globs + transformations (read_delimited, drop_columns, filter, sample). Used for AutoML, parallel jobs, complex/changing schemas, multi-location tabular data

Datasets are first-class citizens for: training inputs, evaluation/grounding inputs, AutoML inputs, prompt-flow inputs, fine-tuning corpora, vector-index source data, agent knowledge sources. A fourth informal type — Hugging Face dataset — surfaces via connections (the asset itself is still uri_folder or mltable pointing at HF storage). Datasets can be registered locally in a hub/project workspace, or shared via Azure ML registries across many workspaces.

UI components

Page chrome

  • Title bar: workspace name (hub or project) + breadcrumb
  • Right-side actions: Refresh, + Create
  • Top tab strip: All assets, Local datasets (this workspace), Registry datasets (shared registry assets), Lineage view, Hugging Face

All assets grid

  • Columns: Name, Latest version, Type (uri_file / uri_folder / mltable), Tags, Description, Modified by, Modified on
  • Filters: Type (all/uri_file/uri_folder/mltable), Tag (key=value), Author, Modified date range
  • Search box (name contains)
  • Per-row actions: Open, New version, Archive, Delete, Copy URI

Create asset wizard

  • Type: radio (File / Folder / Table)
  • Source: tabs
  • From local files — drag-drop, browse, multi-file
  • From a datastore — pick datastore + browse the container tree
  • From web files — URL list (HTTPS public)
  • From an Azure Storage URL — paste wasbs:// / abfss:// / azureml:// / adl:// / https://
  • From OneLake — pick Fabric workspace + lakehouse + table/file
  • From an existing data asset — pick asset + version
  • Storage settings (when type=table): MLTable file source (existing on disk vs auto-generated). For auto-generated, pattern paths + read_delimited config (delimiter, encoding, header rule, na_values, multiline support)
  • Name (must be DNS-safe, lowercase recommended)
  • Version (string; if omitted, autogenerated)
  • Description
  • Tags key-value editor
  • Auto-create new version toggle (when re-uploading to an existing name)
  • Review + Create

Asset detail page

  • Header: name, latest version pill, type badge, description, tags row
  • Tabs:
  • Overview — type, datastore, total file count / size, recent versions count, sample preview (text + table + image thumbnails depending on type)
  • Versions — full version list. Columns: Version, Type, Data URI, Created by, Created on, Description, Tags, Used by count
  • Explore — file tree browser (folder/file structure under the URI). Each leaf has a Preview button (text/JSON/CSV in pane; image thumbnail; Parquet → table)
  • Profile (mltable only) — column inferred types, basic stats (count, mean, std, min, max, null %, distinct %)
  • Sample — quick "view first 100 rows" / "view first 10 files" for the URI
  • Lineage — graph of jobs that consumed this asset and assets/models produced from it
  • Used by — list of: jobs (/jobs?inputs.<name>=...), online deployments (data_collector ref), components (input ref), pipelines (input ref), evaluations, agents (knowledge ref)
  • MLTable definition (mltable only) — full YAML view (paths, transformations)

Version creation flow

  • From the Versions tab → + New version: reuses the Create wizard but pre-pins Name and increments Version

Registry view

  • Cross-workspace shared assets via Azure ML registries
  • Read-only grid: registry name, asset name, version, type, publisher
  • Pull into workspace action (creates a local asset that points at the registry blob)

Lineage view (cross-asset)

  • DAG: nodes = data assets + jobs + models; edges = inputs/outputs

Hugging Face tab

  • Search HF Hub by name/task, surface dataset cards; Import creates a uri_folder asset pointing at HF storage via a HF connection

What Loom has

The current Loom DatasetEditor (apps/fiab-console/lib/editors/foundry-sub-editors.tsx lines 660–778) is wired live to the workspace data-assets ARM via: - GET /api/items/dataset?project=...listDataAssets(workspaceName) — paged list of asset containers - GET /api/items/dataset/[name]?project=...getDataAsset(name) — returns container + versions array - POST /api/items/datasetcreateDataAsset(name, {dataType, dataUri, version, description, workspaceName}) — creates a new version under a name

The current UI is a list-and-detail surface with one creation form: - New-mode toolbar: Scope dropdown (Hub / project picker), Type filter (all / uri_file / uri_folder / mltable) - Create-asset card: Name, Type (dropdown), URI (text — placeholder shows azureml:// or abfss://...), Version (default 1), Description, Create button - List table: Name, Latest version, Type, URI - Detail mode: shows container metadata + versions table (Version, Type, URI, Created) - No upload-from-local, no datastore browse, no web-files import, no OneLake import, no HF import, no MLTable YAML editor, no Explore (file tree), no Preview, no Profile, no Sample, no Lineage, no Used-by, no registry tab, no tags editor, no archive/delete, no per-version description

Gaps for parity

  1. Source types — current create form only accepts a URI string. Foundry portal supports six sources: local upload, datastore browse, web files, storage URL, OneLake, existing-asset. Missing five
  2. Datastore browse — no UI to pick a datastore and walk its tree. The foundry-client has listDatastores but no listDatastoreContents
  3. Local upload — needs a multi-file uploader that POSTs to the workspace's default storage via SAS / hub identity, then registers the resulting uri_folder asset
  4. MLTable YAML editor — for mltable type, no UI to author the MLTable file (paths, read_delimited, transformations). Today users must hand-craft it in a notebook
  5. Tags editor — list view doesn't show tags; create form has no tag input; detail view doesn't render or edit tags
  6. Explore / file-tree browser — clicking an asset should reveal the folder/file structure under the URI. Missing
  7. Preview — no text/CSV/Parquet/image preview pane
  8. Profile — for mltable, no column-stat panel
  9. Sample — no "first N rows" or "first M files" quick preview
  10. Lineage view — no DAG of jobs producing/consuming the asset
  11. Used-by — no list of jobs / deployments / components / evaluations / agents referencing this asset
  12. Archive / delete — list has neither archive nor soft-delete; only create. Asset deletion (DELETE {workspace}/data/{name}/versions/{ver}) is missing
  13. New version flow — Foundry portal makes "+ New version" a clear button on the Versions tab; Loom requires users to know they should re-POST to the create endpoint with a bumped version
  14. Registry tab — no cross-workspace shared-asset browsing
  15. Hugging Face import — no UI, even though it's a common dataset source for fine-tuning
  16. Per-version description / tags — only one description+tags pair per container today; Foundry stores them per version
  17. Auto-create new version toggle — Foundry behaviour when re-uploading to an existing name; missing
  18. Storage path validation — current form accepts any URI string. Foundry validates that the URI matches the type (file vs folder) and that the caller has read on the storage account

Backend mapping

All asset lifecycle is ARM under the workspace. Containers and versions are separate resources.

Loom surface Backend call
List asset containers GET {workspace}/data?api-version=2024-10-01 (already wired)
Get container + versions GET {workspace}/data/{name} + GET {workspace}/data/{name}/versions (already wired)
Create / update version PUT {workspace}/data/{name}/versions/{ver} with {properties: {dataType, dataUri, description, tags, isAnonymous, isArchived}} (already wired for the basic shape)
Delete version DELETE {workspace}/data/{name}/versions/{ver}
Delete container DELETE {workspace}/data/{name}
Datastore browse POST {workspace}/datastores/{ds}/listSecrets for credentials → then Storage Blob REST (GET {account}.blob.core.windows.net/{container}?restype=container&comp=list&prefix=...) using ARM-issued SAS / OAuth
Upload local files Storage Blob PUT {blob} chunked upload, then PUT {workspace}/data/{name}/versions/{ver} referencing the uploaded azureml://datastores/{ds}/paths/{path} URI
MLTable preview / profile POST {region}.api.azureml.ms/data/v1.0/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.MachineLearningServices/workspaces/{ws}/datasets/preview (data-plane preview endpoint)
Lineage GET {workspace}/jobs filtered by inputs.<name>=azureml:{name}:{ver} + outputs traversal
Used-by same /jobs filter + cross-checks against /onlineDeployments, /components, /pipelineJobs, /evaluations
Tags update PATCH {workspace}/data/{name}/versions/{ver} with merged tags
Archive PATCH ... with properties.isArchived=true
Registry assets GET https://management.azure.com/subscriptions/{sub}/providers/Microsoft.MachineLearningServices/registries/{reg}/data/{name}/versions/{ver}
HF import client-side: hit HF API for dataset metadata, then PUT an asset with dataUri="https://huggingface.co/datasets/{org}/{name}/resolve/main" and a HF connection

The existing client implements listDataAssets, getDataAsset, createDataAsset. New helpers required: deleteDataAssetVersion, deleteDataAsset, listDatastoreContents, uploadToDatastore, previewDataAsset, getDataAssetLineage, archiveDataAssetVersion, setDataAssetTags, listRegistryAssets.

Required Azure resources

  • Hub or project workspace with a default datastore (auto-provisioned with the workspace) — already in place
  • Storage account (workspace-attached) with Storage Blob Data Contributor on LOOM_UAMI_CLIENT_ID so the BFF can issue SAS-less upload via OAuth (already in place at the workspace level; verify on the storage account itself)
  • (Optional) Azure ML registry — for cross-workspace shared assets. Out of scope for v2.5 first cut; surface a MessageBar intent="warning" on the Registry tab when no registry is bound
  • (Optional) Hugging Face connection — when the editor offers HF import. Surface honestly when LOOM_HF_CONNECTION_NAME is unset
  • (Optional) OneLake / Fabric workspace — when offering Fabric import. Already gated honestly elsewhere in Loom under LOOM_FABRIC_WORKSPACE_ID
  • Bicep — extend platform/fiab/bicep/modules/foundry/ to optionally provision an Azure ML registry and (for the HF flow) a HF connection seed via a deploymentScript

Estimated effort

3 focused sessions to reach grade B (production-grade — works, looks good, real data, real backend):

  • Session N+1 (~2.5 hrs): Per-asset Tabs (Overview / Versions / Explore / Used-by), Tags editor on container and versions, + New version button, Delete + Archive on versions, datastore browse picker (in Create wizard) wired to listDatastoreContents
  • Session N+2 (~3 hrs): Local upload (chunked → datastore → asset), MLTable YAML editor with paths+transformations + a live previewDataAsset panel, Sample + Profile tabs for mltable, Preview pane for text/CSV/Parquet/image
  • Session N+3 (~2.5 hrs): Lineage DAG view, Used-by cross-search across jobs/deployments/components/evaluations/agents, Registry tab (read-only), Hugging Face import flow gated by LOOM_HF_CONNECTION_NAME

A fourth session lands grade A+ (tests + bicep): Vitest unit tests on the URI-vs-type validator and the MLTable YAML shape, a Playwright walk covering create-from-upload → preview → new-version → archive, and bicep extensions covering an optional Azure ML registry + Hugging Face connection seed.