Loom Dataset Editor — Foundry-parity spec¶
Captured 2026-05-26 by catalog agent
foundry-parity-2026-05-26. Sources: Microsoft Learn — Create and manage data assets, Data concepts in Azure Machine Learning, Working with tables in Azure Machine Learning, How Azure Machine Learning works: resources and assets, CLI (v2) data YAML schema, Share data across workspaces with registries. Cross-checked against existing Loom editor atapps/fiab-console/lib/editors/foundry-sub-editors.tsx::DatasetEditorand the foundry client atapps/fiab-console/lib/azure/foundry-client.ts::listDataAssets/getDataAsset/createDataAsset.
What it is¶
A Foundry / Azure Machine Learning dataset (officially "data asset" in SDK/CLI v2) is a versioned, named reference to data living in a datastore. The asset is a container; under it sit one or more versions, each pointing at an immutable storage URI. Three types exist:
uri_file— single file in storage (CSV, Parquet, image, audio, JSON, etc.). Mapped one-to-one onto the compute filesystem at run timeuri_folder— a folder of files. Mapped one-to-one (recursive) onto the compute filesystem. The canonical type for training images, Parquet shards, raw textmltable— a tabular abstraction with a serializedMLTableartifact that captures path globs + transformations (read_delimited, drop_columns, filter, sample). Used for AutoML, parallel jobs, complex/changing schemas, multi-location tabular data
Datasets are first-class citizens for: training inputs, evaluation/grounding inputs, AutoML inputs, prompt-flow inputs, fine-tuning corpora, vector-index source data, agent knowledge sources. A fourth informal type — Hugging Face dataset — surfaces via connections (the asset itself is still uri_folder or mltable pointing at HF storage). Datasets can be registered locally in a hub/project workspace, or shared via Azure ML registries across many workspaces.
UI components¶
Page chrome¶
- Title bar: workspace name (hub or project) + breadcrumb
- Right-side actions: Refresh, + Create
- Top tab strip: All assets, Local datasets (this workspace), Registry datasets (shared registry assets), Lineage view, Hugging Face
All assets grid¶
- Columns: Name, Latest version, Type (uri_file / uri_folder / mltable), Tags, Description, Modified by, Modified on
- Filters: Type (all/uri_file/uri_folder/mltable), Tag (key=value), Author, Modified date range
- Search box (name contains)
- Per-row actions: Open, New version, Archive, Delete, Copy URI
Create asset wizard¶
- Type: radio (File / Folder / Table)
- Source: tabs
- From local files — drag-drop, browse, multi-file
- From a datastore — pick datastore + browse the container tree
- From web files — URL list (HTTPS public)
- From an Azure Storage URL — paste
wasbs:///abfss:///azureml:///adl:///https:// - From OneLake — pick Fabric workspace + lakehouse + table/file
- From an existing data asset — pick asset + version
- Storage settings (when type=table): MLTable file source (existing on disk vs auto-generated). For auto-generated, pattern paths + read_delimited config (delimiter, encoding, header rule, na_values, multiline support)
- Name (must be DNS-safe, lowercase recommended)
- Version (string; if omitted, autogenerated)
- Description
- Tags key-value editor
- Auto-create new version toggle (when re-uploading to an existing name)
- Review + Create
Asset detail page¶
- Header: name, latest version pill, type badge, description, tags row
- Tabs:
- Overview — type, datastore, total file count / size, recent versions count, sample preview (text + table + image thumbnails depending on type)
- Versions — full version list. Columns: Version, Type, Data URI, Created by, Created on, Description, Tags, Used by count
- Explore — file tree browser (folder/file structure under the URI). Each leaf has a Preview button (text/JSON/CSV in pane; image thumbnail; Parquet → table)
- Profile (mltable only) — column inferred types, basic stats (count, mean, std, min, max, null %, distinct %)
- Sample — quick "view first 100 rows" / "view first 10 files" for the URI
- Lineage — graph of jobs that consumed this asset and assets/models produced from it
- Used by — list of: jobs (
/jobs?inputs.<name>=...), online deployments (data_collector ref), components (input ref), pipelines (input ref), evaluations, agents (knowledge ref) - MLTable definition (mltable only) — full YAML view (paths, transformations)
Version creation flow¶
- From the Versions tab → + New version: reuses the Create wizard but pre-pins Name and increments Version
Registry view¶
- Cross-workspace shared assets via Azure ML registries
- Read-only grid: registry name, asset name, version, type, publisher
- Pull into workspace action (creates a local asset that points at the registry blob)
Lineage view (cross-asset)¶
- DAG: nodes = data assets + jobs + models; edges = inputs/outputs
Hugging Face tab¶
- Search HF Hub by name/task, surface dataset cards; Import creates a
uri_folderasset pointing at HF storage via a HF connection
What Loom has¶
The current Loom DatasetEditor (apps/fiab-console/lib/editors/foundry-sub-editors.tsx lines 660–778) is wired live to the workspace data-assets ARM via: - GET /api/items/dataset?project=... → listDataAssets(workspaceName) — paged list of asset containers - GET /api/items/dataset/[name]?project=... → getDataAsset(name) — returns container + versions array - POST /api/items/dataset → createDataAsset(name, {dataType, dataUri, version, description, workspaceName}) — creates a new version under a name
The current UI is a list-and-detail surface with one creation form: - New-mode toolbar: Scope dropdown (Hub / project picker), Type filter (all / uri_file / uri_folder / mltable) - Create-asset card: Name, Type (dropdown), URI (text — placeholder shows azureml:// or abfss://...), Version (default 1), Description, Create button - List table: Name, Latest version, Type, URI - Detail mode: shows container metadata + versions table (Version, Type, URI, Created) - No upload-from-local, no datastore browse, no web-files import, no OneLake import, no HF import, no MLTable YAML editor, no Explore (file tree), no Preview, no Profile, no Sample, no Lineage, no Used-by, no registry tab, no tags editor, no archive/delete, no per-version description
Gaps for parity¶
- Source types — current create form only accepts a URI string. Foundry portal supports six sources: local upload, datastore browse, web files, storage URL, OneLake, existing-asset. Missing five
- Datastore browse — no UI to pick a datastore and walk its tree. The foundry-client has
listDatastoresbut nolistDatastoreContents - Local upload — needs a multi-file uploader that POSTs to the workspace's default storage via SAS / hub identity, then registers the resulting
uri_folderasset - MLTable YAML editor — for
mltabletype, no UI to author theMLTablefile (paths, read_delimited, transformations). Today users must hand-craft it in a notebook - Tags editor — list view doesn't show tags; create form has no tag input; detail view doesn't render or edit tags
- Explore / file-tree browser — clicking an asset should reveal the folder/file structure under the URI. Missing
- Preview — no text/CSV/Parquet/image preview pane
- Profile — for mltable, no column-stat panel
- Sample — no "first N rows" or "first M files" quick preview
- Lineage view — no DAG of jobs producing/consuming the asset
- Used-by — no list of jobs / deployments / components / evaluations / agents referencing this asset
- Archive / delete — list has neither archive nor soft-delete; only create. Asset deletion (
DELETE {workspace}/data/{name}/versions/{ver}) is missing - New version flow — Foundry portal makes "+ New version" a clear button on the Versions tab; Loom requires users to know they should re-POST to the create endpoint with a bumped version
- Registry tab — no cross-workspace shared-asset browsing
- Hugging Face import — no UI, even though it's a common dataset source for fine-tuning
- Per-version description / tags — only one description+tags pair per container today; Foundry stores them per version
- Auto-create new version toggle — Foundry behaviour when re-uploading to an existing name; missing
- Storage path validation — current form accepts any URI string. Foundry validates that the URI matches the type (file vs folder) and that the caller has read on the storage account
Backend mapping¶
All asset lifecycle is ARM under the workspace. Containers and versions are separate resources.
| Loom surface | Backend call |
|---|---|
| List asset containers | GET {workspace}/data?api-version=2024-10-01 (already wired) |
| Get container + versions | GET {workspace}/data/{name} + GET {workspace}/data/{name}/versions (already wired) |
| Create / update version | PUT {workspace}/data/{name}/versions/{ver} with {properties: {dataType, dataUri, description, tags, isAnonymous, isArchived}} (already wired for the basic shape) |
| Delete version | DELETE {workspace}/data/{name}/versions/{ver} |
| Delete container | DELETE {workspace}/data/{name} |
| Datastore browse | POST {workspace}/datastores/{ds}/listSecrets for credentials → then Storage Blob REST (GET {account}.blob.core.windows.net/{container}?restype=container&comp=list&prefix=...) using ARM-issued SAS / OAuth |
| Upload local files | Storage Blob PUT {blob} chunked upload, then PUT {workspace}/data/{name}/versions/{ver} referencing the uploaded azureml://datastores/{ds}/paths/{path} URI |
| MLTable preview / profile | POST {region}.api.azureml.ms/data/v1.0/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.MachineLearningServices/workspaces/{ws}/datasets/preview (data-plane preview endpoint) |
| Lineage | GET {workspace}/jobs filtered by inputs.<name>=azureml:{name}:{ver} + outputs traversal |
| Used-by | same /jobs filter + cross-checks against /onlineDeployments, /components, /pipelineJobs, /evaluations |
| Tags update | PATCH {workspace}/data/{name}/versions/{ver} with merged tags |
| Archive | PATCH ... with properties.isArchived=true |
| Registry assets | GET https://management.azure.com/subscriptions/{sub}/providers/Microsoft.MachineLearningServices/registries/{reg}/data/{name}/versions/{ver} |
| HF import | client-side: hit HF API for dataset metadata, then PUT an asset with dataUri="https://huggingface.co/datasets/{org}/{name}/resolve/main" and a HF connection |
The existing client implements listDataAssets, getDataAsset, createDataAsset. New helpers required: deleteDataAssetVersion, deleteDataAsset, listDatastoreContents, uploadToDatastore, previewDataAsset, getDataAssetLineage, archiveDataAssetVersion, setDataAssetTags, listRegistryAssets.
Required Azure resources¶
- Hub or project workspace with a default datastore (auto-provisioned with the workspace) — already in place
- Storage account (workspace-attached) with Storage Blob Data Contributor on
LOOM_UAMI_CLIENT_IDso the BFF can issue SAS-less upload via OAuth (already in place at the workspace level; verify on the storage account itself) - (Optional) Azure ML registry — for cross-workspace shared assets. Out of scope for v2.5 first cut; surface a
MessageBar intent="warning"on the Registry tab when no registry is bound - (Optional) Hugging Face connection — when the editor offers HF import. Surface honestly when
LOOM_HF_CONNECTION_NAMEis unset - (Optional) OneLake / Fabric workspace — when offering Fabric import. Already gated honestly elsewhere in Loom under
LOOM_FABRIC_WORKSPACE_ID - Bicep — extend
platform/fiab/bicep/modules/foundry/to optionally provision an Azure ML registry and (for the HF flow) a HF connection seed via a deploymentScript
Estimated effort¶
3 focused sessions to reach grade B (production-grade — works, looks good, real data, real backend):
- Session N+1 (~2.5 hrs): Per-asset Tabs (Overview / Versions / Explore / Used-by), Tags editor on container and versions, + New version button, Delete + Archive on versions, datastore browse picker (in Create wizard) wired to
listDatastoreContents - Session N+2 (~3 hrs): Local upload (chunked → datastore → asset), MLTable YAML editor with paths+transformations + a live
previewDataAssetpanel, Sample + Profile tabs for mltable, Preview pane for text/CSV/Parquet/image - Session N+3 (~2.5 hrs): Lineage DAG view, Used-by cross-search across jobs/deployments/components/evaluations/agents, Registry tab (read-only), Hugging Face import flow gated by
LOOM_HF_CONNECTION_NAME
A fourth session lands grade A+ (tests + bicep): Vitest unit tests on the URI-vs-type validator and the MLTable YAML shape, a Playwright walk covering create-from-upload → preview → new-version → archive, and bicep extensions covering an optional Azure ML registry + Hugging Face connection seed.