Skip to content
CSA Loom — the Microsoft Fabric experience for Azure tenants where Fabric isn't yet available: lakehouses, warehouses, notebooks, semantic models, Activator rules, Data Agents, across Commercial, GCC, GCC-High, and DoD IL5

evaluation — parity gap (validator v2, 2026-05-26)

Loom URL: /items/evaluation/new Fabric reference: ai.azure.com — Evaluations (Quality / RAG / Safety / Custom metric grid) Loom screenshot: temp/parity/evaluation-loom.png

Phase 4

Route Status Notes
GET /api/items/evaluation?project=loom-project-default 403 Same Foundry data-plane permission gap as prompt-flow
POST /api/items/evaluation wired but list 403 prevents reading any history

Renders Project picker · New evaluation form with Display name / Dataset ID / Model deployment / Evaluators (comma-separated string).

Phase 3 — Fabric vs Loom

Fabric element Loom present? Severity
Evaluator picker categorized as Quality · RAG · Safety · Custom with toggleable per-category metrics NO — single comma-separated text field BLOCKER
Run a new evaluation with parameter overrides partial (single form) MAJOR
Metric grid with category groupings (Groundedness · Relevance · Fluency under Quality; Coherence under Quality; Hate/Violence/Sexual/SelfHarm under Safety; etc.) NO — Loom renders flat key/value table when an evaluation is opened BLOCKER
Side-by-side run compare NO MAJOR
Per-row sample drilldown (input → prediction → expected → individual eval scores) NO BLOCKER
Aggregate score sparkline NO MAJOR
Dataset preview button NO MAJOR
Re-run / Cancel run NO MAJOR

Functional

  • Project picker wires to BFF (BFF returns project list OK; subsequent eval list 403)
  • New evaluation form posts to BFF but cannot list to verify
  • Detail render only shows a flat metric table

Grade — F

Same diagnosis as prompt-flow: 403 on data-plane + flat-table UI vs Fabric's categorized metric grid + sample drilldown. The editor is functionally a "Submit eval job" form. Without honest MessageBar gating, this is vaporware. Grade F.