evaluation — parity gap (validator v2, 2026-05-26)¶

Loom URL: /items/evaluation/new Fabric reference: ai.azure.com — Evaluations (Quality / RAG / Safety / Custom metric grid) Loom screenshot: temp/parity/evaluation-loom.png

Phase 4¶

Route	Status	Notes
`GET /api/items/evaluation?project=loom-project-default`	403	Same Foundry data-plane permission gap as prompt-flow
`POST /api/items/evaluation`	wired but list 403 prevents reading any history	—

Renders Project picker · New evaluation form with Display name / Dataset ID / Model deployment / Evaluators (comma-separated string).

Phase 3 — Fabric vs Loom¶

Fabric element	Loom present?	Severity
Evaluator picker categorized as Quality · RAG · Safety · Custom with toggleable per-category metrics	NO — single comma-separated text field	BLOCKER
Run a new evaluation with parameter overrides	partial (single form)	MAJOR
Metric grid with category groupings (Groundedness · Relevance · Fluency under Quality; Coherence under Quality; Hate/Violence/Sexual/SelfHarm under Safety; etc.)	NO — Loom renders flat key/value table when an evaluation is opened	BLOCKER
Side-by-side run compare	NO	MAJOR
Per-row sample drilldown (input → prediction → expected → individual eval scores)	NO	BLOCKER
Aggregate score sparkline	NO	MAJOR
Dataset preview button	NO	MAJOR
Re-run / Cancel run	NO	MAJOR

Functional¶

Project picker wires to BFF (BFF returns project list OK; subsequent eval list 403)
New evaluation form posts to BFF but cannot list to verify
Detail render only shows a flat metric table

Grade — F¶

Same diagnosis as prompt-flow: 403 on data-plane + flat-table UI vs Fabric's categorized metric grid + sample drilldown. The editor is functionally a "Submit eval job" form. Without honest MessageBar gating, this is vaporware. Grade F.