Loom Tracing Editor — AI Foundry parity spec¶
Captured 2026-05-26 by catalog agent
fabric-parity-loop. Sources: Microsoft Learn — Observability in generative AI, Trace and observe AI agents (classic), Configure tracing for AI agent frameworks, Add client-side tracing to Foundry agents, Enable Observability (Agent Framework). Cross-checked againstapps/fiab-console/lib/editors/foundry-sub-editors.tsx::TracingEditor(lines 442–482) and BFF routeapp/api/items/tracing/route.ts.
What it is¶
AI Foundry Tracing is the observability surface for LLM, tool, and agent execution. It is built on OpenTelemetry semantic conventions for generative AI — each request emits a tree of spans (one root, nested children for model calls, tool invocations, RAG retrieval, evaluation steps, agent decisions) — and exports them to Azure Monitor Application Insights via the OTLP exporter or the azure-monitor-opentelemetry package.
Foundry stitches three sources into one Tracing view: - Server-side traces (auto-captured) for agents and flows running inside the Foundry portal - Client-side traces (opt-in) from user code instrumented with langchain-azure-ai, opentelemetry-instrumentation-openai-agents, or the Microsoft Agent Framework - Evaluation traces (auto) for runs of built-in / custom evaluators
The portal surface is gated by an Application Insights resource bound to the Foundry hub / project.
UI components¶
Page chrome¶
- Title bar: project name, App Insights connection status, time window picker
- Right-side actions: Refresh, Settings (sampling rate, content recording), Export (CSV, JSON, KQL query), Open in App Insights
Filter bar (always-on)¶
- Time window: 5m / 1h / 24h / 7d / custom (start/end)
- Free-text search across span names, operation names, attributes, message
- Faceted filters: Operation name, Span kind (server / client / internal / consumer / producer), Status (OK / ERROR / UNSET), Agent ID / Agent reference name, Model (
gen_ai.request.model), User (enduser.id), Session ID, Has exception, Min duration
Traces list (default)¶
- Tabular grid: one row per root span / trace
- Columns: Timestamp, Operation name, Root span name, Duration, Status, # spans, Tokens (prompt / completion / total), Cost (when token-pricing is configured), Error
- Row click opens Trace detail in a side pane (or full page)
Trace detail — Spans tree view¶
- Gantt strip along the top: each span as a horizontal bar, color-coded by kind (LLM = blue, Tool = orange, Retrieval = green, Agent = purple, Internal = grey); hover for name + duration tooltip
- Tree on the left: nested span list with name, kind icon, duration, status badge
- Detail pane on the right (selected span):
- Attributes tab: OTel semantic-convention keys (
gen_ai.system,gen_ai.request.model,gen_ai.usage.prompt_tokens,gen_ai.usage.completion_tokens,gen_ai.response.finish_reason,tool.name,tool.input,tool.output,enduser.id,agent.id, plus custom) - Events tab: timestamped events on the span (e.g.
gen_ai.choice,tool.start,tool.end, exception events) - Input / Output tab: rendered chat messages (system / user / assistant) when
AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED=true; tool I/O JSON - Exceptions tab: stack trace + message when status=ERROR
- Linked spans tab: cross-trace links (e.g. evaluation run → original flow trace)
Latency breakdown panel¶
- Stacked horizontal bar: time spent in LLM calls vs Tool calls vs Retrieval vs Agent logic vs Other
- "Slowest span" callout with link
- Token usage breakdown (prompt vs completion vs total) per LLM call
Search / advanced filter¶
- Free-text → KQL preview drawer that shows the underlying Log Analytics query against the App Insights
traces,dependencies,customEventstables - Saved searches
- "Find similar" on a span: queries the same operation + same model
Comparison view (multi-select traces)¶
- Side-by-side metric strip: duration, total tokens, # LLM calls, # tool calls, status
- Diff view of root prompt / response when both have content recording on
Live tail¶
- Toggle to stream new traces as they arrive (refresh every 10s, max 100 rows)
Export¶
- CSV (current filtered list), JSON (full trace with spans + attributes), Copy KQL (the equivalent Log Analytics query)
Settings¶
- Sampling rate (% of requests sampled)
- Content recording toggle (warns about PII)
- Retention window (read-only, surfaces App Insights retention policy)
What Loom has¶
Current TracingEditor (apps/fiab-console/lib/editors/foundry-sub-editors.tsx lines 442–482) is real-REST wired to Application Insights via lib/azure/foundry-client.ts::queryTraces and BFF route GET /api/items/tracing. The client resolves the Foundry hub's bound applicationInsights resource id, runs a KQL union traces, dependencies, customEvents query, and returns up to 200 rows.
- Window (hrs) input (default 24, max enforced server-side at 168 hours)
- Operation filter input (free-text)
- Reload button
- Results table columns: Time, Operation, Name, Duration (ms), Success (true/false badge), Result (resultCode)
- Errors / not-deployed surface honestly via
ErrorBar(e.g. when the hub has no App Insights bound:NotDeployedError('Application Insights', 'Bind one in the hub workspace properties.'))
That is: Loom can show a flat list of recent spans with filtering by time + operation name, but everything else — the span tree, attributes, latency breakdown, exception view, gen-AI-specific surface, KQL drawer, export — is missing.
Gaps for parity¶
- Spans tree — Loom shows a flat row per span. Foundry pivots on the root trace and renders the full parent / child tree with a Gantt bar. Needs
operation_Idgrouping + parent-id reconstruction in the query. - Per-span detail pane — no attributes / events / input-output / exception sub-tabs. Currently no way to inspect a single span beyond what fits in the row.
- Gen-AI semantic conventions — Loom's columns are generic AppInsights (
name,operation_Name,success,resultCode). Foundry surfacesgen_ai.request.model,gen_ai.usage.*_tokens,tool.name,agent.idas first-class columns / facets. - Latency breakdown — no stacked-bar view of LLM vs Tool vs Retrieval time within a trace.
- Token usage / cost — no aggregation of prompt / completion / total tokens per trace; no cost calculation.
- Content recording (Inputs / Outputs) — chat messages and tool I/O aren't extracted from
customDimensions; today they're hidden behind thecustomDimensionsJSON blob. - Exception drill-down — no dedicated tab; failed spans only render their
resultCode. - Search across attributes — Loom only filters on
operation_Name. No free-text search acrossname,message,customDimensions.gen_ai.*. - Faceted filters — no facets for model, agent ID, user, session, status, span kind.
- Comparison view — no multi-select → side-by-side diff of two traces.
- Live tail — no streaming refresh.
- KQL preview / Export — no drawer showing the underlying query, no CSV / JSON / copy-KQL action.
- Settings — no sampling-rate / content-recording toggles (these write to the App Insights resource).
- Linked spans — no cross-trace navigation (e.g. evaluation trace → originating flow trace).
- Time-range presets — Loom uses a free-form hours input. Foundry has 5m / 1h / 24h / 7d / custom presets.
Backend mapping¶
Single backend: App Insights Log Analytics-backed query API at https://management.azure.com/{appInsightsId}/api/query?api-version=2015-05-01 (already wired). Foundry-side annotations come from the same KQL — just richer projections and grouping.
| Loom surface | Backend KQL / API call |
|---|---|
| Traces list (flat, current) | union traces, dependencies, customEvents \| where timestamp > ago(<h>h) \| project timestamp, name, operation_Name, duration, success, resultCode, message, customDimensions (current; wired) |
| Traces list (root-rooted, target) | Same union, then summarize spanCount=count(), rootName=anyif(name, isempty(operation_ParentId)), totalDuration=sum(duration), totalTokens=sumif(toint(customDimensions["gen_ai.usage.total_tokens"]), isnotempty(customDimensions["gen_ai.usage.total_tokens"])) by operation_Id, operation_Name |
| Trace detail (one trace tree) | union traces, dependencies \| where operation_Id == "<id>" \| project timestamp, id, parentId=operation_ParentId, name, duration, customDimensions \| order by timestamp asc (then client-side parent/child stitch) |
| Span attributes | Already in customDimensions (JSON in the row); just parse keys matching gen_ai.*, tool.*, agent.*, enduser.* |
| Exception drill-down | exceptions \| where operation_Id == "<id>" |
| Token / cost aggregation | customDimensions["gen_ai.usage.prompt_tokens"] + _completion_tokens + price table lookup |
| Faceted counts | summarize count() by customDimensions["gen_ai.request.model"] etc. |
| Live tail | Same query with timestamp > ago(15s) polled every 10s |
| Sampling rate / content recording | PATCH {arm}/.../components/{appi}?api-version=2020-02-02-preview writes the sampling rate; content-recording is set at the SDK side via env vars and surfaced read-only here |
| Get bound App Insights | getWorkspaceInfo() already resolves ws.applicationInsights (wired) |
New helpers required in foundry-client.ts: queryTraceTree(operationId), queryTraceFacets(window, field), queryTokenUsage(window), queryLiveTail(sinceTimestamp). Most of these are KQL variations on the existing armFetch + App Insights /api/query path; no new RP integration needed.
Required Azure resources¶
- Application Insights component (
Microsoft.Insights/components) bound to the Foundry hub / project viaworkspace.properties.applicationInsights— already documented as required; honestNotDeployedErrorsurfaces today when it's missing - Log Analytics workspace that backs the App Insights (created automatically when App Insights is workspace-based, which is the default for new resources since 2022)
- Loom UAMI roles:
Monitoring Readeron the App Insights resource (already wired);Application Insights Component Contributorto change sampling rate from the Settings panel - For content recording: instrumented apps must set
AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED=true(Python) or equivalent in their own code — Loom can surface a doc link / instruction in the Settings panel but cannot toggle it remotely - Bicep: confirm
modules/observability/app-insights.bicep(already present in the FiaB orchestrator) wires the App Insights id into the Foundry hub viaMicrosoft.MachineLearningServices/workspaces.properties.applicationInsights. Loom UAMI role assignments on the App Insights resource go in the same module.
MessageBar intent="warning" triggers: hub has no App Insights bound (already honest), Loom UAMI lacks Monitoring Reader (queries 403), App Insights retention < query window.
Estimated effort¶
3 sessions to reach grade B:
- Session N+1 (~2.5 hrs): Pivot the list from flat spans to root-traces (group by
operation_Id, project tokens + span count + total duration). Add 5m / 1h / 24h / 7d preset chips. ParsecustomDimensionsto surfacegen_ai.request.model, token columns. Add faceted filters (model, agent ID, status). - Session N+2 (~2.5 hrs): Trace detail pane with spans tree + Gantt bar. Per-span sub-tabs (Attributes / Events / Inputs-Outputs / Exceptions). Wire
queryTraceTreehelper. Surface the latency breakdown stacked bar. - Session N+3 (~2 hrs): Comparison view (multi-select). Live tail toggle. KQL preview drawer. CSV / JSON export. Settings panel for sampling rate.
Grade A+ adds Vitest coverage on the parent-id stitching reducer (must produce a deterministic tree even when spans arrive out of order or with a missing parent), a Playwright walk against a seeded set of synthetic OTel spans (LangChain + tool + agent), and bicep additions documenting the App Insights → Foundry hub binding so a fresh deployment surfaces real traces without manual portal steps.