Loom Content Safety Editor — AI Foundry parity spec¶
Captured 2026-05-26 by catalog agent
fabric-parity-loop. Sources: Microsoft Learn — Content Safety in Foundry portal, Prompt Shields, Default safety policies, Configure content filters, Mitigate false results. Cross-checked againstapps/fiab-console/lib/editors/foundry-sub-editors.tsx::ContentSafetyEditor(lines 379–436) and BFF routeapp/api/items/content-safety/route.ts.
What it is¶
Azure AI Content Safety is a moderation service that classifies prompts and completions across four harm categories with four severity levels each, plus six specialised classifiers (Prompt Shields jailbreak / indirect, protected material text / code, groundedness, custom categories). It exposes both a stand-alone REST API and a content-filter integration inside Azure OpenAI / AI Foundry deployments.
The Foundry portal surfaces four operator workflows:
- Try it / Playground — interactive moderation of a single piece of text, image, or document with the current severity thresholds
- Blocklists — per-resource keyword / regex term lists that always block (or always allow) regardless of classifier output
- Custom categories — train and deploy a custom binary classifier (positive / negative term sets, model versions, deployment status)
- Content filter configurations — named bundles of category thresholds + Prompt Shields + protected material + custom blocklists, assigned to one or more Azure OpenAI deployments
- Monitoring — incidents and metrics from filters in production
UI components¶
Page chrome¶
- Title bar: resource name, region, pricing tier
- Right-side actions: + New configuration, Refresh, Settings, Share, Open in Azure portal
Try-it / Playground (default tab)¶
- Radio: Text / Image / Multimodal (text + image) / Document (with embedded prompt-injection check)
- Input area: text box or file upload (image / PDF)
- Harm category sliders (4 sliders, 0–7 severity scale, default 4 = Medium): Hate / Fairness, Sexual, Violence, Self-Harm
- Toggle row: Prompt Shields — direct (jailbreak), Prompt Shields — indirect (XPIA), Protected material — text, Protected material — code, Groundedness (when "Document" is selected)
- Custom categories multi-select: lists deployed custom classifier versions
- Blocklist multi-select: lists blocklists bound to this resource
- Run test button → calls
/text:analyzeor/image:analyzeor/text:shieldPromptand shows: - Per-category result chips: category name + severity score + pass/block badge
- Prompt-shield boolean (attackDetected) + sub-type (Attempt to change system rules / Conversation mockup / Role-Play / Encoding / Manipulated content / Allowing compromised LLM access / Information gathering / Availability / Fraud / Malware)
- Protected material match annotations (citation + license)
- Raw response JSON expander
- Compare with ground truth panel: paste expected verdict, get false-positive / false-negative diagnosis
Blocklists tab¶
- Grid of blocklists: Name, Description, Term count, Created, Modified
- + New blocklist → name, description, then per-blocklist editor: terms grid (
text,isRegexbool,description); bulk import from CSV - Per-term match type: exact / regex / fuzzy
- Test blocklist sub-panel: enter sample text, see which terms hit
Custom categories tab¶
- Grid: Name, Definition, # positive samples, # negative samples, Latest version, Status (NotStarted / Training / Succeeded / Failed)
- + Build a custom category wizard:
- Step 1: name + plain-English definition (used as the seed prompt for the classifier)
- Step 2: positive samples (≥50 recommended) + negative samples (≥50)
- Step 3: training options — model version, training compute size
- Step 4: review + submit; submission writes to
{resource}/text/categories/{name}/versions/{version}:build - Per-version detail: training metrics (accuracy, precision, recall, F1), deploy / undeploy actions, test panel
Content filter configurations tab¶
- Grid of configurations: Name, Used by # deployments, Created, Modified
- + Create configuration wizard (Azure OpenAI integration path):
- Step 1 — Basics: name, description
- Step 2 — Input filter (prompt): per-category severity threshold (Low / Medium / High / Off) + Prompt Shields (direct on/off, indirect on/off) + blocklist multi-select + custom-categories multi-select
- Step 3 — Output filter (completion): per-category severity + Protected material text on/off + Protected material code on/off + groundedness on/off + blocklist + custom categories
- Step 4 — Apply to deployments: multi-select Azure OpenAI deployments
- Step 5 — Review + create
- Per-config detail: deployment list, sample-request inspector, defaults reset button
Monitoring tab¶
- Time-windowed chart of: blocked-prompt rate, blocked-completion rate, total requests
- Top categories triggered
- Recent incidents table: timestamp, deployment, category, severity, snippet (subject to abuse-monitoring data-retention rules)
What Loom has¶
Current ContentSafetyEditor (apps/fiab-console/lib/editors/foundry-sub-editors.tsx lines 379–436) is real-REST wired to a Content Safety resource via lib/azure/foundry-client.ts::moderateText / moderateImage / listContentSafetyPolicies and BFF route GET|POST /api/items/content-safety. The client is env-gated: when LOOM_CONTENT_SAFETY_ENDPOINT is unset, the BFF returns 503 + notDeployed:true and the editor surfaces an honest "Not yet provisioned" warning bar.
- Loads the default policies summary (single hard-coded entry:
{ name: 'default', thresholds: { hate:4, selfHarm:4, sexual:4, violence:4 } }) - Text moderation card: textarea + Analyze text button → POSTs
{ kind:'text', text }, calls/contentsafety/text:analyze?api-version=2024-09-01, dumps result JSON in<pre> - Image moderation card: file picker (base64 client-side) + Analyze image button → POSTs
{ kind:'image', imageBase64 }, calls/contentsafety/image:analyze?api-version=2024-09-01 - Errors / not-deployed surface honestly via
ErrorBar - No blocklists, no custom categories, no filter configurations, no Prompt Shields, no monitoring
That is: Loom can run a single moderation call against the default category set and render raw JSON, but everything else in the Foundry portal is missing.
Gaps for parity¶
- Harm-category sliders — the four severity thresholds are not exposed in the UI; the call uses the resource default (Medium across the board). Foundry shows 4 sliders 0–7.
- Prompt Shields — no UI for
/text:shieldPrompt, no toggle for direct (jailbreak) vs indirect (XPIA / document-embedded) attacks, no rendering of the 9 sub-type categories. - Protected material — no toggle / no display of citation + license annotations when text or code matches a known protected source.
- Groundedness detection — no UI for document-grounded moderation (requires document embedding format).
- Blocklists — no list view, no editor, no per-term match-type picker, no CSV bulk import.
- Custom categories — no wizard to train a custom classifier, no version list, no train/deploy actions.
- Content filter configurations — no UI for the named-bundle abstraction (input vs output filters, deployment binding); this is the bridge between Content Safety and Azure OpenAI deployments.
- Multimodal (text + image) — only text or image alone; the combined call is missing.
- Document upload — no PDF / docx ingest with embedded-instruction detection.
- Ground-truth comparison panel — no false-positive / false-negative triage aid (called out in the Microsoft "Mitigate false results" guide as the recommended operator workflow).
- Monitoring — no incidents view, no blocked-rate chart, no top-categories surface.
- Policies stub is fake-honest —
listContentSafetyPoliciesreturns a single hard-coded "default" entry. Should be replaced with real reads from{resource}/text/blocklists,{resource}/text/categories, and (for OpenAI integration){aoai-resource}/contentFilters?api-version=...once those are wired.
Backend mapping¶
Two REST surfaces: the Content Safety data plane at https://{resource}.cognitiveservices.azure.com/contentsafety/... and the Azure OpenAI ARM resource provider at Microsoft.CognitiveServices/accounts/{aoai}/raiPolicies/... for filter configurations.
| Loom surface | Backend call |
|---|---|
| Analyze text (with sliders + custom categories + blocklists) | POST {ep}/contentsafety/text:analyze?api-version=2024-09-01 body { text, categories?, blocklistNames?, haltOnBlocklistHit?, outputType? } (wired; UI under-uses it) |
| Analyze image | POST {ep}/contentsafety/image:analyze?api-version=2024-09-01 (wired) |
| Multimodal | POST {ep}/contentsafety/multimodal:analyze?api-version=2024-09-15-preview |
| Prompt Shields | POST {ep}/contentsafety/text:shieldPrompt?api-version=2024-09-01 body { userPrompt, documents[] } |
| Detect protected material — text | POST {ep}/contentsafety/text:detectProtectedMaterial?api-version=2024-09-01 |
| Detect protected material — code | POST {ep}/contentsafety/text:detectProtectedMaterialForCode?api-version=2024-09-15-preview |
| Groundedness | POST {ep}/contentsafety/text:detectGroundedness?api-version=2024-09-15-preview |
| List blocklists | GET {ep}/contentsafety/text/blocklists?api-version=2024-09-01 |
| CRUD blocklist | PATCH / DELETE {ep}/contentsafety/text/blocklists/{name} |
| CRUD blocklist items | POST .../blocklists/{name}:addOrUpdateBlocklistItems, :removeBlocklistItems |
| List custom categories | GET {ep}/contentsafety/text/categories?api-version=2024-09-15-preview |
| Build custom category | POST {ep}/contentsafety/text/categories/{name}/versions/{ver}:build |
| Deploy custom category | POST {ep}/contentsafety/text/categories/{name}/versions/{ver}:deploy |
| List filter configurations (AOAI-side) | GET {arm}/.../accounts/{aoai}/raiPolicies?api-version=2024-10-01 |
| Create / update filter config | PUT {arm}/.../accounts/{aoai}/raiPolicies/{name} |
| Monitoring metrics | POST {arm}/.../accounts/{aoai}/providers/microsoft.insights/metrics:query for BlockedPrompts / BlockedCompletions |
New helpers required in foundry-client.ts: shieldPrompt, detectProtectedMaterial, detectGroundedness, listBlocklists, upsertBlocklist, addBlocklistItems, removeBlocklistItems, listCustomCategories, buildCustomCategory, deployCustomCategory, listRaiPolicies, upsertRaiPolicy.
Required Azure resources¶
- Azure AI Content Safety resource (
Microsoft.CognitiveServices/accountskind=ContentSafety) — env varLOOM_CONTENT_SAFETY_ENDPOINTalready gates the editor; Standard tier required for custom categories and blocklists (Free tier limits) - Loom UAMI roles:
Cognitive Services Useron the Content Safety account;Cognitive Services Contributorto write blocklists / categories / RAI policies - Azure OpenAI account (
Microsoft.CognitiveServices/accountskind=OpenAI) — required for the Content filter configurations tab (raiPolicies live on the AOAI resource); UAMI needsCognitive Services OpenAI Contributor - Storage — none required for moderation itself, but custom-category training data and exported results land in the workspace storage account
- Bicep: add
modules/cognitive/content-safety.bicep(already partially present in the AI Foundry orchestrator) and a role-assignment block for the Loom UAMI. Wire the endpoint intoapps[]env inadmin-plane/main.bicepasLOOM_CONTENT_SAFETY_ENDPOINT.
MessageBar intent="warning" triggers: LOOM_CONTENT_SAFETY_ENDPOINT unset (already honest), Free-tier resource selected (custom categories disabled), Loom UAMI missing Cognitive Services Contributor (blocklist / category writes will 403).
Estimated effort¶
3 sessions to reach grade B:
- Session N+1 (~2 hrs): Replace raw-JSON dump with structured result cards. Add the four harm-category sliders + Prompt Shields toggle + Protected material toggles to the Try-it tab. Wire
shieldPromptanddetectProtectedMaterialhelpers. - Session N+2 (~2.5 hrs): Blocklists tab — list, create, term editor with regex / exact / fuzzy, CSV bulk import, "Test blocklist" sub-panel. Replace the hard-coded
listContentSafetyPolicieswith reallistBlocklists+ category reads. - Session N+3 (~3 hrs): Custom categories tab (4-step training wizard, version list, deploy action). Content filter configurations tab (named bundles, input + output filters, deployment binding via
raiPoliciesPUT). Multimodal + Document analysis branches.
Grade A+ adds Vitest coverage on the blocklist regex validation, a Playwright walk against a seeded resource with one blocklist + one custom category + one filter config, and bicep wiring the Content Safety resource into the FiaB orchestrator (already documented in no-vaporware.md as a required infra side-car).