Azure AI Foundry — Comprehensive Guide¶
Architecture, deployment, and operational guide for Azure AI Foundry within the CSA-in-a-Box analytics platform.
Azure AI Foundry (formerly Azure AI Studio) is Microsoft's unified platform for building, evaluating, and deploying generative-AI applications. It consolidates model access, prompt engineering, evaluation, and responsible-AI tooling into a single control plane organised around Hubs and Projects. This guide covers everything a CSA-in-a-Box team needs to stand up AI workloads on top of their existing Data Landing Zone.
Architecture Overview¶
graph TB
subgraph Hub["AI Foundry Hub"]
direction TB
SHARED[Shared Resources<br/>Key Vault · Storage · ACR]
NET[Managed Network<br/>Private Endpoints]
end
subgraph Projects["Projects"]
P1[Project — RAG Chatbot]
P2[Project — Doc Intel Pipeline]
P3[Project — AI Agents]
end
subgraph Models["Model Catalog"]
AOAI[Azure OpenAI<br/>GPT-4o · GPT-4o-mini · o1 · o3]
OSS[Open Source<br/>Phi-4 · Llama 3 · Mistral]
end
subgraph Integration["CSA-in-a-Box Integration"]
GOLD[Gold Layer<br/>ADLS Gen2 / Delta]
SEARCH[Azure AI Search<br/>Vector + Keyword]
PIPE[Enrichment Pipelines<br/>Bronze → Silver → Gold]
AGENTS[AI Agents<br/>Semantic Kernel]
end
subgraph Runtime["Runtime"]
PF[Prompt Flow]
EVAL[Evaluations]
DEPLOY[Deployments<br/>Managed Endpoint / Serverless]
end
Hub --> Projects
P1 & P2 & P3 --> Models
P1 --> PF --> DEPLOY
P1 --> EVAL
P1 --> SEARCH --> GOLD
P3 --> AGENTS --> GOLD
P2 --> PIPE
SHARED -. secrets .-> P1 & P2 & P3
NET -. isolation .-> SEARCH & GOLD
style Hub fill:#e3f2fd,stroke:#1565c0
style Projects fill:#e8f5e9,stroke:#2e7d32
style Models fill:#fff3e0,stroke:#e65100
style Integration fill:#f3e5f5,stroke:#6a1b9a
style Runtime fill:#fce4ec,stroke:#c62828 Key relationships:
- A Hub owns shared infrastructure — Key Vault, Storage, Container Registry, and managed networking. One Hub serves many Projects.
- A Project is the working boundary for a team or workload. It has its own connections, compute, and deployments.
- CSA-in-a-Box's Gold layer feeds AI Search indexes, which ground RAG flows. Enrichment pipelines use model endpoints to move data from Bronze to Silver.
Hub & Project Setup¶
Hub Creation (Bicep)¶
@description('AI Foundry Hub — shared control plane for all AI projects')
resource aiHub 'Microsoft.MachineLearningServices/workspaces@2024-10-01' = {
name: '${prefix}-ai-hub'
location: location
kind: 'Hub'
identity: {
type: 'SystemAssigned'
}
properties: {
friendlyName: 'CSA AI Foundry Hub'
description: 'Shared hub for CSA-in-a-Box AI workloads'
keyVaultId: keyVault.id
storageAccountId: storageAccount.id
containerRegistryId: containerRegistry.id
managedNetwork: {
isolationMode: 'AllowInternetOutbound' // or 'AllowOnlyApprovedOutbound'
outboundRules: {
aoai: {
type: 'PrivateEndpoint'
destination: {
serviceResourceId: openAiAccount.id
subresourceTarget: 'account'
}
}
search: {
type: 'PrivateEndpoint'
destination: {
serviceResourceId: aiSearch.id
subresourceTarget: 'searchService'
}
}
}
}
}
}
Project Creation¶
resource aiProject 'Microsoft.MachineLearningServices/workspaces@2024-10-01' = {
name: '${prefix}-ai-rag-project'
location: location
kind: 'Project'
identity: {
type: 'SystemAssigned'
}
properties: {
friendlyName: 'RAG Chatbot Project'
hubResourceId: aiHub.id
}
}
Linked Services¶
Every Project inherits the Hub's connections but can add its own.
| Service | Connection Type | Purpose |
|---|---|---|
| Azure OpenAI | AzureOpenAI | GPT-4o, embeddings |
| Azure AI Search | CognitiveSearch | Vector & keyword index |
| ADLS Gen2 | AzureBlobStorage | Gold layer data products |
| Key Vault | AzureKeyVault | Secrets — API keys, connection strings |
| Application Insights | ApplicationInsights | Telemetry and tracing |
Managed Network Isolation¶
Production deployments must use managed networking
For any environment handling sensitive or regulated data, set isolationMode to AllowOnlyApprovedOutbound and explicitly approve every outbound destination via outboundRules. AllowInternetOutbound is acceptable only for dev/sandbox.
Managed networking creates a managed VNet behind the Hub. All outbound traffic from compute (Prompt Flow runtimes, managed endpoints) exits through this VNet. Private Endpoints declared in outboundRules are injected automatically.
Model Catalog¶
Azure OpenAI Models¶
| Model | Use Case | Context Window | Strengths |
|---|---|---|---|
| GPT-4o | General reasoning, RAG generation, agents | 128 K | Best all-round quality/speed/cost balance |
| GPT-4o-mini | Classification, extraction, high-volume | 128 K | 10-15x cheaper than GPT-4o |
| o1 | Complex reasoning, math, code gen | 200 K | Extended thinking for hard problems |
| o3 | Scientific reasoning, planning | 200 K | Highest reasoning ceiling |
| text-embedding-3-large | Embeddings (1536-3072 dims) | 8 K | High recall for vector search |
| text-embedding-3-small | Embeddings (512-1536 dims) | 8 K | Cost-effective for large corpora |
Open-Source Models (Model-as-a-Service)¶
| Model | Publisher | Strengths | Deployment |
|---|---|---|---|
| Phi-4 | Microsoft | Small-model excellence, code, math | Serverless API |
| Llama 3.1 (8B / 70B) | Meta | Multilingual, long context | Serverless API or Managed Compute |
| Mistral Large | Mistral AI | European compliance, function calling | Serverless API |
| Cohere Command R+ | Cohere | RAG-optimised, citation grounding | Serverless API |
Deployment Options¶
flowchart TD
Q["Which deployment?"]
Q -->|"Low latency, guaranteed capacity"| PTU["Provisioned Throughput (PTU)"]
Q -->|"Variable / bursty traffic"| PAYG["Pay-As-You-Go (token)"]
Q -->|"OSS model, no Azure OpenAI"| SRV["Serverless API (MaaS)"]
Q -->|"Custom weights / LoRA"| MC["Managed Compute"]
PTU --> note_ptu["Reserve PTUs in 50-unit increments<br/>Best for production steady-state"]
PAYG --> note_payg["Billed per 1K tokens<br/>Best for dev, spiky workloads"]
SRV --> note_srv["No infra to manage<br/>Billed per 1K tokens, model-specific"]
MC --> note_mc["Deploy to dedicated GPU VMs<br/>Full control, highest cost"] Provisioned Throughput vs Pay-Per-Token¶
| Dimension | Provisioned Throughput (PTU) | Pay-As-You-Go (PAYG) |
|---|---|---|
| Billing | Monthly per PTU | Per 1 K tokens |
| Latency | Guaranteed SLA | Best-effort, may throttle |
| Capacity | Reserved, always available | Shared, subject to quota |
| Scale | Fixed (must add PTUs) | Elastic up to quota |
| Min commitment | 1 month | None |
| Best for | Production, >50 K TPM | Dev, spiky, unpredictable loads |
| Cost at high volume | Lower per-token effective cost | Higher per-token cost |
Start with PAYG, graduate to PTU
Measure your steady-state tokens-per-minute in PAYG for 2-4 weeks. If you consistently exceed 50 K TPM, a PTU reservation will be cheaper and more predictable. Use the Azure OpenAI PTU Calculator to size your reservation.
Prompt Flow¶
Prompt Flow is AI Foundry's visual + code-first orchestration engine for LLM applications. Flows are DAGs of nodes — each node is a Python function, LLM call, or tool invocation.
Flow Authoring¶
Flows can be authored in three ways:
- Visual editor — drag-and-drop in AI Foundry portal
- VS Code extension — local authoring with the Prompt Flow extension
- SDK / CLI —
pfazureCLI orpromptflowPython SDK
RAG Flow Pattern¶
The canonical RAG flow for CSA-in-a-Box follows retrieve-augment-generate:
flowchart LR
A[User Query] --> B[Embed Query<br/>text-embedding-3-large]
B --> C[Retrieve<br/>Azure AI Search<br/>Hybrid: vector + keyword]
C --> D[Rerank<br/>Semantic Ranker]
D --> E[Augment Prompt<br/>System + Context + Query]
E --> F[Generate<br/>GPT-4o]
F --> G[Response + Citations] Sample flow node — retrieve from AI Search:
from promptflow.core import tool
from azure.search.documents import SearchClient
from azure.identity import DefaultAzureCredential
@tool
def retrieve(query_embedding: list[float], top_k: int = 5) -> list[dict]:
"""Hybrid search against the Gold-layer index."""
client = SearchClient(
endpoint=os.environ["SEARCH_ENDPOINT"],
index_name="gold-data-products",
credential=DefaultAzureCredential(),
)
results = client.search(
search_text=None,
vector_queries=[{
"vector": query_embedding,
"k_nearest_neighbors": top_k,
"fields": "content_vector",
}],
select=["title", "content", "source", "chunk_id"],
top=top_k,
)
return [dict(r) for r in results]
Evaluation Flows¶
Every RAG flow should have a companion evaluation flow that scores outputs against a labelled dataset. See the Evaluations section for built-in evaluators.
CI/CD for Flows¶
# .github/workflows/prompt-flow-ci.yml (excerpt)
- name: Run evaluation
run: |
pfazure run create \
--flow ./flows/rag-chatbot \
--data ./eval/test-dataset.jsonl \
--run-name "ci-eval-${{ github.sha }}" \
--column-mapping query='${data.question}' \
--stream
- name: Check quality gate
run: |
SCORE=$(pfazure run show --name "ci-eval-${{ github.sha }}" \
--query "properties.metrics.groundedness" -o tsv)
if (( $(echo "$SCORE < 3.5" | bc -l) )); then
echo "::error::Groundedness score $SCORE is below threshold 3.5"
exit 1
fi
Connection Management¶
Connections store credentials for external services. They are scoped to a Project and encrypted at rest in Key Vault.
# Create an Azure OpenAI connection
pfazure connection create \
--name aoai-connection \
--type azure_open_ai \
--api-base "$AOAI_ENDPOINT" \
--api-type azure \
--api-version 2024-06-01
RAG with CSA-in-a-Box¶
This section bridges Tutorial 08 (RAG with Azure AI Search) and the Gold layer built in Tutorial 01. The pattern: Gold-layer data products are chunked, embedded, and indexed in Azure AI Search; RAG flows retrieve from those indexes at query time.
Indexing Gold-Layer Data Products¶
flowchart LR
GOLD[Gold Layer<br/>ADLS Gen2 / Delta] --> CHUNK[Chunker<br/>Recursive / Semantic]
CHUNK --> EMBED[Embedding Model<br/>text-embedding-3-large]
EMBED --> IDX[Azure AI Search Index<br/>vector + keyword + metadata] Chunking strategies:
| Strategy | Chunk Size | Overlap | Best For |
|---|---|---|---|
| Fixed-size | 512 tokens | 50 tokens | Uniform docs (CSV descriptions) |
| Recursive | 512-1024 tokens | 100 tokens | Structured docs (Markdown, HTML) |
| Semantic | Variable | — | Long-form reports, legal docs |
| Document-level | Full doc | — | Short docs (metadata cards) |
Embedding dimensions matter
text-embedding-3-large supports 256, 1024, 1536, or 3072 dimensions. CSA-in-a-Box defaults to 1536 for the balance of recall and storage cost. Use dimensions=1536 in your embedding call and match the index field definition.
Vector Search Configuration¶
{
"name": "gold-data-products",
"fields": [
{ "name": "chunk_id", "type": "Edm.String", "key": true },
{ "name": "title", "type": "Edm.String", "searchable": true },
{ "name": "content", "type": "Edm.String", "searchable": true },
{ "name": "source", "type": "Edm.String", "filterable": true },
{ "name": "domain", "type": "Edm.String", "filterable": true },
{
"name": "content_vector",
"type": "Collection(Edm.Single)",
"searchable": true,
"dimensions": 1536,
"vectorSearchProfile": "hnsw-profile"
}
],
"vectorSearch": {
"algorithms": [
{
"name": "hnsw-algo",
"kind": "hnsw",
"parameters": { "m": 4, "efConstruction": 400, "efSearch": 500 }
}
],
"profiles": [{ "name": "hnsw-profile", "algorithm": "hnsw-algo" }]
},
"semantic": {
"configurations": [
{
"name": "default",
"prioritizedFields": {
"contentFields": [{ "fieldName": "content" }],
"titleField": { "fieldName": "title" }
}
}
]
}
}
Hybrid Search (Keyword + Vector)¶
Hybrid search combines BM25 keyword scoring with vector similarity and optional semantic reranking. This delivers the best recall across both exact-match and semantic queries.
from azure.search.documents.models import VectorizedQuery
results = search_client.search(
search_text=user_query, # keyword leg
vector_queries=[
VectorizedQuery(
vector=query_embedding, # vector leg
k_nearest_neighbors=10,
fields="content_vector",
)
],
query_type="semantic",
semantic_configuration_name="default",
top=5,
)
AI Agents¶
AI agents combine LLM reasoning with tool calling to perform multi-step tasks autonomously. CSA-in-a-Box uses Semantic Kernel as the primary agent framework (see ADR-0017 for the RAG service-layer decision and Tutorial 07 for the hands-on walkthrough).
Semantic Kernel Integration¶
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
kernel = sk.Kernel()
kernel.add_service(AzureChatCompletion(
deployment_name="gpt-4o",
endpoint=os.environ["AOAI_ENDPOINT"],
api_key=os.environ["AOAI_KEY"], # or use DefaultAzureCredential
))
# Register a plugin that queries Gold-layer data
kernel.add_plugin(DataQueryPlugin(), plugin_name="data_query")
Multi-Agent Patterns¶
graph LR
USER((User)) --> ORCH[Orchestrator Agent]
ORCH --> DA[DataAnalyst<br/>Queries Gold layer]
ORCH --> QR[QualityReviewer<br/>Validates outputs]
ORCH --> GO[GovernanceOfficer<br/>Checks compliance]
DA --> GOLD[(Gold Layer)]
QR --> EVAL[Evaluation Metrics]
GO --> PURVIEW[Microsoft Purview] | Agent | Role | Tools | Grounding Source |
|---|---|---|---|
| DataAnalyst | Query data, generate insights | DataQueryPlugin, SQL, AI Search | Gold-layer tables and indexes |
| QualityReviewer | Validate accuracy, flag hallucination | Evaluation SDK | Ground-truth datasets |
| GovernanceOfficer | Check PII, enforce retention | Purview API, content safety | Data catalog policies |
Function Calling over CSA Data Products¶
Agents call functions that wrap CSA-in-a-Box services:
from semantic_kernel.functions import kernel_function
class DataQueryPlugin:
@kernel_function(description="Query a Gold-layer data product by SQL")
def query_gold(self, sql: str) -> str:
"""Execute a read-only query against the Gold layer."""
# Validate SQL is SELECT-only
if not sql.strip().upper().startswith("SELECT"):
return "Error: only SELECT queries are permitted."
# Execute against Databricks SQL warehouse or Synapse serverless
return execute_read_only(sql)
Evaluations¶
AI Foundry provides a built-in evaluation framework to measure RAG and generation quality. Run evaluations locally, in Prompt Flow, or via CI/CD.
Built-In Evaluators¶
| Evaluator | What It Measures | Scale | When to Use |
|---|---|---|---|
| Groundedness | Is the answer supported by retrieved context? | 1-5 | Every RAG flow |
| Relevance | Does the answer address the question? | 1-5 | Every RAG flow |
| Fluency | Is the language natural and readable? | 1-5 | User-facing outputs |
| Coherence | Is the answer logically consistent? | 1-5 | Multi-turn conversations |
| Similarity | How close is the output to a reference answer? | 1-5 | Regression testing |
| F1 Score | Token-level overlap with reference | 0-1 | Extractive QA |
Running Evaluations¶
from azure.ai.evaluation import evaluate, GroundednessEvaluator, RelevanceEvaluator
results = evaluate(
data="eval/test-dataset.jsonl",
evaluators={
"groundedness": GroundednessEvaluator(model_config=model_config),
"relevance": RelevanceEvaluator(model_config=model_config),
},
evaluator_config={
"groundedness": {
"query": "${data.question}",
"context": "${data.context}",
"response": "${data.answer}",
},
"relevance": {
"query": "${data.question}",
"response": "${data.answer}",
},
},
)
print(results.metrics)
# {'groundedness.score': 4.2, 'relevance.score': 4.5}
CI/CD Quality Gates¶
Never ship a RAG flow without evaluation gates
Set minimum thresholds for groundedness (>= 3.5) and relevance (>= 3.5) in your CI pipeline. Flows that fall below these thresholds must not be deployed to production.
| Metric | Dev Threshold | Staging Threshold | Prod Threshold |
|---|---|---|---|
| Groundedness | >= 3.0 | >= 3.5 | >= 4.0 |
| Relevance | >= 3.0 | >= 3.5 | >= 4.0 |
| Fluency | >= 3.0 | >= 3.5 | >= 3.5 |
| Coherence | >= 3.0 | >= 3.5 | >= 3.5 |
Responsible AI¶
Content Safety Filters¶
Azure AI Foundry applies content safety filters to all Azure OpenAI deployments by default. Filters cover four harm categories (hate, sexual, violence, self-harm) at configurable severity levels.
| Filter | Default Level | Customisable | Applies To |
|---|---|---|---|
| Hate & Fairness | Medium | Yes | Prompt + Completion |
| Sexual | Medium | Yes | Prompt + Completion |
| Violence | Medium | Yes | Prompt + Completion |
| Self-Harm | Medium | Yes | Prompt + Completion |
| Prompt Shields | On | Yes | Prompt only |
| Protected Material | On | Limited | Completion only |
Purview DSPM for AI¶
Microsoft Purview Data Security Posture Management (DSPM) for AI monitors interactions with generative-AI services:
- Sensitive data detection — identifies PII, PHI, and classified data in prompts and completions
- Oversharing alerts — flags when sensitive data is sent to AI endpoints
- Activity explorer — audit trail of all AI interactions
- DLP policies — block or warn when prompts contain regulated data
Prompt Shields & Jailbreak Detection¶
Prompt Shields detect two attack vectors:
- Direct attacks (jailbreaks) — user attempts to override system instructions
- Indirect attacks (XPIA) — injected instructions in retrieved documents
from azure.ai.contentsafety import ContentSafetyClient
client = ContentSafetyClient(endpoint, credential)
response = client.analyze_text({
"text": user_prompt,
"categories": ["Hate", "Violence", "Sexual", "SelfHarm"],
"blocklistNames": ["custom-blocked-terms"],
})
# Check response.categoriesAnalysis for severity scores
Cost Management¶
Token Consumption Tracking¶
Track token usage at multiple levels:
- Per-deployment — Azure Monitor metrics (
ProcessedPromptTokens,GeneratedCompletionTokens) - Per-request —
usagefield in API response (prompt_tokens,completion_tokens) - Per-project — aggregate via Application Insights custom dimensions
// KQL — daily token consumption by deployment
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where Category == "RequestResponse"
| extend promptTokens = toint(properties_s.promptTokens)
| extend completionTokens = toint(properties_s.completionTokens)
| summarize TotalPrompt=sum(promptTokens),
TotalCompletion=sum(completionTokens)
by bin(TimeGenerated, 1d), deploymentName=tostring(properties_s.modelDeploymentName)
| order by TimeGenerated desc
PTU vs PAYG Decision Tree¶
flowchart TD
A["Avg TPM over 2 weeks?"] -->|"> 50K TPM"| B["Steady or bursty?"]
A -->|"< 50K TPM"| PAYG["Stay on PAYG"]
B -->|"Steady"| PTU["Switch to PTU"]
B -->|"Bursty (10x spikes)"| SPLIT["Hybrid: PTU base + PAYG overflow"]
PTU --> SIZE["Size: avg_TPM / 6 = PTU count<br/>(1 PTU ≈ 6K TPM for GPT-4o)"] Quota Management¶
Request quota increases early
Azure OpenAI quotas are per-subscription, per-region. For production workloads, request increases at least 2 weeks before go-live via the Azure Portal (Azure OpenAI > Quotas) or by filing a support request.
Model Retirement Planning¶
Azure OpenAI models follow a published retirement schedule. CSA-in-a-Box teams should:
- Subscribe to the Azure OpenAI model deprecation page
- Maintain a
model_configabstraction (never hard-code model names) - Test new model versions in a staging Project before promoting
- Allow at least 4 weeks for migration and evaluation before retirement date
Security¶
Managed Identity for Model Access¶
Always prefer managed identity over API keys
API keys are a shared secret. Managed identity eliminates key rotation burden and reduces blast radius if a credential leaks.
// Grant the AI Project's managed identity access to Azure OpenAI
resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
scope: openAiAccount
name: guid(aiProject.id, openAiAccount.id, cognitiveServicesUserRole)
properties: {
principalId: aiProject.identity.principalId
roleDefinitionId: cognitiveServicesUserRole // Cognitive Services OpenAI User
principalType: 'ServicePrincipal'
}
}
Private Endpoints¶
All AI Foundry services should be reachable only over Private Endpoints in production. The Hub's managed network handles this when isolationMode is set to AllowOnlyApprovedOutbound.
| Service | Private Endpoint Sub-Resource | Required |
|---|---|---|
| Azure OpenAI | account | Yes |
| Azure AI Search | searchService | Yes |
| ADLS Gen2 | blob, dfs | Yes |
| Key Vault | vault | Yes |
| Container Registry | registry | Yes |
RBAC Roles¶
| Role | Scope | Grants |
|---|---|---|
| Azure AI Developer | Project | Create deployments, run flows, view evaluations |
| Azure AI Inference Deployment Operator | Project | Deploy and manage model endpoints |
| Cognitive Services OpenAI User | OpenAI Account | Call model endpoints (chat, embeddings) |
| Cognitive Services OpenAI Contributor | OpenAI Account | Create deployments, manage fine-tunes |
| Search Index Data Contributor | AI Search | Read/write index data |
| Reader | Hub | View Hub resources (no modifications) |
API Key Rotation¶
If API keys must be used (non-production, external integrations):
- Store keys in Key Vault, never in code or environment variables
- Rotate keys every 90 days
- Use Key Vault's expiration and notification policies
- After rotation, update all connections in AI Foundry via
pfazure connection update
CSA-in-a-Box Integration Patterns¶
Enrichment Pipeline — AI-Powered Bronze-to-Silver¶
Use AI models to extract structured data from unstructured Bronze-layer documents.
flowchart LR
BRONZE[Bronze Layer<br/>Raw PDFs, emails, images] --> DOC[Document Intelligence<br/>Layout extraction]
DOC --> LLM[GPT-4o-mini<br/>Entity extraction + classification]
LLM --> SILVER[Silver Layer<br/>Structured tables in Delta]
SILVER --> GOLD[Gold Layer<br/>Curated data products] Example — extract entities from regulatory filings:
from azure.ai.documentintelligence import DocumentIntelligenceClient
client = DocumentIntelligenceClient(endpoint, credential)
poller = client.begin_analyze_document(
"prebuilt-layout",
body={"urlSource": bronze_blob_url},
)
result = poller.result()
# Pass extracted text to GPT-4o-mini for entity extraction
entities = extract_entities(result.content, model="gpt-4o-mini")
write_to_silver(entities, table="regulatory_entities")
Classification Pipeline¶
Classify incoming documents or records into domain categories using few-shot prompting:
system_prompt = """Classify the document into exactly one category:
- financial_report
- regulatory_filing
- correspondence
- technical_specification
Respond with JSON: {"category": "...", "confidence": 0.0-1.0}"""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": document_text[:4000]},
],
response_format={"type": "json_object"},
temperature=0,
)
Summarization Pipeline¶
Generate concise summaries of Gold-layer data products for the data catalog:
summary = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": (
"Summarize the dataset in 2-3 sentences for a data catalog. "
"Include: what the data represents, time range, key fields, "
"and number of records."
)},
{"role": "user", "content": dataset_description},
],
max_tokens=200,
temperature=0.3,
)
Anti-Patterns¶
| Anti-Pattern | Why It Fails | Do This Instead |
|---|---|---|
| Hard-coding model names | Model retirements break production | Use a model_config mapping, abstracted behind env vars |
| Skipping evaluations | Silent quality degradation | Run evaluation flows in CI; enforce quality gates |
| Single large index | Slow queries, poor relevance | Create domain-specific indexes (one per data product family) |
| API keys in code | Security incident waiting to happen | Managed identity + Key Vault references |
| No chunking strategy | Retrieval quality is random | Match chunk size to content type; test with eval datasets |
| Ignoring content safety | Compliance and reputational risk | Keep default filters; add custom blocklists for domain terms |
| PTU over-provisioning | Wasted spend | Start with PAYG, size PTUs from 2+ weeks of usage data |
| Monolithic Prompt Flow | Untestable, hard to debug | Small flows with single-responsibility nodes; compose via subflows |
Pre-Flight Checklist¶
Before deploying an AI workload to production:
- Hub uses
AllowOnlyApprovedOutboundisolation mode - All services reachable via Private Endpoints only
- Managed identity used for all service-to-service auth (no API keys)
- RBAC roles assigned at minimum scope (Project, not Subscription)
- Content safety filters configured and tested
- Evaluation flow exists with quality gates in CI/CD
- Groundedness >= 4.0 and Relevance >= 4.0 on production eval dataset
- Token consumption alerts configured in Azure Monitor
- Model retirement dates tracked; migration plan documented
- Prompt Shields enabled for all user-facing endpoints
- PII detection enabled via Purview DSPM for AI
- Disaster recovery: Hub and Projects exist in paired region (or redeployable via IaC)
- Data products indexed with appropriate chunking strategy and tested recall
Related Resources¶
CSA-in-a-Box Tutorials¶
- Tutorial 06 — AI-First Analytics with Azure AI Foundry
- Tutorial 07 — Building AI Agents with Semantic Kernel
- Tutorial 08 — RAG with Azure AI Search
- Tutorial 09 — GraphRAG Knowledge Graphs
CSA-in-a-Box Architecture Decisions¶
- ADR-0007 — Azure OpenAI over Self-Hosted LLM
- ADR-0017 — RAG Service-Layer Extraction
- RAG vs Fine-Tuning vs Agents