Azure AI Foundry — Comprehensive Guide¶

Architecture, deployment, and operational guide for Azure AI Foundry within the CSA-in-a-Box analytics platform.

Azure AI Foundry (formerly Azure AI Studio) is Microsoft's unified platform for building, evaluating, and deploying generative-AI applications. It consolidates model access, prompt engineering, evaluation, and responsible-AI tooling into a single control plane organised around Hubs and Projects. This guide covers everything a CSA-in-a-Box team needs to stand up AI workloads on top of their existing Data Landing Zone.

Architecture Overview¶

graph TB
    subgraph Hub["AI Foundry Hub"]
        direction TB
        SHARED[Shared Resources<br/>Key Vault · Storage · ACR]
        NET[Managed Network<br/>Private Endpoints]
    end

    subgraph Projects["Projects"]
        P1[Project — RAG Chatbot]
        P2[Project — Doc Intel Pipeline]
        P3[Project — AI Agents]
    end

    subgraph Models["Model Catalog"]
        AOAI[Azure OpenAI<br/>GPT-4o · GPT-4o-mini · o1 · o3]
        OSS[Open Source<br/>Phi-4 · Llama 3 · Mistral]
    end

    subgraph Integration["CSA-in-a-Box Integration"]
        GOLD[Gold Layer<br/>ADLS Gen2 / Delta]
        SEARCH[Azure AI Search<br/>Vector + Keyword]
        PIPE[Enrichment Pipelines<br/>Bronze → Silver → Gold]
        AGENTS[AI Agents<br/>Semantic Kernel]
    end

    subgraph Runtime["Runtime"]
        PF[Prompt Flow]
        EVAL[Evaluations]
        DEPLOY[Deployments<br/>Managed Endpoint / Serverless]
    end

    Hub --> Projects
    P1 & P2 & P3 --> Models
    P1 --> PF --> DEPLOY
    P1 --> EVAL
    P1 --> SEARCH --> GOLD
    P3 --> AGENTS --> GOLD
    P2 --> PIPE
    SHARED -. secrets .-> P1 & P2 & P3
    NET -. isolation .-> SEARCH & GOLD

    style Hub fill:#e3f2fd,stroke:#1565c0
    style Projects fill:#e8f5e9,stroke:#2e7d32
    style Models fill:#fff3e0,stroke:#e65100
    style Integration fill:#f3e5f5,stroke:#6a1b9a
    style Runtime fill:#fce4ec,stroke:#c62828

Key relationships:

A Hub owns shared infrastructure — Key Vault, Storage, Container Registry, and managed networking. One Hub serves many Projects.
A Project is the working boundary for a team or workload. It has its own connections, compute, and deployments.
CSA-in-a-Box's Gold layer feeds AI Search indexes, which ground RAG flows. Enrichment pipelines use model endpoints to move data from Bronze to Silver.

Hub & Project Setup¶

Hub Creation (Bicep)¶

@description('AI Foundry Hub — shared control plane for all AI projects')
resource aiHub 'Microsoft.MachineLearningServices/workspaces@2024-10-01' = {
  name: '${prefix}-ai-hub'
  location: location
  kind: 'Hub'
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    friendlyName: 'CSA AI Foundry Hub'
    description: 'Shared hub for CSA-in-a-Box AI workloads'
    keyVaultId: keyVault.id
    storageAccountId: storageAccount.id
    containerRegistryId: containerRegistry.id
    managedNetwork: {
      isolationMode: 'AllowInternetOutbound'   // or 'AllowOnlyApprovedOutbound'
      outboundRules: {
        aoai: {
          type: 'PrivateEndpoint'
          destination: {
            serviceResourceId: openAiAccount.id
            subresourceTarget: 'account'
          }
        }
        search: {
          type: 'PrivateEndpoint'
          destination: {
            serviceResourceId: aiSearch.id
            subresourceTarget: 'searchService'
          }
        }
      }
    }
  }
}

Project Creation¶

resource aiProject 'Microsoft.MachineLearningServices/workspaces@2024-10-01' = {
  name: '${prefix}-ai-rag-project'
  location: location
  kind: 'Project'
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    friendlyName: 'RAG Chatbot Project'
    hubResourceId: aiHub.id
  }
}

Linked Services¶

Every Project inherits the Hub's connections but can add its own.

Service	Connection Type	Purpose
Azure OpenAI	`AzureOpenAI`	GPT-4o, embeddings
Azure AI Search	`CognitiveSearch`	Vector & keyword index
ADLS Gen2	`AzureBlobStorage`	Gold layer data products
Key Vault	`AzureKeyVault`	Secrets — API keys, connection strings
Application Insights	`ApplicationInsights`	Telemetry and tracing

Managed Network Isolation¶

Production deployments must use managed networking

For any environment handling sensitive or regulated data, set isolationMode to AllowOnlyApprovedOutbound and explicitly approve every outbound destination via outboundRules. AllowInternetOutbound is acceptable only for dev/sandbox.

Managed networking creates a managed VNet behind the Hub. All outbound traffic from compute (Prompt Flow runtimes, managed endpoints) exits through this VNet. Private Endpoints declared in outboundRules are injected automatically.

Model Catalog¶

Azure OpenAI Models¶

Model	Use Case	Context Window	Strengths
GPT-4o	General reasoning, RAG generation, agents	128 K	Best all-round quality/speed/cost balance
GPT-4o-mini	Classification, extraction, high-volume	128 K	10-15x cheaper than GPT-4o
o1	Complex reasoning, math, code gen	200 K	Extended thinking for hard problems
o3	Scientific reasoning, planning	200 K	Highest reasoning ceiling
text-embedding-3-large	Embeddings (1536-3072 dims)	8 K	High recall for vector search
text-embedding-3-small	Embeddings (512-1536 dims)	8 K	Cost-effective for large corpora

Open-Source Models (Model-as-a-Service)¶

Model	Publisher	Strengths	Deployment
Phi-4	Microsoft	Small-model excellence, code, math	Serverless API
Llama 3.1 (8B / 70B)	Meta	Multilingual, long context	Serverless API or Managed Compute
Mistral Large	Mistral AI	European compliance, function calling	Serverless API
Cohere Command R+	Cohere	RAG-optimised, citation grounding	Serverless API

Deployment Options¶

flowchart TD
    Q["Which deployment?"]
    Q -->|"Low latency, guaranteed capacity"| PTU["Provisioned Throughput (PTU)"]
    Q -->|"Variable / bursty traffic"| PAYG["Pay-As-You-Go (token)"]
    Q -->|"OSS model, no Azure OpenAI"| SRV["Serverless API (MaaS)"]
    Q -->|"Custom weights / LoRA"| MC["Managed Compute"]

    PTU --> note_ptu["Reserve PTUs in 50-unit increments<br/>Best for production steady-state"]
    PAYG --> note_payg["Billed per 1K tokens<br/>Best for dev, spiky workloads"]
    SRV --> note_srv["No infra to manage<br/>Billed per 1K tokens, model-specific"]
    MC --> note_mc["Deploy to dedicated GPU VMs<br/>Full control, highest cost"]

Provisioned Throughput vs Pay-Per-Token¶

Dimension	Provisioned Throughput (PTU)	Pay-As-You-Go (PAYG)
Billing	Monthly per PTU	Per 1 K tokens
Latency	Guaranteed SLA	Best-effort, may throttle
Capacity	Reserved, always available	Shared, subject to quota
Scale	Fixed (must add PTUs)	Elastic up to quota
Min commitment	1 month	None
Best for	Production, >50 K TPM	Dev, spiky, unpredictable loads
Cost at high volume	Lower per-token effective cost	Higher per-token cost

Start with PAYG, graduate to PTU

Measure your steady-state tokens-per-minute in PAYG for 2-4 weeks. If you consistently exceed 50 K TPM, a PTU reservation will be cheaper and more predictable. Use the Azure OpenAI PTU Calculator to size your reservation.

Prompt Flow¶

Prompt Flow is AI Foundry's visual + code-first orchestration engine for LLM applications. Flows are DAGs of nodes — each node is a Python function, LLM call, or tool invocation.

Flow Authoring¶

Flows can be authored in three ways:

Visual editor — drag-and-drop in AI Foundry portal
VS Code extension — local authoring with the Prompt Flow extension
SDK / CLI — pfazure CLI or promptflow Python SDK

RAG Flow Pattern¶

The canonical RAG flow for CSA-in-a-Box follows retrieve-augment-generate:

flowchart LR
    A[User Query] --> B[Embed Query<br/>text-embedding-3-large]
    B --> C[Retrieve<br/>Azure AI Search<br/>Hybrid: vector + keyword]
    C --> D[Rerank<br/>Semantic Ranker]
    D --> E[Augment Prompt<br/>System + Context + Query]
    E --> F[Generate<br/>GPT-4o]
    F --> G[Response + Citations]

Sample flow node — retrieve from AI Search:

from promptflow.core import tool
from azure.search.documents import SearchClient
from azure.identity import DefaultAzureCredential

@tool
def retrieve(query_embedding: list[float], top_k: int = 5) -> list[dict]:
    """Hybrid search against the Gold-layer index."""
    client = SearchClient(
        endpoint=os.environ["SEARCH_ENDPOINT"],
        index_name="gold-data-products",
        credential=DefaultAzureCredential(),
    )
    results = client.search(
        search_text=None,
        vector_queries=[{
            "vector": query_embedding,
            "k_nearest_neighbors": top_k,
            "fields": "content_vector",
        }],
        select=["title", "content", "source", "chunk_id"],
        top=top_k,
    )
    return [dict(r) for r in results]

Evaluation Flows¶

Every RAG flow should have a companion evaluation flow that scores outputs against a labelled dataset. See the Evaluations section for built-in evaluators.

CI/CD for Flows¶

# .github/workflows/prompt-flow-ci.yml (excerpt)
- name: Run evaluation
  run: |
      pfazure run create \
        --flow ./flows/rag-chatbot \
        --data ./eval/test-dataset.jsonl \
        --run-name "ci-eval-${{ github.sha }}" \
        --column-mapping query='${data.question}' \
        --stream
- name: Check quality gate
  run: |
      SCORE=$(pfazure run show --name "ci-eval-${{ github.sha }}" \
        --query "properties.metrics.groundedness" -o tsv)
      if (( $(echo "$SCORE < 3.5" | bc -l) )); then
        echo "::error::Groundedness score $SCORE is below threshold 3.5"
        exit 1
      fi

Connection Management¶

Connections store credentials for external services. They are scoped to a Project and encrypted at rest in Key Vault.

# Create an Azure OpenAI connection
pfazure connection create \
  --name aoai-connection \
  --type azure_open_ai \
  --api-base "$AOAI_ENDPOINT" \
  --api-type azure \
  --api-version 2024-06-01

RAG with CSA-in-a-Box¶

This section bridges Tutorial 08 (RAG with Azure AI Search) and the Gold layer built in Tutorial 01. The pattern: Gold-layer data products are chunked, embedded, and indexed in Azure AI Search; RAG flows retrieve from those indexes at query time.

Indexing Gold-Layer Data Products¶

flowchart LR
    GOLD[Gold Layer<br/>ADLS Gen2 / Delta] --> CHUNK[Chunker<br/>Recursive / Semantic]
    CHUNK --> EMBED[Embedding Model<br/>text-embedding-3-large]
    EMBED --> IDX[Azure AI Search Index<br/>vector + keyword + metadata]

Chunking strategies:

Strategy	Chunk Size	Overlap	Best For
Fixed-size	512 tokens	50 tokens	Uniform docs (CSV descriptions)
Recursive	512-1024 tokens	100 tokens	Structured docs (Markdown, HTML)
Semantic	Variable	—	Long-form reports, legal docs
Document-level	Full doc	—	Short docs (metadata cards)

Embedding dimensions matter

text-embedding-3-large supports 256, 1024, 1536, or 3072 dimensions. CSA-in-a-Box defaults to 1536 for the balance of recall and storage cost. Use dimensions=1536 in your embedding call and match the index field definition.

Vector Search Configuration¶

{
    "name": "gold-data-products",
    "fields": [
        { "name": "chunk_id", "type": "Edm.String", "key": true },
        { "name": "title", "type": "Edm.String", "searchable": true },
        { "name": "content", "type": "Edm.String", "searchable": true },
        { "name": "source", "type": "Edm.String", "filterable": true },
        { "name": "domain", "type": "Edm.String", "filterable": true },
        {
            "name": "content_vector",
            "type": "Collection(Edm.Single)",
            "searchable": true,
            "dimensions": 1536,
            "vectorSearchProfile": "hnsw-profile"
        }
    ],
    "vectorSearch": {
        "algorithms": [
            {
                "name": "hnsw-algo",
                "kind": "hnsw",
                "parameters": { "m": 4, "efConstruction": 400, "efSearch": 500 }
            }
        ],
        "profiles": [{ "name": "hnsw-profile", "algorithm": "hnsw-algo" }]
    },
    "semantic": {
        "configurations": [
            {
                "name": "default",
                "prioritizedFields": {
                    "contentFields": [{ "fieldName": "content" }],
                    "titleField": { "fieldName": "title" }
                }
            }
        ]
    }
}

Hybrid Search (Keyword + Vector)¶

Hybrid search combines BM25 keyword scoring with vector similarity and optional semantic reranking. This delivers the best recall across both exact-match and semantic queries.

from azure.search.documents.models import VectorizedQuery

results = search_client.search(
    search_text=user_query,           # keyword leg
    vector_queries=[
        VectorizedQuery(
            vector=query_embedding,   # vector leg
            k_nearest_neighbors=10,
            fields="content_vector",
        )
    ],
    query_type="semantic",
    semantic_configuration_name="default",
    top=5,
)

AI Agents¶

AI agents combine LLM reasoning with tool calling to perform multi-step tasks autonomously. CSA-in-a-Box uses Semantic Kernel as the primary agent framework (see ADR-0017 for the RAG service-layer decision and Tutorial 07 for the hands-on walkthrough).

Semantic Kernel Integration¶

import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

kernel = sk.Kernel()
kernel.add_service(AzureChatCompletion(
    deployment_name="gpt-4o",
    endpoint=os.environ["AOAI_ENDPOINT"],
    api_key=os.environ["AOAI_KEY"],       # or use DefaultAzureCredential
))

# Register a plugin that queries Gold-layer data
kernel.add_plugin(DataQueryPlugin(), plugin_name="data_query")

Multi-Agent Patterns¶

graph LR
    USER((User)) --> ORCH[Orchestrator Agent]
    ORCH --> DA[DataAnalyst<br/>Queries Gold layer]
    ORCH --> QR[QualityReviewer<br/>Validates outputs]
    ORCH --> GO[GovernanceOfficer<br/>Checks compliance]
    DA --> GOLD[(Gold Layer)]
    QR --> EVAL[Evaluation Metrics]
    GO --> PURVIEW[Microsoft Purview]

Agent	Role	Tools	Grounding Source
DataAnalyst	Query data, generate insights	`DataQueryPlugin`, SQL, AI Search	Gold-layer tables and indexes
QualityReviewer	Validate accuracy, flag hallucination	Evaluation SDK	Ground-truth datasets
GovernanceOfficer	Check PII, enforce retention	Purview API, content safety	Data catalog policies

Function Calling over CSA Data Products¶

Agents call functions that wrap CSA-in-a-Box services:

from semantic_kernel.functions import kernel_function

class DataQueryPlugin:
    @kernel_function(description="Query a Gold-layer data product by SQL")
    def query_gold(self, sql: str) -> str:
        """Execute a read-only query against the Gold layer."""
        # Validate SQL is SELECT-only
        if not sql.strip().upper().startswith("SELECT"):
            return "Error: only SELECT queries are permitted."
        # Execute against Databricks SQL warehouse or Synapse serverless
        return execute_read_only(sql)

Evaluations¶

AI Foundry provides a built-in evaluation framework to measure RAG and generation quality. Run evaluations locally, in Prompt Flow, or via CI/CD.

Built-In Evaluators¶

Evaluator	What It Measures	Scale	When to Use
Groundedness	Is the answer supported by retrieved context?	1-5	Every RAG flow
Relevance	Does the answer address the question?	1-5	Every RAG flow
Fluency	Is the language natural and readable?	1-5	User-facing outputs
Coherence	Is the answer logically consistent?	1-5	Multi-turn conversations
Similarity	How close is the output to a reference answer?	1-5	Regression testing
F1 Score	Token-level overlap with reference	0-1	Extractive QA

Running Evaluations¶

from azure.ai.evaluation import evaluate, GroundednessEvaluator, RelevanceEvaluator

results = evaluate(
    data="eval/test-dataset.jsonl",
    evaluators={
        "groundedness": GroundednessEvaluator(model_config=model_config),
        "relevance": RelevanceEvaluator(model_config=model_config),
    },
    evaluator_config={
        "groundedness": {
            "query": "${data.question}",
            "context": "${data.context}",
            "response": "${data.answer}",
        },
        "relevance": {
            "query": "${data.question}",
            "response": "${data.answer}",
        },
    },
)
print(results.metrics)
# {'groundedness.score': 4.2, 'relevance.score': 4.5}

CI/CD Quality Gates¶

Never ship a RAG flow without evaluation gates

Set minimum thresholds for groundedness (>= 3.5) and relevance (>= 3.5) in your CI pipeline. Flows that fall below these thresholds must not be deployed to production.

Metric	Dev Threshold	Staging Threshold	Prod Threshold
Groundedness	>= 3.0	>= 3.5	>= 4.0
Relevance	>= 3.0	>= 3.5	>= 4.0
Fluency	>= 3.0	>= 3.5	>= 3.5
Coherence	>= 3.0	>= 3.5	>= 3.5

Responsible AI¶

Content Safety Filters¶

Azure AI Foundry applies content safety filters to all Azure OpenAI deployments by default. Filters cover four harm categories (hate, sexual, violence, self-harm) at configurable severity levels.

Filter	Default Level	Customisable	Applies To
Hate & Fairness	Medium	Yes	Prompt + Completion
Sexual	Medium	Yes	Prompt + Completion
Violence	Medium	Yes	Prompt + Completion
Self-Harm	Medium	Yes	Prompt + Completion
Prompt Shields	On	Yes	Prompt only
Protected Material	On	Limited	Completion only

Purview DSPM for AI¶

Microsoft Purview Data Security Posture Management (DSPM) for AI monitors interactions with generative-AI services:

Sensitive data detection — identifies PII, PHI, and classified data in prompts and completions
Oversharing alerts — flags when sensitive data is sent to AI endpoints
Activity explorer — audit trail of all AI interactions
DLP policies — block or warn when prompts contain regulated data

Prompt Shields & Jailbreak Detection¶

Prompt Shields detect two attack vectors:

Direct attacks (jailbreaks) — user attempts to override system instructions
Indirect attacks (XPIA) — injected instructions in retrieved documents

from azure.ai.contentsafety import ContentSafetyClient

client = ContentSafetyClient(endpoint, credential)
response = client.analyze_text({
    "text": user_prompt,
    "categories": ["Hate", "Violence", "Sexual", "SelfHarm"],
    "blocklistNames": ["custom-blocked-terms"],
})
# Check response.categoriesAnalysis for severity scores

Cost Management¶

Token Consumption Tracking¶

Track token usage at multiple levels:

Per-deployment — Azure Monitor metrics (ProcessedPromptTokens, GeneratedCompletionTokens)
Per-request — usage field in API response (prompt_tokens, completion_tokens)
Per-project — aggregate via Application Insights custom dimensions

// KQL — daily token consumption by deployment
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where Category == "RequestResponse"
| extend promptTokens = toint(properties_s.promptTokens)
| extend completionTokens = toint(properties_s.completionTokens)
| summarize TotalPrompt=sum(promptTokens),
            TotalCompletion=sum(completionTokens)
    by bin(TimeGenerated, 1d), deploymentName=tostring(properties_s.modelDeploymentName)
| order by TimeGenerated desc

PTU vs PAYG Decision Tree¶

flowchart TD
    A["Avg TPM over 2 weeks?"] -->|"> 50K TPM"| B["Steady or bursty?"]
    A -->|"< 50K TPM"| PAYG["Stay on PAYG"]
    B -->|"Steady"| PTU["Switch to PTU"]
    B -->|"Bursty (10x spikes)"| SPLIT["Hybrid: PTU base + PAYG overflow"]
    PTU --> SIZE["Size: avg_TPM / 6 = PTU count<br/>(1 PTU ≈ 6K TPM for GPT-4o)"]

Quota Management¶

Request quota increases early

Azure OpenAI quotas are per-subscription, per-region. For production workloads, request increases at least 2 weeks before go-live via the Azure Portal (Azure OpenAI > Quotas) or by filing a support request.

Model Retirement Planning¶

Azure OpenAI models follow a published retirement schedule. CSA-in-a-Box teams should:

Subscribe to the Azure OpenAI model deprecation page
Maintain a model_config abstraction (never hard-code model names)
Test new model versions in a staging Project before promoting
Allow at least 4 weeks for migration and evaluation before retirement date

Security¶

Managed Identity for Model Access¶

Always prefer managed identity over API keys

API keys are a shared secret. Managed identity eliminates key rotation burden and reduces blast radius if a credential leaks.

// Grant the AI Project's managed identity access to Azure OpenAI
resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope: openAiAccount
  name: guid(aiProject.id, openAiAccount.id, cognitiveServicesUserRole)
  properties: {
    principalId: aiProject.identity.principalId
    roleDefinitionId: cognitiveServicesUserRole   // Cognitive Services OpenAI User
    principalType: 'ServicePrincipal'
  }
}

Private Endpoints¶

All AI Foundry services should be reachable only over Private Endpoints in production. The Hub's managed network handles this when isolationMode is set to AllowOnlyApprovedOutbound.

Service	Private Endpoint Sub-Resource	Required
Azure OpenAI	`account`	Yes
Azure AI Search	`searchService`	Yes
ADLS Gen2	`blob`, `dfs`	Yes
Key Vault	`vault`	Yes
Container Registry	`registry`	Yes

RBAC Roles¶

Role	Scope	Grants
Azure AI Developer	Project	Create deployments, run flows, view evaluations
Azure AI Inference Deployment Operator	Project	Deploy and manage model endpoints
Cognitive Services OpenAI User	OpenAI Account	Call model endpoints (chat, embeddings)
Cognitive Services OpenAI Contributor	OpenAI Account	Create deployments, manage fine-tunes
Search Index Data Contributor	AI Search	Read/write index data
Reader	Hub	View Hub resources (no modifications)

API Key Rotation¶

If API keys must be used (non-production, external integrations):

Store keys in Key Vault, never in code or environment variables
Rotate keys every 90 days
Use Key Vault's expiration and notification policies
After rotation, update all connections in AI Foundry via pfazure connection update

CSA-in-a-Box Integration Patterns¶

Enrichment Pipeline — AI-Powered Bronze-to-Silver¶

Use AI models to extract structured data from unstructured Bronze-layer documents.

flowchart LR
    BRONZE[Bronze Layer<br/>Raw PDFs, emails, images] --> DOC[Document Intelligence<br/>Layout extraction]
    DOC --> LLM[GPT-4o-mini<br/>Entity extraction + classification]
    LLM --> SILVER[Silver Layer<br/>Structured tables in Delta]
    SILVER --> GOLD[Gold Layer<br/>Curated data products]

Example — extract entities from regulatory filings:

from azure.ai.documentintelligence import DocumentIntelligenceClient

client = DocumentIntelligenceClient(endpoint, credential)
poller = client.begin_analyze_document(
    "prebuilt-layout",
    body={"urlSource": bronze_blob_url},
)
result = poller.result()

# Pass extracted text to GPT-4o-mini for entity extraction
entities = extract_entities(result.content, model="gpt-4o-mini")
write_to_silver(entities, table="regulatory_entities")

Classification Pipeline¶

Classify incoming documents or records into domain categories using few-shot prompting:

system_prompt = """Classify the document into exactly one category:
- financial_report
- regulatory_filing
- correspondence
- technical_specification
Respond with JSON: {"category": "...", "confidence": 0.0-1.0}"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": document_text[:4000]},
    ],
    response_format={"type": "json_object"},
    temperature=0,
)

Summarization Pipeline¶

Generate concise summaries of Gold-layer data products for the data catalog:

summary = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": (
            "Summarize the dataset in 2-3 sentences for a data catalog. "
            "Include: what the data represents, time range, key fields, "
            "and number of records."
        )},
        {"role": "user", "content": dataset_description},
    ],
    max_tokens=200,
    temperature=0.3,
)

Anti-Patterns¶

Anti-Pattern	Why It Fails	Do This Instead
Hard-coding model names	Model retirements break production	Use a `model_config` mapping, abstracted behind env vars
Skipping evaluations	Silent quality degradation	Run evaluation flows in CI; enforce quality gates
Single large index	Slow queries, poor relevance	Create domain-specific indexes (one per data product family)
API keys in code	Security incident waiting to happen	Managed identity + Key Vault references
No chunking strategy	Retrieval quality is random	Match chunk size to content type; test with eval datasets
Ignoring content safety	Compliance and reputational risk	Keep default filters; add custom blocklists for domain terms
PTU over-provisioning	Wasted spend	Start with PAYG, size PTUs from 2+ weeks of usage data
Monolithic Prompt Flow	Untestable, hard to debug	Small flows with single-responsibility nodes; compose via subflows

Azure AI Foundry — Comprehensive Guide¶

Architecture Overview¶

Hub & Project Setup¶

Hub Creation (Bicep)¶

Project Creation¶

Linked Services¶

Managed Network Isolation¶

Model Catalog¶

Azure OpenAI Models¶

Open-Source Models (Model-as-a-Service)¶

Deployment Options¶

Provisioned Throughput vs Pay-Per-Token¶

Prompt Flow¶

Flow Authoring¶

RAG Flow Pattern¶

Evaluation Flows¶

CI/CD for Flows¶

Connection Management¶

RAG with CSA-in-a-Box¶

Indexing Gold-Layer Data Products¶

Vector Search Configuration¶

Hybrid Search (Keyword + Vector)¶

AI Agents¶

Semantic Kernel Integration¶

Multi-Agent Patterns¶

Function Calling over CSA Data Products¶

Evaluations¶

Built-In Evaluators¶

Running Evaluations¶

CI/CD Quality Gates¶

Responsible AI¶

Content Safety Filters¶

Purview DSPM for AI¶

Prompt Shields & Jailbreak Detection¶

Cost Management¶

Token Consumption Tracking¶

PTU vs PAYG Decision Tree¶

Quota Management¶

Model Retirement Planning¶

Security¶

Managed Identity for Model Access¶

Private Endpoints¶

RBAC Roles¶

API Key Rotation¶

CSA-in-a-Box Integration Patterns¶

Enrichment Pipeline — AI-Powered Bronze-to-Silver¶

Classification Pipeline¶

Summarization Pipeline¶

Anti-Patterns¶

Pre-Flight Checklist¶

Related Resources¶

CSA-in-a-Box Tutorials¶

CSA-in-a-Box Architecture Decisions¶

Microsoft Documentation¶