Azure AI Search¶
Overview¶
Azure AI Search (formerly Azure Cognitive Search) is the vector and hybrid search engine that powers every Retrieval-Augmented Generation (RAG) pattern in CSA-in-a-Box. It sits between the curated Gold data products in the medallion lakehouse and the Azure OpenAI models that generate grounded, citation-backed responses for end users, agents, and copilots.
In the CSA-in-a-Box architecture, AI Search serves three primary roles:
| Role | What It Provides |
|---|---|
| Vector store | High-dimensional embedding index for semantic similarity search over Gold data products |
| Hybrid search engine | Combined keyword, vector, and semantic ranking in a single query — maximizing recall and precision |
| AI enrichment pipeline | Built-in and custom skillsets that extract text (OCR), key phrases, entities, and embeddings during indexing |
Unlike a standalone vector database, AI Search delivers full-text search, faceted navigation, geo-spatial queries, and L2 semantic reranking alongside vector similarity — making it the natural choice when RAG workloads coexist with traditional search experiences.
Related Resources
| Resource | Purpose |
|---|---|
| Tutorial 08 — RAG with Azure AI Search | Hands-on lab: build an end-to-end RAG pipeline |
| Tutorial 06 — AI Analytics Foundry | Azure OpenAI provisioning and embedding model setup |
| Tutorial 07 — AI Agents with Semantic Kernel | Semantic Kernel retrieval plugin for AI agents |
| Tutorial 09 — GraphRAG Knowledge | Graph-based RAG for complex knowledge extraction |
| RAG vs Fine-tune vs Agents | Decision tree for choosing the right AI pattern |
| Azure OpenAI ADR | Why CSA-in-a-Box standardizes on Azure OpenAI |
| Security & Compliance | Network isolation and Zero Trust patterns |
Architecture¶
The following diagram shows how Azure AI Search fits into the CSA-in-a-Box data and AI pipeline — from Gold data products through indexing and enrichment to RAG-powered responses.
graph LR
subgraph DataLayer["Gold Data Products"]
ADLS["ADLS Gen2<br/>Delta tables"]
SQL["Azure SQL / Synapse"]
COSMOS["Cosmos DB"]
end
subgraph Enrichment["AI Enrichment Pipeline"]
IDX["Indexer<br/>(pull model)"]
SKILL["Skillset<br/>OCR · Key Phrase<br/>Entity · Embedding"]
end
subgraph SearchService["Azure AI Search"]
INDEX["Search Index<br/>text + vector fields"]
SEM["Semantic Ranker<br/>L2 reranking"]
end
subgraph RAG["RAG Pipeline"]
API["Search API<br/>hybrid query"]
AOAI["Azure OpenAI<br/>gpt-4o / gpt-5.4"]
end
USER((User Query)) --> API
API -->|keyword + vector<br/>+ semantic| INDEX
INDEX --> SEM
SEM -->|ranked results<br/>+ captions| API
API -->|context injection| AOAI
AOAI -->|grounded response<br/>with citations| USER
ADLS --> IDX
SQL --> IDX
COSMOS --> IDX
IDX --> SKILL
SKILL --> INDEX
style DataLayer fill:#e8f5e9,stroke:#2e7d32
style Enrichment fill:#fff3e0,stroke:#e65100
style SearchService fill:#e3f2fd,stroke:#1565c0
style RAG fill:#f3e5f5,stroke:#7b1fa2 Index Design¶
A well-designed index is the foundation of every search and RAG workload. CSA-in-a-Box indexes follow a consistent pattern that separates content fields, metadata fields, and vector fields.
Field Types¶
| EDM Type | Purpose | Example |
|---|---|---|
Edm.String | Full-text searchable content | content, title, summary |
Collection(Edm.Single) | Dense vector embedding | content_vector (1536 or 3072 dims) |
Edm.Int32 / Edm.Int64 | Numeric filters and facets | year, page_number |
Edm.DateTimeOffset | Temporal filters | published_date, last_modified |
Edm.Boolean | Binary filters | is_public, is_active |
Edm.GeographyPoint | Geo-spatial search | location |
Collection(Edm.String) | Multi-value filters and facets | tags, categories |
Analyzers¶
Analyzers control how text is tokenized and normalized at index and query time.
| Analyzer | When to Use |
|---|---|
standard.lucene | Default for general English content — tokenizes, lowercases, removes stop words |
keyword | Exact-match fields like IDs, codes, or status values — no tokenization |
en.microsoft | English content requiring stemming and lemmatization |
en.lucene | Lighter English analysis — good balance of precision and recall |
| Custom analyzer | Domain-specific tokenization (e.g., hyphenated part numbers, legal citations) |
Index vs Search Analyzer
You can set different analyzers for indexing and querying. Use a broader analyzer at index time (e.g., en.microsoft with stemming) and a stricter analyzer at query time to reduce false positives for exact-match requirements.
Scoring Profiles¶
Scoring profiles let you boost relevance based on field weights, freshness, distance, or tag matching — without changing the query itself.
{
"scoringProfiles": [
{
"name": "recency-boosted",
"text": {
"weights": {
"title": 3.0,
"summary": 2.0,
"content": 1.0
}
},
"functions": [
{
"type": "freshness",
"fieldName": "published_date",
"boost": 2.0,
"parameters": {
"boostingDuration": "P365D"
},
"interpolation": "linear"
}
]
}
],
"defaultScoringProfile": "recency-boosted"
}
Vector Search Configuration¶
Azure AI Search supports two algorithm families for approximate nearest neighbor (ANN) search on vector fields.
| Algorithm | Latency | Recall | Memory | Best For |
|---|---|---|---|---|
| HNSW (Hierarchical Navigable Small World) | Low (sub-50 ms) | High (95-99%) | Higher | Production workloads — default recommendation |
| Exhaustive KNN | Higher (scales linearly) | Perfect (100%) | Lower | Small indexes or ground-truth evaluation |
{
"vectorSearch": {
"algorithms": [
{
"name": "hnsw-config",
"kind": "hnsw",
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
}
],
"profiles": [
{
"name": "vector-profile",
"algorithmConfigurationName": "hnsw-config",
"vectorizer": "openai-vectorizer"
}
],
"vectorizers": [
{
"name": "openai-vectorizer",
"kind": "azureOpenAI",
"azureOpenAIParameters": {
"resourceUri": "https://<your-aoai>.openai.azure.com",
"deploymentId": "text-embedding-3-large",
"modelName": "text-embedding-3-large",
"apiKey": "<managed-identity-preferred>"
}
}
]
}
}
HNSW Tuning
Higher m and efConstruction values improve recall but increase index build time and memory consumption. Start with the defaults (m=4, efConstruction=400) and tune only when recall metrics from your evaluation set demand it.
Indexing¶
Push vs Pull Model¶
Azure AI Search supports two indexing strategies. CSA-in-a-Box uses both depending on the data source and latency requirements.
| Model | Mechanism | Best For |
|---|---|---|
| Pull (Indexer) | Scheduled crawler connects to a supported data source | Batch ingestion from ADLS, SQL, Cosmos DB — the primary pattern for Gold data products |
| Push (SDK/REST) | Application calls the Index Documents API directly | Real-time updates, custom chunking pipelines, streaming scenarios |
Indexers for Common Data Sources¶
| Data Source | Connector | Notes |
|---|---|---|
| ADLS Gen2 | azureblob / adlsgen2 | Parses PDF, DOCX, PPTX, JSON, CSV, and plain text; supports hierarchical paths |
| Azure SQL | azuresql | High-watermark or SQL integrated change tracking for incremental indexing |
| Cosmos DB | cosmosdb | Change feed for near-real-time incremental indexing |
| Azure Table Storage | azuretable | Simple key-value lookups |
| SharePoint Online | sharepoint | Enterprise document libraries (requires app registration) |
AI Enrichment Pipeline (Skillsets)¶
Skillsets attach to an indexer and run a sequence of built-in or custom skills over each document during indexing. This is how CSA-in-a-Box extracts structured data and embeddings from unstructured Gold assets.
graph LR
DOC["Source Document<br/>(PDF, DOCX, image)"] --> CRACK["Document Cracking<br/>text extraction"]
CRACK --> OCR["OCR Skill<br/>scanned images"]
OCR --> KP["Key Phrase<br/>Extraction"]
KP --> ENT["Entity<br/>Recognition"]
ENT --> EMBED["AzureOpenAI<br/>Embedding Skill"]
EMBED --> OUT["Enriched Output<br/>→ Search Index"]
style DOC fill:#fce4ec,stroke:#c62828
style OUT fill:#e8f5e9,stroke:#2e7d32 Built-in skills used in CSA-in-a-Box:
| Skill | Purpose |
|---|---|
| OCR | Extracts text from scanned images and image-based PDFs |
| Key Phrase Extraction | Identifies dominant terms for faceted search |
| Entity Recognition | Detects people, organizations, locations, dates |
| Language Detection | Routes content to the correct language analyzer |
| Text Split | Chunks large documents for embedding generation |
| Azure OpenAI Embedding | Generates text-embedding-3-large vectors during indexing |
Custom skills can be added via Azure Functions to run domain-specific enrichment (e.g., legal citation parsing, classification models, PII redaction).
Change Detection and Incremental Enrichment¶
| Strategy | How It Works |
|---|---|
| High-watermark | Indexer tracks a monotonically increasing column (e.g., last_modified) and only processes new/changed rows |
| SQL change tracking | Uses SQL Server integrated change tracking for row-level deltas |
| Cosmos DB change feed | Captures inserts and updates automatically via the Cosmos DB change feed |
| Soft delete | Detects a designated column value (e.g., is_deleted = true) and removes documents from the index |
| Incremental enrichment | Caches skillset outputs in a knowledge store; re-enriches only changed documents — saves Azure OpenAI embedding costs |
Enable Incremental Enrichment
Attach a knowledge store (Azure Blob container) to your skillset to enable caching. This avoids re-running expensive embedding generation on documents that have not changed — a significant cost saver on large indexes.
Vector Search¶
Embedding Generation¶
CSA-in-a-Box standardizes on Azure OpenAI text-embedding-3-large for embedding generation. This model produces 3072-dimensional vectors and supports dimensionality reduction via the dimensions parameter.
| Model | Dimensions | Max Tokens | Notes |
|---|---|---|---|
text-embedding-3-large | 3072 (default) | 8,191 | Highest quality; recommended for RAG |
text-embedding-3-large | 1536 (reduced) | 8,191 | 50% storage savings with minimal quality loss |
text-embedding-3-small | 1536 | 8,191 | Lower cost; suitable for high-volume, lower-precision use cases |
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="https://<your-aoai>.openai.azure.com",
api_version="2024-10-21",
azure_ad_token_provider=token_provider, # managed identity
)
response = client.embeddings.create(
input=["CSA-in-a-Box Gold data product: enforcement summary"],
model="text-embedding-3-large",
dimensions=3072,
)
vector = response.data[0].embedding # List[float] of length 3072
Hybrid Search (Keyword + Vector + Semantic Ranking)¶
Hybrid search is the default query mode for CSA-in-a-Box RAG pipelines. It combines three retrieval strategies in a single request to maximize both recall and precision.
| Stage | Mechanism | What It Does |
|---|---|---|
| 1. Keyword | BM25 full-text search | Exact and stemmed term matching — catches acronyms, proper nouns, IDs |
| 2. Vector | HNSW approximate nearest neighbor | Semantic similarity — catches paraphrases and related concepts |
| 3. Semantic Ranking | Cross-encoder L2 reranker | Reranks the merged result set using a transformer model for deep relevance |
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery
search_client = SearchClient(
endpoint="https://<your-search>.search.windows.net",
index_name="csa-gold-index",
credential=credential,
)
results = search_client.search(
search_text="antitrust enforcement actions 2025",
vector_queries=[
VectorizableTextQuery(
text="antitrust enforcement actions 2025",
k_nearest_neighbors=50,
fields="content_vector",
)
],
query_type="semantic",
semantic_configuration_name="csa-semantic-config",
top=5,
select=["title", "content", "source_url", "published_date"],
)
for result in results:
print(f"[{result['@search.reranker_score']:.2f}] {result['title']}")
print(f" Caption: {result['@search.captions'][0].text}")
Integrated Vectorization¶
With integrated vectorization, Azure AI Search generates embeddings at both index time and query time — eliminating the need for client-side embedding calls.
| Benefit | Description |
|---|---|
| No client-side embedding code | The search service calls Azure OpenAI directly |
| Consistent model version | Index-time and query-time embeddings always use the same model deployment |
| Simplified pipeline | Fewer moving parts in the ingestion and query paths |
Configure integrated vectorization by adding a vectorizer to your vector search profile (see the index definition in the Vector Search Configuration section above).
Semantic Ranking¶
Semantic ranking adds a transformer-based L2 reranking stage on top of keyword and vector results. It is the single most impactful feature for RAG relevance in CSA-in-a-Box.
Configuration¶
{
"semantic": {
"configurations": [
{
"name": "csa-semantic-config",
"prioritizedFields": {
"titleField": { "fieldName": "title" },
"contentFields": [{ "fieldName": "content" }],
"keywordsFields": [{ "fieldName": "tags" }]
}
}
]
}
}
Capabilities¶
| Feature | Description |
|---|---|
| L2 reranking | Cross-encoder reranks the top 50 results from the L1 retrieval stage for deeper relevance |
| Semantic captions | Extracts the most relevant passage from each document — ideal for RAG context injection |
| Semantic answers | Returns a direct extractive answer when the query has a clear factual answer in the corpus |
Semantic Ranking Limits
Semantic ranking reranks up to 50 results per query. It is available on Basic tier and above (free tier includes a limited number of semantic queries per month). See the Cost section for pricing details.
RAG Integration¶
Retrieval Pipeline¶
The CSA-in-a-Box RAG pipeline follows a four-step pattern for every user query.
sequenceDiagram
participant U as User / Agent
participant R as RAG Orchestrator
participant S as Azure AI Search
participant L as Azure OpenAI
U->>R: Natural language query
R->>S: Hybrid search (keyword + vector + semantic)
S-->>R: Top-k results with captions and scores
R->>R: Build prompt with retrieved context
R->>L: Completion request (system + context + query)
L-->>R: Grounded response with citations
R-->>U: Answer + source references Prompt Engineering with Search Results¶
Structure your system prompt to clearly separate instructions, retrieved context, and the user question. This pattern prevents hallucination and enables citation tracking.
## Instructions
You are a data analyst assistant for the CSA-in-a-Box platform.
Answer the user's question using ONLY the provided context.
If the context does not contain enough information, say so.
Cite sources using [Source N] notation.
## Context
{% for doc in search_results %}
[Source {{ loop.index }}] ({{ doc.source_url }})
{{ doc.content }}
{% endfor %}
## Question
{{ user_query }}
Citation and Grounding Patterns¶
| Pattern | Description |
|---|---|
| Inline citations | [Source 1] markers in the response text, mapped to search result metadata |
| Confidence filtering | Discard results below a reranker score threshold (e.g., @search.reranker_score < 1.0) |
| Chunk overlap | Use overlapping text chunks (e.g., 128-token overlap on 512-token chunks) to preserve context across chunk boundaries |
| Multi-index federation | Query multiple indexes (e.g., policies + case law) and merge results before prompt injection |
Semantic Kernel Retrieval Plugin¶
CSA-in-a-Box provides a Semantic Kernel retrieval plugin that wraps Azure AI Search as a native Kernel function. This is the recommended integration point for AI agents and copilots built in Tutorial 07.
// Semantic Kernel — Azure AI Search plugin registration
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Plugins.Memory;
var kernel = Kernel.CreateBuilder()
.AddAzureOpenAIChatCompletion("gpt-4o", endpoint, credential)
.Build();
var searchPlugin = new AzureAISearchPlugin(
searchEndpoint: "https://<your-search>.search.windows.net",
indexName: "csa-gold-index",
credential: credential,
semanticConfigurationName: "csa-semantic-config"
);
kernel.ImportPluginFromObject(searchPlugin, "SearchPlugin");
// The agent can now call SearchPlugin.Search() as a native function
var result = await kernel.InvokePromptAsync(
"{{SearchPlugin.Search $query}} Answer: {{$query}}",
new KernelArguments { ["query"] = "What enforcement actions occurred in 2025?" }
);
Cross-Reference: Agent Tutorials
See Tutorial 07 — AI Agents with Semantic Kernel for the full agent implementation and Tutorial 08 — RAG with Azure AI Search for the step-by-step RAG pipeline lab.
Security¶
Authentication and Authorization¶
| Method | When to Use |
|---|---|
| Azure AD RBAC (recommended) | Production — assign Search Index Data Reader, Search Index Data Contributor, or Search Service Contributor roles via Entra ID |
| API keys | Development and quick prototyping only — rotate regularly, never embed in client code |
Avoid API Keys in Production
API keys grant full access to all indexes on the service. Use Azure AD RBAC with least-privilege role assignments for every production workload. Managed identity eliminates the need to store credentials entirely.
Index-Level Security¶
Azure AI Search does not support native index-level RBAC. To isolate tenants or workloads, use one of these strategies:
| Strategy | Description |
|---|---|
| Separate indexes | One index per tenant or security boundary — simplest to reason about |
| Separate services | Complete network and data isolation — required for IL4/IL5 FedRAMP boundaries |
Document-Level Security Filters¶
For row-level access control within a shared index, add a security filter field and enforce it on every query.
# Index field: "allowed_groups": Collection(Edm.String), filterable
results = search_client.search(
search_text="budget allocation",
filter="allowed_groups/any(g: g eq 'finance-team')",
vector_queries=[...],
query_type="semantic",
semantic_configuration_name="csa-semantic-config",
)
Always Enforce Security Filters Server-Side
Never rely on the client to append security filters. Enforce them in your API middleware or backend service so that no query can bypass document-level access control.
Managed Identity for Indexer Connections¶
Configure indexer data source connections with managed identity to eliminate stored credentials for ADLS Gen2, Azure SQL, and Cosmos DB.
{
"name": "adls-gold-datasource",
"type": "adlsgen2",
"credentials": {
"connectionString": null
},
"identity": {
"@odata.type": "#Microsoft.Azure.Search.DataSourceManagedIdentity",
"userAssignedIdentity": "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<identity>"
},
"container": {
"name": "gold",
"query": "data-products/"
}
}
Private Endpoints¶
For network-isolated deployments (common in government and regulated industries), deploy Azure AI Search with a Private Endpoint and disable public network access.
| Component | Private Endpoint Target |
|---|---|
| Search service | searchService sub-resource |
| Indexer → ADLS Gen2 | blob or dfs sub-resource |
| Indexer → Azure SQL | sqlServer sub-resource |
| Indexer → Cosmos DB | Sql sub-resource |
| Search → Azure OpenAI (vectorizer) | account sub-resource |
Performance¶
Partition and Replica Sizing¶
| Dimension | Purpose | Guidance |
|---|---|---|
| Replicas | Query throughput and high availability | Start with 2 for HA; scale up for QPS |
| Partitions | Storage capacity and indexing throughput | Each partition adds ~25 GB (Standard) or ~100 GB (Storage Optimized) |
SLA Requires Replicas
Azure AI Search SLA for queries requires 2+ replicas. For read-write SLA (queries + indexing), you need 3+ replicas.
Index Optimization¶
| Technique | Impact |
|---|---|
Disable retrievable on large fields not needed in results | Reduces response payload size |
Use filterable and sortable only on fields that need it | Reduces index storage |
| Reduce vector dimensions (e.g., 3072 to 1536) | Halves vector storage with minimal recall loss |
Use stored: false on vector fields | Saves storage when you only need similarity search, not vector retrieval |
Query Performance Tuning¶
| Technique | Description |
|---|---|
Narrow select clause | Return only the fields your application needs |
Use $filter before search | Pre-filter narrows the candidate set before BM25 and vector scoring |
Limit top and skip | Deep pagination is expensive — use search_after for cursor-based paging |
| Warm the index | Run a representative query set after deployment to prime caches |
Cost¶
Service Tiers¶
| Tier | Max Indexes | Storage | Vector Dims | Semantic Ranker | Price Range (est.) |
|---|---|---|---|---|---|
| Free | 3 | 50 MB | Limited | Limited queries/month | $0 |
| Basic | 15 | 2 GB | Full support | Included | ~$75/mo |
| Standard S1 | 50 | 25 GB/partition | Full support | Included | ~$250/mo |
| Standard S2 | 200 | 100 GB/partition | Full support | Included | ~$1,000/mo |
| Standard S3 | 200 | 200 GB/partition | Full support | Included | ~$2,000/mo |
| Storage Optimized L1 | 10 | 1 TB/partition | Full support | Included | ~$2,500/mo |
| Storage Optimized L2 | 10 | 2 TB/partition | Full support | Included | ~$5,000/mo |
Start Small
Most CSA-in-a-Box pilots start on Basic or Standard S1. You can scale partitions and replicas independently without downtime. Promote to S2+ only when index size or QPS demands it.
Cost Drivers¶
| Driver | How to Optimize |
|---|---|
| Service tier and replica count | Right-size to actual index size and query load |
| Semantic ranker | Free allocation on Basic+; pay-per-query beyond threshold |
| Azure OpenAI embedding calls | Enable incremental enrichment to avoid re-embedding unchanged documents |
| Storage | Reduce vector dimensions; disable stored on vector fields; remove unused fields |
Anti-Patterns¶
Avoid these common mistakes when deploying Azure AI Search with CSA-in-a-Box.
Anti-Pattern: Indexing Raw Bronze Data
Problem: Indexing raw, uncleansed Bronze data pollutes the search index with duplicates, schema inconsistencies, and low-quality text. Fix: Always index from the Gold layer where data has been deduplicated, conformed, and enriched by the dbt transformation pipeline.
Anti-Pattern: Skipping Semantic Ranking
Problem: Using keyword-only or vector-only search for RAG retrieval. Keyword search misses paraphrases; vector search misses exact terms and acronyms. Fix: Always use hybrid search with semantic ranking enabled. The L2 reranker consistently improves answer quality for RAG workloads.
Anti-Pattern: Over-Sized Monolithic Indexes
Problem: Cramming every data product into a single massive index with hundreds of fields and millions of documents. Fix: Create domain-scoped indexes (e.g., policy-index, case-law-index, financial-index) and use multi-index federation in the RAG pipeline. This improves relevance, simplifies security filters, and enables independent scaling.
Anti-Pattern: Client-Side Embedding Without Versioning
Problem: Generating embeddings client-side with no guarantee that the model version matches the embeddings stored in the index. Fix: Use integrated vectorization so the search service controls both index-time and query-time embedding generation with the same model deployment.
Anti-Pattern: No Chunking Strategy
Problem: Indexing entire documents as single fields, leading to truncation at the embedding model's token limit and poor retrieval granularity. Fix: Implement a deliberate chunking strategy — typically 512 tokens with 128-token overlap. Use the built-in Text Split skill or a custom chunker in your push pipeline.
Cross-References¶
- Hands-on RAG lab: Tutorial 08 — RAG with Azure AI Search
- AI Foundry setup: Tutorial 06 — AI Analytics Foundry
- Agent integration: Tutorial 07 — AI Agents with Semantic Kernel
- Graph-based RAG: Tutorial 09 — GraphRAG Knowledge
- Decision guidance: RAG vs Fine-tune vs Agents
- Storage layer: Azure Data Lake Storage Gen2 Guide
- Compute engines: Azure Synapse Guide | Databricks Guide
- Governance: Purview Setup | Data Cataloging
- Security baseline: Security & Compliance
- Cost management: Cost Optimization