Source:
examples/fabric-data-agent/README.md— this page is rendered live from that file.
Fabric Data Agent — Read-Only Grounded Q&A over Microsoft Fabric Lakehouse¶
Status: Reference pattern. Production-deployable on Azure Commercial today; pre-GA in Azure Government (use the streaming-spine alternative documented below until Fabric reaches Gov GA).
A minimal, contract-governed, read-only AI agent that answers natural-language questions over a registered set of Fabric Lakehouse tables via the Lakehouse SQL endpoint and Azure OpenAI.
What you get¶
| Component | Path | Purpose |
|---|---|---|
| Agent orchestrator | agent/agent.py | Wires retriever + generator + answer formatter |
| Retriever | agent/retriever.py | Schema-aware SQL generation, read-only enforcement (rejects anything that is not a bare SELECT/WITH), row cap |
| Generator | agent/generator.py | Azure OpenAI grounded answer with mandatory citation back to query results |
| Config | agent/config.py | Endpoint + auth + table-allow-list configuration (no secrets in code) |
| Data contract | contracts/fabric_query_contract.yaml | Schema + classification + scope guard for the agent's request/response surface |
| IaC | deploy/bicep/main.bicep | Fabric capacity + lakehouse + workspace identity + AOAI + Key Vault wiring |
| Sample data | sample_data/ | Synthetic tables for end-to-end testing without a real Fabric workspace |
| Gov note | GOV_NOTE.md | What to do today if you must deploy on Azure Government |
| Tests | agent/tests/ | Unit tests for retriever, generator, agent orchestration |
Why this exists¶
A read-only "ask the lakehouse a question" surface is the most-requested AI pattern across our enterprise customer conversations. The dangerous version of this writes ad-hoc SQL with full credentials. This version:
- Refuses any SQL statement that isn't a bare
SELECT/WITH— enforced before execution - Caps result row count to prevent accidental table dumps
- Requires the LLM to cite the result rows it used (no answer without grounding)
- Restricts table access to an allow-list defined in
config.py - Emits a contract-governed response (
contracts/fabric_query_contract.yaml) that can be validated in CI
The scope guard logic in retriever.py is the part most worth lifting into your own code if you build a similar surface against a different SQL endpoint.
Quickstart (Azure Commercial, ~30 min)¶
Prerequisites¶
- Azure Commercial subscription with Fabric capacity SKU available (F2 minimum for dev)
- Azure OpenAI resource with
gpt-4o-mini(or newer) deployed az cli2.60+,python3.11+,pytest
Deploy infra¶
cd examples/fabric-data-agent
# Fill in deploy/params.dev.json with your Fabric workspace name + AOAI endpoint
az deployment sub create \
--location eastus \
--template-file deploy/bicep/main.bicep \
--parameters @deploy/params.dev.json
Load sample data¶
# Upload sample_data/*.parquet into the Fabric lakehouse via OneLake CLI or
# the Fabric portal. Tables: customers, orders, products, regions.
Run the agent locally¶
# Set env (use Key Vault refs in prod)
export FABRIC_WORKSPACE_ID="<workspace-guid>"
export FABRIC_LAKEHOUSE_NAME="csa_demo"
export AZURE_OPENAI_ENDPOINT="https://<aoai>.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini"
python -m agent.agent \
--question "What were total sales by region last quarter, and which region had the highest growth vs the prior quarter?"
Sample output:
Q: What were total sales by region last quarter, and which region had
the highest growth vs the prior quarter?
SQL (read-only check passed, row cap = 1000):
WITH q AS (
SELECT region_id,
SUM(order_total) AS sales,
DATE_TRUNC('quarter', order_date) AS qtr
FROM csa_demo.orders
GROUP BY region_id, DATE_TRUNC('quarter', order_date)
)
SELECT region_id, qtr, sales,
LAG(sales) OVER (PARTITION BY region_id ORDER BY qtr) AS prior_sales
FROM q
WHERE qtr >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '6 months'
ORDER BY region_id, qtr;
A: Last quarter (Q1 2026), total sales were:
- Northeast: $4.2M (▲ 18% vs Q4) ← highest growth
- South: $3.8M (▲ 6%)
- Midwest: $3.1M (▲ 2%)
- West: $5.6M (▼ 4%)
Source rows: orders.region_id = {1,2,3,4}, qtr in {2025-Q4, 2026-Q1}.
8 rows scanned. See attached SQL for the full query.
Run the tests¶
Read-only enforcement — the part worth copying¶
The retriever rejects any SQL that isn't a bare read. This is the minimum bar for a "let users talk to the warehouse" surface:
# agent/retriever.py (excerpt — see source for full impl)
_ALLOWED_LEADING = ("select", "with")
_FORBIDDEN_TOKENS = (
"insert", "update", "delete", "merge", "drop", "alter",
"create", "truncate", "grant", "revoke", "copy", "exec",
"execute", "call", ";--", "/*", "*/",
)
def _is_read_only(sql: str) -> bool:
s = sql.strip().lower()
if not s.startswith(_ALLOWED_LEADING):
return False
# No multi-statement (we strip trailing ; but reject any embedded ;)
s_no_trailing = s.rstrip(";").strip()
if ";" in s_no_trailing:
return False
return not any(tok in s_no_trailing for tok in _FORBIDDEN_TOKENS)
This is defense in depth — the workspace identity itself should also have read-only RBAC on the lakehouse. Never rely on the in-process check alone in production.
Data contract¶
Every request/response is governed by contracts/fabric_query_contract.yaml:
classification: unrestricted— for sensitive lakehouses, change toconfidentialand require additional auth claimspii: false— declares this surface does not return PII; CI validators flag any column that smells like PIIschema.primary_key: [request_id]— every Q&A round-trip is traceableread_only: true(in thepolicyblock) — the agent will not even attempt non-read SQL
The contract is consumed by csa_platform/governance/contracts/contract_validator.py in CI.
Production checklist¶
Before exposing this to real users:
- Workspace identity has read-only Fabric RBAC (not just the in-process guard)
- AOAI endpoint behind Private Endpoint, Key Vault references for the key
-
FABRIC_TABLE_ALLOW_LISTconfig narrowed to only the tables the agent should see - Row cap (
MAX_RESULT_ROWS) set conservatively (default 1,000; tune per table) - Rate limiter in front (see ADR 0021)
- Application Insights logging enabled with the
request_idcorrelation key - Output content filter on the AOAI deployment (Azure AI Content Safety)
- Eval suite per quarter using
apps/copilot/evals/framework as a template
Azure Government note¶
Fabric is pre-GA in Azure Gov (as of writing). Use the streaming spine alternative documented in GOV_NOTE.md:
| Layer | Commercial (this example) | Gov alternative (today) |
|---|---|---|
| Storage | Fabric Lakehouse / OneLake | ADLS Gen2 + Delta |
| SQL endpoint | Lakehouse SQL endpoint | Synapse Serverless SQL or Databricks SQL |
| LLM | Azure OpenAI (Commercial) | Azure OpenAI (Gov, IL4 / IL5) |
| Agent code | unchanged | unchanged (swap _load_fabric_client for a DB-API 2.0 connection) |
The agent's read-only guard, generator, and orchestrator are fully portable — only the connection layer changes.