Source: examples/fabric-data-agent/README.md — this page is rendered live from that file.

Fabric Data Agent — Read-Only Grounded Q&A over Microsoft Fabric Lakehouse¶

Status: Reference pattern. Production-deployable on Azure Commercial today; pre-GA in Azure Government (use the streaming-spine alternative documented below until Fabric reaches Gov GA).

A minimal, contract-governed, read-only AI agent that answers natural-language questions over a registered set of Fabric Lakehouse tables via the Lakehouse SQL endpoint and Azure OpenAI.

What you get¶

Component	Path	Purpose
Agent orchestrator	`agent/agent.py`	Wires retriever + generator + answer formatter
Retriever	`agent/retriever.py`	Schema-aware SQL generation, read-only enforcement (rejects anything that is not a bare `SELECT`/`WITH`), row cap
Generator	`agent/generator.py`	Azure OpenAI grounded answer with mandatory citation back to query results
Config	`agent/config.py`	Endpoint + auth + table-allow-list configuration (no secrets in code)
Data contract	`contracts/fabric_query_contract.yaml`	Schema + classification + scope guard for the agent's request/response surface
IaC	`deploy/bicep/main.bicep`	Fabric capacity + lakehouse + workspace identity + AOAI + Key Vault wiring
Sample data	`sample_data/`	Synthetic tables for end-to-end testing without a real Fabric workspace
Gov note	`GOV_NOTE.md`	What to do today if you must deploy on Azure Government
Tests	`agent/tests/`	Unit tests for retriever, generator, agent orchestration

Why this exists¶

A read-only "ask the lakehouse a question" surface is the most-requested AI pattern across our enterprise customer conversations. The dangerous version of this writes ad-hoc SQL with full credentials. This version:

Refuses any SQL statement that isn't a bare SELECT/WITH — enforced before execution
Caps result row count to prevent accidental table dumps
Requires the LLM to cite the result rows it used (no answer without grounding)
Restricts table access to an allow-list defined in config.py
Emits a contract-governed response (contracts/fabric_query_contract.yaml) that can be validated in CI

The scope guard logic in retriever.py is the part most worth lifting into your own code if you build a similar surface against a different SQL endpoint.

Quickstart (Azure Commercial, ~30 min)¶

Prerequisites¶

Azure Commercial subscription with Fabric capacity SKU available (F2 minimum for dev)
Azure OpenAI resource with gpt-4o-mini (or newer) deployed
az cli 2.60+, python 3.11+, pytest

Deploy infra¶

cd examples/fabric-data-agent

# Fill in deploy/params.dev.json with your Fabric workspace name + AOAI endpoint
az deployment sub create \
  --location eastus \
  --template-file deploy/bicep/main.bicep \
  --parameters @deploy/params.dev.json

Load sample data¶

# Upload sample_data/*.parquet into the Fabric lakehouse via OneLake CLI or
# the Fabric portal. Tables: customers, orders, products, regions.

Run the agent locally¶

# Set env (use Key Vault refs in prod)
export FABRIC_WORKSPACE_ID="<workspace-guid>"
export FABRIC_LAKEHOUSE_NAME="csa_demo"
export AZURE_OPENAI_ENDPOINT="https://<aoai>.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o-mini"

python -m agent.agent \
  --question "What were total sales by region last quarter, and which region had the highest growth vs the prior quarter?"

Sample output:

Q: What were total sales by region last quarter, and which region had
   the highest growth vs the prior quarter?

SQL  (read-only check passed, row cap = 1000):
  WITH q AS (
    SELECT region_id,
           SUM(order_total) AS sales,
           DATE_TRUNC('quarter', order_date) AS qtr
    FROM csa_demo.orders
    GROUP BY region_id, DATE_TRUNC('quarter', order_date)
  )
  SELECT region_id, qtr, sales,
         LAG(sales) OVER (PARTITION BY region_id ORDER BY qtr) AS prior_sales
  FROM q
  WHERE qtr >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '6 months'
  ORDER BY region_id, qtr;

A: Last quarter (Q1 2026), total sales were:
   - Northeast: $4.2M  (▲ 18% vs Q4)   ← highest growth
   - South:     $3.8M  (▲  6%)
   - Midwest:   $3.1M  (▲  2%)
   - West:      $5.6M  (▼  4%)

   Source rows: orders.region_id = {1,2,3,4}, qtr in {2025-Q4, 2026-Q1}.
   8 rows scanned. See attached SQL for the full query.

Run the tests¶

cd examples/fabric-data-agent
pytest -q

Read-only enforcement — the part worth copying¶

The retriever rejects any SQL that isn't a bare read. This is the minimum bar for a "let users talk to the warehouse" surface:

# agent/retriever.py (excerpt — see source for full impl)

_ALLOWED_LEADING = ("select", "with")
_FORBIDDEN_TOKENS = (
    "insert", "update", "delete", "merge", "drop", "alter",
    "create", "truncate", "grant", "revoke", "copy", "exec",
    "execute", "call", ";--", "/*", "*/",
)

def _is_read_only(sql: str) -> bool:
    s = sql.strip().lower()
    if not s.startswith(_ALLOWED_LEADING):
        return False
    # No multi-statement (we strip trailing ; but reject any embedded ;)
    s_no_trailing = s.rstrip(";").strip()
    if ";" in s_no_trailing:
        return False
    return not any(tok in s_no_trailing for tok in _FORBIDDEN_TOKENS)

This is defense in depth — the workspace identity itself should also have read-only RBAC on the lakehouse. Never rely on the in-process check alone in production.

Data contract¶

Every request/response is governed by contracts/fabric_query_contract.yaml:

classification: unrestricted — for sensitive lakehouses, change to confidential and require additional auth claims
pii: false — declares this surface does not return PII; CI validators flag any column that smells like PII
schema.primary_key: [request_id] — every Q&A round-trip is traceable
read_only: true (in the policy block) — the agent will not even attempt non-read SQL

The contract is consumed by csa_platform/governance/contracts/contract_validator.py in CI.

Production checklist¶

Before exposing this to real users:

Workspace identity has read-only Fabric RBAC (not just the in-process guard)
AOAI endpoint behind Private Endpoint, Key Vault references for the key
FABRIC_TABLE_ALLOW_LIST config narrowed to only the tables the agent should see
Row cap (MAX_RESULT_ROWS) set conservatively (default 1,000; tune per table)
Rate limiter in front (see ADR 0021)
Application Insights logging enabled with the request_id correlation key
Output content filter on the AOAI deployment (Azure AI Content Safety)
Eval suite per quarter using apps/copilot/evals/ framework as a template

Azure Government note¶

Fabric is pre-GA in Azure Gov (as of writing). Use the streaming spine alternative documented in GOV_NOTE.md:

Layer	Commercial (this example)	Gov alternative (today)
Storage	Fabric Lakehouse / OneLake	ADLS Gen2 + Delta
SQL endpoint	Lakehouse SQL endpoint	Synapse Serverless SQL or Databricks SQL
LLM	Azure OpenAI (Commercial)	Azure OpenAI (Gov, IL4 / IL5)
Agent code	unchanged	unchanged (swap `_load_fabric_client` for a DB-API 2.0 connection)

The agent's read-only guard, generator, and orchestrator are fully portable — only the connection layer changes.