Tutorial: SAP Data to Microsoft Fabric¶
Step-by-step: configure Fabric Mirroring for SAP HANA to replicate SAP data to OneLake, build Delta tables, create Power BI reports on SAP data, and integrate with the CSA-in-a-Box analytics layer.
!!! info "Prerequisites" - SAP HANA running on Azure VMs (or RISE with SAP on Azure) - Microsoft Fabric capacity (F64 or higher recommended) - Power BI Pro or Premium license - Network connectivity between SAP HANA and Fabric (VNet peering or private endpoints) - SAP HANA database user with SELECT privileges on target tables - Estimated time: 2--3 hours
Architecture overview¶
SAP HANA (Azure VM) Microsoft Fabric
┌──────────────────┐ ┌──────────────────────────┐
│ VBAK (sales hdr) │──────────────►│ Mirrored DB: sap_mirror │
│ VBAP (sales itm) │──CDC Stream──►│ ├── vbak (Delta) │
│ EKKO (PO header) │──────────────►│ ├── vbap (Delta) │
│ EKPO (PO items) │──────────────►│ ├── ekko (Delta) │
│ ACDOCA (finance) │──────────────►│ ├── ekpo (Delta) │
│ MARA (materials) │──────────────►│ └── acdoca (Delta) │
└──────────────────┘ └───────────┬──────────────┘
│
┌───────────▼──────────────┐
│ Fabric Lakehouse │
│ ├── Bronze (mirrored) │
│ ├── Silver (cleansed) │
│ └── Gold (aggregated) │
└───────────┬──────────────┘
│
┌───────────▼──────────────┐
│ Power BI (Direct Lake) │
│ ├── Sales Dashboard │
│ ├── Procurement Report │
│ └── Finance Overview │
└──────────────────────────┘
Step 1: Prepare SAP HANA for Fabric Mirroring¶
1.1 Create a dedicated HANA user for Fabric¶
-- Connect to SAP HANA as SYSTEM
-- Create a database user for Fabric Mirroring
CREATE USER FABRIC_MIRROR PASSWORD "<SecurePassword123!>"
NO FORCE_FIRST_PASSWORD_CHANGE;
-- Grant SELECT on target schemas
GRANT SELECT ON SCHEMA SAPS4H TO FABRIC_MIRROR;
-- Grant CDC-related privileges
GRANT CATALOG READ TO FABRIC_MIRROR;
GRANT SELECT ON SYS.M_TABLES TO FABRIC_MIRROR;
GRANT SELECT ON SYS.M_TABLE_COLUMNS TO FABRIC_MIRROR;
GRANT SELECT ON SYS.TABLE_COLUMNS TO FABRIC_MIRROR;
-- Enable change tracking on target tables
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'SYSTEM')
SET ('change_tracking', 'enable') = 'true'
WITH RECONFIGURE;
1.2 Verify network connectivity¶
# From Fabric private endpoint, verify HANA connectivity
# HANA SQL port is typically 3<instance>15 (e.g., 30015 for instance 00)
az network private-endpoint create \
--resource-group rg-fabric-integration \
--name pe-fabric-to-hana \
--vnet-name vnet-csa-data \
--subnet fabric-subnet \
--private-connection-resource-id /subscriptions/$SUB_ID/resourceGroups/rg-sap-prod/providers/Microsoft.Network/privateLinkServices/pls-hana \
--connection-name fabric-hana-connection
Step 2: Configure Fabric Mirroring for SAP HANA¶
2.1 Create a mirrored database in Fabric¶
- Open Microsoft Fabric portal (app.fabric.microsoft.com)
- Navigate to your workspace (e.g.,
ws-sap-analytics) - Click + New item > Mirrored Database
- Select SAP HANA as the source
- Enter connection details:
| Field | Value |
|---|---|
| Server | vm-hana-prd.contoso.com (or private endpoint IP) |
| Port | 30015 |
| Database | S4H |
| Authentication | Username and password |
| Username | FABRIC_MIRROR |
| Password | <SecurePassword123!> |
| Connection name | SAP-HANA-Production |
2.2 Select tables for mirroring¶
Select the SAP tables to replicate. Start with high-value tables:
| SAP table | Description | Typical size | Refresh pattern |
|---|---|---|---|
| VBAK | Sales order header | 5--50M rows | Near-real-time CDC |
| VBAP | Sales order items | 10--100M rows | Near-real-time CDC |
| EKKO | Purchase order header | 2--20M rows | Near-real-time CDC |
| EKPO | Purchase order items | 5--50M rows | Near-real-time CDC |
| ACDOCA | Universal Journal (FI) | 50--500M rows | Near-real-time CDC |
| MARA | Material master | 500K--5M rows | Near-real-time CDC |
| MAKT | Material descriptions | 500K--5M rows | Near-real-time CDC |
| KNA1 / BUT000 | Customer master / Business Partner | 100K--2M rows | Near-real-time CDC |
| LFA1 / BUT000 | Vendor master / Business Partner | 50K--500K rows | Near-real-time CDC |
2.3 Start mirroring¶
- Click Mirror database to begin initial synchronization
- Monitor the initial sync progress in the Fabric portal
- Initial sync time depends on table sizes:
| Total data volume | Initial sync time (estimated) |
|---|---|
| < 10 GB | 15--30 minutes |
| 10--50 GB | 30--120 minutes |
| 50--200 GB | 2--6 hours |
| 200 GB+ | 6--24 hours |
Step 3: Build the Fabric Lakehouse (medallion architecture)¶
3.1 Create a Fabric Lakehouse¶
- In your workspace, click + New item > Lakehouse
- Name it
lh_sap_analytics - This lakehouse will contain the silver and gold layers
3.2 Create silver layer transformations (Fabric notebook)¶
# Fabric Notebook: Transform SAP mirrored data to silver layer
# This notebook reads from the mirrored database and writes cleansed data
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.types import *
# Read mirrored SAP sales order data
df_vbak = spark.read.format("delta").load(
"Tables/sap_mirror/vbak"
)
df_vbap = spark.read.format("delta").load(
"Tables/sap_mirror/vbap"
)
# Transform: Silver layer - cleansed sales orders
silver_sales_orders = (
df_vbak
.filter(F.col("mandt") == "100") # Filter to production client
.select(
F.col("vbeln").alias("sales_order_number"),
F.col("erdat").alias("created_date"),
F.col("erzet").alias("created_time"),
F.col("ernam").alias("created_by"),
F.col("auart").alias("order_type"),
F.col("vkorg").alias("sales_org"),
F.col("vtweg").alias("distribution_channel"),
F.col("spart").alias("division"),
F.col("kunnr").alias("customer_number"),
F.col("netwr").alias("net_value"),
F.col("waerk").alias("currency"),
F.col("bstnk").alias("po_number"),
F.col("lifsk").alias("delivery_block"),
F.col("faksk").alias("billing_block"),
F.to_date("erdat", "yyyyMMdd").alias("order_date")
)
)
# Write silver layer
silver_sales_orders.write \
.format("delta") \
.mode("overwrite") \
.option("overwriteSchema", "true") \
.save("Tables/silver/sales_orders")
# Transform: Silver layer - sales order items
silver_sales_items = (
df_vbap
.filter(F.col("mandt") == "100")
.select(
F.col("vbeln").alias("sales_order_number"),
F.col("posnr").alias("item_number"),
F.col("matnr").alias("material_number"),
F.col("arktx").alias("item_description"),
F.col("kwmeng").alias("order_quantity"),
F.col("vrkme").alias("sales_unit"),
F.col("netwr").alias("net_value"),
F.col("waerk").alias("currency"),
F.col("werks").alias("plant"),
F.col("lgort").alias("storage_location"),
F.col("pstyv").alias("item_category")
)
)
silver_sales_items.write \
.format("delta") \
.mode("overwrite") \
.option("overwriteSchema", "true") \
.save("Tables/silver/sales_order_items")
print("Silver layer transformation complete.")
3.3 Create gold layer aggregations¶
# Gold layer: Sales analytics aggregations
# Read silver layer
df_orders = spark.read.format("delta").load("Tables/silver/sales_orders")
df_items = spark.read.format("delta").load("Tables/silver/sales_order_items")
# Gold: Daily sales summary
gold_daily_sales = (
df_orders
.groupBy("order_date", "sales_org", "order_type")
.agg(
F.count("*").alias("order_count"),
F.sum("net_value").alias("total_net_value"),
F.countDistinct("customer_number").alias("unique_customers")
)
.orderBy("order_date")
)
gold_daily_sales.write \
.format("delta") \
.mode("overwrite") \
.save("Tables/gold/daily_sales_summary")
# Gold: Material sales performance
gold_material_sales = (
df_items
.join(df_orders, "sales_order_number")
.groupBy("material_number", "item_description", "sales_org")
.agg(
F.sum("net_value").alias("total_revenue"),
F.sum("order_quantity").alias("total_quantity"),
F.count("*").alias("line_item_count")
)
.orderBy(F.desc("total_revenue"))
)
gold_material_sales.write \
.format("delta") \
.mode("overwrite") \
.save("Tables/gold/material_sales_performance")
print("Gold layer aggregation complete.")
Step 4: Create Power BI reports on SAP data¶
4.1 Create a Power BI semantic model¶
- In the Fabric Lakehouse, click New semantic model
- Select tables:
gold/daily_sales_summary,gold/material_sales_performance,silver/sales_orders - The semantic model uses Direct Lake mode (no data import; reads directly from Delta tables)
4.2 Build a sales dashboard¶
- Open Power BI Desktop or create a report in the Fabric portal
- Connect to the semantic model
- Create visualizations:
| Visualization | Data | Purpose |
|---|---|---|
| KPI card | SUM(total_net_value) | Total sales value |
| KPI card | COUNT(order_count) | Total order count |
| Line chart | order_date (x) vs total_net_value (y) | Sales trend over time |
| Bar chart | material_number vs total_revenue | Top materials by revenue |
| Map | sales_org location vs total_net_value | Sales by region |
| Table | sales_order_number, customer_number, net_value, order_date | Order details |
| Slicer | sales_org, order_type, order_date | Interactive filtering |
4.3 DAX measures for SAP analytics¶
// Total Revenue
Total Revenue = SUM('daily_sales_summary'[total_net_value])
// Year-over-Year Growth
YoY Growth =
VAR CurrentYear = [Total Revenue]
VAR PreviousYear = CALCULATE(
[Total Revenue],
SAMEPERIODLASTYEAR('sales_orders'[order_date])
)
RETURN DIVIDE(CurrentYear - PreviousYear, PreviousYear, 0)
// Average Order Value
Avg Order Value = DIVIDE([Total Revenue], SUM('daily_sales_summary'[order_count]))
// Customer Concentration (Top 10%)
Top 10% Customer Revenue =
CALCULATE(
[Total Revenue],
TOPN(
CALCULATE(DISTINCTCOUNT('sales_orders'[customer_number])) * 0.1,
ALL('sales_orders'[customer_number]),
[Total Revenue],
DESC
)
)
Step 5: Integrate with CSA-in-a-Box¶
5.1 Register SAP data in Purview¶
# Register HANA source in Purview
az purview account create \
--resource-group rg-csa-governance \
--name purview-csa-inabox \
--location eastus2
# Scan SAP HANA metadata
# Configure in Purview portal:
# 1. Register SAP HANA as a data source
# 2. Create classification rules for SAP data (PII, financial)
# 3. Schedule metadata scan
# 4. Review discovered assets in Purview catalog
5.2 Connect to CSA-in-a-Box dbt models¶
# dbt project: reference SAP data from Fabric Lakehouse
# profiles.yml
sap_analytics:
target: fabric
outputs:
fabric:
type: fabric
server: <workspace>.datawarehouse.fabric.microsoft.com
database: lh_sap_analytics
schema: gold
threads: 4
-- dbt model: SAP procurement analytics
-- models/gold/procurement/purchase_order_summary.sql
{{ config(materialized='table', tags=['sap', 'procurement']) }}
SELECT
po.sales_org,
po.vendor_number,
v.vendor_name,
DATE_TRUNC('month', po.order_date) AS order_month,
COUNT(*) AS po_count,
SUM(po.net_value) AS total_po_value,
AVG(po.net_value) AS avg_po_value
FROM {{ ref('silver_purchase_orders') }} po
LEFT JOIN {{ ref('dim_vendors') }} v
ON po.vendor_number = v.vendor_number
GROUP BY 1, 2, 3, 4
5.3 Enable Azure AI on SAP data¶
# Azure OpenAI: Natural language queries on SAP data
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="https://aoai-csa-inabox.openai.azure.com/",
api_version="2024-02-15-preview"
)
# User asks a question about SAP data
user_question = "What were our top 5 selling materials last quarter?"
# Generate SQL from natural language
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """You are a data analyst. Generate Spark SQL queries against
these tables:
- gold.material_sales_performance (material_number, item_description,
total_revenue, total_quantity, sales_org)
- gold.daily_sales_summary (order_date, sales_org, order_type,
order_count, total_net_value)
Return only the SQL query, no explanation."""
}, {
"role": "user",
"content": user_question
}]
)
# Execute the generated SQL
sql_query = response.choices[0].message.content
result_df = spark.sql(sql_query)
result_df.show()
Step 6: Schedule and monitor¶
6.1 Schedule notebook refresh¶
- In Fabric, open the notebook pipeline
- Click Schedule > set refresh frequency (e.g., every 4 hours for silver/gold layers)
- Fabric Mirroring runs continuously (no scheduling needed --- CDC is real-time)
6.2 Monitor data freshness¶
# Check mirroring status
mirror_status = spark.sql("""
SELECT table_name,
last_sync_timestamp,
row_count,
DATEDIFF(minute, last_sync_timestamp, current_timestamp()) AS minutes_behind
FROM information_schema.mirror_status
WHERE database_name = 'sap_mirror'
""")
mirror_status.show()
What you built¶
In this tutorial, you:
- Configured SAP HANA for Fabric Mirroring (CDC-based replication)
- Created a mirrored database in Microsoft Fabric with key SAP tables
- Built a medallion architecture (bronze/silver/gold) in a Fabric Lakehouse
- Created Power BI reports with Direct Lake mode on SAP data
- Integrated with CSA-in-a-Box: Purview governance, dbt models, Azure AI
SAP transactional data now flows continuously into OneLake, is governed by Purview, transformed by dbt, visualized by Power BI, and available for AI-driven insights --- all within the CSA-in-a-Box platform.
Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team Related: Analytics Migration | Tutorial: Deploy SAP on Azure | Best Practices