π Retail & CPG: Demand Forecasting and Customer Analytics on Microsoft Fabric
Industry: Retail / Consumer Packaged Goods Compliance: PCI-DSS v4.0 Fabric SKU: F64 (P1 equivalent) Estimated Cost: ~$22,000/month

Executive Summary
A mid-market omnichannel retailer with 450 stores, $2.8B annual revenue, and 12 million loyalty members leverages Microsoft Fabric to unify point-of-sale streams, demand forecasting, inventory optimization, and customer analytics into a single lakehouse platform. The solution replaces a patchwork of legacy data warehouses, standalone forecasting tools, and siloed CRM systems with a cohesive medallion architecture powered by Delta Lake, Eventstream, and Direct Lake.
Key Business Outcomes
| Metric | Before Fabric | After Fabric | Impact |
| Forecast accuracy (WMAPE) | 38% | 18% | -20pp |
| Stockout rate | 8.2% | 3.1% | -62% |
| Markdown waste | $42M/yr | $28M/yr | -$14M |
| Promotion ROI visibility | Manual, 2 weeks | Automated, next-day | 10x faster |
| Customer 360 refresh | Weekly batch | Near-real-time | Continuous |
| Time-to-insight for merchants | 3β5 days | < 4 hours | 18x faster |
Estimated annual ROI: $18β24M from markdown reduction, stockout avoidance, and promotion optimization.
1. Business Context
1.1 Company Profile
| Attribute | Value |
| Revenue | $2.8B annually |
| Stores | 450 (big-box, express, online) |
| SKUs | 85,000 active |
| Daily POS transactions | ~4.2M |
| Loyalty members | 12M |
| Regions | 6 US regions |
| Distribution centers | 8 |
1.2 Pain Points
- Fragmented demand signals β POS, e-commerce, loyalty, and weather data live in separate systems with no unified view.
- Forecast lag β Legacy batch forecasting runs weekly; merchants need daily SKU-store-level predictions for replenishment.
- Markdown inefficiency β Without granular sell-through curves, buyers mark down too early or too late, eroding margin.
- Customer blindness β CRM captures transactions but lacks real-time behavioral signals; next-best-offer models are stale.
- Promotion black box β Marketing spends $180M/yr on promotions with limited incrementality measurement.
- Supply chain opacity β No consolidated view of vendor lead times, fill rates, or DC throughput.
1.3 Compliance Requirements (PCI-DSS v4.0)
| Requirement | Implementation |
| Req 3.4 β Render PAN unreadable | Tokenize card numbers at POS ingestion; only token + last-4 stored |
| Req 3.5 β Protect cryptographic keys | Azure Key Vault with HSM-backed keys |
| Req 7.1 β Limit access by business need | Fabric workspace RBAC; row-level security on gold tables |
| Req 8.3 β MFA for admin access | Entra ID Conditional Access enforced |
| Req 10.2 β Audit logging | Fabric audit logs β Eventhouse for real-time alerting |
| Req 12.3 β Risk assessment | Quarterly PCI scope review documented |
Critical rule: No Primary Account Number (PAN) enters the analytics lakehouse. All card data is tokenized at the edge (POS terminal or payment gateway) before streaming to Fabric.
2. Solution Architecture
2.1 High-Level Data Flow
graph LR
subgraph Sources
POS[POS Terminals<br/>450 stores]
ECOM[E-Commerce<br/>Platform]
LOYALTY[Loyalty<br/>Platform]
SUPPLY[Supply Chain<br/>ERP]
WEATHER[Weather<br/>API]
end
subgraph Fabric
ES[Eventstream]
EH[Eventhouse<br/>Real-Time Inventory]
BRZ[Bronze<br/>Lakehouse]
SLV[Silver<br/>Lakehouse]
GLD[Gold<br/>Lakehouse]
ML[ML Model<br/>Endpoints]
PBI[Power BI<br/>Direct Lake]
end
POS -->|Tokenized POS| ES
ECOM -->|Orders| ES
LOYALTY -->|Events| ES
SUPPLY -->|Batch/daily| BRZ
WEATHER -->|API pull| BRZ
ES --> EH
ES --> BRZ
BRZ --> SLV
SLV --> GLD
GLD --> ML
ML --> GLD
GLD --> PBI
EH --> PBI
2.2 Eventstream Configuration
| Stream | Source | Volume | Retention |
es-pos-transactions | POS via Event Hub | ~50K events/min peak | 7 days |
es-ecommerce-orders | Webhook β Event Hub | ~2K events/min | 7 days |
es-loyalty-events | CDC from loyalty DB | ~500 events/min | 3 days |
2.3 Eventhouse (Real-Time Inventory)
The Eventhouse provides sub-second inventory position queries for store associates and the e-commerce platform:
// Real-time inventory position by store-SKU
InventoryEvents
| where EventTime > ago(1h)
| summarize
LastQtyOnHand = arg_max(EventTime, QtyOnHand),
TotalSold = sumif(Qty, EventType == "SALE"),
TotalReceived = sumif(Qty, EventType == "RECEIPT")
by StoreId, SKU
| extend AvailableToPromise = LastQtyOnHand - TotalSold + TotalReceived
| where AvailableToPromise < 10
| order by AvailableToPromise asc
2.4 Medallion Lakehouse Layers
Bronze (lh_bronze)
| Table | Source | Grain | Volume |
bronze_pos_transactions | POS Eventstream | 1 row per line item | ~4.2M/day |
bronze_ecommerce_orders | E-commerce webhook | 1 row per order line | ~120K/day |
bronze_loyalty_events | Loyalty CDC | 1 row per event | ~500K/day |
bronze_product_master | ERP batch | 1 row per SKU | 85K (full refresh weekly) |
bronze_store_master | ERP batch | 1 row per store | 450 (full refresh weekly) |
bronze_weather_daily | Weather API | 1 row per store-day | 450/day |
Silver (lh_silver)
| Table | Logic | Grain |
silver_retail_sales | Deduplicated, validated, enriched with product/store dims | 1 row per txn line |
silver_customer_profile | Merged loyalty + POS, PII masked | 1 row per customer |
silver_inventory_daily | End-of-day snapshot from Eventhouse | 1 row per store-SKU-day |
Gold (lh_gold)
| Table | Logic | Grain | Refresh |
gold_retail_demand | Daily aggregated sales + forecast features | SKU-store-day | Daily |
gold_retail_customer_360 | RFM scores, CLV, segments, next-best-offer | Customer | Daily |
gold_promotion_lift | Causal impact per promotion-SKU | Promo-SKU | Post-campaign |
gold_supply_chain_kpi | Vendor fill rate, lead time, DC throughput | Vendor-day | Daily |
3. Demand Forecasting
3.1 Feature Engineering
The demand model operates at SKU Γ Store Γ Day granularity (~38M combinations for active items):
| Feature Category | Examples |
| Lag features | sales_lag_1d, sales_lag_7d, sales_lag_28d |
| Rolling stats | sales_ma_7d, sales_ma_28d, sales_std_7d |
| Seasonal indices | day_of_week_idx, week_of_year_idx, holiday_flag |
| Price signals | current_price, discount_pct, competitor_price_ratio |
| Promotion flags | is_promoted, promo_type, promo_week_number |
| Weather | temp_max, precip_flag, severe_weather_flag |
| Store attributes | store_format, region, sqft, traffic_index |
3.2 Model Architecture
Daily sales (SKU-Store-Day)
βββ Base forecast: LightGBM ensemble (7/14/28-day horizons)
βββ Seasonal adjustment: Fourier terms + holiday calendar
βββ Promotion uplift: Separate causal model (Double ML)
βββ Safety stock: Quantile regression (P95) for inventory planning
- Training cadence: Weekly retrain on 2 years of history
- Serving: Batch scoring via Fabric ML Model Endpoints; results written to
gold_retail_demand - Evaluation: WMAPE < 20% target; monitored via Power BI dashboard
3.3 Inventory Optimization
The forecast feeds downstream inventory decisions:
| Metric | Formula | Use |
| Safety stock | z_score Γ Ο_demand Γ βlead_time | Buffer against uncertainty |
| Reorder point | avg_daily_demand Γ lead_time + safety_stock | Trigger replenishment |
| Economic order qty | β(2 Γ demand Γ order_cost / holding_cost) | Optimal order size |
| Markdown timing | Sell-through curve < 60% at week 6 β markdown | Margin protection |
4. Customer 360 & Loyalty Analytics
4.1 RFM Scoring
# RFM calculation in gold layer
rfm = (
silver_sales
.groupBy("customer_id")
.agg(
datediff(current_date(), max("txn_date")).alias("recency_days"),
countDistinct("txn_id").alias("frequency"),
sum("line_total").alias("monetary")
)
)
# Score 1-5 using quintiles
for metric in ["recency_days", "frequency", "monetary"]:
rfm = rfm.withColumn(
f"{metric}_score",
ntile(5).over(Window.orderBy(col(metric)))
)
4.2 Customer Lifetime Value (CLV)
- Model: BG/NBD for purchase frequency Γ Gamma-Gamma for monetary value
- Horizon: 12-month forward-looking CLV
- Segments: Platinum (top 5%), Gold (5β20%), Silver (20β50%), Bronze (50β80%), At-Risk (bottom 20%)
4.3 Next-Best-Offer
| Signal | Weight | Source |
| Purchase history (category affinity) | 40% | silver_retail_sales |
| CLV tier | 20% | gold_retail_customer_360 |
| Promotion response history | 20% | gold_promotion_lift |
| Seasonal relevance | 10% | Calendar |
| Inventory position | 10% | silver_inventory_daily |
5.1 Methodology
Each promotion is evaluated for true incrementality using causal inference:
- Test vs. Control: Matched-store design (propensity score matching)
- Causal Impact: Bayesian structural time-series model
- Cannibalization: Cross-elasticity matrix within category
- Halo effect: Basket analysis for promoted vs. non-promoted items in same trip
| KPI | Definition |
| Incremental units | Units sold above baseline forecast |
| Lift % | (Promo sales β baseline) / baseline Γ 100 |
| Cost per incremental unit | Total promo cost / incremental units |
| Margin impact | Incremental revenue β (discount cost + cannibalization) |
| Payback period | Days until promoted SKU returns to baseline velocity |
6. Supply Chain Visibility
6.1 Vendor Scorecard
| Metric | Target | Source |
| Fill rate | β₯ 97% | PO receipts vs. orders |
| On-time delivery | β₯ 95% | Receipt date vs. requested date |
| Lead time variability | CV < 0.15 | Historical lead time distribution |
| Quality defect rate | < 0.5% | Returns tagged as defective |
6.2 Distribution Center Throughput
// DC throughput monitoring
DCOperations
| where Timestamp > ago(24h)
| summarize
UnitsReceived = sumif(Units, OperationType == "INBOUND"),
UnitsShipped = sumif(Units, OperationType == "OUTBOUND"),
AvgPickTime = avg(PickDurationMinutes)
by DCId, bin(Timestamp, 1h)
| extend Utilization = round(100.0 * UnitsShipped / Capacity, 1)
7. Power BI Dashboards
7.1 Dashboard Suite
| Dashboard | Audience | Refresh | Connection |
| Demand Command Center | Merchandising VPs | Every 15 min | Direct Lake |
| Store Inventory Pulse | Store managers | Real-time | Eventhouse + Direct Lake |
| Customer Insights | Marketing | Daily | Direct Lake |
| Promotion Effectiveness | Category managers | Post-campaign | Direct Lake |
| Supply Chain Ops | Supply chain team | Hourly | Direct Lake |
7.2 Row-Level Security
// RLS filter: Regional managers see only their region
[Region] = USERPRINCIPALNAME_REGION_LOOKUP()
// Store managers see only their store
[StoreId] = LOOKUPVALUE(
StoreAccess[StoreId],
StoreAccess[UserPrincipalName], USERPRINCIPALNAME()
)
8. PCI-DSS Implementation Details
8.1 Tokenization Flow
POS Terminal β Payment Gateway (tokenizes PAN) β Event Hub β Fabric
β
Only token + last4
enter the lakehouse
8.2 Data Classification
| Field | Classification | Handling |
card_token | Sensitive | Stored, never reversed in Fabric |
card_last4 | Internal | Display only, masked in exports |
loyalty_id | Internal | Pseudonymized in gold layer |
customer_name | PII | Hashed in silver, excluded from gold |
email | PII | Hashed in silver, excluded from gold |
txn_amount | Business | Available for analytics |
8.3 Audit Trail
All access to PCI-scoped tables is logged via Fabric audit logs routed to Eventhouse:
FabricAuditLogs
| where Timestamp > ago(24h)
| where TableName has "pos" or TableName has "card"
| summarize AccessCount = count() by UserId, TableName, Operation
| where AccessCount > 100
| order by AccessCount desc
9. Cost Model
9.1 Monthly Fabric Cost Breakdown
| Component | Configuration | Est. Cost/Month |
| Fabric F64 capacity | Shared with other workloads | $8,200 |
| Eventstream (POS + e-com) | ~50K events/min peak | $3,100 |
| Eventhouse | 500GB hot, 30-day retention | $2,800 |
| OneLake storage | ~12TB Delta Lake | $1,400 |
| ML Model Endpoints | 2 endpoints, auto-scale | $2,200 |
| Power BI Premium per-user | 25 report builders | $2,500 |
| Azure Key Vault (HSM) | Tokenization keys | $400 |
| Monitoring & logging | Eventhouse audit | $600 |
| Total | | ~$21,200 |
9.2 ROI Calculation
| Benefit | Annual Value |
| Markdown reduction (33%) | $14.0M |
| Stockout avoidance (62% reduction) | $5.2M |
| Promotion optimization (15% efficiency gain) | $2.7M |
| Labor savings (analyst productivity) | $1.1M |
| Total annual benefit | $23.0M |
| Total annual cost | $254K |
| Net ROI | ~90:1 |
10. Implementation Roadmap
| Phase | Duration | Deliverables |
| Phase 1: Foundation | 4 weeks | Eventstream ingestion, bronze layer, store/product dims |
| Phase 2: Silver + Gold | 3 weeks | Cleansed sales, customer profiles, daily aggregates |
| Phase 3: Forecasting | 4 weeks | Feature engineering, LightGBM training, batch scoring |
| Phase 4: Customer 360 | 3 weeks | RFM, CLV, segmentation, next-best-offer |
| Phase 5: Dashboards | 2 weeks | 5 Power BI dashboards with RLS |
| Phase 6: Promotion analytics | 3 weeks | Causal impact framework, cannibalization model |
| Phase 7: Supply chain | 2 weeks | Vendor scorecard, DC throughput monitoring |
Total: ~21 weeks (5 months) from kickoff to production.
11. Notebooks Reference
| Notebook | Layer | Purpose |
53_retail_pos.py | Bronze | Ingest POS transactions |
53_retail_cleansed.py | Silver | Deduplicate, validate, enrich |
53_retail_demand_forecast.py | Gold | Demand features, RFM, CLV |
Last Updated: 2026-04-27 Author: Fabric POC Team Compliance Review: PCI-DSS v4.0 scope validated