Tutorial 52: Telecom Churn Prediction & Network Quality Analytics¶

Build a complete telecom analytics platform on Microsoft Fabric for a regional wireless carrier with 3.5M subscribers and 8,000 cell sites. This tutorial covers churn prediction, network quality monitoring, and revenue assurance with CPNI compliance.

Learning Objectives¶

By the end of this tutorial you will:

Generate realistic CDR, subscriber, and cell site data
Build a medallion pipeline (Bronze -> Silver -> Gold) for telecom
Enrich CDRs with subscriber profiles and cell site metadata
Train a churn propensity model using Spark ML
Compute network quality KPIs by cell, sector, and hour
Implement CPNI compliance controls (access, consent, audit)

Architecture Overview¶

graph LR
    A[CDR / xDR Feed] -->|Eventstream| B[Eventhouse]
    A -->|Batch| C[Bronze Lakehouse]
    D[Subscriber CRM] --> C
    E[Cell Site Config] --> C
    B -->|15-min agg| C
    C -->|Cleanse + Join| F[Silver Lakehouse]
    F -->|Features + ML| G[Gold Lakehouse]
    G --> H[Power BI Direct Lake]

    subgraph Gold Tables
        G1[gold_telecom_churn]
        G2[gold_network_kpis]
        G3[gold_telecom_arpu]
    end
    G --> G1
    G --> G2
    G --> G3

Prerequisites¶

Requirement	Details
Microsoft Fabric workspace	F64 SKU or trial capacity
Lakehouses	`lh_bronze`, `lh_silver`, `lh_gold` created
Python environment	Python 3.10+, `pip install numpy pandas faker tqdm`
Fabric notebooks	Import capability enabled
CPNI training	Analyst team trained on CPNI obligations

Step 1: Generate Sample Data¶

1.1 Install Dependencies¶

cd data_generation
pip install -r requirements.txt

1.2 Generate CDR Data¶

python -m data_generation.generators.telecom.cdr_generator

This produces: - 5,000 CDR records with voice, SMS, and data sessions - 1,000 subscriber profiles with plan types and churn flags (~3% churn) - 200 cell sites (600 sectors) with coordinates and technology type

1.3 Verify Output¶

import pandas as pd

df = pd.read_parquet("data_generation/output/telecom_cdr_sample.parquet")
print(f"Records: {len(df):,}")
print(f"Call types: {df['call_type'].value_counts().to_dict()}")
print(f"Date range: {df['start_dt'].min()} to {df['start_dt'].max()}")

Expected output:

Records: 5,000
Call types: {'data': ~3000, 'voice': ~1250, 'sms': ~750}
Date range: [30 days of data]

Checkpoint: You should see ~60% data, ~25% voice, ~15% SMS distribution.

Step 2: Upload Data to Fabric¶

2.1 Upload to Lakehouse¶

Open your lh_bronze lakehouse in Fabric
Navigate to Files > output
Upload telecom_cdr_sample.parquet

2.2 Create Reference Tables¶

Upload subscriber and cell site reference data. You can generate these separately:

from data_generation.generators.telecom.cdr_generator import TelecomCDRGenerator

gen = TelecomCDRGenerator(seed=42, num_subscribers=1000, num_cell_sites=200)
gen.get_subscriber_df().to_parquet("subscribers.parquet", index=False)
gen.get_cell_site_df().to_parquet("cell_sites.parquet", index=False)

Upload both files to lh_bronze/Files/output/.

Checkpoint: Three parquet files visible in lh_bronze/Files/output/.

Step 3: Bronze Ingestion¶

3.1 Import Notebook¶

Import notebooks/bronze/56_telecom_cdr.py into your Fabric workspace.

3.2 Configure and Run¶

Attach to your lh_bronze lakehouse
Verify SOURCE_PATH matches your upload location
Run all cells

3.3 Verify Bronze Table¶

SELECT call_type, COUNT(*) as cnt
FROM lh_bronze.bronze_telecom_cdr
GROUP BY call_type
ORDER BY cnt DESC

Expected:

call_type	cnt
data	~3,000
voice	~1,250
sms	~750

Checkpoint: bronze_telecom_cdr table created with all records.

Step 4: Silver Enrichment¶

4.1 Import and Run¶

Import notebooks/silver/56_telecom_enriched.py and attach to both lh_bronze (source) and lh_silver (target) lakehouses.

4.2 What Happens¶

The Silver notebook performs:

Deduplication -- removes duplicate CDRs by cdr_id
Validation -- filters records where voice/data duration <= 0
Enrichment -- joins with subscriber profiles (plan, tenure, churn flag) and cell sites (coordinates, technology)
Quality metrics -- calculates throughput (Mbps) and data volume (MB)
Daily aggregation -- per-subscriber daily usage summaries

4.3 Verify Silver Tables¶

-- Enriched CDRs
SELECT subscriber_id, plan_type, call_type, throughput_mbps
FROM lh_silver.silver_telecom_usage
LIMIT 10

-- Daily aggregates
SELECT subscriber_id, call_date, voice_calls, data_volume_mb
FROM lh_silver.silver_telecom_daily_usage
ORDER BY call_date DESC
LIMIT 10

Checkpoint: Both silver_telecom_usage and silver_telecom_daily_usage created.

Step 5: Gold Analytics¶

5.1 Import and Run¶

Import notebooks/gold/56_telecom_churn.py and attach to lh_silver and lh_gold.

5.2 Churn Propensity Scores¶

The notebook trains a Gradient Boosted Tree classifier on subscriber features:

Feature	Description
tenure_months	Months since activation
monthly_charge	Plan price
total_voice_calls	Total calls in period
total_sms_count	SMS messages sent
data_usage_gb	Total data consumed
arpu	Estimated monthly revenue
mean_throughput_mbps	Average download speed
distinct_active_days	Days with any activity
voice_minutes	Total voice minutes

Risk tiers: - High (>0.7): Immediate retention intervention - Medium (0.4-0.7): Targeted offer via SMS/app - Low (<0.4): Standard engagement

5.3 Network KPIs¶

The network KPI table aggregates per cell/sector/hour: - Dropped call rate (%) - Average throughput (Mbps) - Total data volume (MB) - Unique subscriber count

5.4 Verify Gold Tables¶

-- Churn scores by risk tier
SELECT risk_tier, COUNT(*) as subscribers, AVG(churn_propensity) as avg_score
FROM lh_gold.gold_telecom_churn
GROUP BY risk_tier

-- Worst cells by dropped call rate
SELECT cell_id, sector, dropped_call_rate, voice_attempts
FROM lh_gold.gold_network_kpis
WHERE voice_attempts > 10
ORDER BY dropped_call_rate DESC
LIMIT 10

-- ARPU by plan
SELECT * FROM lh_gold.gold_telecom_arpu

Checkpoint: All three Gold tables created with meaningful data.

Step 6: CPNI Compliance Controls¶

6.1 Access Control¶

Configure workspace RBAC to restrict CPNI-containing tables:

Table	Access Level	Roles
bronze_telecom_cdr	Restricted	Data Engineers, CPNI Authorised
silver_telecom_usage	Restricted	Data Engineers, CPNI Authorised
gold_telecom_churn	Broader	Analysts, Marketing (no PII)
gold_network_kpis	Broader	NOC, Network Planning

The cpni_consent flag on subscriber profiles controls data usage:

-- Only use CDRs from consented subscribers for marketing analytics
SELECT s.*
FROM silver_telecom_usage s
JOIN bronze_telecom_subscribers b ON s.subscriber_id = b.subscriber_id
WHERE b.cpni_consent = true

6.3 Audit Trail¶

Enable SQL audit logs on all lakehouses. Review access patterns monthly:

FabricAuditLogs
| where TableName contains "telecom"
| where Operation in ("SELECT", "EXPORT")
| summarize AccessCount = count() by UserPrincipal, TableName, bin(Timestamp, 1h)
| where AccessCount > 100
| order by AccessCount desc

Checkpoint: RBAC configured, consent queries tested, audit logging enabled.

Step 7: Power BI Dashboard¶

7.1 Create Direct Lake Semantic Model¶

Connect to the Gold lakehouse and add: - gold_telecom_churn - gold_network_kpis - gold_telecom_arpu

7.2 Suggested Report Pages¶

Churn Command Centre -- risk tier donut, propensity histogram, top-risk subscriber list
Network Health -- cell site map with DCR overlay, throughput trends, worst-cell table
Revenue Dashboard -- ARPU by segment bar chart, usage trends, plan distribution

7.3 Key DAX Measures¶

Churn Rate =
DIVIDE(
    COUNTROWS(FILTER(gold_telecom_churn, gold_telecom_churn[label] = 1)),
    COUNTROWS(gold_telecom_churn)
)

Avg ARPU =
AVERAGE(gold_telecom_churn[arpu])

High Risk Subscribers =
COUNTROWS(FILTER(gold_telecom_churn, gold_telecom_churn[risk_tier] = "high"))

Checkpoint: Direct Lake model created, at least one report page rendering.

Validation¶

Run the unit tests to verify the data generator:

pytest validation/unit_tests/telecom/test_cdr_generator.py -v

Expected: 6 tests passed.

Summary¶

Component	File	Output
Data Generator	`data_generation/generators/telecom/cdr_generator.py`	CDRs, subscribers, cell sites
Bronze Notebook	`notebooks/bronze/56_telecom_cdr.py`	`bronze_telecom_cdr`
Silver Notebook	`notebooks/silver/56_telecom_enriched.py`	`silver_telecom_usage`, `silver_telecom_daily_usage`
Gold Notebook	`notebooks/gold/56_telecom_churn.py`	`gold_telecom_churn`, `gold_network_kpis`, `gold_telecom_arpu`
Unit Tests	`validation/unit_tests/telecom/test_cdr_generator.py`	6 passing tests
Use Case Doc	`docs/use-cases/telecom-churn-network.md`	Architecture + compliance reference

CPNI Rules (47 CFR 64)
Microsoft Fabric Real-Time Intelligence
Spark ML GBTClassifier
Tutorial 01: Casino Slot Telemetry (similar medallion pattern)

Next: Tutorial 53 >>

← Previous51 — Energy & UtilitiesRead more →Next →53 — Pharma Clinical TrialsRead more →