Tutorial 52: Telecom Churn Prediction & Network Quality Analytics¶
Build a complete telecom analytics platform on Microsoft Fabric for a regional wireless carrier with 3.5M subscribers and 8,000 cell sites. This tutorial covers churn prediction, network quality monitoring, and revenue assurance with CPNI compliance.
Learning Objectives¶
By the end of this tutorial you will:
- Generate realistic CDR, subscriber, and cell site data
- Build a medallion pipeline (Bronze -> Silver -> Gold) for telecom
- Enrich CDRs with subscriber profiles and cell site metadata
- Train a churn propensity model using Spark ML
- Compute network quality KPIs by cell, sector, and hour
- Implement CPNI compliance controls (access, consent, audit)
Architecture Overview¶
graph LR
A[CDR / xDR Feed] -->|Eventstream| B[Eventhouse]
A -->|Batch| C[Bronze Lakehouse]
D[Subscriber CRM] --> C
E[Cell Site Config] --> C
B -->|15-min agg| C
C -->|Cleanse + Join| F[Silver Lakehouse]
F -->|Features + ML| G[Gold Lakehouse]
G --> H[Power BI Direct Lake]
subgraph Gold Tables
G1[gold_telecom_churn]
G2[gold_network_kpis]
G3[gold_telecom_arpu]
end
G --> G1
G --> G2
G --> G3
Prerequisites¶
| Requirement | Details |
|---|---|
| Microsoft Fabric workspace | F64 SKU or trial capacity |
| Lakehouses | lh_bronze, lh_silver, lh_gold created |
| Python environment | Python 3.10+, pip install numpy pandas faker tqdm |
| Fabric notebooks | Import capability enabled |
| CPNI training | Analyst team trained on CPNI obligations |
Step 1: Generate Sample Data¶
1.1 Install Dependencies¶
1.2 Generate CDR Data¶
This produces: - 5,000 CDR records with voice, SMS, and data sessions - 1,000 subscriber profiles with plan types and churn flags (~3% churn) - 200 cell sites (600 sectors) with coordinates and technology type
1.3 Verify Output¶
import pandas as pd
df = pd.read_parquet("data_generation/output/telecom_cdr_sample.parquet")
print(f"Records: {len(df):,}")
print(f"Call types: {df['call_type'].value_counts().to_dict()}")
print(f"Date range: {df['start_dt'].min()} to {df['start_dt'].max()}")
Expected output:
Records: 5,000
Call types: {'data': ~3000, 'voice': ~1250, 'sms': ~750}
Date range: [30 days of data]
Checkpoint: You should see ~60% data, ~25% voice, ~15% SMS distribution.
Step 2: Upload Data to Fabric¶
2.1 Upload to Lakehouse¶
- Open your lh_bronze lakehouse in Fabric
- Navigate to Files > output
- Upload
telecom_cdr_sample.parquet
2.2 Create Reference Tables¶
Upload subscriber and cell site reference data. You can generate these separately:
from data_generation.generators.telecom.cdr_generator import TelecomCDRGenerator
gen = TelecomCDRGenerator(seed=42, num_subscribers=1000, num_cell_sites=200)
gen.get_subscriber_df().to_parquet("subscribers.parquet", index=False)
gen.get_cell_site_df().to_parquet("cell_sites.parquet", index=False)
Upload both files to lh_bronze/Files/output/.
Checkpoint: Three parquet files visible in lh_bronze/Files/output/.
Step 3: Bronze Ingestion¶
3.1 Import Notebook¶
Import notebooks/bronze/56_telecom_cdr.py into your Fabric workspace.
3.2 Configure and Run¶
- Attach to your lh_bronze lakehouse
- Verify
SOURCE_PATHmatches your upload location - Run all cells
3.3 Verify Bronze Table¶
SELECT call_type, COUNT(*) as cnt
FROM lh_bronze.bronze_telecom_cdr
GROUP BY call_type
ORDER BY cnt DESC
Expected:
| call_type | cnt |
|---|---|
| data | ~3,000 |
| voice | ~1,250 |
| sms | ~750 |
Checkpoint:
bronze_telecom_cdr table created with all records.
Step 4: Silver Enrichment¶
4.1 Import and Run¶
Import notebooks/silver/56_telecom_enriched.py and attach to both lh_bronze (source) and lh_silver (target) lakehouses.
4.2 What Happens¶
The Silver notebook performs:
- Deduplication -- removes duplicate CDRs by
cdr_id - Validation -- filters records where voice/data duration <= 0
- Enrichment -- joins with subscriber profiles (plan, tenure, churn flag) and cell sites (coordinates, technology)
- Quality metrics -- calculates throughput (Mbps) and data volume (MB)
- Daily aggregation -- per-subscriber daily usage summaries
4.3 Verify Silver Tables¶
-- Enriched CDRs
SELECT subscriber_id, plan_type, call_type, throughput_mbps
FROM lh_silver.silver_telecom_usage
LIMIT 10
-- Daily aggregates
SELECT subscriber_id, call_date, voice_calls, data_volume_mb
FROM lh_silver.silver_telecom_daily_usage
ORDER BY call_date DESC
LIMIT 10
Checkpoint: Both
silver_telecom_usage and silver_telecom_daily_usage created.
Step 5: Gold Analytics¶
5.1 Import and Run¶
Import notebooks/gold/56_telecom_churn.py and attach to lh_silver and lh_gold.
5.2 Churn Propensity Scores¶
The notebook trains a Gradient Boosted Tree classifier on subscriber features:
| Feature | Description |
|---|---|
| tenure_months | Months since activation |
| monthly_charge | Plan price |
| total_voice_calls | Total calls in period |
| total_sms_count | SMS messages sent |
| data_usage_gb | Total data consumed |
| arpu | Estimated monthly revenue |
| mean_throughput_mbps | Average download speed |
| distinct_active_days | Days with any activity |
| voice_minutes | Total voice minutes |
Risk tiers: - High (>0.7): Immediate retention intervention -
Medium (0.4-0.7): Targeted offer via SMS/app -
Low (<0.4): Standard engagement
5.3 Network KPIs¶
The network KPI table aggregates per cell/sector/hour: - Dropped call rate (%) - Average throughput (Mbps) - Total data volume (MB) - Unique subscriber count
5.4 Verify Gold Tables¶
-- Churn scores by risk tier
SELECT risk_tier, COUNT(*) as subscribers, AVG(churn_propensity) as avg_score
FROM lh_gold.gold_telecom_churn
GROUP BY risk_tier
-- Worst cells by dropped call rate
SELECT cell_id, sector, dropped_call_rate, voice_attempts
FROM lh_gold.gold_network_kpis
WHERE voice_attempts > 10
ORDER BY dropped_call_rate DESC
LIMIT 10
-- ARPU by plan
SELECT * FROM lh_gold.gold_telecom_arpu
Checkpoint: All three Gold tables created with meaningful data.
Step 6: CPNI Compliance Controls¶
6.1 Access Control¶
Configure workspace RBAC to restrict CPNI-containing tables:
| Table | Access Level | Roles |
|---|---|---|
| bronze_telecom_cdr | Restricted | Data Engineers, CPNI Authorised |
| silver_telecom_usage | Restricted | Data Engineers, CPNI Authorised |
| gold_telecom_churn | Broader | Analysts, Marketing (no PII) |
| gold_network_kpis | Broader | NOC, Network Planning |
6.2 Consent Enforcement¶
The cpni_consent flag on subscriber profiles controls data usage:
-- Only use CDRs from consented subscribers for marketing analytics
SELECT s.*
FROM silver_telecom_usage s
JOIN bronze_telecom_subscribers b ON s.subscriber_id = b.subscriber_id
WHERE b.cpni_consent = true
6.3 Audit Trail¶
Enable SQL audit logs on all lakehouses. Review access patterns monthly:
FabricAuditLogs
| where TableName contains "telecom"
| where Operation in ("SELECT", "EXPORT")
| summarize AccessCount = count() by UserPrincipal, TableName, bin(Timestamp, 1h)
| where AccessCount > 100
| order by AccessCount desc
Checkpoint: RBAC configured, consent queries tested, audit logging enabled.
Step 7: Power BI Dashboard¶
7.1 Create Direct Lake Semantic Model¶
Connect to the Gold lakehouse and add: - gold_telecom_churn - gold_network_kpis - gold_telecom_arpu
7.2 Suggested Report Pages¶
- Churn Command Centre -- risk tier donut, propensity histogram, top-risk subscriber list
- Network Health -- cell site map with DCR overlay, throughput trends, worst-cell table
- Revenue Dashboard -- ARPU by segment bar chart, usage trends, plan distribution
7.3 Key DAX Measures¶
Churn Rate =
DIVIDE(
COUNTROWS(FILTER(gold_telecom_churn, gold_telecom_churn[label] = 1)),
COUNTROWS(gold_telecom_churn)
)
Avg ARPU =
AVERAGE(gold_telecom_churn[arpu])
High Risk Subscribers =
COUNTROWS(FILTER(gold_telecom_churn, gold_telecom_churn[risk_tier] = "high"))
Checkpoint: Direct Lake model created, at least one report page rendering.
Validation¶
Run the unit tests to verify the data generator:
Expected: 6 tests passed.
Summary¶
| Component | File | Output |
|---|---|---|
| Data Generator | data_generation/generators/telecom/cdr_generator.py | CDRs, subscribers, cell sites |
| Bronze Notebook | notebooks/bronze/56_telecom_cdr.py | bronze_telecom_cdr |
| Silver Notebook | notebooks/silver/56_telecom_enriched.py | silver_telecom_usage, silver_telecom_daily_usage |
| Gold Notebook | notebooks/gold/56_telecom_churn.py | gold_telecom_churn, gold_network_kpis, gold_telecom_arpu |
| Unit Tests | validation/unit_tests/telecom/test_cdr_generator.py | 6 passing tests |
| Use Case Doc | docs/use-cases/telecom-churn-network.md | Architecture + compliance reference |
Related Resources¶
- CPNI Rules (47 CFR 64)
- Microsoft Fabric Real-Time Intelligence
- Spark ML GBTClassifier
- Tutorial 01: Casino Slot Telemetry (similar medallion pattern)
Next: Tutorial 53 >>