🔄 Tutorial 10: Teradata to Microsoft Fabric Migration¶
Last Updated: 2026-04-15 | Version: 2.0 Status: ✅ Final | Maintainer: Documentation Team
Third-party references — publicly sourced, good-faith comparison
This page references non-Microsoft products and services. That information is drawn from each vendor's publicly available documentation and is offered for honest, good-faith comparison only. This is a personal project written from a Microsoft Fabric and Azure perspective; it does not claim expertise in, or authority over, any third-party product, and nothing here is an official statement by, or endorsed by, those vendors. Capabilities, pricing, and features change often — always verify against the vendor's current official documentation. Where a third-party offering is the stronger choice, we say so plainly.
🔄 Tutorial 10: Teradata to Microsoft Fabric Migration¶
| Difficulty | ⭐⭐⭐ Advanced |
| Time | ⏱️ 120-180 minutes |
| Focus | Data Migration & Modernization |
📊 Progress Tracker¶
┌────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┐
│ 00 │ 01 │ 02 │ 03 │ 04 │ 05 │ 06 │ 07 │ 08 │ 09 │ 10 │ 11 │
│ SETUP │ BRONZE │ SILVER │ GOLD │ RT │ PBI │ PIPES │ GOV │ MIRROR │ AI/ML │TERADATA│ SAS │
├────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│ ✅ │ ✅ │ ✅ │ ✅ │ ✅ │ ✅ │ ✅ │ ✅ │ ✅ │ ✅ │ 🔵 │ ⬚ │
└────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘
▲
YOU ARE HERE
| Navigation | |
|---|---|
| ⬅️ Previous | 09-Advanced AI/ML |
| ➡️ Next | 11-SAS Connectivity |
📖 Overview¶
This tutorial provides a guide for migrating from a Teradata data warehouse to Microsoft Fabric. You will learn migration strategies, SQL translation patterns, ETL conversion techniques, and practical steps for a data warehouse modernization. Teradata-side behavior described here is based on Teradata's publicly available documentation (as of this doc's date); always verify against Teradata's current official documentation.
Teradata is a mature, capable enterprise data warehouse platform that has served large analytics workloads for decades and remains a strong choice for many organizations. Microsoft Fabric is a unified analytics platform that some teams choose as a target when consolidating onto the Microsoft stack, offering: - Unified Data Lake with OneLake and Delta Lake format - Lakehouse architecture combining data lake and warehouse patterns - Integrated analytics from ingestion to Power BI reporting - Pay-per-use pricing with capacity-based or consumption models
The right platform depends on your requirements; this playbook focuses on the mechanics of moving a workload to Fabric when that is the chosen direction.
🎯 Learning Objectives¶
By the end of this tutorial, you will be able to:
- Understand Teradata to Fabric migration architecture
- Assess your Teradata workload for migration readiness
- Configure Teradata connectivity in Fabric Data Factory
- Translate Teradata SQL to Fabric T-SQL/Spark
- Migrate data using pipelines and COPY INTO
- Convert BTEQ scripts to Fabric notebooks
- Use Microsoft Migration Assistant for schema conversion
- Validate migrated data for accuracy
- Implement ongoing synchronization patterns
- Convert advanced BTEQ patterns (MultiLoad, TPump, error handling)
- Benchmark and optimize query performance after migration
- Map Teradata data types to Fabric Warehouse and Lakehouse types
- Migrate Teradata stored procedures to parameterized notebooks
🏗️ Migration Architecture Overview¶
flowchart TB
subgraph Teradata["Teradata Source"]
TDB[(Teradata DW)]
BTEQ[BTEQ Scripts]
TPT[TPT Jobs]
ETL_OLD[Informatica/Ab Initio]
end
subgraph Migration["Migration Layer"]
ADF[Data Factory Pipelines]
NB[Fabric Notebooks]
COPY[COPY INTO]
TRANS[SQL Translator]
end
subgraph Fabric["Microsoft Fabric"]
OL[(OneLake)]
LH[Lakehouse]
WH[Data Warehouse]
SM[Semantic Model]
PBI[Power BI]
end
TDB -->|Teradata Connector| ADF
BTEQ -->|Convert| NB
TPT -->|Refactor| ADF
ETL_OLD -->|Migrate| ADF
ADF -->|Parquet/CSV| OL
TRANS -->|T-SQL| WH
COPY -->|Bulk Load| LH
NB -->|PySpark| LH
LH --> SM
WH --> SM
SM --> PBI Migration Approaches¶
| Approach | Description | Best For |
|---|---|---|
| Lift and Shift | Direct data copy with minimal changes | Quick migrations, testing |
| Refactor | Modernize SQL and processes | Long-term value, optimization |
| Replatform | Use Fabric-native features | Maximum Fabric benefits |
| Hybrid | Keep some data in Teradata | Phased migration |
📋 Prerequisites¶
Before starting this tutorial, ensure you have:
- Completed Tutorial 00: Environment Setup
- Completed Tutorials 01-03: Medallion Architecture
- Fabric workspace with F64+ capacity (recommended for large migrations)
- Access to source Teradata environment
- Teradata user with SELECT privileges on migration scope
- Network connectivity between Fabric and Teradata (VPN/ExpressRoute for on-premises)
- Teradata JDBC Driver (for custom tooling)
💡 Tip: For testing without a Teradata instance, you can use sample Teradata DDL scripts and synthetic data to practice SQL translation patterns.
🛠️ Step 1: Assess Your Teradata Environment¶
1.1 Inventory Assessment¶
Create a comprehensive inventory of objects to migrate:
-- Teradata: List all databases and sizes
SELECT
DatabaseName,
SUM(CurrentPerm) / 1024 / 1024 / 1024 AS SizeGB,
COUNT(DISTINCT TableName) AS TableCount
FROM DBC.TableSizeV
WHERE DatabaseName NOT IN ('DBC', 'SYSLIB', 'SYSSPATIAL', 'TD_SYSFNLIB')
GROUP BY DatabaseName
ORDER BY SizeGB DESC;
-- List all tables with row counts
SELECT
DatabaseName,
TableName,
TableKind,
RowCount,
CurrentPerm / 1024 / 1024 AS SizeMB
FROM DBC.TablesV t
LEFT JOIN DBC.TableSizeV s
ON t.DatabaseName = s.DatabaseName
AND t.TableName = s.TableName
WHERE t.DatabaseName = 'YOUR_DATABASE'
ORDER BY SizeMB DESC;
1.2 Identify Migration Complexity¶
| Complexity Factor | Assessment Query | Impact |
|---|---|---|
| SET Tables | Count tables using SET (unique rows only) | Requires DISTINCT handling |
| MULTISET Tables | Count MULTISET tables | Direct migration |
| Primary Index | Identify PI columns | Rethink distribution |
| Partitioning | Check partition expressions | Map to Fabric partitioning |
| Stored Procedures | Count procedures and UDFs | Manual conversion |
| Views | Catalog view dependencies | SQL translation |
| BTEQ Scripts | Inventory automation scripts | Notebook conversion |
-- Count SET vs MULTISET tables
SELECT
TableKind,
CASE WHEN TableKind = 'T' THEN 'Standard Table'
WHEN TableKind = 'O' THEN 'SET Table (No Duplicates)'
WHEN TableKind = 'V' THEN 'View'
ELSE 'Other' END AS TableType,
COUNT(*) AS TableCount
FROM DBC.TablesV
WHERE DatabaseName = 'YOUR_DATABASE'
GROUP BY TableKind;
-- List stored procedures
SELECT
DatabaseName,
ProcedureName,
CreateTimeStamp
FROM DBC.ProceduresV
WHERE DatabaseName = 'YOUR_DATABASE';
1.3 Complexity Scoring Matrix¶
quadrantChart
title Teradata Migration Complexity Assessment
x-axis Low SQL Complexity --> High SQL Complexity
y-axis Small Data Volume --> Large Data Volume
quadrant-1 Complex Migration - Phased Approach
quadrant-2 Complex but Manageable - Plan Carefully
quadrant-3 Quick Win - Simple Data Copy
quadrant-4 Data Challenge - Optimize Transfer
Simple Tables: [0.2, 0.3]
Views with Joins: [0.5, 0.4]
Stored Procedures: [0.8, 0.3]
Large Fact Tables: [0.3, 0.8]
Complex Analytics: [0.7, 0.7] 🛠️ Step 2: Configure Teradata Connectivity in Fabric¶
2.1 Create Teradata Connection in Data Factory¶

Source: Create your first pipeline in Data Factory
- Open your Fabric workspace
- Navigate to Data Factory > Manage > Linked Services
- Click + New > Teradata
2.2 Connection Configuration¶
| Setting | Description | Example |
|---|---|---|
| Name | Descriptive connection name | ls_teradata_casino_dw |
| Server | Teradata server hostname | teradata.casino.com |
| Database | Default database | CASINO_DW |
| Authentication | Auth method | Basic or LDAP |
| Username | Service account | fabric_migration_user |
| Password | Azure Key Vault reference | @Microsoft.KeyVault(...) |
2.3 Connection String Options¶
{
"name": "ls_teradata_casino_dw",
"properties": {
"type": "Teradata",
"typeProperties": {
"server": "teradata.casino.com",
"authenticationType": "Basic",
"username": "fabric_migration_user",
"password": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "ls_keyvault",
"type": "LinkedServiceReference"
},
"secretName": "teradata-password"
}
},
"connectVia": {
"referenceName": "SelfHostedIR",
"type": "IntegrationRuntimeReference"
}
}
}
⚠️ Warning: For on-premises Teradata, you must configure a Self-Hosted Integration Runtime. Install on a Windows server with network access to both Teradata and Azure.
2.4 Test Connection¶
# Fabric Notebook: Test Teradata connectivity via Data Factory
from notebookutils import mssparkutils
# Test connection using Data Factory linked service
connection_result = mssparkutils.data.connect(
linkedService="ls_teradata_casino_dw"
)
print(f"Connection Status: {connection_result}")
🛠️ Step 3: SQL Translation - Teradata to Fabric¶
3.1 Key SQL Differences¶
Teradata SQL is an ANSI SQL dialect with Teradata-specific extensions. Fabric supports T-SQL (for Warehouse) and Spark SQL (for Lakehouse).
flowchart LR
TD[Teradata SQL] --> TRANS{Translation}
TRANS --> TSQL[T-SQL<br/>Fabric Warehouse]
TRANS --> SPARK[Spark SQL<br/>Fabric Lakehouse]
subgraph Patterns["Key Patterns"]
Q[QUALIFY]
SET[SET Tables]
DT[Date/Time]
TOP[TOP N]
end 3.2 QUALIFY Clause Translation¶
Teradata's QUALIFY clause filters window function results. Fabric requires a CTE approach.
Teradata (Original):
-- Get latest record per player using QUALIFY
SELECT
player_id,
session_timestamp,
total_spend,
loyalty_tier
FROM casino.player_sessions
QUALIFY ROW_NUMBER() OVER (
PARTITION BY player_id
ORDER BY session_timestamp DESC
) = 1;
Fabric T-SQL (Converted):
-- Use CTE with ROW_NUMBER and filter in WHERE
WITH ranked_sessions AS (
SELECT
player_id,
session_timestamp,
total_spend,
loyalty_tier,
ROW_NUMBER() OVER (
PARTITION BY player_id
ORDER BY session_timestamp DESC
) AS rn
FROM casino.player_sessions
)
SELECT
player_id,
session_timestamp,
total_spend,
loyalty_tier
FROM ranked_sessions
WHERE rn = 1;
Fabric Spark SQL (Alternative):
from pyspark.sql.functions import row_number, desc
from pyspark.sql.window import Window
df = spark.table("casino.player_sessions")
window_spec = Window.partitionBy("player_id").orderBy(desc("session_timestamp"))
df_latest = df.withColumn("rn", row_number().over(window_spec)) \
.filter("rn = 1") \
.drop("rn")
3.3 SET Table Migration (Duplicate Handling)¶
Teradata SET tables automatically eliminate duplicate rows. Fabric requires explicit handling.
Teradata (Original):
-- SET table - duplicates automatically rejected
CREATE SET TABLE casino.slot_events (
event_id INTEGER,
machine_id VARCHAR(50),
event_type VARCHAR(20),
event_timestamp TIMESTAMP
) PRIMARY INDEX (event_id);
Fabric Options:
Option A: Add UNIQUE constraint (Warehouse)
-- Fabric Warehouse with unique constraint
CREATE TABLE casino.slot_events (
event_id INT NOT NULL,
machine_id VARCHAR(50),
event_type VARCHAR(20),
event_timestamp DATETIME2,
CONSTRAINT PK_slot_events PRIMARY KEY NONCLUSTERED (event_id)
);
Option B: Deduplicate in pipeline/notebook (Lakehouse)
# PySpark: Remove duplicates during ingestion
df = spark.read.parquet("Files/raw/slot_events/")
# Remove exact duplicates
df_deduped = df.dropDuplicates()
# Or remove duplicates by key
df_deduped = df.dropDuplicates(["event_id"])
df_deduped.write.mode("overwrite").saveAsTable("casino.slot_events")
3.4 Date and Time Function Translation¶
| Teradata | Fabric T-SQL | Fabric Spark SQL |
|---|---|---|
CURRENT_DATE | CAST(GETDATE() AS DATE) | current_date() |
CURRENT_TIMESTAMP | GETDATE() | current_timestamp() |
DATE '2024-01-15' | CAST('2024-01-15' AS DATE) | to_date('2024-01-15') |
ADD_MONTHS(date, n) | DATEADD(MONTH, n, date) | add_months(date, n) |
date + INTERVAL '7' DAY | DATEADD(DAY, 7, date) | date_add(date, 7) |
EXTRACT(MONTH FROM date) | MONTH(date) | month(date) |
date1 - date2 (days) | DATEDIFF(DAY, date2, date1) | datediff(date1, date2) |
3.5 TOP N and SAMPLE Translation¶
Teradata (Original):
-- Top 100 players by spend
SELECT TOP 100 WITH TIES
player_id,
SUM(total_spend) AS lifetime_spend
FROM casino.player_transactions
GROUP BY player_id
ORDER BY lifetime_spend DESC;
-- Random 1% sample
SELECT * FROM casino.large_table SAMPLE 0.01;
Fabric T-SQL (Converted):
-- TOP 100 WITH TIES equivalent using RANK
WITH ranked AS (
SELECT
player_id,
SUM(total_spend) AS lifetime_spend,
RANK() OVER (ORDER BY SUM(total_spend) DESC) AS rnk
FROM casino.player_transactions
GROUP BY player_id
)
SELECT player_id, lifetime_spend
FROM ranked
WHERE rnk <= 100;
-- TABLESAMPLE for random sampling
SELECT * FROM casino.large_table
TABLESAMPLE (1 PERCENT);
Fabric Spark (Converted):
# Top 100 with ties
from pyspark.sql.functions import sum, rank
from pyspark.sql.window import Window
df = spark.table("casino.player_transactions")
df_agg = df.groupBy("player_id").agg(sum("total_spend").alias("lifetime_spend"))
window = Window.orderBy(desc("lifetime_spend"))
df_ranked = df_agg.withColumn("rnk", rank().over(window))
df_top = df_ranked.filter("rnk <= 100").drop("rnk")
# Random sample
df_sample = spark.table("casino.large_table").sample(0.01)
3.6 Common Function Mappings¶
| Teradata Function | Fabric T-SQL | Fabric Spark SQL |
|---|---|---|
NVL(a, b) | ISNULL(a, b) or COALESCE(a, b) | coalesce(a, b) |
NULLIFZERO(x) | NULLIF(x, 0) | nullif(x, 0) |
ZEROIFNULL(x) | ISNULL(x, 0) | coalesce(x, 0) |
TRIM(x) | TRIM(x) | trim(x) |
SUBSTR(s, start, len) | SUBSTRING(s, start, len) | substring(s, start, len) |
INDEX(s, pattern) | CHARINDEX(pattern, s) | locate(pattern, s) |
OREPLACE(s, old, new) | REPLACE(s, old, new) | replace(s, old, new) |
CAST(x AS FORMAT 'YYYY-MM-DD') | FORMAT(x, 'yyyy-MM-dd') | date_format(x, 'yyyy-MM-dd') |
🛠️ Step 4: Migrate Data Using Data Factory Pipelines¶
4.1 Create Migration Pipeline¶

Source: Copy activity in Data Factory
flowchart LR
subgraph Source["Source Activities"]
LOOKUP[Lookup Tables]
FOREACH[ForEach Table]
end
subgraph Copy["Copy Activities"]
COPY_TD[Copy from Teradata]
STAGING[Stage to Files]
end
subgraph Load["Load Activities"]
COPY_LH[COPY INTO Lakehouse]
VALIDATE[Validate Counts]
end
LOOKUP --> FOREACH
FOREACH --> COPY_TD
COPY_TD --> STAGING
STAGING --> COPY_LH
COPY_LH --> VALIDATE 4.2 Pipeline JSON Definition¶
{
"name": "pl_teradata_migration_full",
"properties": {
"activities": [
{
"name": "Get Tables to Migrate",
"type": "Lookup",
"typeProperties": {
"source": {
"type": "TeradataSource",
"query": "SELECT DatabaseName, TableName FROM DBC.TablesV WHERE DatabaseName = 'CASINO_DW' AND TableKind = 'T'"
},
"dataset": {
"referenceName": "ds_teradata_query",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "ForEach Table",
"type": "ForEach",
"typeProperties": {
"items": {
"value": "@activity('Get Tables to Migrate').output.value",
"type": "Expression"
},
"isSequential": false,
"batchCount": 4,
"activities": [
{
"name": "Copy Table to Lakehouse",
"type": "Copy",
"typeProperties": {
"source": {
"type": "TeradataSource",
"query": {
"value": "SELECT * FROM @{item().DatabaseName}.@{item().TableName}",
"type": "Expression"
}
},
"sink": {
"type": "ParquetSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
}
}
}
}
]
}
}
]
}
}
4.3 Incremental Migration Pattern¶
For large tables, use watermark-based incremental loads:
# Fabric Notebook: Incremental Teradata Migration
from pyspark.sql import SparkSession
from pyspark.sql.functions import max as spark_max
from datetime import datetime
# Configuration
teradata_table = "CASINO_DW.SLOT_TRANSACTIONS"
fabric_table = "bronze.slot_transactions"
watermark_column = "transaction_timestamp"
# Get current watermark from Fabric
try:
current_watermark = spark.table(fabric_table) \
.select(spark_max(watermark_column)) \
.collect()[0][0]
except:
current_watermark = datetime(2020, 1, 1) # Default start
print(f"Current watermark: {current_watermark}")
# Read incremental data from Teradata
query = f"""
SELECT * FROM {teradata_table}
WHERE {watermark_column} > TIMESTAMP '{current_watermark}'
ORDER BY {watermark_column}
"""
# Use JDBC connection (requires Teradata JDBC driver)
df_incremental = spark.read \
.format("jdbc") \
.option("url", "jdbc:teradata://teradata.casino.com/CASINO_DW") \
.option("dbtable", f"({query}) AS subq") \
.option("user", mssparkutils.credentials.getSecret("keyvault", "teradata-user")) \
.option("password", mssparkutils.credentials.getSecret("keyvault", "teradata-password")) \
.option("driver", "com.teradata.jdbc.TeraDriver") \
.load()
# Append to Fabric table
row_count = df_incremental.count()
print(f"Rows to migrate: {row_count}")
if row_count > 0:
df_incremental.write.mode("append").saveAsTable(fabric_table)
print(f"Successfully migrated {row_count} rows")
4.4 Large Table Migration with Partitioning¶
# Fabric Notebook: Partitioned Large Table Migration
from pyspark.sql.functions import col, year, month
# Configuration
teradata_table = "CASINO_DW.SLOT_TRANSACTIONS_HISTORY"
fabric_table = "bronze.slot_transactions_history"
partition_column = "transaction_date"
# Read with partition pruning
df = spark.read \
.format("jdbc") \
.option("url", "jdbc:teradata://teradata.casino.com/CASINO_DW") \
.option("dbtable", teradata_table) \
.option("user", mssparkutils.credentials.getSecret("keyvault", "teradata-user")) \
.option("password", mssparkutils.credentials.getSecret("keyvault", "teradata-password")) \
.option("partitionColumn", partition_column) \
.option("lowerBound", "2020-01-01") \
.option("upperBound", "2024-12-31") \
.option("numPartitions", 48) \
.option("fetchsize", 100000) \
.load()
# Write with Fabric partitioning
df.write \
.mode("overwrite") \
.partitionBy("transaction_year", "transaction_month") \
.format("delta") \
.saveAsTable(fabric_table)
# Optimize the table
spark.sql(f"OPTIMIZE {fabric_table}")
🛠️ Step 5: Convert BTEQ Scripts to Fabric Notebooks¶
5.1 BTEQ to Notebook Mapping¶
| BTEQ Command | Fabric Notebook Equivalent |
|---|---|
.LOGON | Spark JDBC connection or linked service |
.SET variables | Notebook parameters or Python variables |
.EXPORT | df.write.csv() or df.write.parquet() |
.IMPORT | spark.read.csv() or COPY INTO |
.RUN FILE | %run magic or modular notebooks |
.IF / .THEN / .ELSE | Python if/else statements |
.REPEAT | Python for loops |
| SQL statements | spark.sql() or DataFrame API |
5.2 BTEQ Script Conversion Example¶
Original BTEQ Script:
.LOGON teradata.casino.com/fabric_user,password
.SET WIDTH 65531
-- Daily slot summary extraction
SELECT
CAST(transaction_date AS DATE) AS report_date,
machine_id,
COUNT(*) AS spin_count,
SUM(coin_in) AS total_coin_in,
SUM(coin_out) AS total_coin_out
FROM CASINO_DW.SLOT_TRANSACTIONS
WHERE transaction_date = CURRENT_DATE - 1
GROUP BY 1, 2
ORDER BY total_coin_in DESC;
.EXPORT DATA FILE=/data/exports/daily_slots.csv
SELECT * FROM tmp_daily_slots;
.LOGOFF
Converted Fabric Notebook:
# Cell 1: Configuration
# ====================
from datetime import datetime, timedelta
from pyspark.sql.functions import col, count, sum as spark_sum
# Parameters
report_date = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
export_path = f"Files/exports/daily_slots_{report_date}.csv"
print(f"Processing report for: {report_date}")
# Cell 2: Read Source Data
# ========================
# Option A: Read from already-migrated Lakehouse table
df_transactions = spark.table("bronze.slot_transactions")
# Option B: Read directly from Teradata (if still connected)
# df_transactions = spark.read.format("jdbc").option(...).load()
# Cell 3: Transform - Daily Slot Summary
# ======================================
df_daily_summary = df_transactions \
.filter(col("transaction_date") == report_date) \
.groupBy("machine_id") \
.agg(
count("*").alias("spin_count"),
spark_sum("coin_in").alias("total_coin_in"),
spark_sum("coin_out").alias("total_coin_out")
) \
.withColumn("report_date", lit(report_date)) \
.orderBy(col("total_coin_in").desc())
# Cell 4: Export to CSV
# =====================
df_daily_summary.coalesce(1) \
.write \
.mode("overwrite") \
.option("header", "true") \
.csv(export_path)
print(f"Exported {df_daily_summary.count()} rows to {export_path}")
# Cell 5: Save to Silver Layer
# ============================
df_daily_summary.write \
.mode("append") \
.saveAsTable("silver.daily_slot_summary")
5.3 BTEQ Export/Import Pattern Conversion¶
BTEQ FastLoad Pattern:
.LOGTABLE CASINO_DW.FL_LOG;
.LOGON teradata.casino.com/user,pwd;
.BEGIN LOADING CASINO_DW.LARGE_TABLE;
DEFINE
col1 (VARCHAR(50)),
col2 (INTEGER),
col3 (DECIMAL(18,2))
FILE=/data/import/large_file.csv;
INSERT INTO CASINO_DW.LARGE_TABLE VALUES (:col1, :col2, :col3);
.END LOADING;
.LOGOFF;
Fabric COPY INTO Equivalent:
-- Fabric Warehouse: COPY INTO for high-performance bulk load
COPY INTO casino.large_table
FROM 'https://onelake.dfs.fabric.microsoft.com/workspace/lakehouse/Files/import/large_file.csv'
WITH (
FILE_TYPE = 'CSV',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
ENCODING = 'UTF8',
MAXERRORS = 0
);
🛠️ Step 6: Use Microsoft Migration Assistant¶
6.1 Fabric Migration Assistant Overview¶

Source: Migration assistant for Fabric data warehouse
The Fabric Migration Assistant is a native tool for automating schema and data migration.
flowchart LR
subgraph Source["Source Analysis"]
SCHEMA[Schema Discovery]
ASSESS[Compatibility Assessment]
end
subgraph Convert["Conversion"]
DDL[DDL Translation]
AI[AI-Powered Fixes]
end
subgraph Deploy["Deployment"]
CREATE[Create Objects]
DATA[Copy Data]
VALIDATE[Validate]
end
SCHEMA --> ASSESS
ASSESS --> DDL
DDL --> AI
AI --> CREATE
CREATE --> DATA
DATA --> VALIDATE 6.2 Supported Migration Sources¶
| Source | Native Support | Notes |
|---|---|---|
| Azure Synapse Dedicated SQL Pool | ✅ Full support | Schema + Data |
| SQL Server | ✅ Full support | OLAP workloads |
| Teradata | 🔶 Via third-party | See partner tools |
| Snowflake | 🔶 Via connectors | Data Factory |
| Oracle | 🔶 Via third-party | Partner solutions |
6.3 Third-Party Migration Tools for Teradata¶
Third-party tools — publicly sourced, not an endorsement
The following third-party tools are summarized from each vendor's publicly available documentation. Capabilities change often — verify current behavior, support, and pricing with each vendor directly. Listing here is not an endorsement.
Datometry (Network-Level Translation)¶
Per Datometry's public documentation, Datometry provides SQL translation at the network layer, with the goal of allowing Teradata applications to run against a new target with reduced code changes.
| Feature | Capability (per vendor docs) |
|---|---|
| SQL Translation | Teradata-to-T-SQL conversion |
| Application Compatibility | Aims to run existing apps with minimal modification |
| Performance | Query optimization for the target platform |
| Migration Speed | Vendor reports reduced migration effort; validate for your workload |
Raven (Datametica/Onix)¶
Per the vendor's public documentation, Raven automates code conversion for SQL, ETL, and stored procedures.
| Feature | Capability (per vendor docs) |
|---|---|
| SQL Conversion | Teradata to Spark SQL / T-SQL |
| ETL Migration | Informatica/Ab Initio to Data Factory |
| Stored Procedures | Convert to notebooks/procedures |
| Validation | Automated testing framework |
X2X Suite (Travinto)¶
| Tool | Purpose |
|---|---|
| X2XConverter | Automated code conversion |
| X2XAnalyzer | Assess migration complexity |
| X2XValidator | Post-migration validation |
🛠️ Step 7: Validate Migrated Data¶

Source: What is a lakehouse in Microsoft Fabric?
7.1 Row Count Validation¶
# Fabric Notebook: Row Count Validation
from pyspark.sql import SparkSession
import pandas as pd
# Tables to validate
validation_tables = [
("CASINO_DW.SLOT_TRANSACTIONS", "bronze.slot_transactions"),
("CASINO_DW.PLAYER_SESSIONS", "bronze.player_sessions"),
("CASINO_DW.CAGE_OPERATIONS", "bronze.cage_operations"),
]
results = []
for teradata_table, fabric_table in validation_tables:
# Get Fabric count
fabric_count = spark.table(fabric_table).count()
# Get Teradata count (via JDBC)
teradata_count_df = spark.read \
.format("jdbc") \
.option("url", "jdbc:teradata://teradata.casino.com/CASINO_DW") \
.option("query", f"SELECT COUNT(*) AS cnt FROM {teradata_table}") \
.load()
teradata_count = teradata_count_df.collect()[0]["cnt"]
# Calculate difference
diff = abs(teradata_count - fabric_count)
diff_pct = (diff / teradata_count * 100) if teradata_count > 0 else 0
status = "✅ PASS" if diff_pct < 0.01 else "❌ FAIL"
results.append({
"Source Table": teradata_table,
"Target Table": fabric_table,
"Source Count": teradata_count,
"Target Count": fabric_count,
"Difference": diff,
"Diff %": f"{diff_pct:.4f}%",
"Status": status
})
# Display results
df_results = pd.DataFrame(results)
display(df_results)
7.2 Data Checksum Validation¶
# Fabric Notebook: Checksum Validation
from pyspark.sql.functions import sum as spark_sum, count, md5, concat_ws
def calculate_table_checksum(df, key_columns, numeric_columns):
"""Calculate a checksum for data validation."""
return df.agg(
count("*").alias("row_count"),
*[spark_sum(col).alias(f"sum_{col}") for col in numeric_columns]
).collect()[0]
# Compare slot_transactions
df_fabric = spark.table("bronze.slot_transactions")
fabric_checksum = calculate_table_checksum(
df_fabric,
key_columns=["transaction_id"],
numeric_columns=["coin_in", "coin_out", "jackpot_contribution"]
)
print("Fabric Checksum:")
print(f" Row Count: {fabric_checksum['row_count']:,}")
print(f" Sum Coin In: ${fabric_checksum['sum_coin_in']:,.2f}")
print(f" Sum Coin Out: ${fabric_checksum['sum_coin_out']:,.2f}")
# Compare against Teradata (run equivalent query on source)
7.3 Sample Data Comparison¶
# Fabric Notebook: Sample Row Comparison
from pyspark.sql.functions import col
# Get sample records from Fabric
df_fabric_sample = spark.table("bronze.slot_transactions") \
.filter(col("transaction_date") == "2024-01-15") \
.limit(100) \
.toPandas()
# Get same sample from Teradata
query = """
SELECT * FROM CASINO_DW.SLOT_TRANSACTIONS
WHERE transaction_date = DATE '2024-01-15'
SAMPLE 100
"""
df_teradata_sample = spark.read.format("jdbc") \
.option("query", query) \
.load() \
.toPandas()
# Compare schemas
print("Schema Comparison:")
print(f"Fabric columns: {list(df_fabric_sample.columns)}")
print(f"Teradata columns: {list(df_teradata_sample.columns)}")
# Merge and compare (on key column)
comparison = df_fabric_sample.merge(
df_teradata_sample,
on="transaction_id",
how="outer",
suffixes=("_fabric", "_teradata"),
indicator=True
)
print(f"\nMatching records: {len(comparison[comparison['_merge'] == 'both'])}")
print(f"Fabric only: {len(comparison[comparison['_merge'] == 'left_only'])}")
print(f"Teradata only: {len(comparison[comparison['_merge'] == 'right_only'])}")
🛠️ Step 8: Ongoing Synchronization Options¶
8.1 Synchronization Patterns¶
| Pattern | Latency | Use Case | Implementation |
|---|---|---|---|
| One-time Migration | N/A | Cutover | Data Factory full copy |
| Daily Batch | 24 hours | Reporting | Scheduled pipeline |
| Hourly Incremental | 1 hour | Near real-time | Watermark-based |
| Real-time CDC | Minutes | Live analytics | Open Mirroring + Qlik |
8.2 Open Mirroring with Qlik Replicate¶
For real-time Teradata replication to Fabric:
flowchart LR
TD[(Teradata)] -->|CDC Logs| QLIK[Qlik Replicate]
QLIK -->|Parquet Files| OL[(OneLake<br/>Landing Zone)]
OL -->|Auto-Ingest| DT[Delta Tables]
DT -->|Query| LH[Lakehouse] Configuration Steps: 1. Install Qlik Replicate with Teradata source connector 2. Configure OneLake as target (Open Mirroring landing zone) 3. Map Teradata schemas to Fabric tables 4. Enable CDC capture on Teradata tables 5. Start replication task
💡 Tip: Per Qlik's public documentation, Qlik Replicate supports many heterogeneous sources including Teradata, and can serve as a CDC option for complex environments. Verify current source support and licensing with Qlik.
🛠️ Step 9: Advanced BTEQ Conversion Scripts¶
Step 5 covered basic BTEQ-to-notebook conversion. This section tackles three advanced Teradata loading patterns frequently found in casino data warehouses: MultiLoad merges, TPump continuous feeds, and error-handling scripts.
9.1 MultiLoad Pattern → Delta Lake MERGE¶
Teradata MultiLoad performs upsert operations against large tables. The Fabric equivalent uses Delta Lake MERGE INTO inside a notebook.
Original BTEQ MultiLoad Script:
.LOGTABLE CASINO_DW.ML_LOG;
.LOGON teradata.casino.com/etl_user,password;
.BEGIN MLOAD TABLES CASINO_DW.PLAYER_LOYALTY_ACCOUNTS;
-- Apply table (staging)
.LAYOUT player_update;
.FIELD player_id * VARCHAR(20);
.FIELD loyalty_points * INTEGER;
.FIELD tier_status * VARCHAR(10);
.FIELD last_visit * TIMESTAMP;
.DML LABEL upd_existing;
UPDATE CASINO_DW.PLAYER_LOYALTY_ACCOUNTS
SET loyalty_points = :loyalty_points,
tier_status = :tier_status,
last_visit = :last_visit
WHERE player_id = :player_id;
.DML LABEL ins_new;
INSERT INTO CASINO_DW.PLAYER_LOYALTY_ACCOUNTS
VALUES (:player_id, :loyalty_points, :tier_status, :last_visit);
.IMPORT INFILE /data/feeds/loyalty_daily_update.csv
LAYOUT player_update
APPLY upd_existing
APPLY ins_new;
.END MLOAD;
.LOGOFF;
Converted Fabric Notebook (Delta Lake MERGE):
# Cell 1: Configuration
# ====================
from pyspark.sql.functions import col, current_timestamp
from delta.tables import DeltaTable
staging_path = "Files/feeds/loyalty_daily_update.csv"
target_table = "silver.player_loyalty_accounts"
# Cell 2: Read staging data (replaces .IMPORT INFILE)
# ====================================================
df_updates = spark.read \
.option("header", "true") \
.option("inferSchema", "true") \
.csv(staging_path) \
.withColumn("_loaded_at", current_timestamp())
print(f"Staging records to merge: {df_updates.count():,}")
# Cell 3: Delta Lake MERGE (replaces MultiLoad DML)
# ==================================================
target = DeltaTable.forName(spark, target_table)
target.alias("tgt").merge(
df_updates.alias("src"),
"tgt.player_id = src.player_id"
).whenMatchedUpdate(set={
"loyalty_points": "src.loyalty_points",
"tier_status": "src.tier_status",
"last_visit": "src.last_visit",
"_loaded_at": "src._loaded_at"
}).whenNotMatchedInsert(values={
"player_id": "src.player_id",
"loyalty_points": "src.loyalty_points",
"tier_status": "src.tier_status",
"last_visit": "src.last_visit",
"_loaded_at": "src._loaded_at"
}).execute()
# Cell 4: Post-merge validation (replaces .LOGTABLE check)
# =========================================================
history = spark.sql(f"DESCRIBE HISTORY {target_table} LIMIT 1").collect()[0]
print(f"Operation: {history['operation']}")
print(f"Rows updated: {history['operationMetrics']['numTargetRowsUpdated']}")
print(f"Rows inserted: {history['operationMetrics']['numTargetRowsInserted']}")
9.2 TPump (Continuous Load) → Eventstreams + Structured Streaming¶
Teradata TPump provides continuous micro-batch loading for near-real-time data. The Fabric equivalent combines Eventstreams with Spark Structured Streaming.
flowchart LR
subgraph Teradata["Teradata Pattern"]
SRC_FILE[Flat Files<br/>Continuous Drop] -->|TPump| TD_TBL[(Target Table)]
end
subgraph Fabric["Fabric Pattern"]
EH[Event Hub /<br/>Eventstream] -->|Structured<br/>Streaming| DELTA[(Delta Table<br/>Bronze Layer)]
DELTA -->|Triggered<br/>Notebook| SILVER[(Silver Table)]
end Original BTEQ TPump Script:
.LOGON teradata.casino.com/etl_user,password;
.BEGIN LOAD SESSIONS 4
TENACITY 4
SLEEP 60
PACK 2000;
.LAYOUT slot_events;
.FIELD machine_id * VARCHAR(20);
.FIELD event_type * VARCHAR(30);
.FIELD event_ts * TIMESTAMP;
.FIELD coin_in * DECIMAL(12,2);
.FIELD coin_out * DECIMAL(12,2);
.DML LABEL ins_slot_event;
INSERT INTO CASINO_DW.SLOT_EVENTS_RT
VALUES (:machine_id, :event_type, :event_ts, :coin_in, :coin_out);
.IMPORT INFILE /data/realtime/slot_feed_*.csv
LAYOUT slot_events
APPLY ins_slot_event;
.END LOAD;
.LOGOFF;
Converted Fabric Notebook (Structured Streaming):
# Cell 1: Configuration
# ====================
from pyspark.sql.functions import col, from_json, current_timestamp
from pyspark.sql.types import (
StructType, StructField, StringType, TimestampType, DecimalType
)
# Eventstream connection (replaces TPump INFILE)
eh_connection_string = mssparkutils.credentials.getSecret(
"keyvault", "slot-events-eventhub-conn"
)
target_table = "bronze.slot_events_rt"
checkpoint_path = "Files/_checkpoints/slot_events_rt"
# Cell 2: Define schema (replaces .LAYOUT)
# =========================================
slot_event_schema = StructType([
StructField("machine_id", StringType(), False),
StructField("event_type", StringType(), False),
StructField("event_ts", TimestampType(), False),
StructField("coin_in", DecimalType(12, 2), True),
StructField("coin_out", DecimalType(12, 2), True)
])
# Cell 3: Structured Streaming (replaces TPump continuous load)
# =============================================================
df_stream = spark.readStream \
.format("eventhubs") \
.option("eventhubs.connectionString",
sc._jvm.org.apache.spark.eventhubs.EventHubsUtils
.encrypt(eh_connection_string)) \
.option("eventhubs.startingPosition",
'{"offset":"-1","seqNo":-1,"enqueuedTime":null,"isInclusive":true}') \
.load()
# Parse JSON payload into typed columns
df_parsed = df_stream \
.select(from_json(col("body").cast("string"), slot_event_schema).alias("data")) \
.select("data.*") \
.withColumn("_ingest_ts", current_timestamp())
# Cell 4: Write to Delta Lake (replaces INSERT DML)
# ==================================================
# Micro-batch append mirrors TPump SLEEP/PACK cadence
query = df_parsed.writeStream \
.format("delta") \
.outputMode("append") \
.option("checkpointLocation", checkpoint_path) \
.trigger(processingTime="60 seconds") \
.toTable(target_table)
print(f"Streaming query started: {query.id}")
print(f"Processing every 60 seconds (equivalent to TPump SLEEP 60)")
9.3 BTEQ Error Handling (.IF ACTIVITYCOUNT) → Python try/except¶
Teradata BTEQ uses .IF ACTIVITYCOUNT and .IF ERRORCODE for flow control. The Fabric equivalent uses Python exception handling with explicit row count validation.
Original BTEQ Script with Error Handling:
.LOGON teradata.casino.com/etl_user,password;
-- Step 1: Load daily cage transactions into staging
DELETE FROM CASINO_DW.STG_CAGE_TRANSACTIONS;
.IF ERRORCODE <> 0 THEN .GOTO ERROR_EXIT;
INSERT INTO CASINO_DW.STG_CAGE_TRANSACTIONS
SELECT * FROM CASINO_DW.CAGE_TRANSACTIONS_RAW
WHERE transaction_date = CURRENT_DATE - 1;
.IF ACTIVITYCOUNT = 0 THEN .GOTO NO_DATA_EXIT;
-- Step 2: Apply CTR flagging (>$10,000)
UPDATE CASINO_DW.STG_CAGE_TRANSACTIONS
SET ctr_flag = 'Y'
WHERE transaction_amount >= 10000;
.IF ERRORCODE <> 0 THEN .GOTO ERROR_EXIT;
-- Step 3: Move to production
INSERT INTO CASINO_DW.CAGE_TRANSACTIONS_DAILY
SELECT * FROM CASINO_DW.STG_CAGE_TRANSACTIONS;
.IF ACTIVITYCOUNT <> (SELECT COUNT(*) FROM CASINO_DW.STG_CAGE_TRANSACTIONS)
THEN .GOTO COUNT_MISMATCH;
.LOGOFF;
.QUIT 0;
.LABEL ERROR_EXIT;
.LOGOFF;
.QUIT 8;
.LABEL NO_DATA_EXIT;
.LOGOFF;
.QUIT 4;
.LABEL COUNT_MISMATCH;
.LOGOFF;
.QUIT 12;
Converted Fabric Notebook (Python try/except):
# Cell 1: Configuration
# ====================
from pyspark.sql.functions import col, lit, current_timestamp, when
from datetime import datetime, timedelta
import sys
report_date = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
staging_table = "bronze.stg_cage_transactions"
target_table = "silver.cage_transactions_daily"
ctr_threshold = 10000
exit_code = 0 # Mirrors .QUIT return codes
# Cell 2: Step 1 - Load staging (replaces DELETE + INSERT + .IF ACTIVITYCOUNT)
# =============================================================================
try:
# Clear staging (replaces DELETE FROM)
spark.sql(f"TRUNCATE TABLE {staging_table}")
# Load from raw
df_raw = spark.table("bronze.cage_transactions_raw") \
.filter(col("transaction_date") == report_date)
staging_count = df_raw.count()
if staging_count == 0:
print(f"WARNING: No data found for {report_date}")
exit_code = 4 # Mirrors .QUIT 4 (NO_DATA_EXIT)
mssparkutils.notebook.exit(f"NO_DATA|exit_code={exit_code}")
df_raw.write.mode("overwrite").saveAsTable(staging_table)
print(f"Step 1 PASS: {staging_count:,} rows loaded to staging")
except Exception as e:
print(f"Step 1 FAIL: {str(e)}")
exit_code = 8 # Mirrors .QUIT 8 (ERROR_EXIT)
raise
# Cell 3: Step 2 - CTR flagging (replaces UPDATE + .IF ERRORCODE)
# ================================================================
try:
spark.sql(f"""
UPDATE {staging_table}
SET ctr_flag = 'Y'
WHERE transaction_amount >= {ctr_threshold}
""")
ctr_count = spark.table(staging_table) \
.filter(col("ctr_flag") == "Y").count()
print(f"Step 2 PASS: {ctr_count:,} transactions flagged for CTR")
except Exception as e:
print(f"Step 2 FAIL: {str(e)}")
exit_code = 8
raise
# Cell 4: Step 3 - Move to production (replaces INSERT + .IF ACTIVITYCOUNT check)
# =================================================================================
try:
df_staging = spark.table(staging_table)
pre_count = df_staging.count()
df_staging.write.mode("append").saveAsTable(target_table)
# Validate row count (replaces .IF ACTIVITYCOUNT <> SELECT COUNT)
post_count = spark.table(target_table) \
.filter(col("transaction_date") == report_date).count()
if post_count != pre_count:
print(f"Step 3 FAIL: Count mismatch - staging={pre_count:,}, target={post_count:,}")
exit_code = 12 # Mirrors .QUIT 12 (COUNT_MISMATCH)
raise ValueError(f"Row count mismatch: {pre_count} vs {post_count}")
print(f"Step 3 PASS: {post_count:,} rows written to {target_table}")
print(f"Pipeline completed successfully (exit_code={exit_code})")
except ValueError:
raise
except Exception as e:
print(f"Step 3 FAIL: {str(e)}")
exit_code = 8
raise
💡 Tip: Map BTEQ exit codes to notebook exit values so downstream pipeline activities can branch on success/failure just as legacy BTEQ schedulers did.
🛠️ Step 10: Performance Benchmarks¶
After migrating from Teradata to Microsoft Fabric, validating query performance is critical. This section provides benchmark patterns and optimization strategies for common casino analytics queries.
10.1 Query Performance Comparison¶
The following table shows illustrative, non-authoritative benchmark figures for a 500 million-row slot_transactions fact table. These are hypothetical reference points for planning, not measured results, and Teradata is a high-performance platform whose tuned numbers will often differ from those shown. Teradata numbers assume a 10-node system; Fabric numbers use an F64 capacity with V-ORDER enabled. Always benchmark both platforms with your own data and configuration.
| Query Pattern | Example | Teradata (10-node) | Fabric Lakehouse (F64) | Fabric Warehouse (F64) | Notes |
|---|---|---|---|---|---|
| Point lookup (indexed) | Single transaction by ID | 0.3 s | 0.8 s | 0.4 s | Teradata PI advantage; Fabric improves with Z-ORDER |
| Full scan + aggregation | Daily revenue across all machines | 12 s | 9 s | 11 s | Fabric columnar + V-ORDER performs well on scans |
| Complex join + window | Player lifetime value with ranking | 45 s | 28 s | 35 s | Spark scales well with window functions |
| QUALIFY-equivalent | Latest session per player | 18 s | 14 s | 16 s | CTE approach in Fabric is efficient |
| Time-series rollup | Hourly coin-in aggregation, 90 days | 22 s | 10 s | 15 s | Delta Lake partition pruning is very effective |
| Ad-hoc with LIKE | Search player notes by keyword | 8 s | 6 s | 5 s | Fabric predicate pushdown on string columns |
⚠️ Note: Actual performance varies with data distribution, cluster load, and query complexity. Always benchmark with your own data.
10.2 Benchmark Query Examples¶
Point Lookup:
-- Fabric Warehouse: Single transaction lookup
SELECT transaction_id, machine_id, coin_in, coin_out, transaction_timestamp
FROM casino.slot_transactions
WHERE transaction_id = 'TXN-2024-00438271';
Full Scan + Aggregation:
-- Fabric Warehouse: Daily revenue summary
SELECT
CAST(transaction_timestamp AS DATE) AS report_date,
COUNT(*) AS total_spins,
SUM(coin_in) AS total_coin_in,
SUM(coin_out) AS total_coin_out,
SUM(coin_in) - SUM(coin_out) AS net_revenue
FROM casino.slot_transactions
WHERE transaction_timestamp >= DATEADD(DAY, -30, GETDATE())
GROUP BY CAST(transaction_timestamp AS DATE)
ORDER BY report_date;
Complex Join + Window Function:
# Fabric Spark: Player lifetime value with tier ranking
from pyspark.sql.functions import sum as spark_sum, count, datediff, \
current_date, dense_rank, col
from pyspark.sql.window import Window
df_players = spark.table("silver.player_profiles")
df_txns = spark.table("silver.slot_transactions")
df_ltv = df_txns.groupBy("player_id").agg(
spark_sum("coin_in").alias("lifetime_coin_in"),
spark_sum("coin_out").alias("lifetime_coin_out"),
count("*").alias("total_sessions"),
spark_sum(col("coin_in") - col("coin_out")).alias("lifetime_net_revenue")
)
# Rank players by net revenue (replaces Teradata QUALIFY with DENSE_RANK)
window_spec = Window.orderBy(col("lifetime_net_revenue").desc())
df_ranked = df_ltv.withColumn("revenue_rank", dense_rank().over(window_spec))
df_result = df_ranked.join(df_players, "player_id") \
.select("player_id", "player_name", "loyalty_tier",
"lifetime_coin_in", "lifetime_net_revenue", "revenue_rank")
display(df_result.filter("revenue_rank <= 100"))
Time-Series Rollup:
-- Fabric Warehouse: Hourly coin-in rollup for floor monitoring
SELECT
CAST(transaction_timestamp AS DATE) AS txn_date,
DATEPART(HOUR, transaction_timestamp) AS txn_hour,
zone_id,
COUNT(*) AS spin_count,
SUM(coin_in) AS hourly_coin_in,
AVG(coin_in) AS avg_bet_size
FROM casino.slot_transactions
WHERE transaction_timestamp >= DATEADD(DAY, -90, GETDATE())
GROUP BY
CAST(transaction_timestamp AS DATE),
DATEPART(HOUR, transaction_timestamp),
zone_id
ORDER BY txn_date, txn_hour, zone_id;
10.3 Optimization Recommendations¶
| Optimization | When to Use | Fabric Command | Impact |
|---|---|---|---|
| Z-ORDER | Point lookups and filtered scans on known columns | OPTIMIZE table ZORDER BY (col) | 2-10x faster lookups |
| Partition pruning | Date-partitioned fact tables with time-range filters | Partition by transaction_year, transaction_month | Skips irrelevant files entirely |
| Predicate pushdown | Filters pushed to storage layer before read | Automatic with Delta Lake; ensure filters are in WHERE | Reduces I/O dramatically |
| V-ORDER | Direct Lake semantic models over large tables | Enabled by default in Fabric; verify with table properties | Optimal for Power BI scans |
| OPTIMIZE (bin-compaction) | After heavy append workloads (post-migration) | OPTIMIZE table | Reduces small file overhead |
| Caching | Repeated interactive queries on same dataset | spark.catalog.cacheTable("table") | Near-instant repeated queries |
Apply Z-ORDER after migration:
-- Optimize the slot_transactions table for common query patterns
OPTIMIZE casino.slot_transactions
ZORDER BY (machine_id, transaction_timestamp);
-- Verify optimization
DESCRIBE DETAIL casino.slot_transactions;
V-ORDER for Direct Lake:
V-ORDER is a write-time optimization that sorts data within Parquet row groups for maximum scan performance in Direct Lake mode. Fabric applies V-ORDER by default, but you can verify and force rewrite if migrated data was loaded without it.
# Force V-ORDER rewrite on a migrated table
df = spark.table("gold.daily_slot_summary")
df.write \
.mode("overwrite") \
.option("vorder", "true") \
.format("delta") \
.saveAsTable("gold.daily_slot_summary")
print("V-ORDER rewrite complete - Direct Lake scans optimized")
🛠️ Step 11: Data Type Mapping Reference¶
When converting Teradata DDL to Fabric, data type mapping is one of the most error-prone steps. This reference covers every common Teradata type and its Fabric equivalents.
11.1 Numeric Types¶
| Teradata Type | Range / Precision | Fabric Warehouse (T-SQL) | Fabric Lakehouse (Spark) | Notes |
|---|---|---|---|---|
BYTEINT | -128 to 127 | TINYINT | ByteType | T-SQL TINYINT is unsigned (0-255); add CHECK constraint if negative values exist |
SMALLINT | -32,768 to 32,767 | SMALLINT | ShortType | Direct mapping |
INTEGER | -2.1B to 2.1B | INT | IntegerType | Direct mapping |
BIGINT | -9.2E18 to 9.2E18 | BIGINT | LongType | Direct mapping |
DECIMAL(p,s) / NUMERIC(p,s) | Up to (38,38) | DECIMAL(p,s) | DecimalType(p,s) | Direct mapping; always preserve precision for financial amounts (coin_in, coin_out) |
FLOAT / REAL | ~15 significant digits | FLOAT | DoubleType | Avoid for financial data; use DECIMAL instead |
NUMBER | Flexible precision | DECIMAL(38,18) | DecimalType(38,18) | Teradata NUMBER has variable precision; choose an explicit scale for Fabric |
11.2 Character Types¶
| Teradata Type | Description | Fabric Warehouse (T-SQL) | Fabric Lakehouse (Spark) | Notes |
|---|---|---|---|---|
CHAR(n) | Fixed-length, up to 64,000 | CHAR(n) (max 8,000) | StringType | If n > 8,000 use VARCHAR(MAX) in Warehouse |
VARCHAR(n) | Variable-length, up to 64,000 | VARCHAR(n) (max 8,000) | StringType | If n > 8,000 use VARCHAR(MAX) |
CLOB | Up to 2 GB text | VARCHAR(MAX) | StringType | Verify string length fits within Spark limits |
GRAPHIC(n) | Fixed-length Unicode | NCHAR(n) | StringType | Spark StringType is always UTF-8 |
VARGRAPHIC(n) | Variable-length Unicode | NVARCHAR(n) | StringType | Use NVARCHAR for multilingual player names |
11.3 Date and Time Types¶
| Teradata Type | Format / Range | Fabric Warehouse (T-SQL) | Fabric Lakehouse (Spark) | Notes |
|---|---|---|---|---|
DATE | YYYY-MM-DD | DATE | DateType | Direct mapping |
TIME | HH:MI:SS.ssssss | TIME | StringType (no native TIME) | Store as string or convert to TIMESTAMP in Spark |
TIMESTAMP | Date + Time, microsecond | DATETIME2(6) | TimestampType | Use DATETIME2 not DATETIME to preserve microsecond precision |
TIMESTAMP WITH TIME ZONE | Timestamp + TZ | DATETIMEOFFSET | TimestampType (UTC) | Spark stores UTC; add separate TZ column if offset needed |
INTERVAL YEAR | Year span | INT (year count) | IntegerType | No direct interval type; store as integer count |
INTERVAL DAY TO SECOND | Day-second span | INT (total seconds) | LongType | Convert to seconds for calculations |
11.4 Special / Complex Types¶
| Teradata Type | Description | Fabric Warehouse (T-SQL) | Fabric Lakehouse (Spark) | Notes |
|---|---|---|---|---|
PERIOD(DATE) | Date range (begin, end) | Two columns: start_date DATE, end_date DATE | Two columns: DateType | Teradata PERIOD has no Fabric equivalent; split into two columns |
PERIOD(TIMESTAMP) | Timestamp range | Two columns: start_ts DATETIME2, end_ts DATETIME2 | Two columns: TimestampType | Same splitting pattern |
ARRAY | Ordered collection | VARCHAR(MAX) (JSON) | ArrayType(elementType) | Serialize as JSON in Warehouse; use native ArrayType in Spark |
JSON / JSON(n) | JSON document | VARCHAR(MAX) + ISJSON() | StringType (parse with from_json) | Fabric Warehouse has JSON functions; Spark has full JSON support |
BLOB | Binary large object up to 2 GB | VARBINARY(MAX) | BinaryType | Store externally in OneLake Files for objects > 100 MB |
XML | XML document | VARCHAR(MAX) | StringType | Parse with T-SQL XML methods or Spark xpath functions |
11.5 Conversion Gotchas¶
⚠️ Critical items to watch for during casino data migration:
-
BYTEINT signedness - Teradata BYTEINT is signed (-128 to 127); T-SQL TINYINT is unsigned (0 to 255). If your Teradata column stores negative values (e.g., adjustment codes), use SMALLINT in Fabric instead.
-
TIMESTAMP precision - Teradata TIMESTAMP stores up to 6 fractional digits. Use
DATETIME2(6)in Fabric Warehouse, notDATETIME(which only has 3.33 ms precision). -
PERIOD splitting - Teradata PERIOD types (used for player comp validity ranges, promotional periods) have no Fabric equivalent. Always split into two explicit columns and add a CHECK constraint:
end_date >= start_date. -
NUMBER without precision - Teradata
NUMBERwithout explicit precision is flexible. Fabric requires explicit precision. Audit your data to determine max scale, then useDECIMAL(38, n). -
Character set differences - Teradata GRAPHIC/VARGRAPHIC types use fixed double-byte encoding. Fabric NVARCHAR uses UTF-16. Character counts may differ; test string lengths post-migration.
-
NULL semantics in SET tables - Teradata SET tables treat rows with NULLs as potential duplicates differently than Fabric. After migration, verify deduplication logic handles NULLs correctly.
# Utility: Auto-detect and report type mapping issues post-migration
from pyspark.sql.functions import col, min as spark_min, max as spark_max
def audit_type_mapping(table_name, checks):
"""Run type-mapping audit checks on a migrated table."""
df = spark.table(table_name)
print(f"Type Mapping Audit: {table_name}")
print("=" * 60)
for col_name, expected_type, check_fn in checks:
actual_type = str(df.schema[col_name].dataType)
type_ok = expected_type in actual_type
range_result = check_fn(df, col_name) if check_fn else "N/A"
status = "PASS" if type_ok else "FAIL"
print(f" [{status}] {col_name}: {actual_type} "
f"(expected {expected_type}) | Range: {range_result}")
# Example: Audit slot_transactions
def check_range(df, col_name):
row = df.select(spark_min(col(col_name)), spark_max(col(col_name))).collect()[0]
return f"[{row[0]} .. {row[1]}]"
audit_type_mapping("bronze.slot_transactions", [
("coin_in", "Decimal", check_range),
("coin_out", "Decimal", check_range),
("transaction_timestamp", "Timestamp", check_range),
("machine_id", "String", None),
])
🛠️ Step 12: Stored Procedure Migration¶
Teradata stored procedures encapsulate business logic that runs inside the database engine. In Fabric, these become parameterized notebooks or Warehouse stored procedures. This section demonstrates the conversion pattern for casino-specific business logic.
12.1 Migration Strategy¶
flowchart TB
subgraph Teradata["Teradata Stored Procedure"]
SP["CREATE PROCEDURE<br/>sp_daily_slot_hold"]
CALL["CALL sp_daily_slot_hold('2024-01-15')"]
SCHED["Teradata Scheduler<br/>(cron / TWS)"]
end
subgraph Fabric["Fabric Equivalent"]
NB["Parameterized Notebook<br/>nb_daily_slot_hold"]
PIPE["Data Factory Pipeline<br/>or Scheduled Run"]
API["Notebook API<br/>mssparkutils.notebook.run()"]
end
SP -->|Convert| NB
CALL -->|Replace| API
SCHED -->|Replace| PIPE 12.2 Stored Procedure Conversion Example¶
The following Teradata stored procedure calculates daily slot hold percentages per machine and zone, then writes results to a summary table.
Original Teradata Stored Procedure:
REPLACE PROCEDURE CASINO_DW.sp_daily_slot_hold (
IN p_report_date DATE,
OUT p_machines_processed INTEGER,
OUT p_status VARCHAR(20)
)
BEGIN
DECLARE v_row_count INTEGER DEFAULT 0;
DECLARE v_error_msg VARCHAR(200);
-- Handler for SQL exceptions (SQLCODE/SQLSTATE)
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
SET p_status = 'FAILED';
INSERT INTO CASINO_DW.ETL_ERROR_LOG
VALUES (CURRENT_TIMESTAMP, 'sp_daily_slot_hold',
SQLSTATE, SQLCODE, v_error_msg);
END;
-- Step 1: Clear staging for the target date
DELETE FROM CASINO_DW.STG_SLOT_HOLD
WHERE report_date = p_report_date;
-- Step 2: Calculate hold percentage per machine
INSERT INTO CASINO_DW.STG_SLOT_HOLD
SELECT
p_report_date AS report_date,
t.machine_id,
m.zone_id,
m.denomination,
SUM(t.coin_in) AS total_coin_in,
SUM(t.coin_out) AS total_coin_out,
CASE WHEN SUM(t.coin_in) > 0
THEN (SUM(t.coin_in) - SUM(t.coin_out)) / SUM(t.coin_in) * 100
ELSE 0
END AS hold_percentage
FROM CASINO_DW.SLOT_TRANSACTIONS t
JOIN CASINO_DW.MACHINE_MASTER m ON t.machine_id = m.machine_id
WHERE CAST(t.transaction_timestamp AS DATE) = p_report_date
GROUP BY t.machine_id, m.zone_id, m.denomination;
-- Capture row count (ACTIVITY_COUNT equivalent)
SET v_row_count = ACTIVITY_COUNT;
SET p_machines_processed = v_row_count;
-- Step 3: Dynamic SQL - merge into production table
EXECUTE IMMEDIATE
'INSERT INTO CASINO_DW.DAILY_SLOT_HOLD '
|| 'SELECT * FROM CASINO_DW.STG_SLOT_HOLD '
|| 'WHERE report_date = ''' || CAST(p_report_date AS VARCHAR(10)) || '''';
SET p_status = 'SUCCESS';
-- Log completion
INSERT INTO CASINO_DW.ETL_AUDIT_LOG
VALUES (CURRENT_TIMESTAMP, 'sp_daily_slot_hold',
p_report_date, v_row_count, p_status);
END;
-- Invocation
CALL CASINO_DW.sp_daily_slot_hold(DATE '2024-01-15', ?, ?);
Converted Fabric Notebook (nb_daily_slot_hold):
# Cell 1: Parameters (replaces IN/OUT procedure parameters)
# ==========================================================
# These are notebook parameters - configurable via pipeline or API
p_report_date = "2024-01-15" # IN parameter
# OUT values returned via mssparkutils.notebook.exit()
# Cell 2: Imports and Setup
# =========================
from pyspark.sql.functions import (
col, sum as spark_sum, when, lit, current_timestamp, count
)
from datetime import datetime
import json
staging_table = "silver.stg_slot_hold"
target_table = "gold.daily_slot_hold"
audit_table = "bronze.etl_audit_log"
error_table = "bronze.etl_error_log"
machines_processed = 0
status = "PENDING"
# Cell 3: Step 1 - Clear staging (replaces DELETE FROM)
# =====================================================
try:
spark.sql(f"""
DELETE FROM {staging_table}
WHERE report_date = '{p_report_date}'
""")
print(f"Staging cleared for {p_report_date}")
except Exception as e:
# Replaces DECLARE EXIT HANDLER FOR SQLEXCEPTION
status = "FAILED"
spark.sql(f"""
INSERT INTO {error_table}
VALUES (current_timestamp(), 'nb_daily_slot_hold',
'SPARK_ERROR', -1, '{str(e)[:200]}')
""")
raise
# Cell 4: Step 2 - Calculate hold percentage (replaces INSERT..SELECT)
# =====================================================================
try:
df_transactions = spark.table("silver.slot_transactions")
df_machines = spark.table("silver.machine_master")
df_hold = df_transactions \
.filter(col("transaction_date") == p_report_date) \
.join(df_machines, "machine_id") \
.groupBy("machine_id", "zone_id", "denomination") \
.agg(
spark_sum("coin_in").alias("total_coin_in"),
spark_sum("coin_out").alias("total_coin_out")
) \
.withColumn("hold_percentage",
when(col("total_coin_in") > 0,
(col("total_coin_in") - col("total_coin_out"))
/ col("total_coin_in") * 100
).otherwise(0)
) \
.withColumn("report_date", lit(p_report_date))
# Write to staging
df_hold.write.mode("append").saveAsTable(staging_table)
# Capture row count (replaces ACTIVITY_COUNT)
machines_processed = df_hold.count()
print(f"Hold calculated for {machines_processed:,} machines")
except Exception as e:
status = "FAILED"
spark.sql(f"""
INSERT INTO {error_table}
VALUES (current_timestamp(), 'nb_daily_slot_hold',
'SPARK_ERROR', -1, '{str(e)[:200]}')
""")
raise
# Cell 5: Step 3 - Merge into production (replaces EXECUTE IMMEDIATE)
# ====================================================================
try:
# Dynamic SQL equivalent using spark.sql with f-string
# (replaces EXECUTE IMMEDIATE || concatenation)
spark.sql(f"""
INSERT INTO {target_table}
SELECT * FROM {staging_table}
WHERE report_date = '{p_report_date}'
""")
status = "SUCCESS"
except Exception as e:
status = "FAILED"
spark.sql(f"""
INSERT INTO {error_table}
VALUES (current_timestamp(), 'nb_daily_slot_hold',
'SPARK_ERROR', -1, '{str(e)[:200]}')
""")
raise
# Cell 6: Audit log and return (replaces ETL_AUDIT_LOG insert + OUT params)
# ==========================================================================
spark.sql(f"""
INSERT INTO {audit_table}
VALUES (current_timestamp(), 'nb_daily_slot_hold',
'{p_report_date}', {machines_processed}, '{status}')
""")
# Return OUT parameters as JSON (replaces procedure OUT params)
result = json.dumps({
"machines_processed": machines_processed,
"status": status,
"report_date": p_report_date
})
print(f"Procedure complete: {result}")
mssparkutils.notebook.exit(result)
12.3 CALL Statement → Notebook Scheduling¶
| Teradata Pattern | Fabric Equivalent | Implementation |
|---|---|---|
CALL sp_name(params) | mssparkutils.notebook.run() | Inline from another notebook |
| Teradata Scheduler / TWS | Data Factory Pipeline Schedule | Trigger → Notebook Activity |
BTEQ .RUN FILE sp_call.bteq | Pipeline with Notebook Activity | Parameterized pipeline |
| Chained CALL statements | Pipeline with sequential activities | Notebook activities in series |
Invoke from another notebook (replaces CALL):
# Replaces: CALL CASINO_DW.sp_daily_slot_hold(DATE '2024-01-15', ?, ?);
result_json = mssparkutils.notebook.run(
"nb_daily_slot_hold",
timeout_seconds=600,
arguments={"p_report_date": "2024-01-15"}
)
result = json.loads(result_json)
print(f"Machines processed: {result['machines_processed']}")
print(f"Status: {result['status']}")
Invoke from a Data Factory pipeline (replaces Teradata Scheduler):
{
"name": "Run Daily Slot Hold",
"type": "TridentNotebook",
"typeProperties": {
"notebookId": "nb_daily_slot_hold",
"parameters": {
"p_report_date": {
"value": {
"value": "@formatDateTime(adddays(utcnow(), -1), 'yyyy-MM-dd')",
"type": "Expression"
},
"type": "string"
}
}
}
}
12.4 Error Handling Reference¶
| Teradata Construct | Fabric Equivalent | Notes |
|---|---|---|
SQLCODE | Python exception type | except AnalysisException, except Py4JJavaError |
SQLSTATE | Exception message parsing | Extract state from str(exception) |
DECLARE EXIT HANDLER | try/except block | Wraps each logical step |
ACTIVITY_COUNT | df.count() after write | Explicit count call required |
EXECUTE IMMEDIATE | spark.sql(f"...") | F-strings replace string concatenation |
SET variable = expr | Python assignment | variable = expr |
| Procedure OUT params | mssparkutils.notebook.exit(json) | Return JSON for structured results |
💡 Tip: When migrating complex stored procedures with multiple OUT parameters, use
mssparkutils.notebook.exit()with a JSON payload. The calling pipeline or notebook can parse the JSON to make branching decisions, replicating the control flow that BTEQ scripts achieved with.IFand return codes.
✅ Validation Checklist¶
Before considering migration complete, verify:
- Schema Migrated - All tables, views, and indexes recreated
- Data Migrated - Row counts match within acceptable tolerance
- Checksums Validated - Numeric column sums match
- SQL Translated - All queries tested and functional
- ETL Converted - BTEQ/TPT scripts converted to notebooks/pipelines
- Performance Tested - Key queries meet SLA requirements
- Reports Validated - Business reports produce correct results
- Sync Established - Ongoing synchronization (if required) working
🔍 Final Validation Queries
### Comprehensive Validation Script# Fabric Notebook: Final Migration Validation
import pandas as pd
from datetime import datetime
validation_report = {
"Migration ID": f"MIG-{datetime.now().strftime('%Y%m%d-%H%M')}",
"Source": "Teradata CASINO_DW",
"Target": "Fabric casino-fabric-poc",
"Validation Date": datetime.now().isoformat(),
"Results": []
}
# 1. Table count validation
tables = spark.catalog.listTables("bronze")
validation_report["Results"].append({
"Check": "Tables Created",
"Expected": 15, # Update with your expected count
"Actual": len(tables),
"Status": "✅ PASS" if len(tables) >= 15 else "❌ FAIL"
})
# 2. Total row count
total_rows = sum([spark.table(f"bronze.{t.name}").count() for t in tables])
validation_report["Results"].append({
"Check": "Total Rows Migrated",
"Expected": "> 1,000,000",
"Actual": f"{total_rows:,}",
"Status": "✅ PASS" if total_rows > 1000000 else "⚠️ REVIEW"
})
# 3. Key table validation
key_tables = ["slot_transactions", "player_sessions", "cage_operations"]
for table in key_tables:
count = spark.table(f"bronze.{table}").count()
validation_report["Results"].append({
"Check": f"{table} Row Count",
"Expected": "> 0",
"Actual": f"{count:,}",
"Status": "✅ PASS" if count > 0 else "❌ FAIL"
})
# 4. Schema validation
df_slots = spark.table("bronze.slot_transactions")
required_columns = ["transaction_id", "machine_id", "coin_in", "coin_out", "transaction_timestamp"]
missing = [c for c in required_columns if c not in df_slots.columns]
validation_report["Results"].append({
"Check": "Schema Completeness",
"Expected": "All required columns present",
"Actual": f"Missing: {missing}" if missing else "All present",
"Status": "✅ PASS" if not missing else "❌ FAIL"
})
# Display report
print("=" * 60)
print(f"MIGRATION VALIDATION REPORT - {validation_report['Migration ID']}")
print("=" * 60)
print(f"Source: {validation_report['Source']}")
print(f"Target: {validation_report['Target']}")
print(f"Date: {validation_report['Validation Date']}")
print("-" * 60)
for result in validation_report["Results"]:
print(f"{result['Status']} {result['Check']}")
print(f" Expected: {result['Expected']}")
print(f" Actual: {result['Actual']}")
print()
# Overall status
all_passed = all("PASS" in r["Status"] for r in validation_report["Results"])
print("=" * 60)
print(f"OVERALL STATUS: {'✅ MIGRATION VALIDATED' if all_passed else '❌ ISSUES FOUND'}")
print("=" * 60)
🔧 Troubleshooting¶
Common Issues¶
| Issue | Cause | Resolution |
|---|---|---|
| Connection timeout | Firewall blocking | Configure VNet/ExpressRoute |
| Authentication failed | Wrong credentials | Verify Teradata user/password |
| Out of memory | Large table without partitioning | Add partition options to JDBC read |
| SQL syntax error | Untranslated Teradata syntax | Review SQL translation patterns |
| Data type mismatch | Incompatible types | Add explicit CAST operations |
| Slow data transfer | Single-threaded copy | Enable parallel copy in pipeline |
| QUALIFY not supported | Teradata-specific syntax | Convert to CTE with ROW_NUMBER |
Performance Optimization¶
# Optimized JDBC read settings for large tables
df = spark.read \
.format("jdbc") \
.option("url", jdbc_url) \
.option("dbtable", table_name) \
.option("partitionColumn", "date_column") \
.option("lowerBound", "2020-01-01") \
.option("upperBound", "2024-12-31") \
.option("numPartitions", 48) \
.option("fetchsize", 100000) \
.option("batchsize", 100000) \
.load()
📚 Best Practices¶
- Start with Assessment - Thoroughly analyze Teradata workload before migration
- Prioritize by Complexity - Migrate simple tables first, complex ones later
- Test SQL Translation - Validate all translated SQL against known results
- Use Incremental Loads - For large tables, use watermark-based incremental migration
- Parallelize Transfers - Use Data Factory's parallel copy capabilities
- Validate Continuously - Check row counts and checksums at each stage
- Document Mappings - Maintain a translation guide for SQL and schemas
- Plan for Rollback - Keep Teradata running until migration is validated
- Train Teams - Ensure analysts and engineers learn Fabric-native patterns
- Optimize After Migration - Use OPTIMIZE and Z-ORDER on Delta tables
🎉 Summary¶
Congratulations! You have completed the Teradata to Microsoft Fabric migration tutorial. You have learned to:
- ✅ Assess Teradata environments for migration readiness
- ✅ Configure Teradata connectivity in Fabric Data Factory
- ✅ Translate Teradata SQL to Fabric T-SQL and Spark SQL
- ✅ Migrate data using pipelines with incremental patterns
- ✅ Convert BTEQ scripts to Fabric notebooks
- ✅ Validate migrated data for accuracy
- ✅ Understand ongoing synchronization options
- ✅ Convert advanced BTEQ patterns (MultiLoad, TPump, error handling)
- ✅ Benchmark query performance and apply Fabric optimizations
- ✅ Map Teradata data types accurately to Fabric equivalents
- ✅ Migrate stored procedures to parameterized Fabric notebooks
➡️ Next Steps¶
Continue to Tutorial 11: SAS Connectivity to learn how to connect SAS to Microsoft Fabric using OLEDB and ODBC connectors.
📁 Included Resources¶
This tutorial includes the following supplementary files:
| Resource | Description |
|---|---|
scripts/teradata_migration_utils.py | Python utilities for migration |
scripts/sql_translation_templates.sql | SQL translation examples |
scripts/sample_teradata_ddl.sql | Sample Teradata schema for practice |
templates/migration_assessment.md | Assessment template |
templates/migration_checklist.md | Migration checklist |
diagrams/migration-architecture.md | Architecture diagrams |
📚 Additional Resources¶
- Microsoft Fabric Migration Guide
- Teradata Database Connector
- Fabric Migration Assistant
- Teradata SQL to Fabric T-SQL Differences
- Data Migration ETL for Teradata
- Third-Party Migration Tools
- Teradata AI Unlimited in Fabric
🧭 Navigation¶
| ⬅️ Previous | ⬆️ Up | ➡️ Next |
|---|---|---|
| 09-Advanced AI/ML | Tutorials Index | 11-SAS Connectivity |
💬 Questions or issues? Open an issue in the GitHub repository.