Home > Tutorials > IBM DB2 Source

🏢 Tutorial 25: IBM DB2 as a Source for Microsoft Fabric¶

Last Updated: 2026-04-15 | Version: 2.0 Status: ✅ Final | Maintainer: Documentation Team

🏢 Tutorial 25: IBM DB2 as a Source for Microsoft Fabric¶


Difficulty	⭐⭐⭐ Advanced
Time	150-210 minutes
Focus	Mainframe & Legacy Data Migration

Progress Tracker¶

┌────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┐
│   00   │   01   │   02   │   03   │   04   │   05   │   06   │   07   │   08   │   09   │   10   │   11   │   12   │   13   │
│ SETUP  │ BRONZE │ SILVER │  GOLD  │  RT    │  PBI   │ PIPES  │  GOV   │ MIRROR │  AI/ML │TERADATA│  SAS   │ CI/CD  │ MIGPLN │
├────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │
└────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘

┌────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┐
│   14   │   15   │   16   │   17   │   18   │   19   │   20   │   21   │   22   │   23   │   24   │   25   │   26   │   ...  │
│SECNET  │ COST   │  PERF  │ MONALR │ SHARE  │COPILOT │ WKBEST │GEOARCG │NETCONN │SHIRGW  │SNOWFLK │IBM DB2 │STREAM  │        │
├────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │   ✅   │  🔵   │   ⬚   │   ⬚   │
└────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘
                                                                                                      ▲
                                                                                                 YOU ARE HERE

Navigation
Previous	24-Snowflake to Fabric
➡️ Next	26-Multi-Source Streaming

Third-party references — publicly sourced, good-faith comparison

This page references non-Microsoft products and services. That information is drawn from each vendor's publicly available documentation and is offered for honest, good-faith comparison only. This is a personal project written from a Microsoft Fabric and Azure perspective; it does not claim expertise in, or authority over, any third-party product, and nothing here is an official statement by, or endorsed by, those vendors. Capabilities, pricing, and features change often — always verify against the vendor's current official documentation. Where a third-party offering is the stronger choice, we say so plainly.

📖 Overview¶

This tutorial provides a guide for connecting IBM DB2 as a data source for Microsoft Fabric. Many casino and gaming organizations run core systems --- cage transaction processing, player tracking, regulatory reporting --- on IBM DB2 mainframes, a robust and long-proven platform that has been in production for decades. DB2 behavior, catalog structures, and connectivity details described here are based on IBM's publicly available documentation (as of this doc's date); always verify against IBM's current official documentation.

IBM DB2 comes in three primary variants, each with distinct connectivity considerations:

DB2 Variant	Platform	Typical Casino Use Case
DB2 for z/OS	IBM Mainframe (z14/z15/z16)	Core cage operations, compliance reporting, mainframe player history
DB2 LUW	Linux, Unix, Windows	Departmental analytics, mid-tier player databases, marketing
DB2 for iSeries	IBM Power Systems (AS/400)	Property management, legacy loyalty systems, back-office accounting

Microsoft Fabric provides a unified analytics platform that can ingest from all three DB2 variants, offering unified OneLake storage, real-time intelligence, lakehouse architecture, and integrated governance through Microsoft Purview for NIGC MICS compliance lineage.

🎯 Learning Objectives¶

By the end of this tutorial, you will be able to:

Assess IBM DB2 environments across z/OS, LUW, and iSeries variants
Configure DB2 connectivity using Data Gateway and Self-Hosted Integration Runtime
Map DB2 data types to Fabric T-SQL and Spark types (including EBCDIC and packed decimal)
Translate DB2 SQL patterns to Fabric-native equivalents
Build Data Factory pipelines with DB2 source connectors
Implement CDC patterns using InfoSphere CDC, Q Replication, and Debezium
Handle z/OS-specific challenges (DRDA, EBCDIC, packed decimal)
Validate migrated data for row counts, checksums, and character encoding integrity

🏗️ Migration Architecture Overview¶

flowchart TB
    subgraph DB2["IBM DB2 Sources"]
        ZOS[(DB2 for z/OS<br/>Mainframe)]
        LUW[(DB2 LUW<br/>Linux/Unix/Windows)]
        ISE[(DB2 for iSeries<br/>AS/400)]
    end

    subgraph Gateway["Connectivity Layer"]
        GW[On-Premises<br/>Data Gateway]
        SHIR[Self-Hosted<br/>Integration Runtime]
        JDBC[IBM Data Server<br/>JDBC Driver]
    end

    subgraph Migration["Migration Layer"]
        ADF[Data Factory Pipelines]
        NB[Fabric Notebooks]
        CDC[CDC Engine]
        ES[Eventstreams]
    end

    subgraph Fabric["Microsoft Fabric"]
        OL[(OneLake)]
        LH[Lakehouse]
        EH[Eventhouse / KQL]
        SM[Semantic Model]
        PBI[Power BI]
    end

    ZOS -->|DRDA Protocol| GW
    LUW -->|TCP/IP| SHIR
    ISE -->|DDM/DRDA| SHIR
    GW --> JDBC
    SHIR --> JDBC
    JDBC --> ADF
    JDBC --> NB
    CDC --> ES
    ADF --> OL
    NB --> OL
    ES --> EH
    OL --> LH
    LH --> SM
    EH --> SM
    SM --> PBI

Migration Approaches¶

Approach	Description	Best For	Latency
Batch Extract	Scheduled full/incremental JDBC reads	Historical data, large tables	Hours
Real-Time CDC	InfoSphere CDC or Debezium capture	Live cage transactions, slot telemetry	Seconds-Minutes
Hybrid	CDC for hot tables, batch for cold	Mixed workloads	Varies
Q Replication	DB2 native replication to Kafka/Eventstreams	z/OS environments with MQ infrastructure	Seconds

📋 Prerequisites¶

Completed Tutorial 00: Environment Setup
Completed Tutorials 01-03: Medallion Architecture
Completed Tutorial 23: SHIR & Data Gateways (recommended)
Fabric workspace with F64+ capacity
Access to source IBM DB2 environment (z/OS, LUW, or iSeries)
DB2 user with SELECT privileges on target schemas
Network connectivity between Fabric and DB2 (VPN/ExpressRoute)
IBM Data Server Driver for JDBC (db2jcc4.jar)
On-premises Data Gateway or Self-Hosted Integration Runtime installed

💡 Tip: For testing without a live DB2 instance, use the sample DDL scripts and synthetic data to practice SQL translation patterns.

🛠️ Step 1: Assess Your DB2 Environment¶

1.1 z/OS System Catalog Queries¶

-- DB2 z/OS: List tables with row counts and encoding
SELECT
    CREATOR AS SCHEMA_NAME, NAME AS TABLE_NAME, TYPE,
    COLCOUNT AS COLUMN_COUNT, CARDF AS ESTIMATED_ROWS, ENCODING_SCHEME
FROM SYSIBM.SYSTABLES
WHERE CREATOR = 'CASINO' AND TYPE = 'T'
ORDER BY CARDF DESC;

-- DB2 z/OS: Column details for a table
SELECT
    TBNAME AS TABLE_NAME, NAME AS COLUMN_NAME, COLNO,
    COLTYPE AS DATA_TYPE, LENGTH, SCALE, NULLS, CCSID
FROM SYSIBM.SYSCOLUMNS
WHERE TBCREATOR = 'CASINO' AND TBNAME = 'CAGE_TRANSACTIONS'
ORDER BY COLNO;

1.2 LUW System Catalog Queries¶

-- DB2 LUW: List tables with size estimates
SELECT
    T.TABSCHEMA, T.TABNAME, T.CARD AS ESTIMATED_ROWS,
    T.NPAGES * TS.PAGESIZE / 1024 / 1024 AS SIZE_MB, T.COLCOUNT
FROM SYSCAT.TABLES T
JOIN SYSCAT.TABLESPACES TS ON T.TBSPACEID = TS.TBSPACEID
WHERE T.TABSCHEMA = 'CASINO' AND T.TYPE = 'T'
ORDER BY SIZE_MB DESC;

-- DB2 LUW: Column details
SELECT TABNAME, COLNAME, COLNO, TYPENAME, LENGTH, SCALE, NULLS, CODEPAGE
FROM SYSCAT.COLUMNS
WHERE TABSCHEMA = 'CASINO' AND TABNAME = 'PLAYER_HISTORY'
ORDER BY COLNO;

1.3 iSeries (AS/400) Catalog Queries¶

-- DB2 iSeries: Table details
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TEXT AS DESCRIPTION,
       NUMBER_ROWS, DATA_SIZE / 1024 / 1024 AS SIZE_MB
FROM QSYS2.SYSTABLESTAT
WHERE TABLE_SCHEMA = 'CASINOLIB'
ORDER BY DATA_SIZE DESC;

1.4 Complexity Factors¶

Factor	Assessment	Impact	Mitigation
EBCDIC Encoding	z/OS CCSID 37/500	Character conversion required	PySpark EBCDIC decoder
Packed Decimal (COMP-3)	z/OS financial data	Binary-to-decimal conversion	Custom UDF
REXX Stored Procedures	z/OS business logic	No direct equivalent	Rewrite as Python/Spark
GRAPHIC/VARGRAPHIC	DBCS double-byte fields	Type mapping to NVARCHAR	Explicit CAST
ROWID Columns	DB2-generated identifiers	Not portable	Map to BIGINT surrogate
TIMESTAMP(12)	Extended precision	Precision loss	Truncate to TIMESTAMP(6)
Temporal Tables	System/business-time	Redesign versioning	Map to Delta time travel

quadrantChart
    title DB2 Migration Complexity Assessment
    x-axis Low Technical Complexity --> High Technical Complexity
    y-axis Small Data Volume --> Large Data Volume
    quadrant-1 Complex Migration - Phased Approach
    quadrant-2 Data Volume Challenge - Optimize Transfer
    quadrant-3 Quick Win - Standard JDBC Extract
    quadrant-4 Technical Challenge - Handle Encodings
    LUW Simple Tables: [0.2, 0.3]
    z/OS EBCDIC Tables: [0.6, 0.5]
    z/OS Packed Decimal: [0.8, 0.6]
    z/OS Large Partitioned: [0.4, 0.9]
    REXX Stored Procedures: [0.9, 0.2]

🛠️ Step 2: Configure DB2 Connectivity¶

2.1 On-Premises Data Gateway¶

Data Gateway Architecture

Source: What is an on-premises data gateway?

Download the On-premises Data Gateway on a Windows server with DB2 network access
Sign in with your Microsoft Entra account and register with your Fabric tenant
Install the IBM Data Server Driver for JDBC on the gateway machine

2.2 Self-Hosted Integration Runtime¶

flowchart LR
    subgraph OnPrem["On-Premises Network"]
        DB2[(IBM DB2)]
        SHIR2["SHIR Node"]
        JDBC_DRV["db2jcc4.jar"]
    end
    subgraph Cloud["Microsoft Fabric"]
        ADF2["Data Factory"]
        LH2[(Lakehouse)]
    end
    DB2 -->|DRDA / TCP| SHIR2
    SHIR2 --- JDBC_DRV
    SHIR2 -->|HTTPS 443| ADF2
    ADF2 -->|Delta| LH2

Install SHIR, register with the authentication key, then place db2jcc4.jar in the SHIR lib directory.

2.3 JDBC Driver Components¶

Component	File	Purpose
JDBC Driver	`db2jcc4.jar`	Type 4 pure Java driver
z/OS + iSeries License	`db2jcc_license_cisuz.jar`	Required for z/OS and iSeries
LUW License	`db2jcc_license_cu.jar`	Required for LUW

⚠️ Important: Without db2jcc_license_cisuz.jar, z/OS and iSeries connections fail with SQL1598N licensing errors.

2.4 Connection String Formats¶

DB2 for z/OS:

jdbc:db2://mainframe.casino.com:5023/DSNDB2Z:currentSchema=CASINO;sslConnection=true;

DB2 LUW:

jdbc:db2://db2server.casino.com:50000/CASINODB:currentSchema=CASINO;sslConnection=true;

DB2 for iSeries:

jdbc:as400://as400.casino.com/CASINOLIB;translate binary=true;prompt=false;

2.5 SSL/TLS Configuration¶

# Import DB2 server certificate into a trust store
keytool -importcert -alias db2_server -file db2_server_cert.cer \
    -keystore db2_truststore.jks -storepass changeit -noprompt

2.6 Test Connection Notebook¶

# Fabric Notebook: Test IBM DB2 Connectivity
from notebookutils import mssparkutils

db2_host = mssparkutils.credentials.getSecret("keyvault", "db2-host")
db2_port = mssparkutils.credentials.getSecret("keyvault", "db2-port")
db2_database = mssparkutils.credentials.getSecret("keyvault", "db2-database")
db2_user = mssparkutils.credentials.getSecret("keyvault", "db2-user")
db2_password = mssparkutils.credentials.getSecret("keyvault", "db2-password")

jdbc_url = f"jdbc:db2://{db2_host}:{db2_port}/{db2_database}:currentSchema=CASINO;"

df_test = spark.read.format("jdbc") \
    .option("url", jdbc_url) \
    .option("query", "SELECT CURRENT TIMESTAMP AS TEST_TS FROM SYSIBM.SYSDUMMY1") \
    .option("user", db2_user).option("password", db2_password) \
    .option("driver", "com.ibm.db2.jcc.DB2Driver") \
    .load()

print(f"Connection successful! DB2 time: {df_test.collect()[0]['TEST_TS']}")

🛠️ Step 3: Data Type Mapping¶

3.1 DB2 to Fabric Type Reference¶

DB2 Data Type	Fabric T-SQL	Spark / Delta	Notes
`SMALLINT`	`SMALLINT`	`ShortType`	Direct
`INTEGER`	`INT`	`IntegerType`	Direct
`BIGINT`	`BIGINT`	`LongType`	Direct
`DECIMAL(p,s)`	`DECIMAL(p,s)`	`DecimalType(p,s)`	Direct
`DECFLOAT(16)`	`DECIMAL(34,6)`	`DecimalType(34,6)`	Approximate
`REAL` / `DOUBLE`	`REAL` / `FLOAT`	`FloatType` / `DoubleType`	Direct
`CHAR(n)` / `VARCHAR(n)`	`CHAR(n)` / `VARCHAR(n)`	`StringType`	EBCDIC conversion for z/OS
`GRAPHIC(n)`	`NCHAR(n)`	`StringType`	DBCS double-byte
`VARGRAPHIC(n)`	`NVARCHAR(n)`	`StringType`	DBCS double-byte
`CLOB` / `BLOB`	`VARCHAR(MAX)` / `VARBINARY(MAX)`	`StringType` / `BinaryType`	Size limits differ
`DATE` / `TIME`	`DATE` / `TIME`	`DateType` / `StringType`	Spark has no TIME type
`TIMESTAMP`	`DATETIME2(6)`	`TimestampType`	Default 6-digit precision
`TIMESTAMP(12)`	`DATETIME2(7)`	`TimestampType`	Truncated to 7/6 digits
`ROWID`	`BIGINT`	`LongType`	Map to surrogate key
`XML`	`VARCHAR(MAX)`	`StringType`	Parse with XPath

3.2 EBCDIC to UTF-8 Conversion¶

# EBCDIC decoder UDF for z/OS binary character fields
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

def ebcdic_to_utf8(binary_data, ccsid=37):
    if binary_data is None:
        return None
    try:
        return binary_data.decode(f'cp{ccsid:03d}').strip()
    except (UnicodeDecodeError, AttributeError):
        return None

ebcdic_udf = udf(lambda x: ebcdic_to_utf8(x), StringType())

# Usage: df.withColumn("player_name", ebcdic_udf("player_name_raw"))

3.3 Packed Decimal (COMP-3) Handling¶

# Packed decimal converter for z/OS financial data
from pyspark.sql.functions import udf
from pyspark.sql.types import DecimalType

def unpack_decimal(packed_bytes, scale=2):
    """Convert COMP-3 packed decimal to Python float.
    Format: each byte = 2 BCD digits, last nibble = sign (C=+, D=-)"""
    if packed_bytes is None:
        return None
    result = 0
    for i, byte in enumerate(packed_bytes):
        high, low = (byte >> 4) & 0x0F, byte & 0x0F
        if i < len(packed_bytes) - 1:
            result = result * 100 + high * 10 + low
        else:
            result = result * 10 + high
            if low == 0x0D:
                result = -result
    return result / (10 ** scale)

packed_udf = udf(lambda x: unpack_decimal(x, 2), DecimalType(15, 2))

# Usage: df.withColumn("amount", packed_udf("txn_amt_comp3"))

3.4 TIMESTAMP(12) Precision¶

from pyspark.sql.functions import col, substring, to_timestamp

# DB2 TIMESTAMP(12) arrives as "2024-01-15-14.30.45.123456789012"
# Truncate to microseconds (6 digits) for Spark TimestampType
df = df_raw.withColumn(
    "event_timestamp",
    to_timestamp(substring(col("db2_ts_str"), 1, 26), "yyyy-MM-dd-HH.mm.ss.SSSSSS")
)

🛠️ Step 4: SQL Translation Patterns¶

4.1 FETCH FIRST N ROWS¶

	DB2	Fabric T-SQL	Spark SQL
Syntax	`FETCH FIRST 50 ROWS ONLY`	`TOP 50`	`LIMIT 50`

-- DB2: Top cage transactions
SELECT transaction_id, player_id, transaction_amount
FROM CASINO.CAGE_TRANSACTIONS
WHERE transaction_date = CURRENT DATE
ORDER BY transaction_amount DESC
FETCH FIRST 50 ROWS ONLY;

-- Fabric T-SQL equivalent
SELECT TOP 50 transaction_id, player_id, transaction_amount
FROM casino.cage_transactions
WHERE transaction_date = CAST(GETDATE() AS DATE)
ORDER BY transaction_amount DESC;

4.2 WITH UR (Uncommitted Read)¶

-- DB2: Reporting query with uncommitted read to avoid locks
SELECT cage_id, SUM(cash_in) AS total_cash_in, COUNT(*) AS txn_count
FROM CASINO.CAGE_TRANSACTIONS
WHERE shift_date = CURRENT DATE - 1 DAY
GROUP BY cage_id WITH UR;

-- Fabric T-SQL: SET isolation or NOLOCK hint
SELECT cage_id, SUM(cash_in) AS total_cash_in, COUNT(*) AS txn_count
FROM casino.cage_transactions WITH (NOLOCK)
WHERE shift_date = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))
GROUP BY cage_id;

💡 Note: In Spark/Delta Lake, MVCC handles concurrency automatically --- no isolation hint needed.

4.3 Date/Time Functions¶

DB2	Fabric T-SQL	Spark SQL
`CURRENT TIMESTAMP`	`GETDATE()`	`current_timestamp()`
`CURRENT DATE`	`CAST(GETDATE() AS DATE)`	`current_date()`
`CURRENT DATE - 7 DAYS`	`DATEADD(DAY, -7, CAST(GETDATE() AS DATE))`	`date_sub(current_date(), 7)`
`DAYS(date)`	`DATEDIFF(DAY, '0001-01-01', date)`	`datediff(date, '0001-01-01')`

4.4 Concatenation Operator (||)¶

-- DB2: String concatenation with || operator
SELECT player_id,
    first_name || ' ' || last_name AS display_name,
    'CTR-' || CHAR(YEAR(txn_date)) || '-' || LPAD(CHAR(ctr_seq), 6, '0') AS ctr_ref
FROM CASINO.PLAYER_MASTER WHERE loyalty_tier = 'DIAMOND';

-- Fabric T-SQL: CONCAT function
SELECT player_id,
    CONCAT(first_name, ' ', last_name) AS display_name,
    CONCAT('CTR-', YEAR(txn_date), '-', RIGHT('000000' + CAST(ctr_seq AS VARCHAR), 6)) AS ctr_ref
FROM casino.player_master WHERE loyalty_tier = 'DIAMOND';

4.5 Common Function Mappings¶

DB2 Function	Fabric T-SQL	Spark SQL
`VALUE(a, b)`	`ISNULL(a, b)`	`coalesce(a, b)`
`STRIP(x)`	`TRIM(x)`	`trim(x)`
`SUBSTR(s, p, n)`	`SUBSTRING(s, p, n)`	`substring(s, p, n)`
`LOCATE(pat, s)` / `POSSTR(s, pat)`	`CHARINDEX(pat, s)`	`locate(pat, s)`
`LENGTH(s)`	`LEN(s)`	`length(s)`
`CHAR(n)`	`CAST(n AS VARCHAR)`	`cast(n as string)`
`DIGITS(n)`	`RIGHT('00..0' + CAST(n AS VARCHAR), len)`	`lpad(cast(n as string), len, '0')`
`HEX(x)`	`CONVERT(VARCHAR, x, 2)`	`hex(x)`
`RAISE_ERROR(state, msg)`	`THROW 50000, msg, 1`	`raise Exception(msg)`

4.6 Stored Procedure Conversion¶

DB2 stored procedures (especially REXX on z/OS) convert to Fabric notebooks:

# Fabric Notebook: CTR Threshold Check (replaces DB2 CASINO.CHECK_CTR_THRESHOLD)
from pyspark.sql.functions import col, sum as spark_sum, when, lit, current_date

CTR_THRESHOLD = 10000.00

df_daily = spark.table("silver.cage_transactions") \
    .filter((col("transaction_date") == current_date()) &
            (col("transaction_type").isin("CASH_IN", "CHIP_PURCHASE"))) \
    .groupBy("player_id", "transaction_date") \
    .agg(spark_sum("transaction_amount").alias("total_amount")) \
    .withColumn("ctr_required", when(col("total_amount") >= CTR_THRESHOLD, lit("Y")).otherwise(lit("N")))

# Write CTR candidates to Gold layer
df_daily.filter(col("ctr_required") == "Y") \
    .write.mode("append").saveAsTable("gold.ctr_candidates")

🛠️ Step 5: Batch Data Migration¶

5.1 Data Factory Pipeline Architecture¶

Copy Activity Configuration

Source: Copy activity in Data Factory

flowchart LR
    LOOKUP[Lookup: Get Table List] --> FOREACH[ForEach: Iterate Tables]
    FOREACH --> COPY[Copy from DB2 via JDBC]
    COPY --> WRITE[Write Delta to Lakehouse]
    WRITE --> VALIDATE[Validate Row Counts]

5.2 Pipeline JSON Definition¶

{
    "name": "pl_db2_migration_casino",
    "properties": {
        "activities": [
            {
                "name": "Get DB2 Tables",
                "type": "Lookup",
                "typeProperties": {
                    "source": {
                        "type": "Db2Source",
                        "query": "SELECT TABNAME AS TABLE_NAME, CARD AS ROW_COUNT FROM SYSCAT.TABLES WHERE TABSCHEMA = 'CASINO' AND TYPE = 'T'"
                    },
                    "firstRowOnly": false
                }
            },
            {
                "name": "ForEach Table",
                "type": "ForEach",
                "typeProperties": {
                    "items": "@activity('Get DB2 Tables').output.value",
                    "batchCount": 4,
                    "activities": [
                        {
                            "name": "Copy to Lakehouse",
                            "type": "Copy",
                            "typeProperties": {
                                "source": {
                                    "type": "Db2Source",
                                    "query": "SELECT * FROM CASINO.@{item().TABLE_NAME}"
                                },
                                "sink": { "type": "LakehouseTableSink", "tableActionOption": "Overwrite" }
                            }
                        }
                    ]
                }
            }
        ]
    }
}

5.3 JDBC Read with Partitioning¶

# Fabric Notebook: Partitioned large table extract from DB2
jdbc_url = "jdbc:db2://db2server.casino.com:50000/CASINODB:currentSchema=CASINO;"

df = spark.read.format("jdbc") \
    .option("url", jdbc_url) \
    .option("dbtable", "CASINO.CAGE_TRANSACTIONS") \
    .option("user", db2_user).option("password", db2_password) \
    .option("driver", "com.ibm.db2.jcc.DB2Driver") \
    .option("partitionColumn", "TRANSACTION_DATE") \
    .option("lowerBound", "2020-01-01") \
    .option("upperBound", "2025-01-01") \
    .option("numPartitions", 48) \
    .option("fetchsize", 50000) \
    .load()

# Lowercase columns (DB2 returns uppercase)
for c in df.columns:
    df = df.withColumnRenamed(c, c.lower())

df.write.mode("overwrite").format("delta").saveAsTable("bronze.db2_cage_transactions")
spark.sql("OPTIMIZE bronze.db2_cage_transactions")

5.4 Handling NULL Indicators and Code Pages¶

# DB2 z/OS batch exports may include NULL indicator columns
from pyspark.sql.functions import col, when, lit

null_indicators = {"txn_amount": "txn_amount_ni", "player_id": "player_id_ni"}

for data_col, null_col in null_indicators.items():
    if null_col in df_raw.columns:
        df_raw = df_raw.withColumn(
            data_col, when(col(null_col) == -1, lit(None)).otherwise(col(data_col))
        ).drop(null_col)

🛠️ Step 6: CDC Patterns for DB2¶

6.1 InfoSphere CDC (ASN Capture)¶

flowchart LR
    subgraph DB2S["DB2 Source"]
        LOG[(Transaction Logs)]
        ASN[ASN Capture Agent]
    end
    subgraph CDCE["InfoSphere CDC"]
        CAP[Capture Engine]
        APPLY[Apply Engine]
    end
    subgraph FabricT["Microsoft Fabric"]
        ES[Eventstreams]
        LH3[(Lakehouse Delta)]
    end
    LOG --> ASN --> CAP --> APPLY -->|Kafka| ES --> LH3

Key ASN Tables:

Table	Purpose
`IBMSNAP_REGISTER`	Source tables registered for capture
`IBMSNAP_PRUNCNTL`	Pruning control for change tables
`IBMSNAP_SUBS_SET`	Subscription set definitions

-- Check registered CDC tables
SELECT SOURCE_OWNER, SOURCE_TABLE, CD_TABLE, CCD_CONDENSED
FROM ASN.IBMSNAP_REGISTER WHERE SOURCE_OWNER = 'CASINO';

6.2 CDC via JDBC Polling¶

# Watermark-based polling for environments without InfoSphere CDC
from delta.tables import DeltaTable
from datetime import datetime

target_table = "silver.cage_transactions"
watermark_col = "last_modified_ts"

# Get current watermark
wm = spark.table(target_table).select(spark_max(watermark_col)).collect()[0][0] or datetime(2020,1,1)

# Read changes
query = f"SELECT * FROM CASINO.CAGE_TRANSACTIONS WHERE {watermark_col} > TIMESTAMP('{wm}') FETCH FIRST 500000 ROWS ONLY"
df_changes = spark.read.format("jdbc") \
    .option("url", jdbc_url).option("query", query) \
    .option("user", db2_user).option("password", db2_password) \
    .option("driver", "com.ibm.db2.jcc.DB2Driver").load()

# Merge into target
if df_changes.count() > 0 and spark.catalog.tableExists(target_table):
    DeltaTable.forName(spark, target_table).alias("t") \
        .merge(df_changes.alias("s"), "t.transaction_id = s.TRANSACTION_ID") \
        .whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

6.3 Q Replication to Kafka to Eventstreams¶

flowchart LR
    LOG2[(DB2 z/OS Logs)] --> QCAP[Q Capture]
    QCAP -->|MQ| SNDQ[Send Queue]
    SNDQ -->|MQ Channel| RCVQ[Receive Queue]
    RCVQ --> MQSRC[MQ Kafka Connector]
    MQSRC --> KAFKA[Kafka]
    KAFKA -->|Mirror| ES2[Eventstreams]
    ES2 --> EH2[Eventhouse]

6.4 Debezium DB2 Connector¶

Debezium provides open-source CDC for DB2 LUW via ASN capture tables.

{
    "name": "db2-casino-connector",
    "config": {
        "connector.class": "io.debezium.connector.db2.Db2Connector",
        "database.hostname": "db2server.casino.com",
        "database.port": "50000",
        "database.dbname": "CASINODB",
        "database.cdcschema": "ASNCDC",
        "topic.prefix": "casino.db2",
        "table.include.list": "CASINO.CAGE_TRANSACTIONS,CASINO.PLAYER_SESSIONS,CASINO.SLOT_TELEMETRY"
    }
}

💡 Note: Debezium for DB2 polls ASN change data capture tables --- it requires the ASN capture agent to be configured on the DB2 LUW instance.

🛠️ Step 7: z/OS Specific Patterns¶

7.1 DRDA Protocol and Subsystem Access¶

flowchart TB
    subgraph MF["IBM z/OS Mainframe"]
        DB2SS[DB2 Subsystem]
        DDF[Distributed Data Facility]
    end
    subgraph GW2["Gateway Server"]
        SHIR3[Self-Hosted IR]
        DRV[db2jcc4.jar + cisuz license]
    end
    SHIR3 -->|DRDA Port 5023| DDF --> DB2SS

Parameter	Description	Typical Value
DDF Port	Distributed Data Facility port	`5023`
Location Name	DB2 subsystem location	`DB2Z_PROD`
Package Collection	Bound package schema	`NULLID`
Security	Auth mechanism	`ENCRYPTED_USER_AND_DATA_SECURITY`

7.2 BIND PACKAGE Requirements¶

-- Run on z/OS by DBA to bind JDBC packages
BIND PACKAGE(NULLID) MEMBER(SYSSH200) ACTION(REPLACE)
    ISOLATION(CS) DYNAMICRULES(RUN) ENCODING(EBCDIC) OWNER(FABRIC_USER);

7.3 Comprehensive EBCDIC + Packed Decimal Processing¶

# Fabric Notebook: Process z/OS cage transaction extract
EBCDIC_PAGES = {37: 'cp037', 500: 'cp500', 1047: 'cp1047', 1140: 'cp1140'}

def create_ebcdic_decoder(ccsid=37):
    encoding = EBCDIC_PAGES.get(ccsid, 'cp037')
    def decode(data):
        if data is None: return None
        try: return data.decode(encoding).strip()
        except: return f"[DECODE_ERROR:CCSID{ccsid}]"
    return udf(decode, StringType())

ebcdic_037 = create_ebcdic_decoder(37)

df_processed = spark.table("bronze.db2_zos_cage_raw") \
    .withColumn("player_name", ebcdic_037(col("player_name_raw"))) \
    .withColumn("cage_location", ebcdic_037(col("cage_loc_raw"))) \
    .withColumn("transaction_amount", packed_udf(col("txn_amt_comp3"))) \
    .drop("player_name_raw", "cage_loc_raw", "txn_amt_comp3")

df_processed.write.mode("overwrite").saveAsTable("silver.cage_transactions_zos")

🛠️ Step 8: Ongoing Synchronization¶

8.1 Strategy by Table Type¶

Category	Example	Strategy	Frequency
High-frequency	`cage_transactions`, `slot_telemetry`	Real-time CDC	Continuous
Session data	`player_sessions`, `table_events`	JDBC polling	Every 15 min
Reference/dimension	`player_master`, `machine_catalog`	Batch pipeline	Daily
Historical	`transaction_history`	One-time + incremental	Weekly
Compliance	`ctr_filings`, `sar_reports`	Batch with audit	Daily

8.2 Hybrid Approach¶

flowchart TB
    subgraph DB2Source["DB2 Casino Systems"]
        HOT[Hot: cage_transactions]
        WARM[Warm: player_sessions]
        COLD[Cold: player_master]
    end
    subgraph Sync["Sync Layer"]
        CDC3[InfoSphere CDC / Debezium]
        POLL2[JDBC Polling - 15 min]
        BATCH2[Data Factory - Daily]
    end
    subgraph Fabric4["Fabric"]
        RT2[Eventhouse Real-Time]
        SIL[(Silver Delta Tables)]
        GOLD2[(Gold Aggregations)]
    end
    HOT --> CDC3 --> RT2
    CDC3 --> SIL
    WARM --> POLL2 --> SIL
    COLD --> BATCH2 --> SIL
    SIL --> GOLD2

🛠️ Step 9: Validate Migrated Data¶

9.1 Row Count Comparison¶

# Row count validation across DB2 and Fabric
validation_tables = [
    ("CASINO.CAGE_TRANSACTIONS", "bronze.db2_cage_transactions"),
    ("CASINO.PLAYER_SESSIONS", "bronze.db2_player_sessions"),
    ("CASINO.PLAYER_MASTER", "bronze.db2_player_master"),
]
results = []
for db2_tbl, fabric_tbl in validation_tables:
    fabric_count = spark.table(fabric_tbl).count()
    db2_count = spark.read.format("jdbc") \
        .option("url", jdbc_url).option("query", f"SELECT COUNT(*) AS CNT FROM {db2_tbl}") \
        .option("user", db2_user).option("password", db2_password) \
        .option("driver", "com.ibm.db2.jcc.DB2Driver").load().collect()[0]["CNT"]
    diff_pct = abs(db2_count - fabric_count) / db2_count * 100 if db2_count > 0 else 0
    results.append({"DB2": db2_tbl, "Fabric": fabric_tbl,
        "DB2 Count": db2_count, "Fabric Count": fabric_count,
        "Diff%": f"{diff_pct:.4f}%", "Status": "PASS" if diff_pct < 0.01 else "FAIL"})
display(pd.DataFrame(results))

9.2 Checksum Validation¶

from pyspark.sql.functions import sum as spark_sum, count, avg

checksums = spark.table("bronze.db2_cage_transactions").agg(
    count("*").alias("rows"),
    spark_sum("transaction_amount").alias("sum_amount"),
    avg("transaction_amount").alias("avg_amount")
).collect()[0]

print(f"Rows: {checksums['rows']:,}  Sum: ${checksums['sum_amount']:,.2f}  Avg: ${checksums['avg_amount']:,.2f}")
# Compare against: SELECT COUNT(*), SUM(TRANSACTION_AMOUNT), AVG(TRANSACTION_AMOUNT) FROM CASINO.CAGE_TRANSACTIONS

9.3 EBCDIC / UTF-8 Character Validation¶

from pyspark.sql.functions import col, when, lit

df = spark.table("silver.player_master")
total = df.count()
bad = df.filter(
    col("first_name").contains("\uFFFD") | col("last_name").contains("\uFFFD")
).count()

print(f"Total: {total:,}  Encoding issues: {bad:,} ({bad/total*100:.2f}%)")
print(f"Status: {'PASS' if bad == 0 else 'REVIEW'}")

🔧 Troubleshooting¶

Issue	Cause	Resolution
`SQL1598N` License error	Missing `db2jcc_license_cisuz.jar`	Place license JAR alongside `db2jcc4.jar` in SHIR lib
`SQLCODE -204` Object not found	Wrong schema/table name (case-sensitive z/OS)	Verify with `SYSIBM.SYSTABLES` catalog
`SQLCODE -551` No privilege	User lacks SELECT	`GRANT SELECT ON CASINO.table TO fabric_user`
`SQLCODE -30081` Comm error	Firewall blocking DRDA port	Open port 5023 (z/OS) or 50000 (LUW)
`SQLCODE -30082` Security error	Auth failure or SSL mismatch	Verify credentials and trust store
`SQLCODE -20528` Package not bound	JDBC packages missing	Run BIND PACKAGE or enable auto-bind
Connection timeout	Network latency or wrong port	Increase `loginTimeout` in JDBC URL
Out of memory in Spark	Large table without partitioning	Add `partitionColumn`, `numPartitions`
Garbled characters	EBCDIC not decoded	Use `cp037` decoder; verify `translate binary=true` for iSeries
Decimal precision loss	DECFLOAT mapped to DOUBLE	Use DECIMAL(38,n) for financial amounts
Slow JDBC reads	Single-threaded fetch	Increase `fetchsize` (50000+) and `numPartitions`
iSeries record locking	AS/400 file-level locks	Add `block size=512;block criteria=2` to JDBC URL

📚 Best Practices¶

Assess all three DB2 variants independently --- z/OS, LUW, and iSeries have different catalogs, protocols, and encoding challenges.
Install the correct JDBC license JARs --- db2jcc_license_cisuz.jar is mandatory for z/OS and iSeries; missing it causes cryptic errors.
Handle EBCDIC conversion at ingestion --- Convert all z/OS character data to UTF-8 during Bronze layer load.
Convert packed decimal fields explicitly --- Use known precision and scale from the COBOL copybook, never rely on automatic inference.
Partition JDBC reads for large tables --- Tables over 1M rows should use partitionColumn and numPartitions for parallel extract.
Use CDC for high-frequency tables --- Cage transactions and slot telemetry benefit from real-time CDC; batch is fine for reference tables.
Validate sort order after migration --- EBCDIC and UTF-8 have different sort orders; re-validate ORDER BY logic.
Keep DB2 running during validation --- Row count and checksum comparisons require concurrent access to source and target.
Coordinate with the z/OS DBA --- BIND PACKAGE, GRANTs, DDF config, and ASN capture setup require mainframe authority.
Document all type mappings --- Maintain a living reference of DB2-to-Fabric mappings including CCSID values and packed decimal layouts.

🎉 Summary¶

Congratulations! You have completed the IBM DB2 as a Source for Microsoft Fabric tutorial. You have learned to:

✅ Assess DB2 environments across z/OS, LUW, and iSeries using system catalog queries
✅ Configure DB2 connectivity through Data Gateway and Self-Hosted Integration Runtime
✅ Map DB2 data types to Fabric, including GRAPHIC, DECFLOAT, and ROWID
✅ Translate DB2 SQL patterns (FETCH FIRST, WITH UR, || concatenation) to Fabric equivalents
✅ Build Data Factory pipelines and PySpark notebooks for batch migration
✅ Implement CDC using InfoSphere CDC, Q Replication, and Debezium
✅ Handle z/OS-specific challenges: EBCDIC encoding, packed decimal, DRDA protocol
✅ Validate migrated data for row counts, checksums, and character encoding integrity

➡️ Next Steps¶

Continue to Tutorial 26: Multi-Source Real-Time Intelligence to learn how to combine multiple data sources --- including DB2 CDC streams --- into a unified real-time analytics pipeline in Microsoft Fabric.

📁 Included Resources¶

Resource	Description
`scripts/db2_migration_utils.py`	Python utilities (EBCDIC, packed decimal converters)
`scripts/db2_type_mapping.sql`	Complete type mapping reference
`scripts/db2_sql_translation.sql`	SQL translation examples
`scripts/db2_catalog_queries.sql`	System catalog queries for all variants
`templates/db2_migration_checklist.md`	Migration checklist

📚 Additional Resources¶

Previous	Up	➡️ Next
24-Snowflake to Fabric	Tutorials Index	26-Multi-Source Streaming

Questions or issues? Open an issue in the GitHub repository.

⬆️ Back to Top | 📚 Tutorials | 🏠 Home

← Previous24 — Snowflake to FabricRead more →Next →26 — Multi-Source StreamingRead more →

🏢 Tutorial 25: IBM DB2 as a Source for Microsoft Fabric¶

🏢 Tutorial 25: IBM DB2 as a Source for Microsoft Fabric¶

Progress Tracker¶

📖 Overview¶

🎯 Learning Objectives¶

🏗️ Migration Architecture Overview¶

Migration Approaches¶

📋 Prerequisites¶

🛠️ Step 1: Assess Your DB2 Environment¶

1.1 z/OS System Catalog Queries¶

1.2 LUW System Catalog Queries¶

1.3 iSeries (AS/400) Catalog Queries¶

1.4 Complexity Factors¶

🛠️ Step 2: Configure DB2 Connectivity¶

2.1 On-Premises Data Gateway¶

2.2 Self-Hosted Integration Runtime¶

2.3 JDBC Driver Components¶

2.4 Connection String Formats¶

2.5 SSL/TLS Configuration¶

2.6 Test Connection Notebook¶

🛠️ Step 3: Data Type Mapping¶

3.1 DB2 to Fabric Type Reference¶

3.2 EBCDIC to UTF-8 Conversion¶

3.3 Packed Decimal (COMP-3) Handling¶

3.4 TIMESTAMP(12) Precision¶

🛠️ Step 4: SQL Translation Patterns¶

4.1 FETCH FIRST N ROWS¶

4.2 WITH UR (Uncommitted Read)¶

4.3 Date/Time Functions¶

4.4 Concatenation Operator (||)¶

4.5 Common Function Mappings¶

4.6 Stored Procedure Conversion¶

🛠️ Step 5: Batch Data Migration¶

5.1 Data Factory Pipeline Architecture¶

5.2 Pipeline JSON Definition¶

5.3 JDBC Read with Partitioning¶

5.4 Handling NULL Indicators and Code Pages¶

🛠️ Step 6: CDC Patterns for DB2¶

6.1 InfoSphere CDC (ASN Capture)¶

6.2 CDC via JDBC Polling¶

6.3 Q Replication to Kafka to Eventstreams¶

6.4 Debezium DB2 Connector¶

🛠️ Step 7: z/OS Specific Patterns¶

7.1 DRDA Protocol and Subsystem Access¶

7.2 BIND PACKAGE Requirements¶

7.3 Comprehensive EBCDIC + Packed Decimal Processing¶

🛠️ Step 8: Ongoing Synchronization¶

8.1 Strategy by Table Type¶

8.2 Hybrid Approach¶

🛠️ Step 9: Validate Migrated Data¶

9.1 Row Count Comparison¶

9.2 Checksum Validation¶

9.3 EBCDIC / UTF-8 Character Validation¶

🔧 Troubleshooting¶

📚 Best Practices¶

🎉 Summary¶

➡️ Next Steps¶

📁 Included Resources¶

📚 Additional Resources¶

🧭 Navigation¶