Data Migration -- IBM Db2 to Azure SQL¶

Audience: DBAs, Data Engineers, Migration Engineers Purpose: Comprehensive guide for moving data from IBM Db2 (z/OS and LUW) to Azure SQL, covering SSMA data migration, Db2 utilities, Azure Data Factory, bulk insert, and BCP for large tables.

Data migration strategy overview¶

Data migration from Db2 to Azure SQL follows one of four patterns depending on data volume, acceptable downtime, and source platform:

Strategy	Data volume	Downtime tolerance	Best for
SSMA data migration	< 100 GB	Hours	Small databases, initial migrations, validation
Db2 EXPORT + AzCopy + BULK INSERT	100 GB - 5 TB	Extended window (overnight/weekend)	Large tables, batch migration
Azure Data Factory Db2 connector	Any	Minimal (incremental)	Ongoing replication, phased cutover
BCP + parallel load	1 TB+	Extended window	Very large tables, maximum throughput

1. SSMA data migration¶

SSMA for Db2 includes integrated data migration that reads source data via DRDA or native client and writes to the target Azure SQL database via bulk operations.

When to use SSMA data migration¶

Databases under 100 GB total size
Initial proof-of-concept or pilot migrations
Small-to-medium tables where simplicity matters
Validation testing alongside schema conversion

Configuration for data migration¶

In SSMA Project Settings, configure data migration parameters:

SSMA setting	Recommended value	Notes
Batch size	10,000	Rows per batch; increase for simple schemas, decrease for wide rows
Extended data migration	Enable	Uses client-side data pump for better performance
Parallel data migration	Enable (4-8 threads)	Migrates multiple tables simultaneously
Table lock during migration	Enable	Improves insert throughput via TABLOCK hint
Data migration timeout	7200 (seconds)	Increase for very large tables

Running SSMA data migration¶

Complete schema conversion and apply to target database.
In SSMA, right-click the source database and select Migrate Data.
SSMA reads source tables via the Db2 connection and performs bulk insert into target tables.
Monitor progress in the SSMA output window. Failed rows are logged with error details.
After completion, review the data migration report for table-level row counts and error summaries.

Handling SSMA data migration errors¶

Common errors and resolutions:

Error	Cause	Resolution
String truncation	Source data exceeds target column width	Increase target column width or use VARCHAR(MAX)
Code page conversion	EBCDIC characters without Unicode mapping	Map problematic characters; see mainframe guide
Arithmetic overflow	DECFLOAT values exceeding DECIMAL precision	Increase target DECIMAL precision or use FLOAT
Foreign key violation	Parent table not yet migrated	Migrate parent tables first or disable FK constraints during load
Timeout	Large table exceeds default timeout	Increase timeout setting; consider alternative strategy

2. Db2 EXPORT + AzCopy + BULK INSERT¶

For tables exceeding 100 GB or when SSMA throughput is insufficient, export data from Db2 to flat files, transfer to Azure, and bulk-load into Azure SQL.

Step 1: Export from Db2¶

-- Db2 EXPORT command for delimited format
EXPORT TO /data/export/transactions.csv OF DEL
    MODIFIED BY COLDEL| CHARDEL"" TIMESTAMPFORMAT="YYYY-MM-DD HH:MM:SS.UUUUUU"
    MESSAGES /data/export/transactions.msg
    SELECT * FROM finance.transactions;

Export options explained:

OF DEL -- delimited format (pipe-delimited recommended for data containing commas)
COLDEL| -- column delimiter (pipe character avoids conflicts with data)
CHARDEL"" -- character delimiter (double-quote for strings)
TIMESTAMPFORMAT -- ensures consistent timestamp format for SQL Server import
MESSAGES -- captures row counts and warnings

For z/OS, use the Db2 UNLOAD utility or DSNTIAUL instead of EXPORT:

//UNLOAD   EXEC PGM=IKJEFT01
//SYSTSIN  DD *
  DSN SYSTEM(DB2P)
  RUN PROGRAM(DSNTIAUL) PLAN(DSNTIAUL) -
      PARMS('/UNICODE')
  END
//SYSIN    DD *
  SELECT * FROM FINANCE.TRANSACTIONS;
/*

Step 2: Transfer to Azure with AzCopy¶

# Install AzCopy (if not already available)
# Transfer exported files to Azure Blob Storage
azcopy copy "/data/export/transactions.csv" \
    "https://migrationstorage.blob.core.usgovcloudapi.net/db2-export/transactions.csv" \
    --put-md5

# For large exports with multiple files, copy the entire directory
azcopy copy "/data/export/" \
    "https://migrationstorage.blob.core.usgovcloudapi.net/db2-export/" \
    --recursive \
    --put-md5

For z/OS data: Use IBM z/OS Connect, Sterling File Gateway, or a mainframe FTP exit to transfer files from the mainframe to an intermediary server, then AzCopy to Azure. Direct AzCopy from z/OS is not supported.

Step 3: BULK INSERT into Azure SQL¶

-- Create a database-scoped credential for Blob Storage access
CREATE DATABASE SCOPED CREDENTIAL BlobCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '<SAS-token>';

-- Create an external data source
CREATE EXTERNAL DATA SOURCE BlobStorage
WITH (
    TYPE = BLOB_STORAGE,
    LOCATION = 'https://migrationstorage.blob.core.usgovcloudapi.net/db2-export',
    CREDENTIAL = BlobCredential
);

-- BULK INSERT with format options matching Db2 EXPORT
BULK INSERT finance.transactions
FROM 'transactions.csv'
WITH (
    DATA_SOURCE = 'BlobStorage',
    FORMAT = 'CSV',
    FIELDTERMINATOR = '|',
    ROWTERMINATOR = '\n',
    FIELDQUOTE = '"',
    FIRSTROW = 1,
    TABLOCK,
    BATCHSIZE = 100000,
    MAXERRORS = 100
);

Performance tuning for BULK INSERT¶

Technique	Impact	How
TABLOCK hint	30-50% faster	Minimizes row-level locking
Batch size 100K-500K	Optimal throughput	Balance between transaction log usage and commit frequency
Minimal logging	2-5x faster	Set database recovery model to BULK_LOGGED during load
Disable indexes	20-40% faster	Drop non-clustered indexes, load, rebuild
Disable constraints	10-20% faster	Disable FK checks during load with `ALTER TABLE NOCHECK`
Split large files	Better parallelism	Split into 1 GB files and load in parallel

3. Azure Data Factory Db2 connector¶

ADF provides a native Db2 connector that reads data from Db2 for LUW and Db2 for z/OS (via DRDA). This is the recommended approach for ongoing data movement, incremental loads, and phased cutover migrations.

Setting up the Db2 linked service¶

{
    "name": "Db2LinkedService",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
        "type": "Db2",
        "typeProperties": {
            "connectionString": "Server=db2server.example.com;Database=FINANCEDB;AuthenticationType=Basic",
            "username": "db2admin",
            "password": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "KeyVaultLinkedService",
                    "type": "LinkedServiceReference"
                },
                "secretName": "db2-admin-password"
            }
        },
        "connectVia": {
            "referenceName": "SelfHostedIR",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Integration runtime: For Db2 on-premises or in a government data center, deploy a Self-Hosted Integration Runtime (SHIR) in the network where Db2 is accessible. See docs/SELF_HOSTED_IR.md for SHIR deployment patterns.

Copy activity for full table migration¶

{
    "name": "CopyTransactions",
    "type": "Copy",
    "inputs": [
        {
            "referenceName": "Db2TransactionsDataset",
            "type": "DatasetReference"
        }
    ],
    "outputs": [
        {
            "referenceName": "AzureSqlTransactionsDataset",
            "type": "DatasetReference"
        }
    ],
    "typeProperties": {
        "source": {
            "type": "Db2Source",
            "query": "SELECT * FROM FINANCE.TRANSACTIONS WHERE TRANS_DATE >= '2025-01-01'"
        },
        "sink": {
            "type": "AzureSqlSink",
            "writeBehavior": "insert",
            "sqlWriterUseTableLock": true,
            "disableMetricsCollection": false
        },
        "enableStaging": true,
        "stagingSettings": {
            "linkedServiceName": {
                "referenceName": "StagingBlobStorage",
                "type": "LinkedServiceReference"
            },
            "path": "adf-staging"
        },
        "parallelCopies": 8
    }
}

Incremental load pattern¶

For phased cutover, use watermark-based incremental loads:

{
    "name": "IncrementalLoadTransactions",
    "type": "Copy",
    "typeProperties": {
        "source": {
            "type": "Db2Source",
            "query": {
                "value": "SELECT * FROM FINANCE.TRANSACTIONS WHERE LAST_MODIFIED > '@{activity('GetWatermark').output.firstRow.watermark}'",
                "type": "Expression"
            }
        },
        "sink": {
            "type": "AzureSqlSink",
            "writeBehavior": "upsert",
            "upsertSettings": {
                "useTempDB": true,
                "keys": ["TRANS_ID"]
            }
        }
    }
}

ADF performance tuning¶

Setting	Recommended value	Impact
Parallel copies	4-16	Parallel read/write threads
DIU (Data Integration Units)	16-64	Cloud compute for data movement
Staging via Blob	Enabled	Enables PolyBase-like bulk loading
Batch size	100,000	Rows per batch write
Table lock	Enabled	Reduces lock contention

4. BCP for large table migration¶

For the largest tables (40M+ rows, 500 GB+), BCP (Bulk Copy Program) provides maximum throughput with fine-grained control.

Export from Db2 to BCP-compatible format¶

First, export from Db2 to a delimited format that BCP can read:

# Db2 CLP export
db2 "EXPORT TO /data/export/large_table.dat OF DEL \
    MODIFIED BY COLDEL0x7C NOCHARDEL \
    TIMESTAMPFORMAT=\"YYYY-MM-DD HH:MM:SS.UUUUUU\" \
    SELECT * FROM SCHEMA.LARGE_TABLE"

BCP import to Azure SQL MI¶

# BCP import with pipe delimiter
bcp "SCHEMA.LARGE_TABLE" in /data/export/large_table.dat \
    -S "sqlmi-instance.database.usgovcloudapi.net" \
    -d "TargetDB" \
    -U "sqladmin" \
    -P "$DB_PASSWORD" \
    -t "|" \
    -b 100000 \
    -h "TABLOCK" \
    -a 65535 \
    -e /data/export/large_table_errors.log

BCP parameter reference:

Parameter	Value	Purpose
`-t`	`"\\|"`	Field terminator (pipe)
`-b`	100000	Batch size
`-h`	"TABLOCK"	Table lock hint
`-a`	65535	Network packet size (max)
`-e`	error file	Error output file

Parallel BCP for maximum throughput¶

Split large exports into range-based partitions and load in parallel:

#!/bin/bash
# Parallel BCP load for partitioned data

TABLE="SCHEMA.LARGE_TABLE"
TARGET_DB="TargetDB"
SERVER="sqlmi.database.usgovcloudapi.net"

# Split by date range (or any partition key)
for YEAR in 2020 2021 2022 2023 2024 2025; do
    echo "Loading year $YEAR..."
    bcp "$TABLE" in "/data/export/large_table_${YEAR}.dat" \
        -S "$SERVER" -d "$TARGET_DB" \
        -U "sqladmin" -P "$DB_PASSWORD" \
        -t "|" -b 100000 -h "TABLOCK" -a 65535 \
        -e "/data/export/errors_${YEAR}.log" &
done

wait
echo "All partitions loaded."

5. Data validation and reconciliation¶

Row count validation¶

-- Run on Db2
SELECT 'TRANSACTIONS' AS table_name, COUNT(*) AS row_count
FROM FINANCE.TRANSACTIONS
UNION ALL
SELECT 'ACCOUNTS', COUNT(*) FROM FINANCE.ACCOUNTS
UNION ALL
SELECT 'CUSTOMERS', COUNT(*) FROM FINANCE.CUSTOMERS;

-- Run on Azure SQL and compare
SELECT 'TRANSACTIONS' AS table_name, COUNT(*) AS row_count
FROM FINANCE.TRANSACTIONS
UNION ALL
SELECT 'ACCOUNTS', COUNT(*) FROM FINANCE.ACCOUNTS
UNION ALL
SELECT 'CUSTOMERS', COUNT(*) FROM FINANCE.CUSTOMERS;

Checksum validation¶

-- Db2: aggregate checksum for numeric columns
SELECT
    COUNT(*) AS row_count,
    SUM(CAST(amount AS DECIMAL(31,2))) AS total_amount,
    MIN(trans_date) AS min_date,
    MAX(trans_date) AS max_date
FROM FINANCE.TRANSACTIONS;

-- Azure SQL: same query for comparison
SELECT
    COUNT(*) AS row_count,
    SUM(CAST(amount AS DECIMAL(31,2))) AS total_amount,
    MIN(trans_date) AS min_date,
    MAX(trans_date) AS max_date
FROM FINANCE.TRANSACTIONS;

Sampling validation¶

For very large tables, validate a random sample:

-- Azure SQL: validate random sample
SELECT TOP 1000 *
FROM FINANCE.TRANSACTIONS
ORDER BY NEWID();

-- Compare specific rows by primary key against source Db2

Character encoding validation¶

Critical for z/OS migrations where EBCDIC-to-Unicode conversion may introduce issues:

-- Azure SQL: find rows with replacement characters (encoding issues)
SELECT *
FROM FINANCE.CUSTOMERS
WHERE name LIKE '%' + NCHAR(0xFFFD) + '%'   -- Unicode replacement character
   OR address LIKE '%' + NCHAR(0xFFFD) + '%';

6. Cutover strategies¶

Big-bang cutover¶

Freeze Db2 writes (application downtime).
Run final data migration (delta since last sync).
Validate row counts and checksums.
Switch application connection strings to Azure SQL.
Monitor for 24-48 hours before decommissioning Db2.

Best for: Small databases (< 50 GB), short acceptable downtime windows.

Phased cutover with dual-write¶

Set up ADF incremental pipeline from Db2 to Azure SQL.
Run dual-write period: application writes to Db2, ADF replicates to Azure SQL.
Validate Azure SQL data continuously during dual-write.
Switch read queries to Azure SQL first (read traffic cutover).
Switch write queries to Azure SQL (write traffic cutover).
Maintain Db2 as read-only fallback for 2-4 weeks.
Decommission Db2 after validation period.

Best for: Large databases, minimal downtime requirements.

Table-by-table cutover¶

Migrate and cut over individual tables/domains independently.
Use linked servers or ADF for cross-database queries during transition.
Complete migration when all tables are on Azure SQL.

Best for: Complex databases with independent table groups.

7. Post-migration data integration with CSA-in-a-Box¶

After data migration to Azure SQL, establish the analytics integration:

Enable Fabric Mirroring from Azure SQL MI or Azure SQL DB to bring data into OneLake as Delta tables.
Configure Purview scanning to classify and catalog the migrated data.
Build Power BI semantic models over the mirrored Delta tables using Direct Lake mode.
Set up ADF pipelines for any incremental data feeds that continue from the Db2 environment during the transition period.

Schema Migration -- prepare the target schema first
Tutorial: SSMA Migration -- step-by-step SSMA walkthrough
Mainframe Considerations -- z/OS-specific data movement
Best Practices -- validation methodology

Maintainers: csa-inabox core team Last updated: 2026-04-30