📋 Copy Job - Continuous Ingestion with Change Data Capture¶

Low-Code Incremental Data Ingestion for Microsoft Fabric

Last Updated: 2026-04-13 | Version: 1.0.0

📑 Table of Contents¶

🎯 Overview
🏗️ Architecture
⚙️ Configuration
🔄 CDC Patterns
📐 Source & Destination Mapping
🎰 Casino Implementation
🏛️ Federal Agency Implementation
📊 Monitoring and Troubleshooting
⚡ Performance Optimization
⚠️ Limitations
📚 References

🎯 Overview¶

Copy Job is a Microsoft Fabric workspace item (Preview) designed for continuous, low-code data ingestion with built-in change data capture (CDC). It fills a critical gap in the Fabric data movement spectrum -- between the one-shot Copy Activity (which runs once per pipeline execution) and database Mirroring (which provides full real-time replication). Copy Job provides a middle ground: scheduled incremental ingestion that automatically tracks what has changed since the last run and loads only the delta.

Think of Copy Job as a managed, low-code alternative to building custom watermark-based incremental notebooks. Instead of writing PySpark logic to track high-water marks, manage checkpoints, and handle schema drift, Copy Job handles all of this through a visual configuration experience.

Where Copy Job Fits¶

flowchart LR
    subgraph Spectrum["📊 Data Movement Spectrum"]
        CA["Copy Activity<br/>One-shot, pipeline-driven<br/>Full or filtered load"]
        CJ["📋 Copy Job<br/>Scheduled incremental<br/>Watermark-based CDC"]
        MR["Mirroring<br/>Real-time replication<br/>Full database CDC"]
    end

    CA -->|"More automation"| CJ -->|"More real-time"| MR

    style CJ fill:#6C3483,stroke:#4A235A,color:#fff
    style CA fill:#2471A3,stroke:#1A5276,color:#fff
    style MR fill:#E67E22,stroke:#CA6F1E,color:#fff

Key Capabilities¶

Capability	Description
Scheduled Incremental Loads	Configure load frequency from every 5 minutes to daily; only changed/new data is loaded
Automatic Watermark Tracking	Copy Job manages high-water mark state internally -- no manual checkpoint management
Multi-Table Support	Ingest multiple tables from the same source in a single Copy Job with independent watermarks
Schema Mapping	Visual column mapping with support for type conversion and column renaming
Change Tracking Integration	Native support for SQL Server Change Tracking and timestamp-based incremental detection
Delta Lake Output	All data lands in OneLake as Delta tables, ready for Lakehouse querying
Built-In Monitoring	Execution history, rows loaded, duration, and error details per table per run
Low-Code Experience	No PySpark or SQL required -- fully visual configuration in the Fabric portal

Copy Job vs. Alternatives¶

Feature	Copy Activity	Copy Job	Mirroring
Setup Complexity	Medium (pipeline + parameters)	Low (visual wizard)	Low (connection + select tables)
Incremental Support	Manual (watermark logic)	Built-in (automatic watermarks)	Automatic (CDC log tailing)
Schedule Flexibility	Pipeline triggers	Built-in scheduler (5 min - daily)	Continuous (near real-time)
Multi-Table	One per activity (or ForEach)	Multiple tables per job	Entire database
Target Format	Delta, Parquet, CSV	Delta only	Delta (read-only mirror)
Write Mode	Overwrite, Append, Upsert	Append, Upsert	Mirror (system-managed)
Source Changes	Full requery per run	Watermark-filtered queries	Log-based CDC
Best For	One-time loads, complex transforms	Scheduled incremental ETL	Full database replication

🏗️ Architecture¶

Copy Job operates as a managed orchestration item within the Fabric workspace. Each Copy Job encapsulates source connectivity, table mappings, watermark state, scheduling, and execution monitoring.

Component Architecture¶

flowchart LR
    subgraph Source["📡 Data Sources"]
        SQL["SQL Server"]
        ORA["Oracle"]
        PG["PostgreSQL"]
        MY["MySQL"]
        REST["REST API"]
        BLOB["Azure Blob"]
    end

    subgraph CopyJob["📋 Copy Job"]
        CONN["Connection<br/>Manager"]
        MAP["Table & Column<br/>Mapping"]
        WM["Watermark<br/>Tracker"]
        SCH["Scheduler<br/>(5 min - daily)"]
        EXEC["Execution<br/>Engine"]
    end

    subgraph Target["💾 OneLake"]
        DT["Delta Tables<br/>(Lakehouse)"]
    end

    Source --> CONN --> MAP --> WM --> EXEC --> Target
    SCH -->|"Trigger"| EXEC

    style Source fill:#2471A3,stroke:#1A5276,color:#fff
    style CopyJob fill:#6C3483,stroke:#4A235A,color:#fff
    style Target fill:#27AE60,stroke:#1E8449,color:#fff

Execution Flow¶

sequenceDiagram
    participant SCH as Scheduler
    participant CJ as Copy Job Engine
    participant WM as Watermark Store
    participant SRC as Source Database
    participant OL as OneLake (Delta)

    SCH->>CJ: Trigger scheduled run
    CJ->>WM: Read last watermark per table
    WM-->>CJ: watermark = 2026-04-12T18:30:00Z

    loop For Each Mapped Table
        CJ->>SRC: SELECT * WHERE modified_date > '2026-04-12T18:30:00Z'
        SRC-->>CJ: Changed rows (delta)
        CJ->>OL: Write delta rows to Delta table (append or upsert)
        CJ->>WM: Update watermark to MAX(modified_date) from delta
    end

    CJ->>CJ: Log execution metrics (rows, duration, status)

    Note over CJ,SRC: Only changed data is queried
    Note over CJ,OL: Delta format with ACID guarantees

Copy Job Lifecycle¶

stateDiagram-v2
    [*] --> Created: New Copy Job
    Created --> Configured: Map tables & columns
    Configured --> Scheduled: Enable schedule
    Scheduled --> Running: Trigger fires
    Running --> Succeeded: All tables loaded
    Running --> PartiallyFailed: Some tables failed
    Running --> Failed: Critical error
    Succeeded --> Scheduled: Wait for next trigger
    PartiallyFailed --> Scheduled: Retry on next run
    Failed --> Scheduled: Auto-retry (configurable)
    Scheduled --> Paused: Manual pause
    Paused --> Scheduled: Resume

⚙️ Configuration¶

Step 1: Create the Copy Job Item¶

Workspace → + New → Copy Job (Preview)
  Name: cj-casino-slot-logs
  Description: Incremental ingestion of slot machine transaction logs from Oracle

Step 2: Configure the Source Connection¶

Copy Job → Source → + Add Connection
  Connection Type: Oracle
  Server: <oracle-host>:1521
  Database: GAMING_OPS
  Authentication: Service Principal (recommended)

  Advanced:
  ├── Connection Timeout: 30 seconds
  ├── Command Timeout: 600 seconds
  └── Encryption: Required

Step 3: Select and Map Tables¶

Copy Job → Tables → + Add Tables
  ☑ SLOT_TRANSACTIONS       → bronze_slot_transactions
  ☑ PLAYER_SESSIONS          → bronze_player_sessions
  ☑ MACHINE_STATUS_LOG       → bronze_machine_status
  ☑ COMPLIANCE_EVENTS        → bronze_compliance_events

Step 4: Configure Incremental Column¶

For each table, specify the column used to detect changes:

Source Table	Incremental Column	Type	Write Mode
`SLOT_TRANSACTIONS`	`TRANSACTION_TIMESTAMP`	Timestamp	Append
`PLAYER_SESSIONS`	`SESSION_END_TIME`	Timestamp	Append
`MACHINE_STATUS_LOG`	`STATUS_CHANGE_ID`	Integer (monotonic)	Append
`COMPLIANCE_EVENTS`	`MODIFIED_DATE`	Timestamp	Upsert (on `event_id`)

Step 5: Configure Column Mapping¶

Copy Job → Tables → SLOT_TRANSACTIONS → Column Mapping

  Source Column          → Target Column              Type Conversion
  ─────────────────────────────────────────────────────────────────
  TRANSACTION_ID         → transaction_id             VARCHAR → string
  MACHINE_ID             → machine_id                 VARCHAR → string
  TRANSACTION_TIMESTAMP  → transaction_timestamp      TIMESTAMP → timestamp
  TRANSACTION_TYPE       → transaction_type           VARCHAR → string
  AMOUNT                 → amount                     NUMBER(18,2) → decimal(18,2)
  PLAYER_ID              → player_id                  VARCHAR → string (nullable)
  DENOMINATION           → denomination               NUMBER(5,2) → decimal(5,2)
  FLOOR_LOCATION         → floor_location             VARCHAR → string

  Exclude Columns:
  ☐ INTERNAL_AUDIT_FLAG (internal use only)
  ☐ RAW_SAS_PAYLOAD (too large for bronze)

Step 6: Configure Schedule¶

Copy Job → Schedule
  Frequency: Every 15 minutes
  Start Time: Immediate
  End Time: None (continuous)

  Advanced:
  ├── Retry on Failure: 3 attempts with 5-minute backoff
  ├── Timeout per Table: 10 minutes
  ├── Concurrent Table Loads: 4 (parallel)
  └── Alert on Failure: Email data-engineering@casino.com

Step 7: Initial Load¶

Before starting scheduled incremental runs, perform an initial full load:

Copy Job → Run → Initial Load
  Mode: Full (load all existing data)
  Tables: All mapped tables

  Progress:
  ├── SLOT_TRANSACTIONS     █████████████████████░ 95% (4.2M rows)
  ├── PLAYER_SESSIONS       ████████████████████░░ 88% (1.8M rows)
  ├── MACHINE_STATUS_LOG    █████████████████████████ 100% (320K rows)
  └── COMPLIANCE_EVENTS     █████████████████████████ 100% (45K rows)

📝 Note: The initial full load establishes the baseline watermark for each table. Subsequent scheduled runs will only load rows with incremental column values greater than the stored watermark.

🔄 CDC Patterns¶

Copy Job supports three change data capture patterns. Choose the pattern that matches your source system's capabilities.

Pattern Comparison¶

Pattern	Mechanism	Source Requirement	Accuracy	Performance
Watermark (Timestamp)	Query rows where `modified_date > last_watermark`	Table has a reliable `modified_date` column	High (if timestamp is always updated)	Good (index on timestamp column recommended)
Watermark (Integer)	Query rows where `id > last_watermark`	Table has a monotonically increasing integer column	High (for append-only tables)	Excellent (primary key scan)
Change Tracking	Query SQL Server Change Tracking system tables	SQL Server with Change Tracking enabled on source tables	Exact (includes deletes)	Good (CT overhead on source)

Pattern 1: Timestamp-Based Watermark¶

The most common pattern. Requires a column that is updated whenever the row is modified (e.g., modified_date, updated_at, last_change_timestamp).

flowchart LR
    subgraph Source["📡 Source Table"]
        R1["Row 1: modified_date = Apr 10"]
        R2["Row 2: modified_date = Apr 11"]
        R3["Row 3: modified_date = Apr 12"]
        R4["Row 4: modified_date = Apr 13 ← NEW"]
    end

    subgraph CopyJob["📋 Copy Job"]
        WM["Watermark: Apr 12 00:00"]
        QRY["SELECT * WHERE<br/>modified_date > Apr 12"]
    end

    subgraph Target["💾 Delta Table"]
        D["Append: Row 4"]
    end

    Source --> QRY --> Target
    WM -.->|"Filter"| QRY

    style CopyJob fill:#6C3483,stroke:#4A235A,color:#fff

Configuration:

{
    "table": "SLOT_TRANSACTIONS",
    "incremental_column": "TRANSACTION_TIMESTAMP",
    "incremental_type": "timestamp",
    "write_mode": "append",
    "watermark_initial": "1970-01-01T00:00:00Z"
}

Advantages: - Works with any database that supports timestamp columns - Simple to implement and understand - Good performance with index on timestamp column

Considerations: - Does not detect deletes (deleted rows are not captured) - Requires the timestamp column to be updated on every modification - Clock skew between source servers can cause missed rows (rare)

Pattern 2: Integer-Based Watermark¶

Best for append-only tables with an auto-increment primary key. The watermark tracks the highest loaded ID value.

{
    "table": "MACHINE_STATUS_LOG",
    "incremental_column": "STATUS_CHANGE_ID",
    "incremental_type": "integer",
    "write_mode": "append",
    "watermark_initial": 0
}

Advantages: - Extremely efficient (primary key index scan) - No false positives -- each ID is loaded exactly once - No dependency on timestamps or clock synchronization

Considerations: - Only works for append-only tables (no updates to existing rows) - ID gaps (from rollbacks or deletes) are harmless but may seem unexpected

Pattern 3: SQL Server Change Tracking¶

For SQL Server sources with Change Tracking enabled, Copy Job can query the CT system tables directly. This is the most accurate pattern because it captures inserts, updates, and deletes.

flowchart LR
    subgraph Source["📡 SQL Server"]
        CT["Change Tracking<br/>System Tables"]
        TBL["Source Table"]
    end

    subgraph CopyJob["📋 Copy Job"]
        QRY["CHANGETABLE(CHANGES ..., @last_version)"]
        MERGE["Merge Logic:<br/>INSERT / UPDATE / DELETE"]
    end

    subgraph Target["💾 Delta Table"]
        UP["Upsert changed rows"]
        DEL["Soft-delete removed rows"]
    end

    CT --> QRY --> MERGE --> Target

    style CopyJob fill:#6C3483,stroke:#4A235A,color:#fff

Configuration:

{
    "table": "COMPLIANCE_EVENTS",
    "incremental_column": null,
    "incremental_type": "change_tracking",
    "write_mode": "upsert",
    "upsert_key": ["event_id"],
    "handle_deletes": "soft_delete",
    "soft_delete_column": "_is_deleted"
}

Advantages: - Captures inserts, updates, AND deletes - Exact change detection (no false positives or missed rows) - Minimal source query impact (reads CT tables, not full table scans)

Considerations: - SQL Server only (not available for Oracle, PostgreSQL, MySQL) - Change Tracking must be enabled on the source database and tables - CT retention on source must be longer than the Copy Job schedule interval

Upsert (Merge) Write Mode¶

When using the Upsert write mode, Copy Job performs a Delta Lake MERGE operation:

-- Conceptual merge logic executed by Copy Job
MERGE INTO bronze_compliance_events AS target
USING (SELECT * FROM staging_batch) AS source
ON target.event_id = source.event_id
WHEN MATCHED THEN
    UPDATE SET
        target.event_type = source.event_type,
        target.amount = source.amount,
        target.modified_date = source.modified_date,
        target._last_updated = current_timestamp()
WHEN NOT MATCHED THEN
    INSERT (event_id, event_type, amount, modified_date, _last_updated)
    VALUES (source.event_id, source.event_type, source.amount,
            source.modified_date, current_timestamp());

💡 Tip: Use Append mode for high-volume, insert-only tables (telemetry, logs, transactions). Use Upsert mode for slowly changing dimension tables or tables where existing rows are updated (compliance events, player profiles, machine configurations).

📐 Source & Destination Mapping¶

Supported Sources¶

Source	Version	Auth Methods	Change Detection
SQL Server	2016+	SQL Auth, Windows Auth, Entra ID	Timestamp, Integer, Change Tracking
Azure SQL Database	All	SQL Auth, Entra ID, Managed Identity	Timestamp, Integer, Change Tracking
Oracle	12c+	Oracle Auth	Timestamp, Integer
PostgreSQL	12+	PostgreSQL Auth, Entra ID	Timestamp, Integer
MySQL	8.0+	MySQL Auth	Timestamp, Integer
Azure Blob Storage	All	Account Key, SAS, Managed Identity	File modified date
Azure Data Lake Gen2	All	Account Key, SAS, Managed Identity	File modified date
Amazon S3	All	Access Key, IAM Role	File modified date
REST API	HTTP/HTTPS	API Key, OAuth, Bearer Token	Response-based (pagination token)
Snowflake	All	Snowflake Auth	Timestamp, Integer

Destination Configuration¶

All Copy Job destinations are Delta tables in a Fabric Lakehouse:

Destination:
  Lakehouse: lh_bronze
  Schema: dbo (default)
  Table Prefix: bronze_ (optional, applied to all tables)

  Delta Settings:
  ├── Optimize Write: true (auto-compact small files)
  ├── Auto-Compact: true (merge small Delta files)
  ├── Liquid Clustering: false (configure post-load)
  └── Partition Columns: none (configure post-load)

Type Mapping Reference¶

Source Type (Oracle)	Source Type (SQL Server)	Delta Lake Type
VARCHAR2	VARCHAR / NVARCHAR	string
NUMBER(p,s)	DECIMAL(p,s)	decimal(p,s)
NUMBER (no scale)	INT / BIGINT	long
DATE	DATETIME / DATETIME2	timestamp
TIMESTAMP	DATETIME2(7)	timestamp
CLOB	NVARCHAR(MAX)	string
BLOB	VARBINARY(MAX)	binary
RAW	BINARY	binary

🎰 Casino Implementation¶

Casino gaming systems generate high-volume transactional data in operational databases (typically Oracle or SQL Server). Copy Job provides a low-code way to continuously ingest this data into the Fabric Bronze layer without building custom incremental notebooks.

Casino Ingestion Architecture¶

flowchart TB
    subgraph Sources["🎰 Casino Source Systems"]
        SAS["SAS/G2S Protocol<br/>Slot Machine Events"]
        TGS["Table Game System<br/>RFID Chip Tracking"]
        CMS["Casino Mgmt System<br/>Player & Compliance"]
    end

    subgraph Oracle["📡 Oracle Gaming DB"]
        T1["SLOT_TRANSACTIONS<br/>~500K rows/day"]
        T2["PLAYER_SESSIONS<br/>~50K rows/day"]
        T3["MACHINE_STATUS<br/>~100K rows/day"]
        T4["COMPLIANCE_EVENTS<br/>~5K rows/day"]
    end

    subgraph CopyJob["📋 Copy Job: cj-casino-bronze"]
        WM["Watermark Tracking<br/>per table"]
        SCH["Schedule: Every 15 min"]
    end

    subgraph Bronze["💾 Lakehouse: lh_bronze"]
        B1["bronze_slot_transactions"]
        B2["bronze_player_sessions"]
        B3["bronze_machine_status"]
        B4["bronze_compliance_events"]
    end

    Sources --> Oracle
    Oracle --> CopyJob --> Bronze

    style Sources fill:#E67E22,stroke:#CA6F1E,color:#fff
    style Oracle fill:#2471A3,stroke:#1A5276,color:#fff
    style CopyJob fill:#6C3483,stroke:#4A235A,color:#fff
    style Bronze fill:#27AE60,stroke:#1E8449,color:#fff

Incremental Oracle Slot Logs (Every 15 Minutes)¶

The primary casino ingestion scenario: slot machine transaction logs are loaded incrementally from Oracle every 15 minutes.

Copy Job Configuration:

{
    "copy_job_name": "cj-casino-slot-logs",
    "source": {
        "type": "Oracle",
        "connection": "<oracle-host>:1521/GAMING_OPS",
        "authentication": "service_principal"
    },
    "tables": [
        {
            "source_table": "GAMING_OPS.SLOT_TRANSACTIONS",
            "target_table": "bronze_slot_transactions",
            "target_lakehouse": "lh_bronze",
            "incremental_column": "TRANSACTION_TIMESTAMP",
            "incremental_type": "timestamp",
            "write_mode": "append",
            "column_mapping": [
                {"source": "TRANSACTION_ID", "target": "transaction_id"},
                {"source": "MACHINE_ID", "target": "machine_id"},
                {"source": "TRANSACTION_TIMESTAMP", "target": "transaction_timestamp"},
                {"source": "TRANSACTION_TYPE", "target": "transaction_type"},
                {"source": "AMOUNT", "target": "amount"},
                {"source": "PLAYER_ID", "target": "player_id"},
                {"source": "DENOMINATION", "target": "denomination"},
                {"source": "FLOOR_LOCATION", "target": "floor_location"}
            ]
        }
    ],
    "schedule": {
        "frequency": "every_15_minutes",
        "retry_count": 3,
        "retry_interval_minutes": 5,
        "timeout_minutes": 10
    }
}

Compliance Event Upsert¶

Compliance events (CTR, SAR, W-2G) can be modified after initial creation (status changes, review outcomes). Use Upsert mode to keep the Bronze layer current:

{
    "source_table": "GAMING_OPS.COMPLIANCE_EVENTS",
    "target_table": "bronze_compliance_events",
    "incremental_column": "MODIFIED_DATE",
    "write_mode": "upsert",
    "upsert_key": ["event_id"],
    "column_mapping": [
        {"source": "EVENT_ID", "target": "event_id"},
        {"source": "EVENT_TYPE", "target": "event_type"},
        {"source": "PLAYER_ID", "target": "player_id"},
        {"source": "AMOUNT", "target": "amount"},
        {"source": "STATUS", "target": "status"},
        {"source": "REVIEWER_ID", "target": "reviewer_id"},
        {"source": "CREATED_DATE", "target": "created_date"},
        {"source": "MODIFIED_DATE", "target": "modified_date"}
    ]
}

⚠️ Warning: Compliance data (CTR, SAR, W-2G) must be ingested completely and accurately. Configure alerts for any Copy Job failures on compliance tables. NIGC MICS requires that all reportable transactions are captured within regulatory time windows.

🏛️ Federal Agency Implementation¶

Federal agencies can use Copy Job to incrementally ingest data from agency source systems, public APIs, and partner databases into the Fabric Bronze layer.

USDA -- Commodity Price Delta Loads¶

USDA commodity price data is updated daily in a PostgreSQL database maintained by the NASS QuickStats system. Copy Job loads only the new or updated price records.

flowchart LR
    subgraph USDA["🌾 USDA Source"]
        PG["PostgreSQL<br/>NASS QuickStats<br/>commodity_prices"]
    end

    subgraph CJ["📋 Copy Job"]
        WM["Watermark:<br/>load_timestamp"]
        SCH["Schedule: Daily 6 AM"]
    end

    subgraph Bronze["💾 lh_bronze"]
        BT["bronze_usda_<br/>commodity_prices"]
    end

    USDA --> CJ --> Bronze

    style USDA fill:#27AE60,stroke:#1E8449,color:#fff
    style CJ fill:#6C3483,stroke:#4A235A,color:#fff

Configuration:

{
    "copy_job_name": "cj-usda-commodity-prices",
    "source": {
        "type": "PostgreSQL",
        "connection": "nass-quickstats.usda.gov:5432/quickstats",
        "authentication": "entra_id"
    },
    "tables": [
        {
            "source_table": "public.commodity_prices",
            "target_table": "bronze_usda_commodity_prices",
            "target_lakehouse": "lh_bronze",
            "incremental_column": "load_timestamp",
            "incremental_type": "timestamp",
            "write_mode": "append"
        },
        {
            "source_table": "public.crop_production_estimates",
            "target_table": "bronze_usda_crop_production",
            "target_lakehouse": "lh_bronze",
            "incremental_column": "release_date",
            "incremental_type": "timestamp",
            "write_mode": "upsert",
            "upsert_key": ["state_fips", "commodity_code", "year"]
        }
    ],
    "schedule": {
        "frequency": "daily",
        "time": "06:00",
        "timezone": "America/New_York"
    }
}

DOT/FAA -- Flight Status Delta Loads¶

The DOT Bureau of Transportation Statistics updates flight performance data in near-real-time. Copy Job loads delta records from the BTS REST API.

Configuration:

{
    "copy_job_name": "cj-dot-flight-status",
    "source": {
        "type": "REST_API",
        "base_url": "https://api.bts.dot.gov/v1/flights",
        "authentication": {
            "type": "api_key",
            "header": "X-API-Key"
        },
        "pagination": {
            "type": "offset",
            "page_size": 10000
        }
    },
    "tables": [
        {
            "source_endpoint": "/on-time-performance",
            "target_table": "bronze_dot_flight_performance",
            "target_lakehouse": "lh_bronze",
            "incremental_column": "last_updated",
            "incremental_type": "timestamp",
            "write_mode": "upsert",
            "upsert_key": ["flight_id", "flight_date"]
        },
        {
            "source_endpoint": "/delays",
            "target_table": "bronze_dot_flight_delays",
            "target_lakehouse": "lh_bronze",
            "incremental_column": "delay_record_id",
            "incremental_type": "integer",
            "write_mode": "append"
        }
    ],
    "schedule": {
        "frequency": "every_30_minutes"
    }
}

Multi-Agency Copy Job Summary¶

Agency	Source	Tables	Schedule	CDC Pattern	Daily Volume
USDA	PostgreSQL	commodity_prices, crop_production	Daily 6 AM	Timestamp	~50K rows
SBA	Azure SQL DB	ppp_loans, 7a_loans, disaster_loans	Daily 7 AM	Timestamp	~30K rows
NOAA	REST API	observations, alerts, storm_events	Every 30 min	Timestamp	~200K rows
EPA	PostgreSQL	tri_releases, aqs_daily, facility_registry	Daily 5 AM	Timestamp	~100K rows
DOI	REST API	earthquake_events, land_transactions	Every hour	Integer	~20K rows
DOT/FAA	REST API	flight_performance, delay_records	Every 30 min	Timestamp/Integer	~150K rows

💡 Tip: Group agency tables by source system into a single Copy Job. For example, all USDA tables that come from the same PostgreSQL instance should be in one Copy Job to minimize connection overhead and simplify monitoring.

📊 Monitoring and Troubleshooting¶

Execution History¶

Copy Job provides built-in execution history accessible from the Fabric portal:

Copy Job → Monitor → Run History

  Run ID          Start Time           Duration    Tables  Rows Loaded  Status
  ────────────────────────────────────────────────────────────────────────────
  run-2026-0413-1  2026-04-13 18:15:00  2m 34s     4       12,847      ✅ Succeeded
  run-2026-0413-2  2026-04-13 18:00:00  2m 18s     4       11,230      ✅ Succeeded
  run-2026-0413-3  2026-04-13 17:45:00  3m 02s     4       15,102      ✅ Succeeded
  run-2026-0413-4  2026-04-13 17:30:00  8m 15s     4       14,890      ⚠️ Partial
  run-2026-0413-5  2026-04-13 17:15:00  2m 45s     4       13,445      ✅ Succeeded

Per-Table Run Details¶

Copy Job → Monitor → Run Details (run-2026-0413-4)

  Table                        Rows   Duration  Status    Error
  ──────────────────────────────────────────────────────────────
  bronze_slot_transactions     8,230  1m 45s    ✅        —
  bronze_player_sessions       2,100  0m 32s    ✅        —
  bronze_machine_status        4,560  0m 48s    ✅        —
  bronze_compliance_events     0      5m 10s    ❌ Failed  ORA-12170: TNS:Connect timeout

Watermark State Inspection¶

Copy Job → Monitor → Watermark State

  Table                        Current Watermark              Last Updated
  ─────────────────────────────────────────────────────────────────────────
  bronze_slot_transactions     2026-04-13T18:14:59.123Z       2026-04-13 18:17:34
  bronze_player_sessions       2026-04-13T18:12:30.456Z       2026-04-13 18:17:34
  bronze_machine_status        STATUS_CHANGE_ID = 4,823,107   2026-04-13 18:17:34
  bronze_compliance_events     2026-04-13T17:28:00.000Z       2026-04-13 17:33:15 ← STALE

Common Issues and Resolutions¶

Issue	Symptom	Root Cause	Resolution
Connection Timeout	`ORA-12170: TNS:Connect timeout`	Source database unreachable	Check network connectivity, VPN/ExpressRoute, firewall rules
Stale Watermark	Table shows old watermark, new data not loading	Previous run failed before watermark update	Manually reset watermark or retry
Duplicate Rows	Target table has duplicate records	Source timestamp column not unique	Add upsert key; switch to Upsert write mode
Schema Drift	`Column 'NEW_COL' not found in mapping`	Source table added new column	Update column mapping in Copy Job configuration
Slow Initial Load	Initial full load takes hours	Large source table with no partitioning	Use parallel load (increase concurrent table limit)
Throttling	`429 Too Many Requests` (REST API source)	API rate limit exceeded	Reduce schedule frequency or add backoff settings

⚡ Performance Optimization¶

Source Query Optimization¶

Optimization	Description	Impact
Index on Incremental Column	Ensure the watermark column has an index on the source database	10-100x faster incremental queries
Partition Elimination	If source table is partitioned by date, watermark queries benefit from partition pruning	Significant for large tables
Column Projection	Map only needed columns; exclude large BLOB/CLOB columns	Reduces network transfer and target storage
Read Replicas	Point Copy Job to a read replica instead of the primary database	Zero impact on OLTP performance

Copy Job Tuning¶

Setting	Default	Recommendation	Description
Concurrent Tables	4	4-8 (depends on source capacity)	Number of tables loaded in parallel per run
Batch Size	10,000 rows	50,000-100,000 for high-volume tables	Rows per write batch to Delta Lake
Timeout per Table	10 minutes	Adjust based on expected load time	Fail individual table if it exceeds timeout
Schedule Frequency	15 minutes	Match to source change velocity	More frequent = lower latency but more CU
Optimize Write	On	Keep On	Auto-compacts small Delta files during write

CU Consumption Estimates¶

Scenario	Tables	Rows/Run	Schedule	Est. CU/Day
Casino Slot Logs	4 tables	~13K rows	Every 15 min	2-4% of F64
USDA Daily Load	2 tables	~50K rows	Daily	< 0.5% of F64
NOAA Observations	3 tables	~200K rows	Every 30 min	1-2% of F64
Full Federal (6 agencies)	12 tables	~550K rows	Mixed	3-5% of F64

💡 Tip: Copy Job is significantly more CU-efficient than running equivalent Spark notebooks for simple incremental loads. The managed execution engine avoids Spark session startup overhead (which can add 30-60 seconds per run).

⚠️ Limitations¶

Current Limitations (Preview)¶

Limitation	Details	Workaround
Preview Status	Feature is in Public Preview; API and configuration may change	Avoid production-critical workflows during preview
Supported Sources	Limited to sources listed in the Supported Sources section	Use Copy Activity in a pipeline for unsupported sources
Delta Output Only	Cannot write to Parquet, CSV, or Warehouse tables	Use Delta Lake; convert downstream if needed
No Complex Transforms	Column mapping and type conversion only; no expressions or lookups	Use Silver-layer notebooks for complex transformations
Max Tables per Job	50 tables per Copy Job	Create multiple Copy Jobs for larger source systems
Max Parallelism	8 concurrent table loads per Copy Job	Split into multiple jobs if more parallelism is needed
No Delete Detection	Timestamp and Integer patterns do not detect source deletes	Use Change Tracking pattern (SQL Server only) or periodic full refresh
Schedule Minimum	5 minutes minimum schedule interval	Use Mirroring or Eventstream for sub-minute latency requirements
Watermark Reset	Resetting watermark requires manual intervention in portal	No API for programmatic watermark reset (expected at GA)
Schema Evolution	New columns in source are not auto-added to mapping	Manually update column mapping when source schema changes

Feature Roadmap¶

Enhancement	Expected Timeline	Description
Expression Columns	GA (H2 2026)	Add computed columns during ingestion (e.g., `CONCAT`, `UPPER`)
CDC for All Sources	GA (H2 2026)	Log-based CDC for Oracle, PostgreSQL (beyond SQL Server)
API Management	GA (H2 2026)	Programmatic Copy Job management via REST API
Auto-Schema Evolution	Post-GA	Automatically add new source columns to mapping
Cross-Workspace Target	Post-GA	Write to Lakehouses in different workspaces

📝 Note: Copy Job is designed for structured, incremental ingestion. For complex ETL with transformations, joins, and business logic, continue using Spark notebooks or Dataflow Gen2 in the Silver layer.

📚 References¶

Resource	URL
Copy Job Overview	https://learn.microsoft.com/fabric/data-factory/copy-job-overview
Data Factory in Fabric	https://learn.microsoft.com/fabric/data-factory/data-factory-overview
Copy Activity (comparison)	https://learn.microsoft.com/fabric/data-factory/copy-data-activity
Database Mirroring (comparison)	https://learn.microsoft.com/fabric/database/mirrored-database/overview
Delta Lake MERGE	https://learn.microsoft.com/fabric/data-engineering/lakehouse-delta-merge
SQL Server Change Tracking	https://learn.microsoft.com/sql/relational-databases/track-changes/about-change-tracking-sql-server

Real-Time Intelligence -- For sub-second latency, use Eventstreams instead of Copy Job
Fabric IQ -- Query Copy Job-loaded Bronze tables with natural language
Data Governance Deep Dive -- Governance for incremental ingestion
Performance Best Practices -- Delta table optimization post-ingestion
Fabric CI/CD Deployment -- Deploy Copy Job configurations via CI/CD
Architecture -- System architecture overview

📝 Document Metadata - Author: Documentation Team - Reviewers: Data Engineering, Data Factory, Platform Operations - Classification: Internal - Next Review: 2026-07-13

📋 Copy Job - Continuous Ingestion with Change Data Capture¶

📑 Table of Contents¶

🎯 Overview¶

Where Copy Job Fits¶

Key Capabilities¶

Copy Job vs. Alternatives¶

🏗️ Architecture¶

Component Architecture¶

Execution Flow¶

Copy Job Lifecycle¶

⚙️ Configuration¶

Step 1: Create the Copy Job Item¶

Step 2: Configure the Source Connection¶

Step 3: Select and Map Tables¶

Step 4: Configure Incremental Column¶

Step 5: Configure Column Mapping¶

Step 6: Configure Schedule¶

Step 7: Initial Load¶

🔄 CDC Patterns¶

Pattern Comparison¶

Pattern 1: Timestamp-Based Watermark¶

Pattern 2: Integer-Based Watermark¶

Pattern 3: SQL Server Change Tracking¶

Upsert (Merge) Write Mode¶

📐 Source & Destination Mapping¶

Supported Sources¶

Destination Configuration¶

Type Mapping Reference¶

🎰 Casino Implementation¶

Casino Ingestion Architecture¶

Incremental Oracle Slot Logs (Every 15 Minutes)¶

Compliance Event Upsert¶

🏛️ Federal Agency Implementation¶

USDA -- Commodity Price Delta Loads¶

DOT/FAA -- Flight Status Delta Loads¶

Multi-Agency Copy Job Summary¶

📊 Monitoring and Troubleshooting¶

Execution History¶

Per-Table Run Details¶

Watermark State Inspection¶

Common Issues and Resolutions¶

⚡ Performance Optimization¶

Source Query Optimization¶

Copy Job Tuning¶

CU Consumption Estimates¶

⚠️ Limitations¶

Current Limitations (Preview)¶

Feature Roadmap¶

📚 References¶

🔗 Related Documents¶