PowerCenter Migration Guide: PowerCenter to ADF + dbt¶
A comprehensive guide for migrating Informatica PowerCenter mappings, workflows, sessions, and transformations to Azure Data Factory and dbt.
Overview¶
PowerCenter is the workhorse of the Informatica portfolio. Enterprises typically have hundreds to thousands of mappings built over a decade or more, with complex interdependencies between workflows, sessions, and shared components. This guide provides a systematic approach to converting every PowerCenter concept to its Azure equivalent.
The target architecture is:
- dbt -- all transformations (SQL-first; replaces PowerCenter mappings)
- ADF -- orchestration and data movement (replaces PowerCenter workflows and sessions)
- ADF Mapping Data Flows -- visual transformations for analyst-facing flows (optional; for teams that need a GUI)
Architecture mapping¶
graph LR
subgraph "PowerCenter (Source)"
PC_Map[Mapping] --> PC_Sess[Session]
PC_Sess --> PC_WF[Workflow]
PC_Mapplet[Mapplet] --> PC_Map
PC_Param[Parameter File] --> PC_Sess
PC_Sched[Scheduler] --> PC_WF
PC_Conn[Connection] --> PC_Sess
end
subgraph "Azure (Target)"
dbt_Model[dbt Model] --> ADF_Act[ADF Activity]
ADF_Act --> ADF_Pipe[ADF Pipeline]
dbt_Macro[dbt Macro] --> dbt_Model
dbt_Var[dbt Vars / ADF Params] --> ADF_Act
ADF_Trig[ADF Trigger] --> ADF_Pipe
ADF_LS[Linked Service] --> ADF_Act
end
PC_Map -.->|converts to| dbt_Model
PC_Mapplet -.->|converts to| dbt_Macro
PC_Sess -.->|converts to| ADF_Act
PC_WF -.->|converts to| ADF_Pipe
PC_Param -.->|converts to| dbt_Var
PC_Sched -.->|converts to| ADF_Trig
PC_Conn -.->|converts to| ADF_LS Transformation-by-transformation mapping¶
This is the definitive reference for converting every PowerCenter transformation to its dbt or ADF equivalent.
Source/Target transformations¶
| PowerCenter transformation | Azure equivalent | Example |
|---|---|---|
| Source Qualifier | dbt source() reference + CTE | WITH src AS (SELECT * FROM {{ source('erp', 'orders') }} WHERE order_date > '2024-01-01') |
| Target | dbt materialization (table, view, incremental) | {{ config(materialized='table') }} |
| Flat File Source | ADF Copy Activity (Blob/ADLS source) | Copy from ADLS to staging table; dbt reads staging |
| XML Source | ADF Mapping Data Flow (XML connector) | Parse XML in MDF; output to staging table |
| XML Generator | ADF Mapping Data Flow (XML sink) | Generate XML in MDF sink |
Row-level transformations¶
| PowerCenter transformation | Azure equivalent | dbt SQL pattern |
|---|---|---|
| Expression | dbt SELECT projection | SELECT col_a, col_b * 1.1 AS col_b_adjusted, UPPER(col_c) AS col_c_clean FROM ... |
| Filter | dbt WHERE clause | SELECT * FROM {{ ref('stg_orders') }} WHERE status != 'cancelled' |
| Router | Multiple dbt models with WHERE | Model 1: WHERE region = 'east'; Model 2: WHERE region = 'west'; each references same staging model |
| Update Strategy | dbt incremental materialization | {{ config(materialized='incremental', unique_key='order_id') }} with {% if is_incremental() %} block |
| Data Masking | Azure SQL Dynamic Data Masking or dbt macro | {{ mask_pii(column_name) }} macro using CASE WHEN logic |
Set-level transformations¶
| PowerCenter transformation | Azure equivalent | dbt SQL pattern |
|---|---|---|
| Aggregator | dbt GROUP BY | SELECT customer_id, SUM(amount) AS total_amount, COUNT(*) AS order_count FROM ... GROUP BY customer_id |
| Sorter | dbt ORDER BY | Generally unnecessary in intermediate models; use in final mart if needed |
| Rank | SQL window function | ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn |
| Normalizer | SQL UNPIVOT or CROSS APPLY | SELECT id, month, value FROM src UNPIVOT (value FOR month IN (jan, feb, mar)) |
| Union | dbt UNION ALL | SELECT * FROM {{ ref('model_a') }} UNION ALL SELECT * FROM {{ ref('model_b') }} |
Lookup transformations¶
| PowerCenter transformation | Azure equivalent | dbt SQL pattern |
|---|---|---|
| Lookup (connected, equality) | dbt LEFT JOIN | SELECT o.*, c.customer_name FROM orders o LEFT JOIN customers c ON o.customer_id = c.customer_id |
| Lookup (connected, range) | dbt JOIN with range condition | ... ON o.order_date BETWEEN r.start_date AND r.end_date |
| Lookup (unconnected) | dbt macro returning scalar | See Unconnected Lookup Pattern below |
| Lookup (persistent cache) | dbt ephemeral model or CTE | Ephemeral models compute once per dbt run, similar to persistent cache |
| Joiner | dbt JOIN (any type) | Supports INNER, LEFT, RIGHT, FULL, CROSS; dbt handles dialect differences |
Key generation¶
| PowerCenter transformation | Azure equivalent | dbt SQL pattern |
|---|---|---|
| Sequence Generator | SQL ROW_NUMBER() | ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) + (SELECT MAX(sk) FROM target) |
| Surrogate key (hash) | dbt-utils generate_surrogate_key | {{ dbt_utils.generate_surrogate_key(['customer_id', 'product_id']) }} |
Advanced transformations¶
| PowerCenter transformation | Azure equivalent | Complexity | Notes |
|---|---|---|---|
| Stored Procedure | dbt pre/post-hook or run_operation | Low | {{ config(pre_hook="EXEC sp_cleanup") }} |
| SQL Transformation | dbt model (native) | Low | dbt IS SQL; simplest conversion |
| Transaction Control | ADF pipeline error handling | Medium | ADF handles commit/rollback at pipeline level |
| Custom Transformation (Java) | Azure Function or dbt Python model | High | Rewrite Java in Python or .NET; call from ADF |
| HTTP Transformation | ADF Web activity or REST connector | Medium | ADF REST connector handles auth, pagination |
| Java Transformation | Azure Function | High | Rewrite in Python/C#; expose as HTTP endpoint |
| External Procedure | Azure Function or dbt run_operation | Medium | Depends on what the procedure does |
SCD and history¶
| PowerCenter transformation | Azure equivalent | dbt SQL pattern |
|---|---|---|
| Slowly Changing Dimension Type 1 | dbt incremental with merge | {{ config(materialized='incremental', unique_key='customer_id', incremental_strategy='merge') }} |
| Slowly Changing Dimension Type 2 | dbt snapshot | {% snapshot customer_snapshot %} ... {% endsnapshot %} with check_cols or updated_at |
| SCD Type 3 | dbt model with previous-value columns | Custom SQL maintaining current_value, previous_value, change_date |
Unconnected lookup pattern¶
PowerCenter's unconnected lookup is a common pattern that has no single SQL equivalent. Here is the dbt macro pattern:
PowerCenter: :LKP.LKP_GET_CUSTOMER_NAME(customer_id)
dbt macro (macros/lkp_get_customer_name.sql):
{% macro lkp_get_customer_name(customer_id_column) %}
(
SELECT customer_name
FROM {{ ref('dim_customer') }}
WHERE customer_id = {{ customer_id_column }}
LIMIT 1
)
{% endmacro %}
Usage in dbt model:
SELECT
order_id,
customer_id,
{{ lkp_get_customer_name('customer_id') }} AS customer_name,
order_amount
FROM {{ ref('stg_orders') }}
Performance note: Correlated subqueries can be slow on large datasets. For production, prefer explicit JOINs. Use the macro pattern only when the lookup is optional or the cardinality is low.
Workflow to ADF pipeline mapping¶
Workflow components¶
| PowerCenter component | ADF equivalent | Notes |
|---|---|---|
| Workflow | Pipeline | Top-level orchestration unit |
| Session (mapping execution) | Activity (dbt run, Copy Data, Notebook) | One session = one or more activities |
| Worklet | Sub-pipeline (Execute Pipeline activity) | Nested pipeline for reuse |
| Decision (link condition) | If Condition activity / Switch activity | Conditional branching |
| Email task | Web activity -> Logic Apps | Logic Apps sends to Teams, email, Slack |
| Command task | Web activity or Azure Function activity | Shell command equivalent |
| Event Wait | Event trigger or Storage Events trigger | Wait for file arrival |
| Timer | Wait activity | Configurable delay |
| Assignment (variable) | Set Variable activity | Set pipeline variables |
| Pre-session SQL | dbt pre-hook | {{ config(pre_hook="TRUNCATE TABLE staging.target") }} |
| Post-session SQL | dbt post-hook | {{ config(post_hook="EXEC sp_update_watermark") }} |
| Session recovery | ADF rerun from failure | ADF tracks activity-level completion |
Scheduling migration¶
| PowerCenter scheduler | ADF trigger type | When to use |
|---|---|---|
| Time-based schedule | Schedule trigger | Fixed time (daily, hourly, custom cron) |
| Recurring schedule | Tumbling window trigger | Fixed-interval processing with backfill support |
| Event-based (file arrival) | Storage Events trigger / Custom Events trigger | React to new files in ADLS/Blob |
| On-demand | Manual trigger / REST API | Ad-hoc execution via Portal or API |
| Dependency-based | Pipeline dependency (After success/failure) | Chain pipelines with activity dependencies |
Connection mapping¶
| PowerCenter connection | ADF Linked Service | Integration Runtime |
|---|---|---|
| Oracle (on-prem) | Oracle connector | Self-Hosted IR |
| SQL Server (on-prem) | SQL Server connector | Self-Hosted IR |
| SQL Server (Azure) | Azure SQL Database connector | Azure IR (no IR needed) |
| DB2 (on-prem) | DB2 connector | Self-Hosted IR |
| Teradata | Teradata connector | Self-Hosted IR |
| SAP (ABAP, HANA) | SAP Table / SAP CDC / SAP HANA | Self-Hosted IR |
| Salesforce | Salesforce connector | Azure IR |
| Oracle (Azure) | Azure Database for PostgreSQL (migrated) or Oracle via IR | Depends on migration status |
| Flat file (local) | ADLS / Blob Storage (upload first) | Azure IR after upload |
| Flat file (FTP/SFTP) | FTP / SFTP connector | Azure IR or Self-Hosted IR |
| S3 | Amazon S3 connector | Azure IR |
| GCS | Google Cloud Storage connector | Azure IR |
| REST/SOAP API | REST connector / HTTP connector | Azure IR |
| ODBC (generic) | ODBC connector | Self-Hosted IR |
Worked example: complex PowerCenter mapping to dbt¶
PowerCenter mapping: m_CUSTOMER_DIMENSION¶
This mapping loads a customer dimension using SCD Type 2 with these transformations:
- Source Qualifier (SQ_CUSTOMER) -- reads from
CRM.CUSTOMERwith filter onMODIFIED_DATE - Expression (EXP_DERIVE) -- derives
FULL_NAME,EMAIL_DOMAIN,IS_ACTIVEflag - Lookup (LKP_REGION) -- connected lookup to
REF.REGIONonREGION_CODE - Lookup (LKP_EXISTING) -- connected lookup to
DW.DIM_CUSTOMERonCUSTOMER_ID(for SCD detection) - Router (RTR_SCD) -- routes to INSERT group (new customers) and UPDATE group (changed customers)
- Update Strategy (UPD_EXPIRE) -- marks existing records as expired (SCD Type 2 close)
- Target (DIM_CUSTOMER) -- inserts new/updated records
dbt equivalent¶
Step 1: Staging model (models/staging/crm/stg_crm__customers.sql):
WITH source AS (
SELECT *
FROM {{ source('crm', 'customer') }}
WHERE modified_date > COALESCE(
(SELECT MAX(dbt_updated_at) FROM {{ this }}),
'1900-01-01'
)
)
SELECT
customer_id,
first_name,
last_name,
CONCAT(first_name, ' ', last_name) AS full_name,
email,
SUBSTRING(email, CHARINDEX('@', email) + 1, LEN(email)) AS email_domain,
CASE WHEN status = 'A' THEN TRUE ELSE FALSE END AS is_active,
region_code,
modified_date
FROM source
Step 2: Intermediate model with lookup (models/intermediate/int_customers__enriched.sql):
SELECT
c.customer_id,
c.full_name,
c.email,
c.email_domain,
c.is_active,
r.region_name,
r.region_group,
c.modified_date
FROM {{ ref('stg_crm__customers') }} c
LEFT JOIN {{ ref('stg_ref__regions') }} r
ON c.region_code = r.region_code
Step 3: Snapshot for SCD Type 2 (snapshots/snap_dim_customer.sql):
{% snapshot snap_dim_customer %}
{{
config(
target_schema='dw',
unique_key='customer_id',
strategy='check',
check_cols=['full_name', 'email', 'is_active', 'region_name']
)
}}
SELECT * FROM {{ ref('int_customers__enriched') }}
{% endsnapshot %}
Step 4: Tests (models/staging/crm/_crm__models.yml):
version: 2
models:
- name: stg_crm__customers
columns:
- name: customer_id
tests:
- unique
- not_null
- name: email
tests:
- not_null
- name: is_active
tests:
- accepted_values:
values: [true, false]
Key differences from PowerCenter¶
| Aspect | PowerCenter | dbt |
|---|---|---|
| SCD Type 2 | Manual Router + Update Strategy | dbt snapshot (built-in) |
| Lookup | Separate transformation in mapping | SQL JOIN in model |
| Expression | Separate transformation | SQL in SELECT clause |
| Testing | Manual QA | dbt test (automated) |
| Deployment | Repository export | git push + CI/CD |
| Documentation | Separate wiki | Auto-generated from YAML |
Migration execution process¶
Step 1: Inventory and categorize (Week 1-2)¶
Export the PowerCenter repository metadata:
-- Query PowerCenter repository for mapping inventory
SELECT
m.MAPPING_NAME,
m.DESCRIPTION,
COUNT(DISTINCT t.TRANSFORMATION_NAME) AS transformation_count,
w.WORKFLOW_NAME,
s.SESSION_NAME
FROM REP_MAPPINGS m
JOIN REP_WIDGET_INST t ON m.MAPPING_ID = t.MAPPING_ID
LEFT JOIN REP_SESS_WIDGET_INST sw ON m.MAPPING_ID = sw.MAPPING_ID
LEFT JOIN REP_SESSIONS s ON sw.SESSION_ID = s.SESSION_ID
LEFT JOIN REP_WORKFLOWS w ON s.WORKFLOW_ID = w.WORKFLOW_ID
GROUP BY m.MAPPING_NAME, m.DESCRIPTION, w.WORKFLOW_NAME, s.SESSION_NAME
ORDER BY transformation_count DESC;
Categorize each mapping:
| Tier | Transformation count | Estimated effort | Action |
|---|---|---|---|
| A (Simple) | 1-5 transformations | 1-2 days | Direct convert to dbt model |
| B (Medium) | 6-15 transformations | 3-5 days | Decompose into multiple dbt models |
| C (Complex) | 16+ transformations | 5-15 days | Re-architect; may need multiple models + macros |
| D (Decommission) | Any | 0 | Archive metadata; do not migrate |
Step 2: Set up dbt project (Week 2-3)¶
dbt_project/
├── dbt_project.yml
├── models/
│ ├── staging/ # 1:1 with source tables
│ │ ├── crm/
│ │ ├── erp/
│ │ └── ref/
│ ├── intermediate/ # Business logic (joins, enrichment)
│ └── marts/ # Final dimensional models
├── macros/ # Reusable logic (replaces mapplets)
├── snapshots/ # SCD Type 2 (replaces SCD mappings)
├── seeds/ # Reference data (replaces flat file lookups)
└── tests/ # Custom data tests (replaces IDQ rules)
Step 3: Convert mappings by wave (Weeks 3-36)¶
For each wave (business domain):
- Convert Source Qualifiers to dbt
source()definitions - Convert simple transformations (Expression, Filter, Lookup) to dbt SQL
- Convert complex transformations (Router, SCD) using patterns above
- Convert workflows to ADF pipelines
- Implement dbt tests for every critical data quality assertion
- Run parallel with PowerCenter for 14+ days
- Cut over downstream consumers
Step 4: Validate and decommission (per wave)¶
- Daily reconciliation: row counts + key metric values (sum of amounts, distinct counts)
- Downstream consumer repoint
- PowerCenter workflows paused (not deleted until 30-day stable period)
- After 30 days stable, archive PowerCenter metadata to Purview and decommission
Common conversion challenges¶
Challenge 1: Mappings with 50+ transformations¶
Problem: PowerCenter allows arbitrarily complex mappings with dozens of transformations chained together.
Solution: Decompose into multiple dbt models following the staging/intermediate/marts pattern. Each model should do one thing. A 50-transformation mapping typically becomes 5-10 dbt models.
Challenge 2: Reusable mapplets used across 100+ mappings¶
Problem: A shared mapplet change in PowerCenter affects all mappings instantly. In dbt, shared logic lives in macros, and changes require re-running downstream models.
Solution: Convert mapplets to dbt macros. Use dbt run --select tag:uses_customer_macro to re-run affected models after macro changes. dbt's ref() graph ensures correct execution order.
Challenge 3: Session-level overrides¶
Problem: PowerCenter sessions can override mapping-level properties (target table name, connection, SQL override). Different sessions using the same mapping may produce different results.
Solution: Use dbt's {{ config() }} block with environment-specific variables. Or create separate dbt models per session variant if the logic diverges significantly.
Challenge 4: Pre/post-session SQL¶
Problem: Sessions often run SQL before (truncate target, disable indexes) and after (rebuild indexes, update watermarks) the mapping.
Solution: Use dbt pre-hooks and post-hooks:
{{ config(
materialized='table',
pre_hook="TRUNCATE TABLE {{ this }}",
post_hook="UPDATE watermarks SET last_run = GETDATE() WHERE table_name = '{{ this.name }}'"
) }}
Challenge 5: High-volume data movement¶
Problem: Some PowerCenter sessions move large data volumes (100M+ rows) with optimized bulk load.
Solution: Use ADF Copy Activity for data movement (it is optimized for bulk). dbt handles transformations after data lands. The pattern is: ADF copies raw data to staging, dbt transforms staging to marts.
Related resources¶
- Tutorial: Mapping to dbt -- Hands-on walkthrough
- Tutorial: Workflow to ADF -- Hands-on walkthrough
- Complete Feature Mapping -- All features mapped
- IICS Migration Guide -- For cloud-based Informatica
- Best Practices -- Migration execution guidance
- Migration Playbook -- End-to-end migration guide
Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team