Skip to content

Troubleshooting Pipeline Issues in Azure Synapse Analytics

Home > Troubleshooting > Pipeline Troubleshooting

This guide covers common pipeline issues in Azure Synapse Analytics, providing diagnostic approaches and solutions for data integration workflows, activity failures, and pipeline orchestration problems.

Common Pipeline Issue Categories

Pipeline issues in Azure Synapse Analytics typically fall into these categories:

  1. Connectivity Issues: Linked service connection failures and networking problems
  2. Activity Failures: Errors in specific pipeline activities like Copy, Mapping Data Flow, or custom activities
  3. Trigger Problems: Issues with scheduled, tumbling window, or event-based triggers
  4. Performance Bottlenecks: Slow-running pipelines and optimization challenges
  5. Integration Failures: Problems with external systems and services
  6. Monitoring and Debugging: Challenges with monitoring pipelines and troubleshooting failures

Connectivity Issues

Linked Service Connection Failures

Symptoms:

  • "Connection timed out" or "Cannot connect to server" errors
  • Authentication failures when accessing data sources
  • Intermittent connection issues to specific services

Solutions:

  1. Verify connection string and configuration:
  2. Check linked service configuration for typos or incorrect parameters
  3. Test connection in the Synapse Studio UI
  4. Validate credentials, account names, and endpoint URLs
// Example: Azure SQL Database linked service configuration
{
  "name": "AzureSqlDatabaseLinkedService",
  "properties": {
    "type": "AzureSqlDatabase",
    "typeProperties": {
      "connectionString": "Server=tcp:server.database.windows.net,1433;Database=mydb;User ID=admin;Password=xxxx;Encrypt=true;Connection Timeout=30"
    },
    "connectVia": {
      "referenceName": "AutoResolveIntegrationRuntime",
      "type": "IntegrationRuntimeReference"
    }
  }
}
  1. Check network access and firewall rules:
  2. Verify IP address restrictions and firewall settings
  3. Check private endpoint configurations if used
  4. Ensure that network security groups allow required traffic

Common ports required for different services:

Service Port Protocol
Azure SQL 1433 TCP
Azure Storage 443 HTTPS
On-premises SQL Server 1433 TCP
REST API 443 HTTPS
  1. Validate credentials and permissions:
  2. Check if service account or identity has proper permissions
  3. For managed identity, verify role assignments
  4. Test authentication independently with the same credentials
# PowerShell: Check managed identity role assignments
$workspace = Get-AzSynapseWorkspace -Name "workspace" -ResourceGroupName "resourcegroup"
Get-AzRoleAssignment -ObjectId $workspace.Identity.PrincipalId

Key Vault Integration Problems

Symptoms:

  • "Access to Azure Key Vault is forbidden" errors
  • Cannot retrieve secrets from Key Vault
  • Credentials stored in Key Vault not resolving

Solutions:

  1. Check Key Vault access policies:
  2. Ensure Synapse managed identity has Get and List permissions for secrets
  3. Verify Key Vault firewall settings allow access from Synapse
# PowerShell: Grant Key Vault permissions to Synapse managed identity
$workspace = Get-AzSynapseWorkspace -Name "workspace" -ResourceGroupName "resourcegroup"
Set-AzKeyVaultAccessPolicy -VaultName "keyvault" -ObjectId $workspace.Identity.PrincipalId -PermissionsToSecrets Get,List
  1. Verify Key Vault linked service:
  2. Test the Key Vault linked service connection
  3. Check correct secret names and versions
  4. Ensure proper URL format for Key Vault
// Example: Azure Key Vault linked service
{
  "name": "AzureKeyVaultLinkedService",
  "properties": {
    "type": "AzureKeyVault",
    "typeProperties": {
      "baseUrl": "https://keyvault.vault.azure.net/"
    }
  }
}
  1. Test secret retrieval manually:
  2. Use Azure Portal or PowerShell to test secret access
  3. Verify secret value and expiration
  4. Check for specific errors in the activity output

Integration Runtime Issues

Symptoms:

  • "Integration runtime is not available" errors
  • Self-hosted integration runtime connectivity problems
  • Performance issues with specific integration runtimes

Solutions:

  1. Check integration runtime status:
  2. Verify Azure IR or self-hosted IR status in Synapse Studio
  3. Check for alerts or monitoring data indicating issues
  4. Ensure sufficient capacity for workload

  5. Troubleshoot self-hosted integration runtime:

  6. Check self-hosted IR logs in Event Viewer (Application and Services Logs > Microsoft > Integration Runtime)
  7. Verify outbound connectivity on port 443
  8. Check for machine resource constraints (CPU, memory)
# PowerShell: Restart self-hosted integration runtime service
Restart-Service -Name "DIAHostService"
  1. Configure high availability for critical workloads:
  2. Set up multiple nodes for self-hosted integration runtime
  3. Implement proper monitoring and alerting
  4. Consider auto-scaling for Azure integration runtime

Activity Failures

Copy Activity Issues

Symptoms:

  • Copy activity fails with specific error messages
  • Slow performance during data transfer
  • Unexpected data transformation issues

Solutions:

  1. Analyze activity error details:
  2. Review the error message and stack trace in the monitoring view
  3. Check specific error codes and failure categories
  4. Identify which phase of the copy activity failed (pre-copy, copy, post-copy)

  5. Address common copy activity errors:

Error Common Cause Solution
Credential issue Invalid connection string or secret Verify credentials and test connection
Source table not found Invalid table name or permissions Check source object existence and permissions
Column mapping error Schema mismatch between source and sink Review column mappings and data types
File format error Incorrect format settings Validate format settings match the actual data
Network error Connectivity or firewall issues Check network settings and firewall rules
  1. Optimize copy performance:
  2. Use parallel copies and partitioning for large datasets
  3. Configure appropriate integration runtime
  4. Use staging for complex transformations
// Example: Copy activity with performance optimizations
{
  "name": "OptimizedCopyActivity",
  "type": "Copy",
  "typeProperties": {
    "source": {
      "type": "AzureSqlSource",
      "sqlReaderQuery": "SELECT * FROM MyTable",
      "partitionOption": "PhysicalPartitionsOfTable"
    },
    "sink": {
      "type": "DelimitedTextSink",
      "storeSettings": {
        "type": "AzureBlobFSWriteSettings"
      }
    },
    "enableStaging": true,
    "stagingSettings": {
      "linkedServiceName": {
        "referenceName": "AzureBlobStorage",
        "type": "LinkedServiceReference"
      },
      "path": "staging"
    },
    "parallelCopies": 32,
    "dataIntegrationUnits": 128
  }
}

Mapping Data Flow Problems

Symptoms:

  • Data flow fails during execution
  • Unexpected transformations or data results
  • Performance issues with complex transformations

Solutions:

  1. Debug with data flow monitoring:
  2. Use the data preview feature to verify transformations
  3. Enable debug mode for detailed inspection
  4. Check row counts and data samples at each step

  5. Address common data flow errors:

  6. Data type mismatches: Validate schema and use explicit casting
  7. Expression errors: Test expressions in the expression builder
  8. Memory issues: Optimize partitioning and enable debugging with optimized mode
// Example: Explicit data type handling in data flow expression
toInteger(trim(movieId))

// Handling null values
iifNull(rating, 0.0)
  1. Optimize data flow performance:
  2. Configure appropriate TTL for debug sessions
  3. Use partitioning strategies for large datasets
  4. Adjust optimization settings for performance
// Example: Data flow activity with optimization settings
{
  "name": "DataFlowActivity",
  "type": "ExecuteDataFlow",
  "typeProperties": {
    "dataFlow": {
      "referenceName": "TransformMovieRatings",
      "type": "DataFlowReference"
    },
    "compute": {
      "coreCount": 32,
      "computeType": "General"
    },
    "staging": {
      "linkedService": {
        "referenceName": "AzureBlobStorage",
        "type": "LinkedServiceReference"
      },
      "folderPath": "staging/dataflow"
    }
  }
}

Spark Activity Issues

Symptoms:

  • Spark notebook or job activities failing
  • Long-running Spark activities timing out
  • Resource constraints during execution

Solutions:

  1. Review Spark application logs:
  2. Check Spark driver and executor logs for errors
  3. Look for out-of-memory exceptions or task failures
  4. Analyze Spark UI for performance bottlenecks

  5. Address common Spark issues:

  6. Memory problems: Adjust executor and driver memory
  7. Job failures: Check for code errors or data issues
  8. Dependency issues: Verify required libraries and versions
// Example: Spark activity with custom configuration
{
  "name": "SparkActivity",
  "type": "SynapseNotebook",
  "typeProperties": {
    "notebook": {
      "referenceName": "ProcessData",
      "type": "NotebookReference"
    },
    "parameters": {
      "date": "2023-04-01"
    },
    "conf": {
      "spark.dynamicAllocation.enabled": "true",
      "spark.dynamicAllocation.minExecutors": "2",
      "spark.dynamicAllocation.maxExecutors": "10"
    },
    "numExecutors": 4
  },
  "linkedServiceName": {
    "referenceName": "SynapseSparkPool",
    "type": "LinkedServiceReference"
  }
}
  1. Optimize Spark configuration:
  2. Configure appropriate Spark pool and size
  3. Use dynamic allocation for variable workloads
  4. Implement proper partitioning strategies

Trigger Problems

Schedule Trigger Issues

Symptoms:

  • Pipeline not running at expected times
  • Inconsistent schedule execution
  • Missing pipeline runs

Solutions:

  1. Verify trigger definition:
  2. Check timezone configuration and DST handling
  3. Validate CRON expression for correctness
  4. Ensure pipeline reference is correct
// Example: Schedule trigger configuration
{
  "name": "DailyTrigger",
  "properties": {
    "type": "ScheduleTrigger",
    "typeProperties": {
      "recurrence": {
        "frequency": "Day",
        "interval": 1,
        "startTime": "2023-01-01T00:00:00Z",
        "timeZone": "UTC",
        "schedule": {
          "hours": [1],
          "minutes": [30]
        }
      }
    },
    "pipelines": [
      {
        "pipelineReference": {
          "referenceName": "DailyProcessingPipeline",
          "type": "PipelineReference"
        },
        "parameters": {
          "WindowStart": "@trigger().scheduledTime",
          "WindowEnd": "@trigger().scheduledTime"
        }
      }
    ]
  }
}
  1. Check trigger activation status:
  2. Verify trigger is activated in Synapse Studio
  3. Look for overlapping schedules or conflicts
  4. Check resource constraints that may delay execution

  5. Monitor and analyze trigger history:

  6. Review trigger run history in monitoring view
  7. Check for failed trigger executions
  8. Analyze patterns in delayed or skipped executions

Tumbling Window Trigger Issues

Symptoms:

  • Gaps in tumbling window execution
  • Dependency issues between window runs
  • Reprocessing or backfill problems

Solutions:

  1. Check window configuration:
  2. Verify window size and delay settings
  3. Check dependency settings for correctness
  4. Validate start and end times
// Example: Tumbling window trigger with dependencies
{
  "name": "TumblingWindowTrigger",
  "properties": {
    "type": "TumblingWindowTrigger",
    "typeProperties": {
      "frequency": "Hour",
      "interval": 1,
      "startTime": "2023-01-01T00:00:00Z",
      "delay": "00:10:00",
      "maxConcurrency": 3,
      "retryPolicy": {
        "count": 3,
        "intervalInSeconds": 30
      },
      "dependsOn": [
        {
          "type": "TumblingWindowTriggerDependencyReference",
          "offset": "1",
          "size": "1",
          "referenceTrigger": {
            "referenceName": "PreviousHourTrigger",
            "type": "TriggerReference"
          }
        }
      ]
    },
    "pipeline": {
      "pipelineReference": {
        "referenceName": "HourlyProcessingPipeline",
        "type": "PipelineReference"
      },
      "parameters": {
        "WindowStart": "@trigger().outputs.windowStartTime",
        "WindowEnd": "@trigger().outputs.windowEndTime"
      }
    }
  }
}
  1. Troubleshoot dependency chains:
  2. Visualize dependency chains in monitoring view
  3. Check for circular dependencies
  4. Verify parent trigger execution status

  5. Implement proper error handling:

  6. Configure retry policies for transient failures
  7. Set up appropriate concurrency limits
  8. Use activity timeout settings strategically

Event Trigger Issues

Symptoms:

  • Pipeline not triggered by storage events
  • Delayed reaction to events
  • Event trigger firing too often or for unexpected events

Solutions:

  1. Verify event source configuration:
  2. Check storage account and container names
  3. Validate event types and filters
  4. Ensure event grid subscription is active
// Example: Event trigger configuration
{
  "name": "BlobEventTrigger",
  "properties": {
    "type": "BlobEventsTrigger",
    "typeProperties": {
      "blobPathBeginsWith": "/container/blobs/input/",
      "blobPathEndsWith": ".csv",
      "ignoreEmptyBlobs": true,
      "scope": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resourcegroup/providers/Microsoft.Storage/storageAccounts/storageaccount",
      "events": ["Microsoft.Storage.BlobCreated"]
    },
    "pipeline": {
      "pipelineReference": {
        "referenceName": "ProcessCSVPipeline",
        "type": "PipelineReference"
      },
      "parameters": {
        "blobPath": "@trigger().outputs.body.url"
      }
    }
  }
}
  1. Test event generation manually:
  2. Upload test files to trigger events
  3. Use Storage Explorer to verify file paths
  4. Check event delivery with Event Grid diagnostics

  5. Monitor event processing:

  6. Set up diagnostic logs for event subscriptions
  7. Check for filtered or dropped events
  8. Verify event delivery latency

Performance Bottlenecks

Slow Pipeline Execution

Symptoms:

  • Pipelines taking longer than expected
  • Increasing execution times over time
  • Specific activities causing delays

Solutions:

  1. Analyze pipeline monitoring data:
  2. Identify slow-running activities using the monitoring view
  3. Compare historical performance data
  4. Look for patterns in performance degradation

  5. Optimize activity configuration:

  6. For Copy activities, use parallel copies and staging
  7. For Data Flows, optimize partitioning and transformations
  8. For Lookups, limit result size and use caching
// Example: Optimized lookup activity
{
  "name": "CachedLookup",
  "type": "Lookup",
  "typeProperties": {
    "source": {
      "type": "AzureSqlSource",
      "sqlReaderQuery": "SELECT TOP 100 * FROM ConfigTable",
      "queryTimeout": "02:00:00",
      "partitionOption": "None"
    },
    "dataset": {
      "referenceName": "AzureSqlTable",
      "type": "DatasetReference"
    },
    "firstRowOnly": false,
    "cachingOptions": {
      "enableCaching": true,
      "cacheDuration": "06:00:00"
    }
  }
}
  1. Implement parallel processing:
  2. Use ForEach activities with batch size and parallel execution
  3. Implement proper dependency chains between activities
  4. Balance parallelism with available resources
// Example: Optimized ForEach activity
{
  "name": "ParallelProcessing",
  "type": "ForEach",
  "typeProperties": {
    "items": {
      "value": "@activity('GetFileList').output.value",
      "type": "Expression"
    },
    "batchCount": 10,
    "isSequential": false,
    "activities": [
      {
        "name": "ProcessFile",
        "type": "Copy",
        "...": "..."
      }
    ]
  }
}

Resource Constraints

Symptoms:

  • "Resource limitation" errors
  • Queue time increasing for pipeline runs
  • Throttling errors from connected services

Solutions:

  1. Monitor resource utilization:
  2. Check integration runtime metrics
  3. Monitor Azure service quotas and limits
  4. Analyze patterns in resource consumption

  5. Optimize resource allocation:

  6. Scale up integration runtime for compute-intensive workloads
  7. Configure appropriate concurrency limits for triggers
  8. Schedule pipelines to avoid peak times

  9. Implement rate limiting and backoff strategies:

  10. Add wait activities between retries
  11. Implement exponential backoff for API calls
  12. Use circuit breaker patterns for unreliable services
// Example: Wait activity with exponential backoff
{
  "name": "ExponentialBackoff",
  "type": "Wait",
  "typeProperties": {
    "waitTimeInSeconds": {
      "value": "@mul(power(2, activity('SetRetry').output.firstRow.RetryCount), 15)",
      "type": "Expression"
    }
  },
  "dependsOn": [
    {
      "activity": "SetRetry",
      "dependencyConditions": ["Succeeded"]
    }
  ]
}

Integration Failures

Error Handling in Pipelines

Symptoms:

  • Failed pipelines without proper error information
  • Cascading failures affecting multiple pipelines
  • Inconsistent error handling across activities

Solutions:

  1. Implement comprehensive error handling:
  2. Use activity failure outputs in expressions
  3. Configure email notifications for failures
  4. Store error details in logging tables
// Example: Error handling with IfCondition
{
  "name": "ErrorHandling",
  "type": "IfCondition",
  "typeProperties": {
    "expression": {
      "value": "@equals(activity('CopyData').output.executionDetails[0].status, 'Failed')",
      "type": "Expression"
    },
    "ifTrueActivities": [
      {
        "name": "LogError",
        "type": "WebActivity",
        "typeProperties": {
          "method": "POST",
          "url": "https://prod-00.westus.logic.azure.com:443/...",
          "body": {
            "value": "{ \"pipelineName\": \"@{pipeline().Pipeline}\", \"error\": \"@{activity('CopyData').error.message}\" }",
            "type": "Expression"
          }
        }
      }
    ]
  },
  "dependsOn": [
    {
      "activity": "CopyData",
      "dependencyConditions": ["Completed"]
    }
  ]
}
  1. Set up retry policies:
  2. Configure appropriate retry counts and intervals
  3. Use different strategies for different failure types
  4. Implement circuit breaker pattern for external services
// Example: Activity with retry policy
{
  "name": "CopyWithRetry",
  "type": "Copy",
  "typeProperties": {
    "...": "..."
  },
  "policy": {
    "retry": 3,
    "retryIntervalInSeconds": 60,
    "secureOutput": false,
    "secureInput": false,
    "timeout": "01:00:00"
  }
}
  1. Create dedicated error handling pipelines:
  2. Implement reusable error handling patterns
  3. Centralize error logging and notification
  4. Set up automated recovery procedures

External Service Integration Problems

Symptoms:

  • Failures when connecting to REST APIs
  • Timeout errors with third-party services
  • Inconsistent responses from external endpoints

Solutions:

  1. Analyze API errors:
  2. Check response status codes and bodies
  3. Validate request headers and authentication
  4. Test API directly with tools like Postman

  5. Implement robust Web activities:

  6. Handle authentication properly
  7. Parse and validate responses
  8. Configure appropriate timeouts
// Example: Web activity with authentication and error handling
{
  "name": "CallRestAPI",
  "type": "WebActivity",
  "typeProperties": {
    "method": "POST",
    "url": "https://api.example.com/data",
    "headers": {
      "Content-Type": "application/json",
      "Authorization": {
        "value": "@concat('Bearer ', activity('GetToken').output.access_token)",
        "type": "Expression"
      }
    },
    "body": {
      "value": "@{activity('PrepareRequest').output.value}",
      "type": "Expression"
    },
    "authentication": {
      "type": "MSI",
      "resource": "https://api.example.com"
    },
    "connectVia": {
      "referenceName": "AutoResolveIntegrationRuntime",
      "type": "IntegrationRuntimeReference"
    }
  },
  "policy": {
    "timeout": "00:01:00",
    "retry": 2,
    "retryIntervalInSeconds": 30
  }
}
  1. Implement circuit breaker patterns:
  2. Track failure rates for external services
  3. Implement fallback mechanisms
  4. Use exponential backoff for retries

Monitoring and Debugging

Pipeline Monitoring Challenges

Symptoms:

  • Difficulty tracking pipeline execution
  • Missing or incomplete monitoring data
  • Challenges correlating related pipeline runs

Solutions:

  1. Set up comprehensive monitoring:
  2. Configure diagnostic settings to send logs to Log Analytics
  3. Create custom dashboards for pipeline monitoring
  4. Implement end-to-end tracing with correlation IDs
# PowerShell: Configure diagnostic settings for Synapse workspace
$workspace = Get-AzSynapseWorkspace -Name "workspace" -ResourceGroupName "resourcegroup"
$logAnalytics = Get-AzOperationalInsightsWorkspace -ResourceGroupName "resourcegroup" -Name "logworkspace"

Set-AzDiagnosticSetting -ResourceId $workspace.Id `
                       -Name "SynapseDiagnostics" `
                       -WorkspaceId $logAnalytics.ResourceId `
                       -Category @("IntegrationPipelineRuns", "IntegrationActivityRuns", "IntegrationTriggerRuns") `
                       -EnableLog $true
  1. Implement custom logging:
  2. Add logging activities to pipelines
  3. Store execution metadata in dedicated tables
  4. Implement custom metrics for business KPIs
// Example: Custom logging activity
{
  "name": "LogPipelineExecution",
  "type": "SqlServerStoredProcedure",
  "typeProperties": {
    "storedProcedureName": "[dbo].[LogPipelineExecution]",
    "storedProcedureParameters": {
      "PipelineName": {
        "value": {
          "value": "@pipeline().Pipeline",
          "type": "Expression"
        },
        "type": "String"
      },
      "RunId": {
        "value": {
          "value": "@pipeline().RunId",
          "type": "Expression"
        },
        "type": "String"
      },
      "StartTime": {
        "value": {
          "value": "@pipeline().TriggerTime",
          "type": "Expression"
        },
        "type": "DateTime"
      },
      "Status": {
        "value": "Succeeded",
        "type": "String"
      },
      "Parameters": {
        "value": {
          "value": "@string(pipeline().parameters)",
          "type": "Expression"
        },
        "type": "String"
      }
    }
  },
  "linkedServiceName": {
    "referenceName": "AzureSqlDatabase",
    "type": "LinkedServiceReference"
  }
}
  1. Query and analyze pipeline logs:
-- Log Analytics query for pipeline performance analysis
SynapseIntegrationPipelineRuns
| where TimeGenerated > ago(7d)
| where Status == "Succeeded"
| summarize AvgDuration = avg(todouble(DurationInMs)/1000), MaxDuration = max(todouble(DurationInMs)/1000), RunCount = count() by PipelineName
| sort by AvgDuration desc

Debugging Complex Pipelines

Symptoms:

  • Difficulty identifying root cause of failures
  • Challenges with pipeline parameter passing
  • Problems with expressions and dynamic content

Solutions:

  1. Use debug mode and data preview:
  2. Enable debug mode for data flows
  3. Test expressions with the expression builder
  4. Add set variable activities to inspect values

  5. Implement incremental testing strategy:

  6. Test individual activities first
  7. Build up to complete pipelines
  8. Use test parameters and datasets

  9. Debug dynamic content and expressions:

  10. Use set variable activities to capture expression results
  11. Output debug information to pipeline annotations
  12. Implement logging of dynamic content values
// Example: Debugging expressions with Set Variable
{
  "name": "DebugExpression",
  "type": "SetVariable",
  "typeProperties": {
    "variableName": "DebugOutput",
    "value": {
      "value": "@concat('WindowStart: ', pipeline().parameters.WindowStart, ', Files: ', string(activity('GetFileList').output.childItems))",
      "type": "Expression"
    }
  },
  "dependsOn": [
    {
      "activity": "GetFileList",
      "dependencyConditions": ["Succeeded"]
    }
  ]
}

Best Practices for Reliable Pipelines

  1. Design for resiliency:
  2. Implement comprehensive error handling
  3. Use idempotent operations where possible
  4. Design for retry and recovery scenarios

  5. Optimize performance:

  6. Use parallel processing for independent operations
  7. Implement appropriate batching strategies
  8. Schedule pipelines to avoid resource contention

  9. Monitor and maintain:

  10. Implement comprehensive logging and monitoring
  11. Set up alerts for critical failures
  12. Regularly review and optimize pipeline performance

  13. Implement proper testing:

  14. Create test environments with reduced data volumes
  15. Implement CI/CD for pipeline development
  16. Maintain test datasets for validation

External Resources