Troubleshooting Pipeline Issues in Azure Synapse Analytics¶
Home > Troubleshooting > Pipeline Troubleshooting
This guide covers common pipeline issues in Azure Synapse Analytics, providing diagnostic approaches and solutions for data integration workflows, activity failures, and pipeline orchestration problems.
Common Pipeline Issue Categories¶
Pipeline issues in Azure Synapse Analytics typically fall into these categories:
- Connectivity Issues: Linked service connection failures and networking problems
- Activity Failures: Errors in specific pipeline activities like Copy, Mapping Data Flow, or custom activities
- Trigger Problems: Issues with scheduled, tumbling window, or event-based triggers
- Performance Bottlenecks: Slow-running pipelines and optimization challenges
- Integration Failures: Problems with external systems and services
- Monitoring and Debugging: Challenges with monitoring pipelines and troubleshooting failures
Connectivity Issues¶
Linked Service Connection Failures¶
Symptoms:
- "Connection timed out" or "Cannot connect to server" errors
- Authentication failures when accessing data sources
- Intermittent connection issues to specific services
Solutions:
- Verify connection string and configuration:
- Check linked service configuration for typos or incorrect parameters
- Test connection in the Synapse Studio UI
- Validate credentials, account names, and endpoint URLs
// Example: Azure SQL Database linked service configuration
{
"name": "AzureSqlDatabaseLinkedService",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": "Server=tcp:server.database.windows.net,1433;Database=mydb;User ID=admin;Password=xxxx;Encrypt=true;Connection Timeout=30"
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
}
}
- Check network access and firewall rules:
- Verify IP address restrictions and firewall settings
- Check private endpoint configurations if used
- Ensure that network security groups allow required traffic
Common ports required for different services:
| Service | Port | Protocol |
|---|---|---|
| Azure SQL | 1433 | TCP |
| Azure Storage | 443 | HTTPS |
| On-premises SQL Server | 1433 | TCP |
| REST API | 443 | HTTPS |
- Validate credentials and permissions:
- Check if service account or identity has proper permissions
- For managed identity, verify role assignments
- Test authentication independently with the same credentials
# PowerShell: Check managed identity role assignments
$workspace = Get-AzSynapseWorkspace -Name "workspace" -ResourceGroupName "resourcegroup"
Get-AzRoleAssignment -ObjectId $workspace.Identity.PrincipalId
Key Vault Integration Problems¶
Symptoms:
- "Access to Azure Key Vault is forbidden" errors
- Cannot retrieve secrets from Key Vault
- Credentials stored in Key Vault not resolving
Solutions:
- Check Key Vault access policies:
- Ensure Synapse managed identity has Get and List permissions for secrets
- Verify Key Vault firewall settings allow access from Synapse
# PowerShell: Grant Key Vault permissions to Synapse managed identity
$workspace = Get-AzSynapseWorkspace -Name "workspace" -ResourceGroupName "resourcegroup"
Set-AzKeyVaultAccessPolicy -VaultName "keyvault" -ObjectId $workspace.Identity.PrincipalId -PermissionsToSecrets Get,List
- Verify Key Vault linked service:
- Test the Key Vault linked service connection
- Check correct secret names and versions
- Ensure proper URL format for Key Vault
// Example: Azure Key Vault linked service
{
"name": "AzureKeyVaultLinkedService",
"properties": {
"type": "AzureKeyVault",
"typeProperties": {
"baseUrl": "https://keyvault.vault.azure.net/"
}
}
}
- Test secret retrieval manually:
- Use Azure Portal or PowerShell to test secret access
- Verify secret value and expiration
- Check for specific errors in the activity output
Integration Runtime Issues¶
Symptoms:
- "Integration runtime is not available" errors
- Self-hosted integration runtime connectivity problems
- Performance issues with specific integration runtimes
Solutions:
- Check integration runtime status:
- Verify Azure IR or self-hosted IR status in Synapse Studio
- Check for alerts or monitoring data indicating issues
-
Ensure sufficient capacity for workload
-
Troubleshoot self-hosted integration runtime:
- Check self-hosted IR logs in Event Viewer (Application and Services Logs > Microsoft > Integration Runtime)
- Verify outbound connectivity on port 443
- Check for machine resource constraints (CPU, memory)
# PowerShell: Restart self-hosted integration runtime service
Restart-Service -Name "DIAHostService"
- Configure high availability for critical workloads:
- Set up multiple nodes for self-hosted integration runtime
- Implement proper monitoring and alerting
- Consider auto-scaling for Azure integration runtime
Activity Failures¶
Copy Activity Issues¶
Symptoms:
- Copy activity fails with specific error messages
- Slow performance during data transfer
- Unexpected data transformation issues
Solutions:
- Analyze activity error details:
- Review the error message and stack trace in the monitoring view
- Check specific error codes and failure categories
-
Identify which phase of the copy activity failed (pre-copy, copy, post-copy)
-
Address common copy activity errors:
| Error | Common Cause | Solution |
|---|---|---|
| Credential issue | Invalid connection string or secret | Verify credentials and test connection |
| Source table not found | Invalid table name or permissions | Check source object existence and permissions |
| Column mapping error | Schema mismatch between source and sink | Review column mappings and data types |
| File format error | Incorrect format settings | Validate format settings match the actual data |
| Network error | Connectivity or firewall issues | Check network settings and firewall rules |
- Optimize copy performance:
- Use parallel copies and partitioning for large datasets
- Configure appropriate integration runtime
- Use staging for complex transformations
// Example: Copy activity with performance optimizations
{
"name": "OptimizedCopyActivity",
"type": "Copy",
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"sqlReaderQuery": "SELECT * FROM MyTable",
"partitionOption": "PhysicalPartitionsOfTable"
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
}
},
"enableStaging": true,
"stagingSettings": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage",
"type": "LinkedServiceReference"
},
"path": "staging"
},
"parallelCopies": 32,
"dataIntegrationUnits": 128
}
}
Mapping Data Flow Problems¶
Symptoms:
- Data flow fails during execution
- Unexpected transformations or data results
- Performance issues with complex transformations
Solutions:
- Debug with data flow monitoring:
- Use the data preview feature to verify transformations
- Enable debug mode for detailed inspection
-
Check row counts and data samples at each step
-
Address common data flow errors:
- Data type mismatches: Validate schema and use explicit casting
- Expression errors: Test expressions in the expression builder
- Memory issues: Optimize partitioning and enable debugging with optimized mode
// Example: Explicit data type handling in data flow expression
toInteger(trim(movieId))
// Handling null values
iifNull(rating, 0.0)
- Optimize data flow performance:
- Configure appropriate TTL for debug sessions
- Use partitioning strategies for large datasets
- Adjust optimization settings for performance
// Example: Data flow activity with optimization settings
{
"name": "DataFlowActivity",
"type": "ExecuteDataFlow",
"typeProperties": {
"dataFlow": {
"referenceName": "TransformMovieRatings",
"type": "DataFlowReference"
},
"compute": {
"coreCount": 32,
"computeType": "General"
},
"staging": {
"linkedService": {
"referenceName": "AzureBlobStorage",
"type": "LinkedServiceReference"
},
"folderPath": "staging/dataflow"
}
}
}
Spark Activity Issues¶
Symptoms:
- Spark notebook or job activities failing
- Long-running Spark activities timing out
- Resource constraints during execution
Solutions:
- Review Spark application logs:
- Check Spark driver and executor logs for errors
- Look for out-of-memory exceptions or task failures
-
Analyze Spark UI for performance bottlenecks
-
Address common Spark issues:
- Memory problems: Adjust executor and driver memory
- Job failures: Check for code errors or data issues
- Dependency issues: Verify required libraries and versions
// Example: Spark activity with custom configuration
{
"name": "SparkActivity",
"type": "SynapseNotebook",
"typeProperties": {
"notebook": {
"referenceName": "ProcessData",
"type": "NotebookReference"
},
"parameters": {
"date": "2023-04-01"
},
"conf": {
"spark.dynamicAllocation.enabled": "true",
"spark.dynamicAllocation.minExecutors": "2",
"spark.dynamicAllocation.maxExecutors": "10"
},
"numExecutors": 4
},
"linkedServiceName": {
"referenceName": "SynapseSparkPool",
"type": "LinkedServiceReference"
}
}
- Optimize Spark configuration:
- Configure appropriate Spark pool and size
- Use dynamic allocation for variable workloads
- Implement proper partitioning strategies
Trigger Problems¶
Schedule Trigger Issues¶
Symptoms:
- Pipeline not running at expected times
- Inconsistent schedule execution
- Missing pipeline runs
Solutions:
- Verify trigger definition:
- Check timezone configuration and DST handling
- Validate CRON expression for correctness
- Ensure pipeline reference is correct
// Example: Schedule trigger configuration
{
"name": "DailyTrigger",
"properties": {
"type": "ScheduleTrigger",
"typeProperties": {
"recurrence": {
"frequency": "Day",
"interval": 1,
"startTime": "2023-01-01T00:00:00Z",
"timeZone": "UTC",
"schedule": {
"hours": [1],
"minutes": [30]
}
}
},
"pipelines": [
{
"pipelineReference": {
"referenceName": "DailyProcessingPipeline",
"type": "PipelineReference"
},
"parameters": {
"WindowStart": "@trigger().scheduledTime",
"WindowEnd": "@trigger().scheduledTime"
}
}
]
}
}
- Check trigger activation status:
- Verify trigger is activated in Synapse Studio
- Look for overlapping schedules or conflicts
-
Check resource constraints that may delay execution
-
Monitor and analyze trigger history:
- Review trigger run history in monitoring view
- Check for failed trigger executions
- Analyze patterns in delayed or skipped executions
Tumbling Window Trigger Issues¶
Symptoms:
- Gaps in tumbling window execution
- Dependency issues between window runs
- Reprocessing or backfill problems
Solutions:
- Check window configuration:
- Verify window size and delay settings
- Check dependency settings for correctness
- Validate start and end times
// Example: Tumbling window trigger with dependencies
{
"name": "TumblingWindowTrigger",
"properties": {
"type": "TumblingWindowTrigger",
"typeProperties": {
"frequency": "Hour",
"interval": 1,
"startTime": "2023-01-01T00:00:00Z",
"delay": "00:10:00",
"maxConcurrency": 3,
"retryPolicy": {
"count": 3,
"intervalInSeconds": 30
},
"dependsOn": [
{
"type": "TumblingWindowTriggerDependencyReference",
"offset": "1",
"size": "1",
"referenceTrigger": {
"referenceName": "PreviousHourTrigger",
"type": "TriggerReference"
}
}
]
},
"pipeline": {
"pipelineReference": {
"referenceName": "HourlyProcessingPipeline",
"type": "PipelineReference"
},
"parameters": {
"WindowStart": "@trigger().outputs.windowStartTime",
"WindowEnd": "@trigger().outputs.windowEndTime"
}
}
}
}
- Troubleshoot dependency chains:
- Visualize dependency chains in monitoring view
- Check for circular dependencies
-
Verify parent trigger execution status
-
Implement proper error handling:
- Configure retry policies for transient failures
- Set up appropriate concurrency limits
- Use activity timeout settings strategically
Event Trigger Issues¶
Symptoms:
- Pipeline not triggered by storage events
- Delayed reaction to events
- Event trigger firing too often or for unexpected events
Solutions:
- Verify event source configuration:
- Check storage account and container names
- Validate event types and filters
- Ensure event grid subscription is active
// Example: Event trigger configuration
{
"name": "BlobEventTrigger",
"properties": {
"type": "BlobEventsTrigger",
"typeProperties": {
"blobPathBeginsWith": "/container/blobs/input/",
"blobPathEndsWith": ".csv",
"ignoreEmptyBlobs": true,
"scope": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resourcegroup/providers/Microsoft.Storage/storageAccounts/storageaccount",
"events": ["Microsoft.Storage.BlobCreated"]
},
"pipeline": {
"pipelineReference": {
"referenceName": "ProcessCSVPipeline",
"type": "PipelineReference"
},
"parameters": {
"blobPath": "@trigger().outputs.body.url"
}
}
}
}
- Test event generation manually:
- Upload test files to trigger events
- Use Storage Explorer to verify file paths
-
Check event delivery with Event Grid diagnostics
-
Monitor event processing:
- Set up diagnostic logs for event subscriptions
- Check for filtered or dropped events
- Verify event delivery latency
Performance Bottlenecks¶
Slow Pipeline Execution¶
Symptoms:
- Pipelines taking longer than expected
- Increasing execution times over time
- Specific activities causing delays
Solutions:
- Analyze pipeline monitoring data:
- Identify slow-running activities using the monitoring view
- Compare historical performance data
-
Look for patterns in performance degradation
-
Optimize activity configuration:
- For Copy activities, use parallel copies and staging
- For Data Flows, optimize partitioning and transformations
- For Lookups, limit result size and use caching
// Example: Optimized lookup activity
{
"name": "CachedLookup",
"type": "Lookup",
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"sqlReaderQuery": "SELECT TOP 100 * FROM ConfigTable",
"queryTimeout": "02:00:00",
"partitionOption": "None"
},
"dataset": {
"referenceName": "AzureSqlTable",
"type": "DatasetReference"
},
"firstRowOnly": false,
"cachingOptions": {
"enableCaching": true,
"cacheDuration": "06:00:00"
}
}
}
- Implement parallel processing:
- Use ForEach activities with batch size and parallel execution
- Implement proper dependency chains between activities
- Balance parallelism with available resources
// Example: Optimized ForEach activity
{
"name": "ParallelProcessing",
"type": "ForEach",
"typeProperties": {
"items": {
"value": "@activity('GetFileList').output.value",
"type": "Expression"
},
"batchCount": 10,
"isSequential": false,
"activities": [
{
"name": "ProcessFile",
"type": "Copy",
"...": "..."
}
]
}
}
Resource Constraints¶
Symptoms:
- "Resource limitation" errors
- Queue time increasing for pipeline runs
- Throttling errors from connected services
Solutions:
- Monitor resource utilization:
- Check integration runtime metrics
- Monitor Azure service quotas and limits
-
Analyze patterns in resource consumption
-
Optimize resource allocation:
- Scale up integration runtime for compute-intensive workloads
- Configure appropriate concurrency limits for triggers
-
Schedule pipelines to avoid peak times
-
Implement rate limiting and backoff strategies:
- Add wait activities between retries
- Implement exponential backoff for API calls
- Use circuit breaker patterns for unreliable services
// Example: Wait activity with exponential backoff
{
"name": "ExponentialBackoff",
"type": "Wait",
"typeProperties": {
"waitTimeInSeconds": {
"value": "@mul(power(2, activity('SetRetry').output.firstRow.RetryCount), 15)",
"type": "Expression"
}
},
"dependsOn": [
{
"activity": "SetRetry",
"dependencyConditions": ["Succeeded"]
}
]
}
Integration Failures¶
Error Handling in Pipelines¶
Symptoms:
- Failed pipelines without proper error information
- Cascading failures affecting multiple pipelines
- Inconsistent error handling across activities
Solutions:
- Implement comprehensive error handling:
- Use activity failure outputs in expressions
- Configure email notifications for failures
- Store error details in logging tables
// Example: Error handling with IfCondition
{
"name": "ErrorHandling",
"type": "IfCondition",
"typeProperties": {
"expression": {
"value": "@equals(activity('CopyData').output.executionDetails[0].status, 'Failed')",
"type": "Expression"
},
"ifTrueActivities": [
{
"name": "LogError",
"type": "WebActivity",
"typeProperties": {
"method": "POST",
"url": "https://prod-00.westus.logic.azure.com:443/...",
"body": {
"value": "{ \"pipelineName\": \"@{pipeline().Pipeline}\", \"error\": \"@{activity('CopyData').error.message}\" }",
"type": "Expression"
}
}
}
]
},
"dependsOn": [
{
"activity": "CopyData",
"dependencyConditions": ["Completed"]
}
]
}
- Set up retry policies:
- Configure appropriate retry counts and intervals
- Use different strategies for different failure types
- Implement circuit breaker pattern for external services
// Example: Activity with retry policy
{
"name": "CopyWithRetry",
"type": "Copy",
"typeProperties": {
"...": "..."
},
"policy": {
"retry": 3,
"retryIntervalInSeconds": 60,
"secureOutput": false,
"secureInput": false,
"timeout": "01:00:00"
}
}
- Create dedicated error handling pipelines:
- Implement reusable error handling patterns
- Centralize error logging and notification
- Set up automated recovery procedures
External Service Integration Problems¶
Symptoms:
- Failures when connecting to REST APIs
- Timeout errors with third-party services
- Inconsistent responses from external endpoints
Solutions:
- Analyze API errors:
- Check response status codes and bodies
- Validate request headers and authentication
-
Test API directly with tools like Postman
-
Implement robust Web activities:
- Handle authentication properly
- Parse and validate responses
- Configure appropriate timeouts
// Example: Web activity with authentication and error handling
{
"name": "CallRestAPI",
"type": "WebActivity",
"typeProperties": {
"method": "POST",
"url": "https://api.example.com/data",
"headers": {
"Content-Type": "application/json",
"Authorization": {
"value": "@concat('Bearer ', activity('GetToken').output.access_token)",
"type": "Expression"
}
},
"body": {
"value": "@{activity('PrepareRequest').output.value}",
"type": "Expression"
},
"authentication": {
"type": "MSI",
"resource": "https://api.example.com"
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
},
"policy": {
"timeout": "00:01:00",
"retry": 2,
"retryIntervalInSeconds": 30
}
}
- Implement circuit breaker patterns:
- Track failure rates for external services
- Implement fallback mechanisms
- Use exponential backoff for retries
Monitoring and Debugging¶
Pipeline Monitoring Challenges¶
Symptoms:
- Difficulty tracking pipeline execution
- Missing or incomplete monitoring data
- Challenges correlating related pipeline runs
Solutions:
- Set up comprehensive monitoring:
- Configure diagnostic settings to send logs to Log Analytics
- Create custom dashboards for pipeline monitoring
- Implement end-to-end tracing with correlation IDs
# PowerShell: Configure diagnostic settings for Synapse workspace
$workspace = Get-AzSynapseWorkspace -Name "workspace" -ResourceGroupName "resourcegroup"
$logAnalytics = Get-AzOperationalInsightsWorkspace -ResourceGroupName "resourcegroup" -Name "logworkspace"
Set-AzDiagnosticSetting -ResourceId $workspace.Id `
-Name "SynapseDiagnostics" `
-WorkspaceId $logAnalytics.ResourceId `
-Category @("IntegrationPipelineRuns", "IntegrationActivityRuns", "IntegrationTriggerRuns") `
-EnableLog $true
- Implement custom logging:
- Add logging activities to pipelines
- Store execution metadata in dedicated tables
- Implement custom metrics for business KPIs
// Example: Custom logging activity
{
"name": "LogPipelineExecution",
"type": "SqlServerStoredProcedure",
"typeProperties": {
"storedProcedureName": "[dbo].[LogPipelineExecution]",
"storedProcedureParameters": {
"PipelineName": {
"value": {
"value": "@pipeline().Pipeline",
"type": "Expression"
},
"type": "String"
},
"RunId": {
"value": {
"value": "@pipeline().RunId",
"type": "Expression"
},
"type": "String"
},
"StartTime": {
"value": {
"value": "@pipeline().TriggerTime",
"type": "Expression"
},
"type": "DateTime"
},
"Status": {
"value": "Succeeded",
"type": "String"
},
"Parameters": {
"value": {
"value": "@string(pipeline().parameters)",
"type": "Expression"
},
"type": "String"
}
}
},
"linkedServiceName": {
"referenceName": "AzureSqlDatabase",
"type": "LinkedServiceReference"
}
}
- Query and analyze pipeline logs:
-- Log Analytics query for pipeline performance analysis
SynapseIntegrationPipelineRuns
| where TimeGenerated > ago(7d)
| where Status == "Succeeded"
| summarize AvgDuration = avg(todouble(DurationInMs)/1000), MaxDuration = max(todouble(DurationInMs)/1000), RunCount = count() by PipelineName
| sort by AvgDuration desc
Debugging Complex Pipelines¶
Symptoms:
- Difficulty identifying root cause of failures
- Challenges with pipeline parameter passing
- Problems with expressions and dynamic content
Solutions:
- Use debug mode and data preview:
- Enable debug mode for data flows
- Test expressions with the expression builder
-
Add set variable activities to inspect values
-
Implement incremental testing strategy:
- Test individual activities first
- Build up to complete pipelines
-
Use test parameters and datasets
-
Debug dynamic content and expressions:
- Use set variable activities to capture expression results
- Output debug information to pipeline annotations
- Implement logging of dynamic content values
// Example: Debugging expressions with Set Variable
{
"name": "DebugExpression",
"type": "SetVariable",
"typeProperties": {
"variableName": "DebugOutput",
"value": {
"value": "@concat('WindowStart: ', pipeline().parameters.WindowStart, ', Files: ', string(activity('GetFileList').output.childItems))",
"type": "Expression"
}
},
"dependsOn": [
{
"activity": "GetFileList",
"dependencyConditions": ["Succeeded"]
}
]
}
Best Practices for Reliable Pipelines¶
- Design for resiliency:
- Implement comprehensive error handling
- Use idempotent operations where possible
-
Design for retry and recovery scenarios
-
Optimize performance:
- Use parallel processing for independent operations
- Implement appropriate batching strategies
-
Schedule pipelines to avoid resource contention
-
Monitor and maintain:
- Implement comprehensive logging and monitoring
- Set up alerts for critical failures
-
Regularly review and optimize pipeline performance
-
Implement proper testing:
- Create test environments with reduced data volumes
- Implement CI/CD for pipeline development
- Maintain test datasets for validation
Related Topics¶
- Pipeline Monitoring and Alerting
- Pipeline Performance Optimization
- Pipeline Security Best Practices
- DevOps Integration for Pipelines