Skip to content

Guided Troubleshooting

🏠 Home > 🔧 Troubleshooting > 🧭 Guided Troubleshooting

🧭 Interactive Diagnostic Guide Follow this step-by-step decision tree to diagnose and resolve common Azure Synapse Analytics issues quickly and effectively.


🎯 Quick Issue Identification

Start by identifying your issue category:

Issue Type Symptoms Quick Link
🔑 Authentication Access denied, login failures, permission errors Authentication Issues
🌐 Connectivity Connection timeouts, network errors, VNet issues Connectivity Issues
Performance Slow queries, high latency, resource bottlenecks Performance Issues
🏞️ Delta Lake Table corruption, versioning issues, ACID failures Delta Lake Issues
🔄 Pipelines Pipeline failures, activity errors, orchestration issues Pipeline Issues
☁️ Serverless SQL Query failures, data access issues, quota exceeded Serverless SQL Issues
🔥 Spark Job failures, executor errors, memory issues Spark Issues

🔑 Authentication Issues

Decision Tree

graph TD
    A[Authentication Error] --> B{Error Type?}
    B -->|Access Denied| C{Using MSI?}
    B -->|Login Failed| D{Authentication Method?}
    B -->|Permission Error| E{Resource Type?}

    C -->|Yes| F[Check MSI Permissions]
    C -->|No| G[Check Service Principal]

    D -->|Azure AD| H[Verify Azure AD Setup]
    D -->|SQL Auth| I[Check SQL Credentials]
    D -->|Key/Token| J[Validate API Keys]

    E -->|Storage| K[Check RBAC Roles]
    E -->|Synapse| L[Check Synapse RBAC]
    E -->|Database| M[Check Database Permissions]

    F --> N[Solution: Grant Required Roles]
    G --> O[Solution: Update SP Credentials]
    H --> P[Solution: Azure AD Configuration]
    I --> Q[Solution: Reset SQL Password]
    J --> R[Solution: Regenerate Keys]
    K --> S[Solution: Assign Storage Roles]
    L --> T[Solution: Assign Synapse Roles]
    M --> U[Solution: Grant DB Permissions]

Diagnostic Steps

Step 1: Identify Authentication Method

Question: What authentication method are you using?

  • Azure AD (Recommended) → Go to Step 2A
  • Managed Identity → Go to Step 2B
  • Service Principal → Go to Step 2C
  • SQL Authentication → Go to Step 2D

Step 2A: Azure AD Authentication

Symptoms: AADSTS* error codes, token expiration, consent errors

Diagnostic Actions:

  1. Verify Azure AD registration:
az ad app show --id <application-id>
  1. Check token validity:
az account get-access-token --resource https://database.windows.net/
  1. Verify user permissions:
az role assignment list --assignee <user-id> --scope <resource-id>

Solutions:

  • Error AADSTS50105 (User not assigned): Assign user to application
  • Error AADSTS50126 (Invalid credentials): Reset password or credentials
  • Error AADSTS65001 (Consent required): Grant admin consent

📚 Related: Authentication Troubleshooting

Step 2B: Managed Identity

Symptoms: Access denied with MSI, identity not found

Diagnostic Actions:

  1. Verify MSI is enabled:
az synapse workspace show --name <workspace-name> --resource-group <rg-name> --query identity
  1. Check MSI role assignments:
az role assignment list --assignee <principal-id>

Solutions:

  • Enable managed identity on Synapse workspace
  • Assign required roles: Storage Blob Data Contributor, Synapse Contributor
  • Wait up to 5 minutes for role propagation

📚 Related: Security Best Practices

Step 2C: Service Principal

Symptoms: Client credential flow errors, secret expiration

Diagnostic Actions:

  1. Verify service principal exists:
az ad sp show --id <app-id>
  1. Check secret expiration:
az ad app credential list --id <app-id>

Solutions:

  • Regenerate expired secrets
  • Update application credentials in Key Vault
  • Verify client ID and tenant ID are correct

📚 Related: Reference Security

Step 2D: SQL Authentication

Symptoms: Login failed for user, password policy errors

Diagnostic Actions:

  1. Test SQL connection:
SELECT SYSTEM_USER, ORIGINAL_LOGIN();
  1. Check login exists:
SELECT name, type_desc FROM sys.server_principals WHERE name = '<username>';

Solutions:

  • Reset SQL admin password via Azure Portal
  • Ensure password meets complexity requirements
  • Check if account is locked or disabled

📚 Related: SQL Performance


🌐 Connectivity Issues

Decision Tree

graph TD
    A[Connection Error] --> B{Error Message?}
    B -->|Timeout| C{Public/Private Access?}
    B -->|Connection Refused| D{Firewall Rules?}
    B -->|Host Not Found| E{DNS Resolution?}

    C -->|Public| F[Check Public Access]
    C -->|Private| G[Check Private Endpoint]

    D -->|Configured| H[Verify IP Whitelisted]
    D -->|Not Configured| I[Add Firewall Rule]

    E -->|Working| J[Check Network Path]
    E -->|Failing| K[Check DNS Config]

    F --> L[Solution: Enable Public Access]
    G --> M[Solution: Configure VNet]
    H --> N[Solution: Update Firewall]
    I --> O[Solution: Create Rule]
    J --> P[Solution: Check NSG/Route]
    K --> Q[Solution: Fix DNS]

Diagnostic Steps

Step 1: Test Basic Connectivity

Question: Can you reach the endpoint?

# Test DNS resolution
nslookup <workspace-name>.sql.azuresynapse.net

# Test port connectivity
Test-NetConnection -ComputerName <workspace-name>.sql.azuresynapse.net -Port 1433

# Test HTTPS endpoint
curl https://<workspace-name>.dev.azuresynapse.net

Results:

DNS Issues

Diagnostic Actions:

  1. Verify DNS settings:
ipconfig /all
  1. Check private DNS zone:
az network private-dns zone show --name privatelink.sql.azuresynapse.net --resource-group <rg-name>
  1. Verify DNS record:
az network private-dns record-set a list --zone-name privatelink.sql.azuresynapse.net --resource-group <rg-name>

Solutions:

  • Create private DNS zone if using Private Link
  • Link DNS zone to VNet
  • Flush DNS cache: ipconfig /flushdns

📚 Related: Connectivity Troubleshooting

Firewall Issues

Diagnostic Actions:

  1. Check workspace firewall rules:
az synapse workspace firewall-rule list --workspace-name <workspace-name> --resource-group <rg-name>
  1. Verify client IP:
curl ifconfig.me

Solutions:

  • Add client IP to firewall rules
  • Enable "Allow Azure services and resources to access this workspace"
  • For private endpoints, disable public network access

📚 Related: Network Security

Certificate Issues

Symptoms: SSL/TLS errors, certificate validation failures

Solutions:

  • Update root certificates on client machine
  • Add connection string parameter: TrustServerCertificate=True (development only)
  • Verify server certificate is valid

⚡ Performance Issues

Decision Tree

graph TD
    A[Slow Performance] --> B{Component?}
    B -->|Query| C{Query Type?}
    B -->|Pipeline| D{Pipeline Stage?}
    B -->|Spark Job| E{Job Phase?}

    C -->|SELECT| F[Check Query Plan]
    C -->|INSERT/UPDATE| G[Check Table Design]
    C -->|JOIN| H[Check Statistics]

    D -->|Copy Activity| I[Check Throughput]
    D -->|Data Flow| J[Check Partitioning]
    D -->|Lookup| K[Check Query Opt]

    E -->|Stage Time| L[Check Shuffle]
    E -->|Task Skew| M[Check Partitioning]
    E -->|Memory| N[Check Config]

    F --> O[Solution: Optimize Query]
    G --> P[Solution: Add Indexes]
    H --> Q[Solution: Update Stats]
    I --> R[Solution: Increase DIUs]
    J --> S[Solution: Repartition]
    K --> T[Solution: Use Parameters]
    L --> U[Solution: Broadcast Join]
    M --> V[Solution: Repartition Data]
    N --> W[Solution: Tune Memory]

Diagnostic Steps

Step 1: Identify Performance Bottleneck

Question: What type of operation is slow?

SQL Performance Diagnostics

Collect Query Metrics:

-- Get query execution plan
SET STATISTICS PROFILE ON;
<your-query>
SET STATISTICS PROFILE OFF;

-- Check query statistics
SELECT
    query_id,
    start_time,
    end_time,
    total_elapsed_time_ms,
    row_count
FROM sys.dm_exec_requests
WHERE session_id = @@SPID;

-- Identify expensive operations
SELECT TOP 10
    text,
    total_worker_time/execution_count AS avg_cpu_time,
    total_elapsed_time/execution_count AS avg_elapsed_time
FROM sys.dm_exec_query_stats
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
ORDER BY avg_elapsed_time DESC;

Common Issues:

Symptom Likely Cause Solution
Full table scan Missing statistics Update statistics
High CPU time Complex joins Simplify query, add indexes
Memory spills Insufficient memory grant Increase DWU, optimize query
Lock waits Concurrent operations Use snapshot isolation

📚 Related: SQL Performance Best Practices

Spark Performance Diagnostics

Collect Job Metrics:

  1. Access Spark UI: https://<workspace-name>.dev.azuresynapse.net/sparkui
  2. Review job timeline and stages
  3. Check executor metrics

Common Issues:

Symptom Likely Cause Solution
Long stage time Data shuffle Use broadcast join
Task skew Uneven partitioning Repartition data
OOM errors Insufficient memory Increase executor memory
Spill to disk Memory pressure Optimize transformations

📚 Related: Spark Performance Best Practices

Pipeline Performance Diagnostics

Check Pipeline Metrics:

  1. Navigate to Monitor → Pipeline runs
  2. Review activity durations
  3. Check data integration units (DIUs)

Common Issues:

Symptom Likely Cause Solution
Slow copy activity Low DIUs Increase DIU count
Mapping data flow slow Small cluster Scale up integration runtime
Lookup timeout Large result set Use query optimization

📚 Related: Pipeline Optimization


🏞️ Delta Lake Issues

Common Scenarios

Scenario 1: Table Not Found

Symptoms: Table or view not found errors

Diagnostic Steps:

# Check if table exists in metastore
spark.sql("SHOW TABLES IN <database>").display()

# Check Delta Lake path
dbutils.fs.ls("/mnt/delta/<table-path>")

# Verify Delta log
dbutils.fs.ls("/mnt/delta/<table-path>/_delta_log")

Solutions:

  • Refresh table metadata: REFRESH TABLE <table_name>
  • Repair table: MSCK REPAIR TABLE <table_name>
  • Recreate table if corrupted

📚 Related: Delta Lake Troubleshooting

Scenario 2: Concurrent Write Conflicts

Symptoms: ConcurrentAppendException, ConcurrentDeleteException

Solutions:

  • Implement optimistic concurrency control
  • Use merge operations instead of separate update/insert
  • Enable auto-optimize for Delta tables
# Enable optimized writes
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true")

# Use merge for upserts
deltaTable.alias("target").merge(
    source.alias("source"),
    "target.id = source.id"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

Scenario 3: Performance Degradation

Symptoms: Slow reads, increasing query time

Diagnostic Steps:

# Check table statistics
spark.sql("DESCRIBE DETAIL <table_name>").display()

# Check file sizes
spark.sql("DESCRIBE DETAIL <table_name>").select("numFiles", "sizeInBytes").display()

Solutions:

  • Run OPTIMIZE: OPTIMIZE <table_name>
  • Z-ORDER on frequently filtered columns: OPTIMIZE <table_name> ZORDER BY (column1, column2)
  • Vacuum old files: VACUUM <table_name> RETAIN 168 HOURS

📚 Related: Delta Lake Optimization


🔄 Pipeline Issues

Common Failure Patterns

Pattern 1: Copy Activity Failures

Diagnostic Questions:

  1. What's the error message?
  2. Is it source or sink failure?
  3. Is it data-related or connectivity-related?

Common Errors:

Error Code Description Solution
UserErrorFailedToReadFromSqlSource SQL query error Validate SQL syntax
UserErrorInvalidColumnMapping Schema mismatch Update column mappings
UserErrorSinkPathNotFound Destination path missing Create target container
UserErrorQuotaExceeded DIU limit reached Request quota increase

Pattern 2: Data Flow Failures

Diagnostic Steps:

  1. Enable debug mode in Data Flow
  2. Review data preview at each transformation
  3. Check cluster logs

Common Issues:

  • Schema drift not handled → Enable "Allow schema drift"
  • Null values causing errors → Add derived column with null handling
  • Memory errors → Increase cluster size or optimize transformations

📚 Related: Pipeline Troubleshooting


☁️ Serverless SQL Issues

Common Problems

Problem 1: File Format Errors

Error: Failed to read parquet file or CSV parsing error

Solutions:

-- Test file format with OPENROWSET
SELECT TOP 100 *
FROM OPENROWSET(
    BULK 'https://<storage>.dfs.core.windows.net/container/file.parquet',
    FORMAT = 'PARQUET'
) AS [result];

-- Handle CSV with custom settings
SELECT *
FROM OPENROWSET(
    BULK 'https://<storage>.dfs.core.windows.net/container/file.csv',
    FORMAT = 'CSV',
    PARSER_VERSION = '2.0',
    FIRSTROW = 2,
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n'
) AS [result];

Problem 2: Performance Issues

Symptoms: Slow queries on data lake files

Solutions:

  • Create external tables instead of OPENROWSET
  • Use partitioned data sources
  • Create statistics on external tables
  • Use CETAS to cache results

📚 Related: Serverless SQL Troubleshooting


🔥 Spark Issues

Common Job Failures

Failure Type 1: Out of Memory

Symptoms: java.lang.OutOfMemoryError, executor failures

Diagnostic Steps:

# Check executor memory configuration
spark.conf.get("spark.executor.memory")
spark.conf.get("spark.driver.memory")

# Monitor memory usage
spark.sparkContext.statusTracker().getExecutorInfos()

Solutions:

  • Increase executor memory: spark.conf.set("spark.executor.memory", "8g")
  • Increase driver memory: spark.conf.set("spark.driver.memory", "4g")
  • Optimize data structures (use arrays instead of lists)
  • Process data in smaller batches

Failure Type 2: Shuffle Failures

Symptoms: Fetch failed, shuffle read errors

Solutions:

  • Increase shuffle partitions: spark.conf.set("spark.sql.shuffle.partitions", "200")
  • Enable adaptive query execution: spark.conf.set("spark.sql.adaptive.enabled", "true")
  • Use broadcast joins for small tables

📚 Related: Spark Troubleshooting


🆘 Emergency Checklist

When all else fails, work through this checklist:

  • Check Azure Service Health
  • Review Azure Monitor alerts
  • Check recent changes in deployment history
  • Verify quota limits haven't been reached
  • Review activity logs for error patterns
  • Test from different network location
  • Compare working vs. non-working configuration
  • Create minimal reproduction case
  • Gather diagnostic information
  • Open Azure support ticket

📚 Additional Resources

Resource Description Link
🔧 Troubleshooting Hub All troubleshooting guides Troubleshooting
📖 Best Practices Prevention strategies Best Practices
📊 Monitoring Setup Proactive monitoring Monitoring
💬 FAQ Common questions FAQ

💡 Pro Tip: Keep a troubleshooting journal documenting issues and resolutions. Patterns often emerge that help prevent future problems.

Last Updated: January 2025