Skip to content

Troubleshooting Serverless SQL Pool in Azure Synapse Analytics

Home > Troubleshooting > Serverless SQL Troubleshooting

This guide provides solutions for common issues encountered when working with Serverless SQL Pools in Azure Synapse Analytics, including query performance problems, error patterns, and optimization techniques.

Common Serverless SQL Issues

When working with Serverless SQL Pools, these are the most common categories of issues:

  1. Query Performance Issues: Slow query execution, timeout errors
  2. Data Format Problems: Parsing errors, schema inference issues
  3. Resource Limitations: Query timeouts, memory constraints
  4. File Access Issues: Permission problems, file not found errors
  5. Metadata Challenges: Statistics issues, partitioning problems

Query Performance Issues

Slow Query Execution

Symptoms:

  • Queries taking longer than expected

  • Timeouts during query execution

  • Performance degradation compared to previous runs

Solutions:

  1. Optimize file format and compression:
  2. Use columnar formats like Parquet or ORC
  3. Use appropriate compression (Snappy for performance, Gzip for storage)
   -- Convert CSV to Parquet for better performance
   CREATE EXTERNAL TABLE [ParquetTable]
   WITH (
       LOCATION = 'abfss://container@account.dfs.core.windows.net/path/to/folder/',
       DATA_SOURCE = [DataSource],
       FILE_FORMAT = [ParquetFileFormat]
   )
   AS SELECT * FROM [CsvTable];
  1. Use partitioning effectively:
  2. Query only needed partitions
  3. Implement partition pruning in queries
   -- Using partition pruning
   SELECT *
   FROM [dbo].[PartitionedTable]
   WHERE Year = 2023 AND Month = 8;
  1. Optimize predicate pushdown:
  2. Structure queries to push filters to storage layer
  3. Use WHERE clauses that can be pushed down

  4. Check execution plans:

  5. Use EXPLAIN to understand query execution
  6. Look for full scans or inefficient operations
   EXPLAIN
   SELECT *
   FROM [dbo].[LargeTable]
   WHERE [Column1] = 'Value';

Query Timeout Errors

Symptoms:

  • Error messages about query execution timeout

  • Queries failing after running for several minutes

  • Consistent failures with large datasets

Solutions:

  1. Break down complex queries:
  2. Split into smaller, manageable queries
  3. Use temporary results or materialized views

  4. Increase timeout settings (for client tools):

  5. Adjust connection timeout in SQL clients
  6. Set command timeout in applications

  7. Optimize join operations:

  8. Ensure smaller tables are on the right side of joins
  9. Use appropriate join types (hash joins for large tables)
  10. Consider denormalizing data where appropriate

  11. Implement query hints:

  12. Use OPTION hints to guide query optimizer
  13. Apply ORDER hints for join operations
   SELECT t1.*, t2.*
   FROM [LargeTable] AS t1
   JOIN [SmallTable] AS t2
   ON t1.key = t2.key
   OPTION(HASH JOIN);

Data Format Problems

CSV Parsing Errors

Symptoms:

  • Error messages about malformed CSV records

  • Unexpected NULL values in query results

  • Data type conversion errors

Solutions:

  1. Adjust CSV parsing options:
   -- Specify CSV format options
   CREATE EXTERNAL FILE FORMAT [CustomCsvFormat]
   WITH (
       FORMAT_TYPE = DELIMITEDTEXT,
       FORMAT_OPTIONS (
           FIELD_TERMINATOR = ',',
           STRING_DELIMITER = '"',
           FIRST_ROW = 2,
           USE_TYPE_DEFAULT = TRUE,
           ENCODING = 'UTF8'
       )
   );
  1. Pre-validate CSV data:
  2. Use validation queries to identify problematic rows
  3. Fix source data or handle exceptions
   -- Find problematic rows
   SELECT
       *,
       LEN([Column]) AS [Length],
       CHARINDEX(',', [RawColumn]) AS [CommaPosition]
   FROM [CsvTable]
   WHERE TRY_CAST([NumericColumn] AS DECIMAL(18,2)) IS NULL
   AND [NumericColumn] IS NOT NULL;
  1. Use explicit schema definition:
  2. Define column types explicitly instead of relying on inference
  3. Use OPENROWSET with explicit schema
   SELECT *
   FROM OPENROWSET(
       BULK 'abfss://container@account.dfs.core.windows.net/path/file.csv',
       FORMAT = 'CSV',
       PARSER_VERSION = '2.0',
       HEADER_ROW = TRUE
   ) WITH (
       [Column1] VARCHAR(100),
       [Column2] INT,
       [Column3] DECIMAL(18,2)
   ) AS [r];

JSON Parsing Challenges

Symptoms:

  • JSON path errors

  • Missing or NULL values from JSON documents

  • Array handling issues

Solutions:

  1. Use proper JSON functions:
   SELECT
       JSON_VALUE(jsonColumn, '$.property') AS PropertyValue,
       JSON_QUERY(jsonColumn, '$.array') AS ArrayValue
   FROM [JsonTable];
  1. Handle nested structures properly:
   -- Extract nested JSON properties
   SELECT
       JSON_VALUE(jsonColumn, '$.person.firstName') AS FirstName,
       JSON_VALUE(jsonColumn, '$.person.lastName') AS LastName,
       JSON_VALUE(jsonColumn, '$.person.address.city') AS City
   FROM [JsonTable];
  1. Check for malformed JSON:
   SELECT *
   FROM [JsonTable]
   WHERE ISJSON(jsonColumn) = 0;

Resource Limitations

Memory Pressure

Symptoms:

  • Queries failing with memory-related errors

  • Inconsistent performance with large result sets

  • Failures during complex aggregations

Solutions:

  1. Reduce result set size:
  2. Select only needed columns
  3. Apply filtering early in queries
  4. Use TOP or LIMIT for initial testing
   -- Instead of SELECT *
   SELECT [Key], [ImportantColumn1], [ImportantColumn2]
   FROM [LargeTable]
   WHERE [FilterColumn] = 'Value';
  1. Implement pagination:
  2. Use ORDER BY with OFFSET-FETCH for pagination
  3. Split queries into smaller result sets
   -- Paginated query
   SELECT *
   FROM [LargeTable]
   ORDER BY [SortColumn]
   OFFSET 1000 ROWS FETCH NEXT 1000 ROWS ONLY;
  1. Optimize memory-intensive operations:
  2. Avoid excessive sorting or grouping
  3. Use windowing functions carefully
  4. Consider materialization of intermediate results

Concurrency Limitations

Symptoms:

  • Query failures during peak usage times

  • Errors about exceeding concurrency limits

  • Queries queued for execution

Solutions:

  1. Implement request management:
  2. Throttle concurrent queries from applications
  3. Use connection pooling effectively

  4. Schedule heavy workloads appropriately:

  5. Distribute load across time periods
  6. Schedule batch operations during off-peak hours

  7. Monitor resource utilization:

  8. Track concurrency usage patterns
  9. Set alerts for approaching limits

File Access Issues

Permission Problems

Symptoms:

  • "Access denied" errors when querying data

  • Authentication failures

  • Queries working for some users but not others

Solutions:

  1. Check storage permissions:
  2. Verify Storage Blob Data Reader role assignments
  3. Check ACL settings for hierarchical namespace
  4. Ensure Synapse workspace has proper access

  5. Use managed identity authentication:

   -- Create credential using managed identity
   CREATE DATABASE SCOPED CREDENTIAL MSICredential
   WITH IDENTITY = 'Managed Identity';

   -- Create data source using credential
   CREATE EXTERNAL DATA SOURCE SecureDataSource
   WITH (
       LOCATION = 'abfss://container@account.dfs.core.windows.net',
       CREDENTIAL = MSICredential
   );
  1. Verify network access:
  2. Check firewall settings
  3. Verify private endpoints configuration
  4. Test with Azure Storage Explorer

File Not Found Errors

Symptoms:

  • "File not found" errors when querying

  • Unexpected empty result sets

  • Path resolution failures

Solutions:

  1. Check path specifications:
  2. Verify path case sensitivity
  3. Use correct URL format (abfss://, wasbs://)
  4. Check for typos in container or folder names

  5. Verify file existence:

  6. Use Storage Explorer to confirm file existence
  7. Check folder structure and naming

  8. Test with explicit paths:

   -- Test file access with explicit path
   SELECT TOP 10 *
   FROM OPENROWSET(
       BULK 'abfss://container@account.dfs.core.windows.net/path/file.csv',
       FORMAT = 'CSV',
       PARSER_VERSION = '2.0',
       HEADER_ROW = TRUE
   ) AS [r];

Metadata Challenges

Statistics Issues

Symptoms:

  • Suboptimal query plans

  • Inconsistent performance

  • Incorrect cardinality estimates

Solutions:

  1. Create statistics on external tables:
   -- Create statistics on important columns
   CREATE STATISTICS [Stats_Column1]
   ON [ExternalTable] ([Column1]);
  1. Update statistics regularly:
   -- Update statistics
   UPDATE STATISTICS [ExternalTable] ([Column1]);
  1. Use query hints when necessary:
   -- Force a specific cardinality estimate
   SELECT *
   FROM [ExternalTable]
   WHERE [Column1] = 'Value'
   OPTION (FORCE_EXTERNALPUSHDOWN, QUERYTRACEON 9481);

Schema Drift Handling

Symptoms:

  • Queries failing after source schema changes

  • Missing columns in query results

  • Data type mismatches

Solutions:

  1. Implement schema flexibility:
   -- Use JSON format for schema flexibility
   SELECT *
   FROM OPENROWSET(
       BULK 'abfss://container@account.dfs.core.windows.net/path/*.json',
       FORMAT = 'CSV',
       FIELDTERMINATOR = '0x0b',
       FIELDQUOTE = '0x0b',
       ROWTERMINATOR = '0x0b'
   ) WITH (
       jsonContent VARCHAR(MAX)
   ) AS [rows]
   CROSS APPLY OPENJSON(jsonContent)
   WITH (
       [Column1] VARCHAR(100) '$.field1',
       [Column2] VARCHAR(100) '$.field2'
       -- Add only required fields
   );
  1. Use schema discovery tools:
   -- Discover schema
   EXEC sp_describe_first_result_set N'
       SELECT *
       FROM OPENROWSET(
           BULK ''abfss://container@account.dfs.core.windows.net/path/file.csv'',
           FORMAT = ''CSV'',
           PARSER_VERSION = ''2.0'',
           HEADER_ROW = TRUE
       ) AS [r]
   ';
  1. Implement schema validation queries:
  2. Create validation queries that run before main processing
  3. Generate schema comparison reports

Advanced Troubleshooting

Query Monitoring

Monitor Serverless SQL Pool queries to identify issues:

  1. Check DMVs for active queries:
   SELECT
       r.session_id,
       r.status,
       r.submit_time,
       r.total_elapsed_time,
       r.request_id,
       r.command,
       t.text
   FROM sys.dm_pdw_exec_requests r
   CROSS APPLY sys.dm_pdw_request_steps s
   CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
   WHERE r.status NOT IN ('Completed', 'Failed', 'Cancelled')
   ORDER BY r.submit_time DESC;
  1. Monitor resource usage:
   SELECT
       r.request_id,
       r.status,
       r.total_elapsed_time,
       s.step_index,
       s.operation_type,
       s.location_type,
       s.row_count,
       s.command
   FROM sys.dm_pdw_exec_requests r
   JOIN sys.dm_pdw_request_steps s ON r.request_id = s.request_id
   WHERE r.session_id = @@SPID
   ORDER BY r.request_id, s.step_index;
  1. Track query history:
   SELECT TOP 100
       r.session_id,
       r.status,
       r.submit_time,
       r.end_time,
       r.total_elapsed_time,
       r.command,
       t.text
   FROM sys.dm_pdw_exec_requests r
   CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
   ORDER BY r.submit_time DESC;

Diagnostic Queries

Use these diagnostic queries to identify Serverless SQL Pool issues:

  1. Check for errors:
   SELECT
       request_id,
       step_index,
       status,
       error_id,
       start_time,
       end_time,
       total_elapsed_time,
       row_count,
       command
   FROM sys.dm_pdw_request_steps
   WHERE request_id IN (
       SELECT request_id
       FROM sys.dm_pdw_exec_requests
       WHERE session_id = @@SPID
       AND status = 'Failed'
   )
   ORDER BY request_id, step_index;
  1. Get error details:
   SELECT
       error_id,
       severity,
       [state],
       [message],
       pdw_node_id
   FROM sys.dm_pdw_errors
   WHERE error_id = '<error_id_from_previous_query>';

Best Practices for Avoiding Issues

  1. Use optimal file formats:
  2. Parquet or ORC for analytical queries
  3. Proper partitioning for large datasets

  4. Implement appropriate data organization:

  5. Partition by frequently filtered columns
  6. Use folder structures that align with query patterns

  7. Follow query optimization guidelines:

  8. Filter data early
  9. Project only necessary columns
  10. Use appropriate join strategies

  11. Set up monitoring:

  12. Configure diagnostic settings
  13. Create alerts for query failures
  14. Track performance patterns

External Resources