Troubleshooting Serverless SQL Pool in Azure Synapse Analytics¶
Home > Troubleshooting > Serverless SQL Troubleshooting
This guide provides solutions for common issues encountered when working with Serverless SQL Pools in Azure Synapse Analytics, including query performance problems, error patterns, and optimization techniques.
Common Serverless SQL Issues¶
When working with Serverless SQL Pools, these are the most common categories of issues:
- Query Performance Issues: Slow query execution, timeout errors
- Data Format Problems: Parsing errors, schema inference issues
- Resource Limitations: Query timeouts, memory constraints
- File Access Issues: Permission problems, file not found errors
- Metadata Challenges: Statistics issues, partitioning problems
Query Performance Issues¶
Slow Query Execution¶
Symptoms:
-
Queries taking longer than expected
-
Timeouts during query execution
-
Performance degradation compared to previous runs
Solutions:
- Optimize file format and compression:
- Use columnar formats like Parquet or ORC
- Use appropriate compression (Snappy for performance, Gzip for storage)
-- Convert CSV to Parquet for better performance
CREATE EXTERNAL TABLE [ParquetTable]
WITH (
LOCATION = 'abfss://container@account.dfs.core.windows.net/path/to/folder/',
DATA_SOURCE = [DataSource],
FILE_FORMAT = [ParquetFileFormat]
)
AS SELECT * FROM [CsvTable];
- Use partitioning effectively:
- Query only needed partitions
- Implement partition pruning in queries
- Optimize predicate pushdown:
- Structure queries to push filters to storage layer
-
Use WHERE clauses that can be pushed down
-
Check execution plans:
- Use
EXPLAINto understand query execution - Look for full scans or inefficient operations
Query Timeout Errors¶
Symptoms:
-
Error messages about query execution timeout
-
Queries failing after running for several minutes
-
Consistent failures with large datasets
Solutions:
- Break down complex queries:
- Split into smaller, manageable queries
-
Use temporary results or materialized views
-
Increase timeout settings (for client tools):
- Adjust connection timeout in SQL clients
-
Set command timeout in applications
-
Optimize join operations:
- Ensure smaller tables are on the right side of joins
- Use appropriate join types (hash joins for large tables)
-
Consider denormalizing data where appropriate
-
Implement query hints:
- Use OPTION hints to guide query optimizer
- Apply ORDER hints for join operations
SELECT t1.*, t2.*
FROM [LargeTable] AS t1
JOIN [SmallTable] AS t2
ON t1.key = t2.key
OPTION(HASH JOIN);
Data Format Problems¶
CSV Parsing Errors¶
Symptoms:
-
Error messages about malformed CSV records
-
Unexpected NULL values in query results
-
Data type conversion errors
Solutions:
- Adjust CSV parsing options:
-- Specify CSV format options
CREATE EXTERNAL FILE FORMAT [CustomCsvFormat]
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = TRUE,
ENCODING = 'UTF8'
)
);
- Pre-validate CSV data:
- Use validation queries to identify problematic rows
- Fix source data or handle exceptions
-- Find problematic rows
SELECT
*,
LEN([Column]) AS [Length],
CHARINDEX(',', [RawColumn]) AS [CommaPosition]
FROM [CsvTable]
WHERE TRY_CAST([NumericColumn] AS DECIMAL(18,2)) IS NULL
AND [NumericColumn] IS NOT NULL;
- Use explicit schema definition:
- Define column types explicitly instead of relying on inference
- Use OPENROWSET with explicit schema
SELECT *
FROM OPENROWSET(
BULK 'abfss://container@account.dfs.core.windows.net/path/file.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE
) WITH (
[Column1] VARCHAR(100),
[Column2] INT,
[Column3] DECIMAL(18,2)
) AS [r];
JSON Parsing Challenges¶
Symptoms:
-
JSON path errors
-
Missing or NULL values from JSON documents
-
Array handling issues
Solutions:
- Use proper JSON functions:
SELECT
JSON_VALUE(jsonColumn, '$.property') AS PropertyValue,
JSON_QUERY(jsonColumn, '$.array') AS ArrayValue
FROM [JsonTable];
- Handle nested structures properly:
-- Extract nested JSON properties
SELECT
JSON_VALUE(jsonColumn, '$.person.firstName') AS FirstName,
JSON_VALUE(jsonColumn, '$.person.lastName') AS LastName,
JSON_VALUE(jsonColumn, '$.person.address.city') AS City
FROM [JsonTable];
- Check for malformed JSON:
Resource Limitations¶
Memory Pressure¶
Symptoms:
-
Queries failing with memory-related errors
-
Inconsistent performance with large result sets
-
Failures during complex aggregations
Solutions:
- Reduce result set size:
- Select only needed columns
- Apply filtering early in queries
- Use TOP or LIMIT for initial testing
-- Instead of SELECT *
SELECT [Key], [ImportantColumn1], [ImportantColumn2]
FROM [LargeTable]
WHERE [FilterColumn] = 'Value';
- Implement pagination:
- Use ORDER BY with OFFSET-FETCH for pagination
- Split queries into smaller result sets
-- Paginated query
SELECT *
FROM [LargeTable]
ORDER BY [SortColumn]
OFFSET 1000 ROWS FETCH NEXT 1000 ROWS ONLY;
- Optimize memory-intensive operations:
- Avoid excessive sorting or grouping
- Use windowing functions carefully
- Consider materialization of intermediate results
Concurrency Limitations¶
Symptoms:
-
Query failures during peak usage times
-
Errors about exceeding concurrency limits
-
Queries queued for execution
Solutions:
- Implement request management:
- Throttle concurrent queries from applications
-
Use connection pooling effectively
-
Schedule heavy workloads appropriately:
- Distribute load across time periods
-
Schedule batch operations during off-peak hours
-
Monitor resource utilization:
- Track concurrency usage patterns
- Set alerts for approaching limits
File Access Issues¶
Permission Problems¶
Symptoms:
-
"Access denied" errors when querying data
-
Authentication failures
-
Queries working for some users but not others
Solutions:
- Check storage permissions:
- Verify Storage Blob Data Reader role assignments
- Check ACL settings for hierarchical namespace
-
Ensure Synapse workspace has proper access
-
Use managed identity authentication:
-- Create credential using managed identity
CREATE DATABASE SCOPED CREDENTIAL MSICredential
WITH IDENTITY = 'Managed Identity';
-- Create data source using credential
CREATE EXTERNAL DATA SOURCE SecureDataSource
WITH (
LOCATION = 'abfss://container@account.dfs.core.windows.net',
CREDENTIAL = MSICredential
);
- Verify network access:
- Check firewall settings
- Verify private endpoints configuration
- Test with Azure Storage Explorer
File Not Found Errors¶
Symptoms:
-
"File not found" errors when querying
-
Unexpected empty result sets
-
Path resolution failures
Solutions:
- Check path specifications:
- Verify path case sensitivity
- Use correct URL format (abfss://, wasbs://)
-
Check for typos in container or folder names
-
Verify file existence:
- Use Storage Explorer to confirm file existence
-
Check folder structure and naming
-
Test with explicit paths:
-- Test file access with explicit path
SELECT TOP 10 *
FROM OPENROWSET(
BULK 'abfss://container@account.dfs.core.windows.net/path/file.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE
) AS [r];
Metadata Challenges¶
Statistics Issues¶
Symptoms:
-
Suboptimal query plans
-
Inconsistent performance
-
Incorrect cardinality estimates
Solutions:
- Create statistics on external tables:
-- Create statistics on important columns
CREATE STATISTICS [Stats_Column1]
ON [ExternalTable] ([Column1]);
- Update statistics regularly:
- Use query hints when necessary:
-- Force a specific cardinality estimate
SELECT *
FROM [ExternalTable]
WHERE [Column1] = 'Value'
OPTION (FORCE_EXTERNALPUSHDOWN, QUERYTRACEON 9481);
Schema Drift Handling¶
Symptoms:
-
Queries failing after source schema changes
-
Missing columns in query results
-
Data type mismatches
Solutions:
- Implement schema flexibility:
-- Use JSON format for schema flexibility
SELECT *
FROM OPENROWSET(
BULK 'abfss://container@account.dfs.core.windows.net/path/*.json',
FORMAT = 'CSV',
FIELDTERMINATOR = '0x0b',
FIELDQUOTE = '0x0b',
ROWTERMINATOR = '0x0b'
) WITH (
jsonContent VARCHAR(MAX)
) AS [rows]
CROSS APPLY OPENJSON(jsonContent)
WITH (
[Column1] VARCHAR(100) '$.field1',
[Column2] VARCHAR(100) '$.field2'
-- Add only required fields
);
- Use schema discovery tools:
-- Discover schema
EXEC sp_describe_first_result_set N'
SELECT *
FROM OPENROWSET(
BULK ''abfss://container@account.dfs.core.windows.net/path/file.csv'',
FORMAT = ''CSV'',
PARSER_VERSION = ''2.0'',
HEADER_ROW = TRUE
) AS [r]
';
- Implement schema validation queries:
- Create validation queries that run before main processing
- Generate schema comparison reports
Advanced Troubleshooting¶
Query Monitoring¶
Monitor Serverless SQL Pool queries to identify issues:
- Check DMVs for active queries:
SELECT
r.session_id,
r.status,
r.submit_time,
r.total_elapsed_time,
r.request_id,
r.command,
t.text
FROM sys.dm_pdw_exec_requests r
CROSS APPLY sys.dm_pdw_request_steps s
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.status NOT IN ('Completed', 'Failed', 'Cancelled')
ORDER BY r.submit_time DESC;
- Monitor resource usage:
SELECT
r.request_id,
r.status,
r.total_elapsed_time,
s.step_index,
s.operation_type,
s.location_type,
s.row_count,
s.command
FROM sys.dm_pdw_exec_requests r
JOIN sys.dm_pdw_request_steps s ON r.request_id = s.request_id
WHERE r.session_id = @@SPID
ORDER BY r.request_id, s.step_index;
- Track query history:
SELECT TOP 100
r.session_id,
r.status,
r.submit_time,
r.end_time,
r.total_elapsed_time,
r.command,
t.text
FROM sys.dm_pdw_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
ORDER BY r.submit_time DESC;
Diagnostic Queries¶
Use these diagnostic queries to identify Serverless SQL Pool issues:
- Check for errors:
SELECT
request_id,
step_index,
status,
error_id,
start_time,
end_time,
total_elapsed_time,
row_count,
command
FROM sys.dm_pdw_request_steps
WHERE request_id IN (
SELECT request_id
FROM sys.dm_pdw_exec_requests
WHERE session_id = @@SPID
AND status = 'Failed'
)
ORDER BY request_id, step_index;
- Get error details:
SELECT
error_id,
severity,
[state],
[message],
pdw_node_id
FROM sys.dm_pdw_errors
WHERE error_id = '<error_id_from_previous_query>';
Best Practices for Avoiding Issues¶
- Use optimal file formats:
- Parquet or ORC for analytical queries
-
Proper partitioning for large datasets
-
Implement appropriate data organization:
- Partition by frequently filtered columns
-
Use folder structures that align with query patterns
-
Follow query optimization guidelines:
- Filter data early
- Project only necessary columns
-
Use appropriate join strategies
-
Set up monitoring:
- Configure diagnostic settings
- Create alerts for query failures
- Track performance patterns