Serverless SQL Architecture Video Script¶
Overview¶
Video script covering Azure Synapse Serverless SQL Pool architecture, query optimization, and design patterns for cost-effective data exploration.
Script Content¶
Opening (0:00 - 1:00)¶
NARRATOR: "Query petabytes of data without provisioning infrastructure. Azure Synapse Serverless SQL Pool enables on-demand analytics with pay-per-query pricing. Let's explore its architecture and optimization strategies."
Section 1: Architecture Overview (1:00 - 5:00)¶
Serverless Engine¶
```textQuery Request ↓ Query Optimizer ↓ Distributed Query Execution ↓ Data Lake Storage (Parquet/Delta/CSV) ↓ Results (pay for data scanned)
**Key Characteristics**:
- No servers to manage
- Automatic scaling
- Pay per TB scanned
- Instant availability
### Section 2: Query Patterns (5:00 - 11:00)
#### OPENROWSET Basics
```sql
-- Query Parquet files
SELECT *
FROM OPENROWSET(
BULK 'https://datalake.dfs.core.windows.net/data/sales/*.parquet',
FORMAT = 'PARQUET'
) AS sales
WHERE sale_date >= '2024-01-01';
-- Query with schema inference
SELECT *
FROM OPENROWSET(
BULK 'https://datalake.dfs.core.windows.net/data/customers.json',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
HEADER_ROW = TRUE
) WITH (
customer_id INT,
name VARCHAR(100),
email VARCHAR(100)
) AS customers;
External Tables¶
-- Create external data source
CREATE EXTERNAL DATA SOURCE DataLake
WITH (
LOCATION = 'https://datalake.dfs.core.windows.net/',
CREDENTIAL = ManagedIdentity
);
-- Create external file format
CREATE EXTERNAL FILE FORMAT ParquetFormat
WITH (
FORMAT_TYPE = PARQUET,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);
-- Create external table
CREATE EXTERNAL TABLE Sales
(
sale_id INT,
sale_date DATE,
amount DECIMAL(19,2)
)
WITH (
LOCATION = 'data/sales/',
DATA_SOURCE = DataLake,
FILE_FORMAT = ParquetFormat
);
Section 3: Optimization Techniques (11:00 - 14:00)¶
Minimize Data Scanned¶
-- BAD: Scans all data
SELECT *
FROM OPENROWSET(
BULK 'https://datalake.dfs.core.windows.net/data/**/*.parquet',
FORMAT = 'PARQUET'
) AS data;
-- GOOD: Uses partition pruning
SELECT *
FROM OPENROWSET(
BULK 'https://datalake.dfs.core.windows.net/data/year=2024/month=01/*.parquet',
FORMAT = 'PARQUET'
) AS data;
-- BEST: Specific columns + partition pruning
SELECT customer_id, sale_date, amount
FROM Sales
WHERE year = 2024 AND month = 1;
File Size Optimization¶
Recommendations: - Optimal file size: 100MB - 1GB - Too small: Overhead increases - Too large: Cannot parallelize - Use compaction for small files
Section 4: Cost Management (14:00 - 16:00)¶
Cost Calculation¶
```textCost = Data Scanned (TB) × $5.00 per TB
Example: - Query scans 100GB - Cost = 0.1 TB × $5.00 = $0.50 ```
Cost Optimization Strategies¶
- Partition Data: Reduce data scanned
- Use Parquet: 10x less data than CSV
- Select Specific Columns: Avoid SELECT *
- Compress Files: Snappy compression
- Create External Tables: Reuse metadata
Conclusion¶
Best Practices: - Use partition elimination - Choose optimal file formats - Create external tables for reuse - Monitor query costs - Compress data appropriately
Related Resources¶
Last Updated: January 2025