Skip to content
Learn — Azure analytics reference library covering services, architecture patterns, tutorials, solutions, monitoring, DevOps

Cost Optimization Best Practices for Azure Synapse Analytics

Home > Best Practices > Cost Optimization

Understanding Synapse Analytics Cost Model

Cost Components

Serverless SQL Pool Costs

  • Data Scanning: Charged per TB of data processed
  • Result Caching: No charge for subsequent queries using cached results
  • Resource Management: No charge when idle (pay-per-query model)

Dedicated SQL Pool Costs

  • Compute Costs: Based on Data Warehouse Units (DWU) and time running
  • Storage Costs: Based on volume of data stored in the dedicated pool
  • Data Movement: Included in compute costs

Spark Pool Costs

  • Compute Costs: Based on vCore-hours consumed
  • Autoscale Impact: Costs vary based on actual usage with autoscale
  • Node Types: Different costs for different node types (memory-optimized vs. compute-optimized)

Storage Costs

  • ADLS Gen2 Storage: Based on volume of data and storage tier (hot/cool/archive)
  • Transaction Costs: Based on number and type of storage operations

Compute Optimization Strategies

Serverless SQL Pool Optimization

Query Optimization

  • Minimize Data Scanning: Prune data aggressively
-- Good: Scans less data with partition filtering
SELECT * FROM external_table
WHERE year_partition = 2025 AND month_partition = 1

-- Avoid: Full scan across all partitions
SELECT * FROM external_table
WHERE YEAR(transaction_date) = 2025 AND MONTH(transaction_date) = 1
  • Use Appropriate File Formats: Prefer columnar formats (Parquet, ORC) over row-based formats (CSV, JSON)
-- Create external file format for Parquet
CREATE EXTERNAL FILE FORMAT ParquetFormat
WITH (
    FORMAT_TYPE = PARQUET,
    DATA_COMPRESSION = 'SNAPPY'
);
  • Statistics: Create statistics on frequently filtered columns
-- Create statistics for better query plans
CREATE STATISTICS stats_year ON external_table(year_column);

Result Set Caching

  • Enable Result Set Caching: Reuse query results for identical queries
-- Enable result set caching at database level
ALTER DATABASE MyDatabase
SET RESULT_SET_CACHING ON;
  • Parameterize Queries: Use parameterized queries to maximize cache hits

Dedicated SQL Pool Optimization

Scale Management

  • Implement Automated Scaling: Scale up/down based on workload patterns
# Scale DW based on schedule
$startTime = (Get-Date).AddHours(1)
$timeZone = [System.TimeZoneInfo]::Local.Id
$schedule = New-AzSynapseWorkspaceManagedSchedule -DayOfWeek Monday, Tuesday, Wednesday, Thursday, Friday -Time "08:00" -TimeZone $timeZone
New-AzSynapseSqlPoolWorkloadManagement -WorkspaceName $workspaceName -SqlPoolName $sqlPoolName -DwuValue 1000 -Schedule $schedule
  • Pause During Inactivity: Automatically pause during non-business hours
# Pause SQL pool
Suspend-AzSynapseSqlPool -WorkspaceName $workspaceName -Name $sqlPoolName

Resource Classes

  • Optimize Resource Classes: Use smaller resource classes for simple queries
-- Assign smaller resource class for simple queries
EXEC sp_addrolemember 'smallrc', 'username';

-- Assign larger resource class for complex queries
EXEC sp_addrolemember 'largerc', 'username';

Spark Pool Optimization

Autoscale Configuration

  • Right-Size Min/Max Nodes: Configure appropriate autoscale range
{
  "name": "optimizedSparkPool",
  "properties": {
    "nodeSize": "Small",
    "nodeSizeFamily": "MemoryOptimized",
    "autoScale": {
      "enabled": true,
      "minNodeCount": 3,
      "maxNodeCount": 10
    }
  }
}
  • Session-Level Configuration: Only request resources needed for each job
# Configure Spark session with appropriate resources
spark.conf.set("spark.executor.instances", "4")
spark.conf.set("spark.executor.memory", "4g")
spark.conf.set("spark.executor.cores", "2")

Node Selection

  • Use Appropriate Node Types: Select based on workload characteristics
  • Memory-optimized for ML and large joins
  • Compute-optimized for ETL and data processing

  • Consider Job Requirements: Match node size to job requirements

Session Management

  • Session Timeout: Configure appropriate timeout to release resources
{
  "name": "optimizedSparkPool",
  "properties": {
    "sessionLevelPackages": [],
    "sparkConfigProperties": {},
    "nodeSize": "Small",
    "nodeSizeFamily": "MemoryOptimized",
    "sessionLevelPackages": [],
    "customLibraries": [],
    "sparkEventsFolder": "/events",
    "autoScale": {
      "enabled": true,
      "minNodeCount": 3,
      "maxNodeCount": 10
    },
    "isComputeIsolationEnabled": false,
    "sessionProperties": {
      "driverSize": "Small",
      "executorSize": "Small",
      "executorCount": 2
    },
    "defaultSparkLogFolder": "/logs",
    "nodeCount": 0,
    "dynamicExecutorAllocation": {
      "enabled": true,
      "minExecutors": 1,
      "maxExecutors": 5
    },
    "coordinatorSize": "Small",
    "provisioningState": "Succeeded"
  }
}

Storage Optimization Strategies

Data Lifecycle Management

Storage Tiering

  • Hot Storage: Use for frequently accessed data (last 30-90 days)
  • Cool Storage: Use for infrequently accessed data (older than 90 days)
  • Archive Storage: Use for rarely accessed data (compliance/historical)

Automated Tiering

  • Lifecycle Management Policies: Configure to automatically move data between tiers
{
  "rules": [
    {
      "enabled": true,
      "name": "MoveToCoolTier",
      "type": "Lifecycle",
      "definition": {
        "filters": {
          "blobTypes": [ "blockBlob" ],
          "prefixMatch": [ "data/historical/" ]
        },
        "actions": {
          "baseBlob": {
            "tierToCool": { "daysAfterModificationGreaterThan": 90 }
          }
        }
      }
    }
  ]
}

Data Storage Optimization

Compression and File Formats

  • Use Compression: Prefer columnar formats with compression
# Write with compression
df.write.format("parquet") \
    .option("compression", "snappy") \
    .save("/path/to/data")
  • Optimize File Sizes: Target 100MB-1GB per file
# Control Parquet file size
spark.conf.set("spark.sql.files.maxPartitionBytes", 134217728)  # 128 MB

Data Cleanup

  • Remove Duplicate Data: Deduplicate data where possible
  • Regular Vacuum: Clean up stale files in Delta tables
-- Remove files no longer needed by the table
VACUUM delta_table RETAIN 7 DAYS
  • Temporary Data Management: Remove temporary datasets after use

Pipeline Optimization

Integration Pipeline Costs

Activity Optimization

  • Combine Activities: Reduce activity runs by combining related operations
  • Use Appropriate Integration Runtime: Match the IR to the workload requirements
  • Optimize Copy Activity: Configure appropriate compute size for data movement

Monitoring and Debugging

  • Limit Debug Runs: Use debug runs sparingly
  • Optimize Logging: Implement appropriate logging levels
  • Use Activity Constraints: Set appropriate timeouts and retry policies

Orchestration Patterns

Trigger Optimization

  • Batch Related Activities: Trigger multiple related activities together
  • Use Event-Based Triggers: Trigger only when needed, rather than on schedule

Monitoring and Analysis

Cost Monitoring

Azure Cost Management

  • Budget Alerts: Set up alerts for cost thresholds
# Create budget with alert
New-AzConsumptionBudget -Name "SynapseMonthlyBudget" `
    -Amount 1000 `
    -Category "Cost" `
    -TimeGrain "Monthly" `
    -StartDate (Get-Date) `
    -EndDate (Get-Date).AddYears(1) `
    -ContactEmail @("user@contoso.com")
  • Cost Analysis: Regularly analyze costs by service, resource, and tag
  • Tag Resources: Implement consistent tagging for cost allocation
{
  "tags": {
    "Environment": "Production",
    "Department": "Finance",
    "Project": "DataWarehouse"
  }
}

Resource Utilization Analysis

  • Monitor Usage Patterns: Track usage to identify optimization opportunities
  • Identify Idle Resources: Find and address underutilized resources
  • Workload Analysis: Understand peak vs. average requirements

Cost Optimization Workflow

Regular Review Process

  • Monthly Cost Review: Schedule regular cost review meetings
  • Cost Optimization Backlog: Maintain a backlog of optimization opportunities
  • ROI Analysis: Prioritize optimization efforts by potential savings

Enterprise Strategies

Reserved Instances

Azure Reservations

  • Reserved Capacity: Consider 1-year or 3-year reservations for stable workloads
  • Reservation Scope: Choose appropriate scope (subscription or resource group)
  • Mixed Approach: Use reserved instances for baseline and pay-as-you-go for variable workloads

Enterprise Agreement Benefits

EA Optimization

  • Leverage EA Pricing: Utilize enterprise agreement discounts
  • Azure Hybrid Benefit: Apply for eligible workloads
  • Enterprise Dev/Test Subscription: Use for non-production environments

Conclusion

Cost optimization in Azure Synapse Analytics requires a multi-faceted approach across compute, storage, and operational aspects. By implementing these best practices, organizations can achieve significant cost savings while maintaining performance and meeting business requirements.

Remember that cost optimization is an ongoing process that should be integrated into your regular operational rhythms. Regular monitoring, analysis, and adjustment of your optimization strategies will ensure continued cost efficiency as your workloads evolve.