Spark Pools¶
Azure Synapse Spark Pools provide scalable Apache Spark compute for big data analytics and machine learning workloads.
Overview¶
Spark Pools in Azure Synapse Analytics enable you to:
- Process large-scale data using Apache Spark
- Run machine learning workloads with built-in libraries
- Integrate with Delta Lake for ACID transactions
- Scale compute resources on-demand
Key Features¶
- Auto-scaling: Automatically scale nodes based on workload
- Built-in Libraries: Pre-installed Spark, Python, and ML libraries
- Notebook Integration: Interactive development with Synapse notebooks
- Delta Lake Support: ACID transactions and time travel
Sections¶
- Delta Lakehouse - Delta Lake implementation patterns
Getting Started¶
To create a Spark Pool:
- Navigate to your Synapse workspace
- Select "Apache Spark pools" from the left menu
- Click "+ New" to create a pool
- Configure node size and auto-scaling settings
- Review and create
Best Practices¶
- Use auto-pause to save costs when pools are idle
- Right-size your nodes based on workload requirements
- Enable dynamic allocation for variable workloads
- Use Delta Lake for production data pipelines
Related Documentation¶
Back to Azure Synapse | Documentation Home