Databricks Best Practices¶

Home | Best Practices | Databricks

Best practices index for Azure Databricks.

Overview¶

This section covers best practices for:

Cluster configuration and management
Delta Lake optimization
MLOps and model management
Cost optimization
Security and governance

Documentation Index¶

Document	Description
Delta Lake Best Practices	Delta Lake optimization and patterns
MLOps Best Practices	Machine learning operations
Cluster Management	Compute configuration
Cost Optimization	Cost management strategies

Quick Wins¶

Cluster Configuration¶

# Recommended cluster settings
{
    "spark_version": "13.3.x-scala2.12",
    "node_type_id": "Standard_DS3_v2",
    "autoscale": {
        "min_workers": 2,
        "max_workers": 8
    },
    "spark_conf": {
        "spark.sql.adaptive.enabled": "true",
        "spark.databricks.delta.optimizeWrite.enabled": "true",
        "spark.databricks.delta.autoCompact.enabled": "true"
    }
}

Notebook Best Practices¶

Use modular notebooks with %run
Parameterize notebooks with widgets
Handle exceptions and log errors
Use version control (Repos)

Performance Guidelines¶

Area	Recommendation	Impact
Cluster sizing	Start small, scale up	Cost
Caching	Cache hot DataFrames	Performance
Partitioning	Match query patterns	Performance
File size	Target 128MB-1GB	Performance

Last Updated: January 2025

Databricks Best Practices¶

Overview¶

Documentation Index¶

Quick Wins¶

Cluster Configuration¶

Notebook Best Practices¶

Performance Guidelines¶

Related Documentation¶