Skip to content

Databricks Best Practices

Home | Best Practices | Databricks

Status Service

Best practices index for Azure Databricks.


Overview

This section covers best practices for:

  • Cluster configuration and management
  • Delta Lake optimization
  • MLOps and model management
  • Cost optimization
  • Security and governance

Documentation Index

Document Description
Delta Lake Best Practices Delta Lake optimization and patterns
MLOps Best Practices Machine learning operations
Cluster Management Compute configuration
Cost Optimization Cost management strategies

Quick Wins

Cluster Configuration

# Recommended cluster settings
{
    "spark_version": "13.3.x-scala2.12",
    "node_type_id": "Standard_DS3_v2",
    "autoscale": {
        "min_workers": 2,
        "max_workers": 8
    },
    "spark_conf": {
        "spark.sql.adaptive.enabled": "true",
        "spark.databricks.delta.optimizeWrite.enabled": "true",
        "spark.databricks.delta.autoCompact.enabled": "true"
    }
}

Notebook Best Practices

  1. Use modular notebooks with %run
  2. Parameterize notebooks with widgets
  3. Handle exceptions and log errors
  4. Use version control (Repos)

Performance Guidelines

Area Recommendation Impact
Cluster sizing Start small, scale up Cost
Caching Cache hot DataFrames Performance
Partitioning Match query patterns Performance
File size Target 128MB-1GB Performance


Last Updated: January 2025