Migration Strategies¶
Home | Best Practices | Migration Strategies
Best practices for migrating to Cloud Scale Analytics.
Migration Approaches¶
| Approach | Use Case | Risk | Duration |
|---|---|---|---|
| Lift & Shift | Quick migration, minimal changes | Low | Short |
| Re-platform | Optimize for cloud services | Medium | Medium |
| Re-architect | Full modernization | High | Long |
Migration Phases¶
flowchart LR
A[Assess] --> B[Plan]
B --> C[Migrate]
C --> D[Optimize]
D --> E[Operate] Phase 1: Assessment¶
# Database assessment script
def assess_database(connection_string):
"""Assess database for migration readiness."""
assessment = {
"size_gb": get_database_size(connection_string),
"table_count": get_table_count(connection_string),
"complexity_score": analyze_stored_procedures(connection_string),
"compatibility_issues": check_compatibility(connection_string)
}
# Recommend target
if assessment["size_gb"] < 100 and assessment["complexity_score"] < 50:
assessment["recommendation"] = "Serverless SQL"
elif assessment["size_gb"] < 1000:
assessment["recommendation"] = "Dedicated SQL Pool (DW1000c)"
else:
assessment["recommendation"] = "Dedicated SQL Pool (DW3000c+)"
return assessment
Phase 2: Planning¶
Migration Checklist:
- Inventory all data sources
- Map source to target schemas
- Identify dependencies
- Plan downtime windows
- Create rollback procedures
- Define success criteria
Phase 3: Migration¶
# Data Factory migration pipeline
az datafactory pipeline create \
--factory-name adf-migration \
--resource-group rg-analytics \
--name migrate-sales-data \
--definition '{
"activities": [
{
"name": "CopyFromOnPrem",
"type": "Copy",
"inputs": [{"referenceName": "OnPremSQL", "type": "DatasetReference"}],
"outputs": [{"referenceName": "DataLake", "type": "DatasetReference"}],
"typeProperties": {
"source": {"type": "SqlSource"},
"sink": {"type": "ParquetSink"},
"enableStaging": true
}
}
]
}'
Phase 4: Validation¶
-- Data validation query
WITH SourceCounts AS (
SELECT
'Source' AS System,
COUNT(*) AS RowCount,
SUM(Amount) AS TotalAmount
FROM SourceDatabase.dbo.Sales
),
TargetCounts AS (
SELECT
'Target' AS System,
COUNT(*) AS RowCount,
SUM(Amount) AS TotalAmount
FROM OPENROWSET(
BULK 'https://datalake.dfs.core.windows.net/sales/*.parquet',
FORMAT = 'PARQUET'
) AS sales
)
SELECT * FROM SourceCounts
UNION ALL
SELECT * FROM TargetCounts;
Common Migration Patterns¶
On-Premises SQL Server to Synapse¶
- Export to Parquet using ADF
- Load to Data Lake bronze layer
- Transform to silver/gold layers
- Create views in Serverless SQL
- Migrate heavy queries to Dedicated SQL
Hadoop to Databricks¶
- Migrate HDFS data to ADLS Gen2
- Convert Hive metastore to Unity Catalog
- Refactor Spark jobs for Databricks
- Implement Delta Lake format
Related Documentation¶
Last Updated: January 2025