MySQL to Azure Migration -- Best Practices¶
Version compatibility checklist, parameter tuning for Azure MySQL Flexible Server, connection pooling, monitoring with slow query log and Performance Insights, and CSA-in-a-Box integration patterns using Fabric Mirroring for MySQL and ADF MySQL connector.
1. Assessment methodology¶
1.1 Discovery phase¶
Before any migration work begins, conduct a thorough MySQL estate discovery:
-- Inventory all databases and sizes
SELECT
table_schema AS db_name,
ROUND(SUM(data_length + index_length) / 1024 / 1024 / 1024, 2) AS size_gb,
COUNT(*) AS table_count,
SUM(table_rows) AS approx_rows
FROM information_schema.tables
WHERE table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
GROUP BY table_schema
ORDER BY SUM(data_length + index_length) DESC;
-- Check MySQL version
SELECT VERSION();
-- Check storage engines in use
SELECT engine, COUNT(*) AS table_count,
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS total_size_mb
FROM information_schema.tables
WHERE table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
GROUP BY engine;
-- Check for stored procedures, functions, triggers, events
SELECT routine_schema, routine_type, COUNT(*)
FROM information_schema.routines
WHERE routine_schema NOT IN ('mysql', 'sys')
GROUP BY routine_schema, routine_type;
SELECT trigger_schema, COUNT(*)
FROM information_schema.triggers
WHERE trigger_schema NOT IN ('mysql', 'sys')
GROUP BY trigger_schema;
SELECT event_schema, COUNT(*)
FROM information_schema.events
GROUP BY event_schema;
-- Check character sets and collations
SELECT character_set_name, collation_name, COUNT(*) AS column_count
FROM information_schema.columns
WHERE table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
AND character_set_name IS NOT NULL
GROUP BY character_set_name, collation_name
ORDER BY column_count DESC;
1.2 Complexity scoring¶
| Dimension | Low complexity | Medium complexity | High complexity |
|---|---|---|---|
| Database size | < 10 GB | 10-500 GB | > 500 GB |
| Table count | < 50 | 50-500 | > 500 |
| Storage engines | InnoDB only | InnoDB + MyISAM | Multiple engines (Aria, ColumnStore, FEDERATED) |
| Stored procedures | None | < 50 | > 50 or complex logic |
| Triggers | None | < 20 | > 20 or cross-table triggers |
| Character sets | utf8mb4 only | Mixed (utf8, latin1) | Multiple non-UTF8 charsets |
| Replication | None or simple async | Semi-sync | Group Replication / Galera |
| MySQL version | 8.0 | 5.7 | 5.6 or earlier |
| MariaDB features | None | Sequences, CHECK constraints | System versioning, ColumnStore |
| Cross-database queries | None | Simple JOINs | Complex cross-DB dependencies |
2. Version compatibility checklist¶
2.1 Pre-migration compatibility check¶
- MySQL version: Confirm source is 5.7 or 8.0 (required for Azure MySQL Flexible Server)
- sql_mode: Document current
sql_modeand set identically on target - lower_case_table_names: Must be set during target server creation (cannot change later)
- character_set_server: Document and match on target (recommend utf8mb4)
- collation_server: Document and match on target
- time_zone: Document and configure on target
- innodb_strict_mode: Verify and match
- explicit_defaults_for_timestamp: Verify and match
- default_authentication_plugin: caching_sha2_password on MySQL 8.0
- All tables InnoDB: Convert MyISAM, Aria, MEMORY tables before migration
2.2 Breaking changes by version¶
| Upgrade path | Breaking changes | Mitigation |
|---|---|---|
| 5.7 to 8.0 | Query cache removed, GROUP BY implicit sort removed, utf8mb4 default, caching_sha2_password default | Test queries, update GROUP BY with ORDER BY, update client drivers |
| 8.0 to 8.4 | Minor behavioral changes, deprecations | Review MySQL 8.4 release notes |
| MariaDB to MySQL | Sequences removed, system versioning removed, Aria engine not supported | Convert to MySQL-compatible equivalents |
3. Parameter tuning for Azure MySQL Flexible Server¶
3.1 Essential parameters to configure¶
-- Buffer pool: Set to maximum available (auto-sized but verify)
-- Azure auto-sizes to 50-80% of RAM
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
-- Connection limits: Size based on expected concurrency
-- Rule of thumb: 4x number of application server threads
SET GLOBAL max_connections = 500; -- Adjust based on tier
-- Slow query logging: Enable for monitoring
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 2; -- Log queries > 2 seconds
-- Wait timeout: Reduce from default 28800 to prevent stale connections
SET GLOBAL wait_timeout = 600; -- 10 minutes
SET GLOBAL interactive_timeout = 600;
-- Temporary tables: Size based on query complexity
SET GLOBAL tmp_table_size = 67108864; -- 64 MB
SET GLOBAL max_heap_table_size = 67108864; -- 64 MB
-- Sort buffer: Increase for ORDER BY / GROUP BY heavy workloads
SET GLOBAL sort_buffer_size = 4194304; -- 4 MB (per connection)
-- Join buffer: Increase for JOIN-heavy queries without indexes
SET GLOBAL join_buffer_size = 4194304; -- 4 MB (per connection)
-- Thread cache: Reduce connection overhead
SET GLOBAL thread_cache_size = 50; -- Reuse threads
3.2 Parameters by workload type¶
| Workload | Key parameters | Recommended values |
|---|---|---|
| Web application (OLTP) | innodb_flush_log_at_trx_commit, max_connections, wait_timeout | 1 (full durability), 200-500, 300 |
| Reporting / Analytics | tmp_table_size, sort_buffer_size, join_buffer_size, read_buffer_size | 128 MB, 8 MB, 8 MB, 1 MB |
| High-write ingestion | innodb_flush_log_at_trx_commit, innodb_io_capacity, pre-provisioned IOPS | 2 (reduced durability OK), 2000+, 10000+ |
| WordPress / CMS | query_cache_size (5.7 only), max_connections, table_open_cache | 64 MB, 100, 2000 |
| Connection-heavy (microservices) | max_connections, wait_timeout, thread_cache_size | 1000+, 60, 100 |
3.3 Parameters to avoid changing¶
| Parameter | Why not to change |
|---|---|
innodb_buffer_pool_size | Auto-sized by Azure; changing may cause issues |
innodb_log_file_size | Managed by service |
innodb_flush_method | Optimized for Azure storage |
innodb_file_per_table | Always ON on Flexible Server |
4. Connection pooling¶
4.1 Why connection pooling matters on Azure MySQL¶
Each MySQL connection consumes ~10 MB of RAM and a server thread. Without pooling, applications that create/destroy connections frequently waste resources and hit max_connections limits.
4.2 Application-level pooling recommendations¶
Python (SQLAlchemy):
from sqlalchemy import create_engine
engine = create_engine(
"mysql+pymysql://admin:password@server.mysql.database.azure.com/mydb",
pool_size=20, # Persistent connections
max_overflow=10, # Additional connections under load
pool_timeout=30, # Wait time for available connection
pool_recycle=1800, # Recycle connections every 30 min
pool_pre_ping=True, # Verify connection before use
connect_args={
"ssl": {"ca": "/path/to/DigiCertGlobalRootCA.crt.pem"}
}
)
Java (HikariCP):
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:mysql://server.mysql.database.azure.com:3306/mydb?useSSL=true&requireSSL=true");
config.setUsername("admin");
config.setPassword("password");
config.setMaximumPoolSize(20);
config.setMinimumIdle(5);
config.setConnectionTimeout(30000);
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);
config.setLeakDetectionThreshold(60000);
Node.js (mysql2):
const mysql = require("mysql2");
const pool = mysql.createPool({
host: "server.mysql.database.azure.com",
user: "admin",
password: "password",
database: "mydb",
waitForConnections: true,
connectionLimit: 20,
queueLimit: 0,
ssl: { ca: fs.readFileSync("/path/to/DigiCertGlobalRootCA.crt.pem") },
});
4.3 Connection pool sizing formula¶
Example: 4 app servers with 10 threads each = 4 _ 10 _ 0.5 = 20 connections per pool.
Total connections = 4 servers * 20 pool connections = 80 active connections.
Ensure max_connections on Azure MySQL is set higher than total expected connections (add 20% buffer).
5. Monitoring¶
5.1 Slow query log¶
Enable and configure the slow query log for ongoing performance monitoring:
# Enable via Azure CLI
az mysql flexible-server parameter set \
--resource-group myResourceGroup \
--server-name myMySQLServer \
--name slow_query_log --value ON
az mysql flexible-server parameter set \
--resource-group myResourceGroup \
--server-name myMySQLServer \
--name long_query_time --value 2
az mysql flexible-server parameter set \
--resource-group myResourceGroup \
--server-name myMySQLServer \
--name log_slow_admin_statements --value ON
az mysql flexible-server parameter set \
--resource-group myResourceGroup \
--server-name myMySQLServer \
--name log_queries_not_using_indexes --value ON
5.2 Performance Insights¶
Azure MySQL Flexible Server provides Query Performance Insights in the Azure Portal:
| Feature | What it shows | How to use |
|---|---|---|
| Long-running queries | Top queries by duration | Identify queries needing optimization |
| Wait statistics | What queries are waiting on (I/O, locks, CPU) | Identify bottlenecks |
| Query store | Execution plan history | Detect plan regressions |
| Active queries | Currently executing queries | Real-time troubleshooting |
5.3 Azure Monitor metrics¶
Key metrics to monitor and alert on:
| Metric | Warning threshold | Critical threshold | Action |
|---|---|---|---|
| CPU percentage | > 70% | > 90% | Scale up compute tier |
| Memory percentage | > 80% | > 95% | Scale up or optimize queries |
| Storage percentage | > 75% | > 90% | Enable auto-grow or provision more |
| IOPS percentage | > 70% | > 90% | Add pre-provisioned IOPS |
| Active connections | > 70% of max | > 90% of max | Increase max_connections or scale up |
| Replication lag (seconds) | > 5 seconds | > 30 seconds | Scale up replica or reduce write load |
| Aborted connections | > 10/min | > 50/min | Fix client SSL config or connection pool |
| Slow queries | > 10/min | > 50/min | Review slow query log; optimize queries |
# Create alert rule for high CPU
az monitor metrics alert create \
--name "MySQL-High-CPU" \
--resource-group myResourceGroup \
--scopes "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.DBforMySQL/flexibleServers/myMySQLServer" \
--condition "avg cpu_percent > 90" \
--window-size 5m \
--evaluation-frequency 1m \
--action "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Insights/actionGroups/myActionGroup"
5.4 Diagnostic logging¶
# Enable diagnostic settings
az monitor diagnostic-settings create \
--resource "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.DBforMySQL/flexibleServers/myMySQLServer" \
--name "MySQLDiagnostics" \
--workspace "<log-analytics-workspace-id>" \
--logs '[
{"category": "MySqlSlowLogs", "enabled": true},
{"category": "MySqlAuditLogs", "enabled": true}
]' \
--metrics '[{"category": "AllMetrics", "enabled": true}]'
6. CSA-in-a-Box integration¶
6.1 Fabric Mirroring for Azure MySQL¶
Fabric Mirroring for Azure MySQL enables near-real-time replication of MySQL tables into Microsoft Fabric's OneLake as Delta Lake tables. This is the recommended integration pattern for analytics on MySQL data within the CSA-in-a-Box platform.
How it works:
- Fabric Mirroring reads the MySQL binlog (similar to a replica)
- Changes are streamed to OneLake as Delta Lake tables
- dbt models in CSA-in-a-Box transform data through medallion layers (Bronze/Silver/Gold)
- Power BI with Direct Lake serves analytics with zero data movement
- Microsoft Purview catalogs and classifies the mirrored data
Setup steps:
- Ensure binlog is enabled on Azure MySQL Flexible Server (default for HA-enabled servers)
- In Microsoft Fabric portal, create a new Mirrored Database
- Select "Azure Database for MySQL" as the source type
- Enter connection details (server name, database, credentials)
- Select tables to mirror (start with a subset, expand later)
- Configure mirroring schedule (continuous or periodic)
- Start mirroring and verify data appears in OneLake
Best practices for Fabric Mirroring:
- Start with the most analytically valuable tables (e.g., orders, customers, products)
- Avoid mirroring large BLOB/TEXT columns unless needed for analytics
- Monitor mirroring lag in Fabric portal
- Use dbt models to transform mirrored data (do not query raw mirrored tables directly for analytics)
6.2 Azure Data Factory MySQL connector¶
For batch data integration or when Fabric Mirroring is not suitable, ADF provides a MySQL connector:
{
"name": "MySQLToOneLake",
"properties": {
"activities": [
{
"name": "CopyMySQLToLakehouse",
"type": "Copy",
"typeProperties": {
"source": {
"type": "MySqlSource",
"query": "SELECT * FROM orders WHERE updated_at > '@{pipeline().parameters.lastWatermark}'"
},
"sink": {
"type": "LakehouseTableSink",
"tableActionOption": "Append"
}
}
}
]
}
}
When to use ADF vs Fabric Mirroring:
| Scenario | Use Fabric Mirroring | Use ADF |
|---|---|---|
| Near-real-time analytics | Yes | No (batch only) |
| Complex data transformations during ingestion | No | Yes (with data flows) |
| Multiple source types in same pipeline | No | Yes |
| On-premises MySQL (not on Azure) | No | Yes (with self-hosted IR) |
| MariaDB sources | No | Yes (MariaDB connector) |
| Selective column projection | Limited | Yes |
| Data quality checks during ingestion | No | Yes (with data flows) |
6.3 Microsoft Purview integration¶
After migrating MySQL to Azure and connecting to CSA-in-a-Box:
- Register Azure MySQL Flexible Server as a data source in Microsoft Purview
- Configure scanning to discover metadata (tables, columns, data types)
- Apply classifications for sensitive data (PII, CUI, PHI)
- Enable lineage tracking from MySQL through Fabric to Power BI
- Set up access policies based on data sensitivity
# Register MySQL source in Purview (via API or Portal)
# Portal: Purview Studio > Data Map > Register > Azure Database for MySQL
# Provide: server name, database name, collection assignment
7. Operational best practices¶
7.1 Maintenance windows¶
# Set custom maintenance window (e.g., Sunday 2:00 AM - 3:00 AM UTC)
az mysql flexible-server update \
--resource-group myResourceGroup \
--name myMySQLServer \
--maintenance-window "Sun:02:00"
7.2 Backup strategy¶
| Backup type | Configuration | Recovery scenario |
|---|---|---|
| Automated daily | Default (1-35 day retention) | Point-in-time restore for accidental changes |
| Geo-redundant | Enable for DR | Cross-region recovery |
| Logical backup | mysqldump to Azure Blob Storage (scheduled) | Application-consistent backup for compliance |
| Long-term | ADF pipeline to ADLS Gen2 (archive tier) | Regulatory retention (7+ years) |
7.3 Scaling guidelines¶
| Trigger | Action | How |
|---|---|---|
| CPU consistently > 80% | Scale up compute tier | az mysql flexible-server update --sku-name Standard_D8ds_v4 |
| Memory consistently > 85% | Scale to Memory Optimized tier | az mysql flexible-server update --sku-name Standard_E8ds_v4 |
| IOPS consistently > 80% | Add pre-provisioned IOPS | az mysql flexible-server update --iops <value> |
| Storage > 80% | Enable auto-grow or provision more | az mysql flexible-server update --storage-size <value> |
| Connections > 70% of max | Implement connection pooling or scale up | Application-level pooling configuration |
7.4 Security hardening post-migration¶
- Remove all public firewall rules (use Private Link / VNet integration)
- Migrate all users to Entra ID authentication
- Enable customer-managed keys for encryption at rest
- Enable audit logging with at least CONNECTION and DDL/DCL events
- Enable Azure Defender for MySQL
- Review and restrict database privileges (least privilege)
- Disable
local_infileunless specifically needed - Set
require_secure_transport = ON(default) - Configure password expiration policies via Entra ID
8. Common pitfalls and how to avoid them¶
| Pitfall | Impact | Prevention |
|---|---|---|
| Not converting MyISAM to InnoDB | HA cannot be enabled; data integrity risk | Convert all tables before migration |
| Ignoring lower_case_table_names | Cannot be changed after server creation | Set correctly during az mysql flexible-server create |
| Using root/admin for application | Security risk, over-privileged | Create dedicated application users with least privilege |
| No connection pooling | Connection exhaustion, poor performance | Implement application-level connection pooling |
| Ignoring slow query log | Undetected performance degradation | Enable slow query log from day one |
| Over-provisioning compute | Wasted spend | Start small, scale up based on monitoring data |
| Under-provisioning IOPS | Write bottleneck, high latency | Monitor IOPS usage; add pre-provisioned IOPS for write-heavy workloads |
| Not testing cutover | Extended downtime during production cutover | Practice cutover in staging at least twice |
| Skipping DEFINER cleanup | Stored procedures/views fail on target | Remove or replace DEFINER clauses in dump files |
| Not updating SSL certificates | Connection failures after migration | Download and configure Azure CA certificate in all applications |
9. Post-migration optimization timeline¶
| Timeframe | Activities |
|---|---|
| Day 1-7 | Monitor slow query log, CPU, memory, IOPS; tune top slow queries |
| Week 2-4 | Review connection pooling effectiveness; adjust max_connections |
| Month 1-2 | Evaluate tier sizing (scale down if over-provisioned, scale up if needed) |
| Month 2-3 | Configure Fabric Mirroring or ADF pipelines for CSA-in-a-Box |
| Month 3-6 | Enable reserved capacity (1-year) after confirming stable workload |
| Month 6+ | Evaluate 3-year reserved capacity; review for cost optimization |
Next: Migration Playbook | Index | Benchmarks
Maintainers: csa-inabox core team Last updated: 2026-04-30