SAP on Azure: Best Practices
Landing zone design, HA/DR architecture, monitoring, backup, cost optimization, and CSA-in-a-Box integration for SAP workloads on Azure.
Overview
This document consolidates best practices for running SAP on Azure, drawn from Microsoft's SAP Center of Excellence, SAP-certified architecture patterns, and CSA-in-a-Box deployment experience. These practices apply to all SAP deployment models: SAP on Azure VMs (self-managed), RISE with SAP on Azure, and HANA Large Instances.
1. Landing zone design
Subscription topology
Follow the Azure Landing Zone (ALZ) enterprise-scale pattern with SAP-specific subscriptions:
Management Group Hierarchy
├── Tenant Root Group
│ ├── Platform
│ │ ├── Management (Log Analytics, Sentinel, Monitor)
│ │ ├── Identity (Entra ID, DNS)
│ │ └── Connectivity (Hub VNet, Firewall, ExpressRoute)
│ ├── Landing Zones
│ │ ├── SAP Production (sub-sap-prd)
│ │ │ ├── rg-sap-hana-prd
│ │ │ ├── rg-sap-app-prd
│ │ │ └── rg-sap-network-prd
│ │ ├── SAP Non-Production (sub-sap-nonprod)
│ │ │ ├── rg-sap-hana-qas
│ │ │ ├── rg-sap-hana-dev
│ │ │ └── rg-sap-app-nonprod
│ │ └── CSA-in-a-Box Data (sub-csa-data)
│ │ ├── rg-fabric-workspace
│ │ ├── rg-purview-governance
│ │ └── rg-ai-integration
│ └── Sandbox (sub-sap-sandbox)
Best practices
| Practice | Rationale |
| Separate SAP PRD from non-PRD subscriptions | Cost isolation, RBAC boundary, policy enforcement |
| SAP-specific Azure Policy initiative | Enforce VM sizes, encryption, NSG rules, tagging |
| Hub-spoke network with Azure Firewall | Centralized egress, traffic inspection, DNS resolution |
| Proximity placement groups for SAP | Sub-millisecond latency between app and DB tiers |
| No public IPs on SAP VMs | Azure Bastion for management access; Private Link for services |
| Resource naming convention | {resourcetype}-{workload}-{environment}-{region} |
| Tagging standard | CostCenter, Environment, Application, Owner, SAPSystemID |
2. High availability architecture
HANA HA with availability zones
Zone 1 Zone 2
┌─────────────────────┐ ┌─────────────────────┐
│ HANA Primary (M128s)│ │ HANA Secondary │
│ ASCS │ │ ERS │
│ App Server 1 │ │ App Server 2 │
│ ANF data/log (zone1)│ │ ANF data/log (zone2)│
└─────────────────────┘ └─────────────────────┘
│ │
└──────── HSR (syncmem) ───────────┘
│ │
└──── Azure Load Balancer ─────────┘
(Standard, zone-redundant)
HA best practices
| Component | Best practice | Rationale |
| HANA | HSR with synchronous in-memory (syncmem) | Zero data loss, read-enabled secondary |
| ASCS/ERS | ENSA2 with Pacemaker (SUSE) or WSFC (Windows) | Enqueue lock protection |
| Load balancer | Standard SKU, zone-redundant, HA ports | Health probe on custom port (e.g., 62503) |
| Pacemaker | STONITH via Azure Fence Agent | Prevent split-brain scenarios |
| Storage | ANF with zonal placement; replicate to both zones | Storage follows compute zone |
| App servers | Minimum 2, spread across availability zones | Active-active for dialog processing |
| Web Dispatcher | 2 instances behind Azure Load Balancer | Load distribution + failover |
| Availability set vs zones | Prefer availability zones | 99.99% SLA vs 99.95% for availability sets |
3. Disaster recovery
DR architecture
| Component | HA (same region, cross-zone) | DR (cross-region) |
| HANA database | HSR synchronous (RPO: 0) | HSR asynchronous (RPO: < 15 min) |
| Application servers | Availability zones | Azure Site Recovery (RPO: < 15 min) |
| Shared file systems | ANF zonal replication | ANF Cross-Region Replication |
| Configuration | Real-time (clustered) | Azure Backup (daily) |
| DNS | Automatic (LB health probe) | Azure Traffic Manager or manual DNS update |
DR best practices
| Practice | Details |
| Define RPO/RTO per system | PRD: RPO < 15 min, RTO < 4 hours; QAS: RPO < 24 hours, RTO < 8 hours |
| Test DR quarterly | Full DR drill including HANA takeover, app restart, DNS cutover |
| Automate DR failover | Azure Automation runbooks for HANA takeover + app server start |
| Pre-provision DR VMs | Keep VMs deallocated in DR region; start on failover (saves cost) |
| Document DR runbook | Step-by-step for HANA takeover, DNS update, Fabric Mirroring redirect |
| Validate HSR lag | Monitor M_SYSTEM_REPLICATION for replication delay |
4. Monitoring with Azure Monitor for SAP Solutions
What to monitor
| Component | Metric | Alert threshold | Action |
| HANA memory | Used memory % | > 80% | Investigate memory consumers; consider VM resize |
| HANA CPU | CPU utilization | > 90% sustained | Identify expensive queries; scale up |
| HANA disk IO | Data volume latency | > 3 ms | Check ANF performance tier; review IO patterns |
| HANA log | Log write latency | > 1 ms | Check ANF Ultra tier; review log volume sizing |
| HANA replication | HSR replication status | != ACTIVE | Investigate HSR break; reconnect |
| HANA backup | Last successful backup age | > 24 hours | Check Azure Backup job status |
| NetWeaver | Enqueue lock count | > 10,000 | Investigate lock contention |
| NetWeaver | Dialog response time | > 2 seconds | Check HANA query performance; app server load |
| NetWeaver | Batch job failures | > 0 | Review SM37 job log |
| OS | Disk space /usr/sap | < 20% free | Clean log files; expand disk |
| OS | Swap usage | > 0 | Investigate memory pressure |
# Create Azure Monitor for SAP Solutions resource
az workloads monitor create \
--resource-group rg-sap-monitoring \
--name monitor-sap-prd \
--location eastus2 \
--app-location eastus2 \
--managed-rg-name rg-sap-monitor-managed \
--routing-preference Default
# Add HANA provider
az workloads monitor provider-instance create \
--resource-group rg-sap-monitoring \
--monitor-name monitor-sap-prd \
--provider-instance-name hana-prd \
--provider-settings '{
"providerType": "SapHana",
"hanaHostname": "10.10.1.10",
"hanaPort": "30015",
"hanaDatabase": "SYSTEMDB",
"sqlPort": "30013",
"hanaDbUsername": "MONITOR_USER",
"hanaDbPasswordUri": "https://kv-sap-secrets.vault.azure.net/secrets/hana-monitor-password"
}'
5. Backup strategy
Backup matrix
| Component | Backup method | Frequency | Retention | Storage tier |
| HANA database (full) | Azure Backup (BACKINT) | Weekly (Sunday) | 12 weeks | Hot |
| HANA database (differential) | Azure Backup (BACKINT) | Daily | 2 weeks | Hot |
| HANA log backup | Azure Backup (BACKINT) | Every 15 minutes | 2 weeks | Hot |
| HANA (long-term) | Azure Backup | Monthly | 12 months | Archive |
| SAP application config | Azure Backup (VM) | Daily | 4 weeks | Hot |
| OS disks | Azure Backup (VM) | Weekly | 4 weeks | Hot |
| ANF snapshots | ANF snapshot policy | Every 4 hours | 48 hours | ANF |
| Transport directory | Azure Files backup | Daily | 4 weeks | Cool |
Backup best practices
| Practice | Rationale |
| Use Azure Backup for SAP HANA (BACKINT-certified) | SAP-certified, no third-party needed |
| Test restore monthly | Validate backup integrity; measure restore time |
| Encrypt backups | Azure Backup encryption with CMK (Key Vault) |
| Monitor backup jobs | Alert on failed backups within 4 hours |
| Use ANF snapshots for rapid recovery | Sub-second snapshot; restore in minutes |
| Separate backup storage from production | Prevent ransomware from encrypting backups |
6. Cost optimization
Reserved Instances
| Resource | RI term | Savings vs PAYG | Notes |
| HANA VMs (PRD) | 3-year | 55--63% | SAP HANA runs 24/7; always use RI |
| HANA VMs (QAS) | 1-year | 35--45% | QAS runs most business hours |
| App server VMs (PRD) | 3-year | 50--60% | Production app servers run 24/7 |
| App server VMs (non-PRD) | None (snooze) | 30--40% | Stop DEV/SBX outside business hours |
| ANF capacity | Reserved | 10--15% | Predictable capacity requirement |
Cost optimization best practices
| Practice | Savings | Implementation |
| Snooze non-production VMs | 30--40% | Azure Automation schedule: stop at 7 PM, start at 7 AM |
| Right-size after 90 days | 10--20% | Review Azure Advisor sizing recommendations |
| Use dev/test pricing | 40--55% | Dev/test subscription for SAP sandbox/development |
| Consolidate non-production | 15--25% | Run DEV + SBX on same VM (smaller sizes) |
| Azure Hybrid Benefit | 40--50% (Windows) | Apply existing Windows Server licenses |
| Spot VMs for batch/test | 60--80% | SAP performance testing on Spot VMs |
| Archive old SAP data | 20--30% on storage | Move historical data to cool/archive; reduce HANA memory |
| Fabric capacity reservation | 10--25% | Reserved Fabric capacity for SAP analytics |
# Snooze non-production SAP VMs (Azure Automation)
# Schedule: Stop at 19:00, Start at 07:00 (weekdays)
az vm deallocate --resource-group rg-sap-nonprod --name vm-hana-dev --no-wait
az vm deallocate --resource-group rg-sap-nonprod --name vm-hana-sbx --no-wait
az vm deallocate --resource-group rg-sap-nonprod --name vm-app-dev --no-wait
7. CSA-in-a-Box integration best practices
Data integration patterns
| Pattern | Use case | Implementation |
| Fabric Mirroring (real-time) | Operational analytics, near-real-time dashboards | Configure mirrored database for high-change SAP tables |
| ADF batch extraction | Historical data migration, data warehouse loads | SAP Table/BW/ODP connectors with partitioned extraction |
| dbt medallion architecture | SAP data transformation (bronze/silver/gold) | dbt models for SAP-specific business logic |
| Event-driven integration | SAP business events → Azure actions | Logic Apps + Service Bus for SAP event processing |
Analytics best practices
| Practice | Details |
| Start with Fabric Mirroring | Configure mirroring for key SAP tables before building analytics |
| Use Direct Lake mode | Power BI Direct Lake on OneLake for near-import performance without data duplication |
| Build domain-specific semantic models | Separate Power BI semantic models for Finance, Supply Chain, HR, Procurement |
| Implement RLS on SAP data | Row-level security in Power BI aligned with SAP authorization objects |
| Enable Copilot for Power BI | Natural-language queries on SAP data for business users |
Governance best practices
| Practice | Details |
| Scan SAP HANA with Purview | Discover and classify SAP data fields (PII, financial, HR-sensitive) |
| Apply sensitivity labels | Microsoft Information Protection labels on SAP data in OneLake |
| Define data ownership | Assign SAP domain data stewards in Purview |
| Implement data quality rules | dbt tests for SAP data quality (uniqueness, referential integrity, not null) |
| Enable lineage tracking | Purview automatic lineage from SAP HANA → Fabric → Power BI |
AI best practices
| Practice | Details |
| Start with descriptive analytics | Build Power BI dashboards before investing in ML |
| Use Azure OpenAI for quick wins | Prompt-based analytics on SAP data (summarize, detect, explain) |
| Build ML models in Databricks | Demand forecasting, anomaly detection, churn prediction on SAP data |
| Register models in Azure ML | Model versioning, deployment, monitoring |
| Implement feedback loops | Business user validation of AI-generated insights |
8. Security best practices
| Practice | Implementation |
| No public IPs on SAP VMs | Azure Bastion for SSH/RDP; Private Link for all services |
| Entra ID SSO for SAP | SAML 2.0 SSO for Fiori, Web GUI, HANA Studio |
| MFA for SAP access | Entra ID Conditional Access with MFA requirement |
| Network segmentation | Separate subnets for DB, app, web, management tiers |
| Encrypt everything | HANA TDE (Key Vault), disk encryption, TLS in transit |
| Defender for Cloud | Enable Defender for SAP workloads; stream to Sentinel |
| Audit logging | SAP Security Audit Log + Azure Monitor + Sentinel |
| Regular patching | Azure Update Manager for OS; SAP kernel patches per SAP schedule |
9. Operations runbook checklist
Daily operations
Weekly operations
Monthly operations
Quarterly operations
Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team Related: Infrastructure Migration | Benchmarks | Federal Migration Guide | SAP to Azure Migration Center