🔧 Platform Administrator Learning Path¶
Master the administration, security, and operations of Azure analytics platforms. Build expertise in governance, monitoring, cost management, and ensuring enterprise-grade reliability and compliance.
🎯 Learning Objectives¶
After completing this learning path, you will be able to:
- Configure and manage Azure Synapse Analytics workspaces at enterprise scale
- Implement comprehensive security including network isolation, identity management, and data protection
- Establish governance frameworks for data access, quality, and compliance
- Monitor and optimize platform performance and costs
- Automate operational tasks using PowerShell, CLI, and Azure DevOps
- Ensure business continuity with backup, disaster recovery, and high availability
- Support data teams with troubleshooting and performance tuning
📋 Prerequisites Checklist¶
Before starting this learning path, ensure you have:
Required Knowledge¶
- Azure fundamentals - Strong understanding of Azure resource management
- Networking basics - VNets, subnets, NSGs, private endpoints
- Security concepts - Identity management, RBAC, encryption
- PowerShell or CLI - Basic scripting and automation skills
- Windows/Linux administration - System administration experience
Required Access¶
- Azure subscription with Owner or User Access Administrator role
- Azure AD privileges to create service principals and manage identities
- Sufficient budget for production-like environment (~$300-500)
Recommended Experience¶
- IT infrastructure management - 1-2 years experience
- Cloud administration - Azure or other cloud platforms
- SQL Server administration - Helpful for dedicated SQL pools
- DevOps practices - CI/CD, Infrastructure as Code
🗺️ Learning Path Structure¶
This path consists of 4 progressive phases focused on security, governance, operations, and optimization:
graph LR
A[Phase 1:<br/>Security &<br/>Governance] --> B[Phase 2:<br/>Operations &<br/>Monitoring]
B --> C[Phase 3:<br/>Cost Management] --> D[Phase 4:<br/>Advanced Admin]
style A fill:#FF6B6B
style B fill:#4ECDC4
style C fill:#FFD93D
style D fill:#95E1D3 Time Investment¶
- Full-Time (40 hrs/week): 8-10 weeks
- Part-Time (15 hrs/week): 14-18 weeks
- Casual (8 hrs/week): 20-24 weeks
📚 Phase 1: Security & Governance (3-4 weeks)¶
Goal: Implement enterprise-grade security and governance for analytics platforms
Module 1.1: Identity and Access Management (16 hours)¶
Learning Objectives:
- Configure Azure AD integration and authentication
- Implement role-based access control (RBAC)
- Manage service principals and managed identities
- Establish least-privilege access principles
Hands-on Exercises:
- Lab 1.1.1: Configure Azure AD authentication for Synapse workspace
- Lab 1.1.2: Create custom RBAC roles for data access
- Lab 1.1.3: Implement managed identities for automated processes
- Lab 1.1.4: Set up conditional access policies
Security Best Practices:
- Use managed identities instead of service principals when possible
- Implement just-in-time (JIT) privileged access
- Enable MFA for all administrative accounts
- Regular access reviews and permission audits
Resources:
Assessment Questions:
- What is the difference between RBAC and resource-level permissions?
- When should you use managed identities vs service principals?
- How do you implement the principle of least privilege?
- What are the implications of owner-level access?
Module 1.2: Network Security and Isolation (16 hours)¶
Learning Objectives:
- Design network architecture with VNet integration
- Configure private endpoints and managed VNets
- Implement firewall rules and IP whitelisting
- Set up Azure Private Link for secure connectivity
Hands-on Exercises:
- Lab 1.2.1: Create managed VNet for Synapse workspace
- Lab 1.2.2: Configure private endpoints for all services
- Lab 1.2.3: Set up Azure Firewall for outbound traffic control
- Lab 1.2.4: Implement VNet peering for cross-region access
Network Architecture Example:
┌─────────────────────────────────────────────────┐
│ Azure Virtual Network (10.0.0.0/16) │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Synapse Subnet (10.0.1.0/24) │ │
│ │ - Managed VNet │ │
│ │ - Private Endpoints │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Data Services Subnet (10.0.2.0/24) │ │
│ │ - Storage Private Endpoints │ │
│ │ - Key Vault Private Endpoints │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
Resources:
Assessment Questions:
- What are the benefits of managed VNet for Synapse?
- How do private endpoints improve security?
- What is the difference between service endpoints and private endpoints?
- How do you troubleshoot private endpoint connectivity issues?
Module 1.3: Data Protection and Encryption (12 hours)¶
Learning Objectives:
- Implement encryption at rest and in transit
- Configure Azure Key Vault for secrets management
- Enable Transparent Data Encryption (TDE)
- Implement dynamic data masking and column-level security
Hands-on Exercises:
- Lab 1.3.1: Configure customer-managed keys with Key Vault
- Lab 1.3.2: Enable TDE for dedicated SQL pools
- Lab 1.3.3: Implement dynamic data masking for sensitive columns
- Lab 1.3.4: Configure column-level and row-level security
Encryption Strategy:
| Data State | Encryption Method | Key Management |
|---|---|---|
| At Rest | AES-256 encryption | Azure-managed or customer-managed keys |
| In Transit | TLS 1.2+ | Azure-managed certificates |
| In Use | Always Encrypted (SQL) | Client-side encryption |
Assessment Questions:
- What are the differences between Azure-managed and customer-managed keys?
- How does TDE work in Azure SQL?
- When should you use dynamic data masking vs encryption?
- What are the performance implications of encryption?
Module 1.4: Compliance and Governance (12 hours)¶
Learning Objectives:
- Implement data classification and sensitivity labels
- Configure Azure Purview for data governance
- Establish data retention and lifecycle policies
- Ensure compliance with regulations (GDPR, HIPAA, SOC 2)
Hands-on Exercises:
- Lab 1.4.1: Configure Azure Purview and scan data sources
- Lab 1.4.2: Implement Microsoft Information Protection labels
- Lab 1.4.3: Set up data retention policies
- Lab 1.4.4: Create compliance reports and audits
Governance Framework:
graph TD
A[Data Discovery] --> B[Classification]
B --> C[Policy Enforcement]
C --> D[Monitoring & Audit]
D --> E[Compliance Reporting]
E --> A
style A fill:#90EE90
style C fill:#FFD93D
style E fill:#87CEEB Resources:
Assessment Questions:
- How does Azure Purview help with data governance?
- What are the key requirements for GDPR compliance?
- How do you implement data retention policies across services?
- What audit logs should be enabled for compliance?
📚 Phase 2: Operations & Monitoring (2-3 weeks)¶
Goal: Establish operational excellence with comprehensive monitoring and automated management
Module 2.1: Monitoring and Alerting (16 hours)¶
Learning Objectives:
- Configure Azure Monitor for Synapse workloads
- Create custom metrics and log queries
- Implement actionable alerting strategies
- Build operational dashboards
Hands-on Exercises:
- Lab 2.1.1: Configure diagnostic settings for all services
- Lab 2.1.2: Create Log Analytics workspace and queries
- Lab 2.1.3: Set up action groups and alert rules
- Lab 2.1.4: Build Azure Monitor dashboard for operations
Critical Metrics to Monitor:
| Category | Key Metrics | Alert Threshold |
|---|---|---|
| Compute | DWU usage, Spark job failures | >80% utilization, any failure |
| Storage | IOPS, throughput, capacity | >75% capacity, throttling |
| Pipelines | Run duration, failure rate | >5% failure rate |
| SQL Pools | Active queries, blocked queries | >50 concurrent, any blocking >5min |
Resources:
Assessment Questions:
- What diagnostic logs should be enabled for Synapse workspaces?
- How do you reduce alert fatigue while maintaining visibility?
- What is the difference between metric alerts and log alerts?
- How do you correlate metrics across multiple services?
Module 2.2: Performance Management (16 hours)¶
Learning Objectives:
- Monitor and optimize query performance
- Tune Spark pool configurations
- Manage dedicated SQL pool scaling
- Implement performance baselines and SLAs
Hands-on Exercises:
- Lab 2.2.1: Analyze slow-running queries using Query Store
- Lab 2.2.2: Optimize Spark pool configurations
- Lab 2.2.3: Implement auto-pause and auto-scale policies
- Lab 2.2.4: Create performance baseline reports
Performance Tuning Checklist:
- Statistics: Ensure up-to-date statistics on all tables
- Indexing: Implement appropriate indexes (clustered columnstore)
- Partitioning: Partition large tables appropriately
- Resource Classes: Assign appropriate resource classes to workloads
- Caching: Enable result set caching where beneficial
- Materialized Views: Use for frequently accessed aggregations
Resources:
Assessment Questions:
- How do you identify the root cause of slow queries?
- What factors affect Spark job performance?
- When should you scale up vs scale out compute resources?
- How do you establish performance SLAs?
Module 2.3: Backup and Disaster Recovery (12 hours)¶
Learning Objectives:
- Implement backup strategies for all data assets
- Configure geo-redundancy and replication
- Test disaster recovery procedures
- Implement business continuity plans
Hands-on Exercises:
- Lab 2.3.1: Configure automated backups for SQL pools
- Lab 2.3.2: Implement geo-redundant storage (GRS)
- Lab 2.3.3: Perform disaster recovery drill
- Lab 2.3.4: Document recovery time objectives (RTO) and recovery point objectives (RPO)
Backup Strategy Matrix:
| Asset Type | Backup Frequency | Retention | Recovery Method |
|---|---|---|---|
| Dedicated SQL Pool | Automated snapshots | 7 days | Restore from snapshot |
| Data Lake Files | Continuous (GRS) | Indefinite | Geo-failover |
| Spark Metadata | Daily | 30 days | Export/import |
| Workspace Config | Version control | Indefinite | IaC redeploy |
Assessment Questions:
- What are the differences between LRS, GRS, and RA-GRS?
- How do you calculate appropriate RPO and RTO?
- What should be included in disaster recovery testing?
- How do you handle data sovereignty requirements?
Module 2.4: Automation and Infrastructure as Code (12 hours)¶
Learning Objectives:
- Automate workspace deployment with ARM/Bicep templates
- Use PowerShell and Azure CLI for operational tasks
- Implement automated maintenance routines
- Version control infrastructure configurations
Hands-on Exercises:
- Lab 2.4.1: Create Bicep templates for workspace deployment
- Lab 2.4.2: Build PowerShell scripts for routine maintenance
- Lab 2.4.3: Automate statistics updates and index maintenance
- Lab 2.4.4: Implement CI/CD for infrastructure changes
Sample Automation Script:
# Automated SQL Pool Maintenance Script
param(
[string]$WorkspaceName,
[string]$SQLPoolName,
[string]$ResourceGroupName
)
# Update statistics
Write-Host "Updating statistics on $SQLPoolName..."
$query = @"
EXEC sp_updatestats
"@
Invoke-AzSynapseSqlCmd -WorkspaceName $WorkspaceName `
-SqlPoolName $SQLPoolName `
-Query $query
# Rebuild indexes
Write-Host "Rebuilding indexes..."
$query = @"
ALTER INDEX ALL ON [Schema].[LargeTable] REBUILD
"@
Invoke-AzSynapseSqlCmd -WorkspaceName $WorkspaceName `
-SqlPoolName $SQLPoolName `
-Query $query
Write-Host "Maintenance complete."
Resources:
Assessment Questions:
- What are the benefits of Infrastructure as Code?
- When should you use ARM templates vs Bicep vs Terraform?
- How do you manage secrets in automated deployments?
- What tasks should be automated vs manual?
📚 Phase 3: Cost Management (1-2 weeks)¶
Goal: Optimize costs while maintaining performance and reliability
Module 3.1: Cost Analysis and Optimization (12 hours)¶
Learning Objectives:
- Analyze cost patterns using Azure Cost Management
- Identify cost optimization opportunities
- Implement cost allocation and chargeback
- Right-size resources for workload requirements
Hands-on Exercises:
- Lab 3.1.1: Analyze costs with Azure Cost Management
- Lab 3.1.2: Create cost allocation reports by department
- Lab 3.1.3: Implement resource tagging strategy
- Lab 3.1.4: Right-size Spark and SQL pools
Cost Optimization Strategies:
| Strategy | Potential Savings | Implementation Effort |
|---|---|---|
| Auto-pause SQL pools | 40-60% | Low |
| Right-size Spark pools | 20-40% | Medium |
| Reserved capacity | 30-50% | Low |
| Storage lifecycle policies | 10-30% | Low |
| Query optimization | 15-40% | High |
Resources:
Assessment Questions:
- What are the primary cost drivers for Synapse workloads?
- How do you balance cost optimization with performance?
- When should you use reserved capacity pricing?
- How do you implement cost accountability across teams?
Module 3.2: Resource Governance and Budgets (8 hours)¶
Learning Objectives:
- Set up budget alerts and limits
- Implement Azure Policy for resource governance
- Configure resource locks for critical resources
- Establish approval workflows for expensive operations
Hands-on Exercises:
- Lab 3.2.1: Create department budgets with alerts
- Lab 3.2.2: Implement Azure Policies for compliance
- Lab 3.2.3: Configure resource locks on production resources
- Lab 3.2.4: Set up cost anomaly detection
Assessment Questions:
- How do Azure Policies differ from RBAC?
- What actions should trigger budget alerts?
- When should you use read-only vs delete locks?
- How do you prevent accidental deletion of resources?
📚 Phase 4: Advanced Administration (2 weeks)¶
Goal: Master advanced administrative scenarios and become subject matter expert
Module 4.1: Advanced Troubleshooting (12 hours)¶
Learning Objectives:
- Diagnose complex performance issues
- Troubleshoot connectivity and authentication problems
- Resolve resource contention and blocking
- Use advanced diagnostic tools
Hands-on Exercises:
- Lab 4.1.1: Troubleshoot Spark job OOM errors
- Lab 4.1.2: Resolve SQL pool blocking scenarios
- Lab 4.1.3: Debug private endpoint connectivity issues
- Lab 4.1.4: Analyze query execution plans for optimization
Troubleshooting Methodology:
- Identify symptoms - What is the observed problem?
- Gather data - Logs, metrics, traces, error messages
- Isolate root cause - Use systematic elimination
- Implement fix - Test in non-production first
- Verify resolution - Confirm problem is resolved
- Document - Create runbook for future reference
Resources:
Assessment Questions:
- What are the most common causes of Spark job failures?
- How do you troubleshoot private endpoint connectivity?
- What tools are available for SQL performance diagnostics?
- How do you prioritize issues during incidents?
Module 4.2: Multi-Region and High Availability (12 hours)¶
Learning Objectives:
- Design multi-region architectures
- Implement failover strategies
- Configure availability zones
- Test failover procedures
Hands-on Exercises:
- Lab 4.2.1: Deploy multi-region Synapse architecture
- Lab 4.2.2: Configure Traffic Manager for failover
- Lab 4.2.3: Implement data replication across regions
- Lab 4.2.4: Conduct failover testing
High Availability Architecture:
Primary Region (East US) Secondary Region (West US)
┌─────────────────────────┐ ┌─────────────────────────┐
│ Synapse Workspace │ │ Synapse Workspace │
│ (Active) │◄────►│ (Standby) │
│ │ │ │
│ Data Lake (GRS) │──────┤ Data Lake (GRS) │
│ Auto-replicated │ │ Read access │
└─────────────────────────┘ └─────────────────────────┘
Assessment Questions:
- What are the trade-offs between multi-region architectures?
- How do you handle data consistency across regions?
- What is the difference between active-passive and active-active?
- How do you minimize failover time (RTO)?
Module 4.3: Capacity Planning and Scaling (8 hours)¶
Learning Objectives:
- Forecast resource requirements
- Plan for growth and scalability
- Conduct load testing
- Establish scaling policies
Hands-on Exercises:
- Lab 4.3.1: Conduct capacity planning analysis
- Lab 4.3.2: Perform load testing on SQL and Spark pools
- Lab 4.3.3: Configure auto-scaling rules
- Lab 4.3.4: Create capacity forecast reports
Assessment Questions:
- How do you forecast future capacity requirements?
- What factors influence scaling decisions?
- How do you conduct effective load testing?
- What metrics indicate need for capacity increase?
Module 4.4: Platform Security Hardening (8 hours)¶
Learning Objectives:
- Implement defense-in-depth strategies
- Conduct security assessments
- Respond to security incidents
- Maintain security posture over time
Hands-on Exercises:
- Lab 4.4.1: Run Azure Security Center assessments
- Lab 4.4.2: Implement security recommendations
- Lab 4.4.3: Conduct security incident simulation
- Lab 4.4.4: Create security runbooks
Resources:
Assessment Questions:
- What is defense-in-depth and how does it apply to Synapse?
- How do you respond to a suspected data breach?
- What security logs should be retained for compliance?
- How do you maintain security posture in evolving environments?
🎯 Capstone Project¶
Duration: 2 weeks
Design and implement a complete enterprise-grade analytics platform:
Project Requirements:¶
- Security: Implement comprehensive security controls
- Networking: Deploy with private endpoints and VNet integration
- Monitoring: Configure full observability stack
- Automation: Implement IaC and operational automation
- Cost Management: Establish cost controls and optimization
- DR/BC: Implement backup and disaster recovery
- Documentation: Provide complete operational runbooks
Evaluation Criteria:¶
| Category | Weight | Criteria |
|---|---|---|
| Security | 25% | Comprehensive controls, compliance |
| Operations | 20% | Monitoring, automation, reliability |
| Cost Management | 15% | Optimization, accountability |
| Documentation | 20% | Runbooks, diagrams, procedures |
| Best Practices | 20% | Architecture, design decisions |
🎓 Certification Preparation¶
AZ-104: Azure Administrator Associate¶
Foundational certification for Azure administration.
Relevant Skills:
- Manage Azure identities and governance
- Implement and manage storage
- Deploy and manage Azure compute resources
- Configure and manage virtual networking
- Monitor and maintain Azure resources
DP-203: Azure Data Engineer Associate¶
Covers data platform administration from engineering perspective.
Relevant Skills:
- Design and implement data storage
- Design and develop data processing
- Design and implement data security
- Monitor and optimize data solutions
💡 Learning Tips¶
Best Practices¶
- Hands-on Focus: Build and break things in sandbox environments
- Document Everything: Create runbooks as you learn
- Automate Early: Practice automation from day one
- Think Security First: Always consider security implications
- Monitor Proactively: Set up monitoring before issues occur
📞 Support and Resources¶
Getting Help¶
- Azure Support: Open support tickets for platform issues
- Community Forums: Engage with Azure administrator community
- Microsoft Docs: Comprehensive Azure documentation
- Technical Support: Lab assistance and troubleshooting help
Ready to become a Platform Administrator?
🚀 Start Phase 1 - Module 1.1 → 📋 Download Admin Checklist (PDF) 🎯 Join Admin Study Group →
Learning Path Version: 1.0 Last Updated: January 2025 Estimated Completion: 8-10 weeks full-time