Tutorial 14: CI/CD Setup¶
Overview¶
This tutorial covers implementing CI/CD (Continuous Integration/Continuous Deployment) for Azure Synapse Analytics, including Git integration, Azure DevOps pipelines, and deployment automation for notebooks, pipelines, and SQL scripts.
Prerequisites¶
- Completed Tutorial 13: Monitoring and Diagnostics
- Azure DevOps or GitHub account
- Git fundamentals
- Understanding of deployment concepts
Learning Objectives¶
By the end of this tutorial, you will be able to:
- Configure Git integration for Synapse
- Set up Azure DevOps pipelines
- Implement deployment strategies
- Automate testing and validation
- Manage environment promotions
Section 1: Git Integration¶
Connecting Synapse to Git¶
```text┌─────────────────────────────────────────────────────────────────┐ │ Synapse Git Integration │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Development Workspace │ │ └── Connected to: feature/xyz branch │ │ ├── notebooks/ │ │ ├── pipelines/ │ │ ├── dataflows/ │ │ ├── linkedServices/ │ │ └── sqlscripts/ │ │ │ │ ┌────────────────┐ │ │ │ Collaborate │ ─── feature/xyz ───▶ Pull Request │ │ └────────────────┘ ▼ │ │ main │ │ ▼ │ │ ┌─────────────┐ │ │ │ Publish to │ │ │ │ Production │ │ │ └─────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘
### Repository Structure
```textsynapse-workspace/
├── .gitignore
├── workspace.json
├── publish_config.json
├── notebook/
│ ├── DataProcessing.json
│ ├── ETL_Pipeline.json
│ └── Reporting.json
├── pipeline/
│ ├── DailyIngestion.json
│ ├── WeeklyAggregation.json
│ └── DataQuality.json
├── dataflow/
│ ├── TransformSales.json
│ └── CleanseCustomer.json
├── linkedService/
│ ├── AzureDataLakeStorage.json
│ ├── AzureSQLDatabase.json
│ └── PowerBI.json
├── integrationRuntime/
│ └── SelfHostedIR.json
├── sqlscript/
│ ├── CreateTables.json
│ ├── StoredProcedures.json
│ └── Views.json
├── credential/
│ └── ManagedIdentity.json
└── trigger/
├── DailyTrigger.json
└── EventTrigger.json
Branch Strategy¶
# Branch naming convention
main: # Production-ready code
└── release/* # Release branches for staging
└── develop # Integration branch
└── feature/* # Feature development
└── bugfix/* # Bug fixes
└── hotfix/* # Production hotfixes
Section 2: Azure DevOps Pipeline Setup¶
Service Connections¶
# Create service principal for deployment
az ad sp create-for-rbac \
--name "synapse-cicd-sp" \
--role "Synapse Administrator" \
--scopes "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Synapse/workspaces/<workspace>"
# Grant additional permissions
az role assignment create \
--assignee <sp-client-id> \
--role "Storage Blob Data Contributor" \
--scope "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<account>"
CI Pipeline (azure-pipelines-ci.yml)¶
# CI Pipeline - Validate and Build
trigger:
branches:
include:
- develop
- feature/*
paths:
include:
- synapse-workspace/*
pool:
vmImage: 'ubuntu-latest'
variables:
workspaceName: 'synapse-dev'
resourceGroup: 'rg-synapse-dev'
subscriptionId: '$(AZURE_SUBSCRIPTION_ID)'
stages:
- stage: Validate
displayName: 'Validate Synapse Artifacts'
jobs:
- job: ValidateArtifacts
displayName: 'Validate Workspace Artifacts'
steps:
- checkout: self
- task: AzureCLI@2
displayName: 'Install Synapse CLI Extension'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
az extension add --name synapse --yes
- task: AzureCLI@2
displayName: 'Validate Workspace'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
cd synapse-workspace
# Validate JSON syntax
for file in $(find . -name "*.json"); do
echo "Validating: $file"
python -m json.tool "$file" > /dev/null
done
# Validate pipeline definitions
for pipeline in pipeline/*.json; do
echo "Checking pipeline: $pipeline"
jq '.properties.activities | length' "$pipeline"
done
- stage: Test
displayName: 'Run Tests'
dependsOn: Validate
jobs:
- job: UnitTests
displayName: 'Run Unit Tests'
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '3.9'
- script: |
pip install pytest pytest-cov pyspark
pytest tests/ -v --junitxml=test-results.xml
displayName: 'Run Python Tests'
- task: PublishTestResults@2
inputs:
testResultsFiles: '**/test-results.xml'
testRunTitle: 'Synapse Unit Tests'
- stage: Build
displayName: 'Build Artifacts'
dependsOn: Test
jobs:
- job: BuildArtifacts
displayName: 'Package Artifacts'
steps:
- task: CopyFiles@2
displayName: 'Copy Synapse Artifacts'
inputs:
SourceFolder: 'synapse-workspace'
Contents: '**'
TargetFolder: '$(Build.ArtifactStagingDirectory)/synapse'
- task: CopyFiles@2
displayName: 'Copy Deployment Scripts'
inputs:
SourceFolder: 'deployment'
Contents: '**'
TargetFolder: '$(Build.ArtifactStagingDirectory)/deployment'
- task: PublishBuildArtifacts@1
displayName: 'Publish Artifacts'
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)'
ArtifactName: 'synapse-artifacts'
CD Pipeline (azure-pipelines-cd.yml)¶
# CD Pipeline - Deploy to Environments
trigger: none
resources:
pipelines:
- pipeline: ci-pipeline
source: 'Synapse-CI'
trigger:
branches:
include:
- develop
- main
pool:
vmImage: 'ubuntu-latest'
variables:
- group: synapse-variables
stages:
- stage: DeployDev
displayName: 'Deploy to Development'
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/develop'))
variables:
environment: 'dev'
workspaceName: 'synapse-dev'
resourceGroup: 'rg-synapse-dev'
jobs:
- deployment: DeployToDevWorkspace
displayName: 'Deploy to Dev Workspace'
environment: 'synapse-dev'
strategy:
runOnce:
deploy:
steps:
- template: templates/deploy-synapse.yml
parameters:
workspaceName: $(workspaceName)
resourceGroup: $(resourceGroup)
environment: $(environment)
- stage: DeployStaging
displayName: 'Deploy to Staging'
dependsOn: DeployDev
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/develop'))
variables:
environment: 'staging'
workspaceName: 'synapse-staging'
resourceGroup: 'rg-synapse-staging'
jobs:
- deployment: DeployToStagingWorkspace
displayName: 'Deploy to Staging Workspace'
environment: 'synapse-staging'
strategy:
runOnce:
deploy:
steps:
- template: templates/deploy-synapse.yml
parameters:
workspaceName: $(workspaceName)
resourceGroup: $(resourceGroup)
environment: $(environment)
- stage: DeployProd
displayName: 'Deploy to Production'
dependsOn: DeployStaging
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
variables:
environment: 'prod'
workspaceName: 'synapse-prod'
resourceGroup: 'rg-synapse-prod'
jobs:
- deployment: DeployToProdWorkspace
displayName: 'Deploy to Production Workspace'
environment: 'synapse-prod'
strategy:
runOnce:
deploy:
steps:
- template: templates/deploy-synapse.yml
parameters:
workspaceName: $(workspaceName)
resourceGroup: $(resourceGroup)
environment: $(environment)
Deployment Template (templates/deploy-synapse.yml)¶
parameters:
- name: workspaceName
type: string
- name: resourceGroup
type: string
- name: environment
type: string
steps:
- download: ci-pipeline
artifact: synapse-artifacts
- task: AzureCLI@2
displayName: 'Install Dependencies'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
az extension add --name synapse --yes
pip install azure-synapse-artifacts
- task: AzureCLI@2
displayName: 'Stop Triggers'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
# Stop all triggers before deployment
triggers=$(az synapse trigger list \
--workspace-name ${{ parameters.workspaceName }} \
--query "[?properties.runtimeState=='Started'].name" -o tsv)
for trigger in $triggers; do
echo "Stopping trigger: $trigger"
az synapse trigger stop \
--workspace-name ${{ parameters.workspaceName }} \
--name "$trigger"
done
- task: AzureCLI@2
displayName: 'Deploy Linked Services'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
cd $(Pipeline.Workspace)/ci-pipeline/synapse-artifacts/synapse/linkedService
for file in *.json; do
name="${file%.json}"
echo "Deploying linked service: $name"
# Apply environment-specific overrides
cat "$file" | envsubst > "${file}.temp"
az synapse linked-service create \
--workspace-name ${{ parameters.workspaceName }} \
--name "$name" \
--file "@${file}.temp"
done
- task: AzureCLI@2
displayName: 'Deploy Datasets'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
cd $(Pipeline.Workspace)/ci-pipeline/synapse-artifacts/synapse/dataset
for file in *.json; do
name="${file%.json}"
echo "Deploying dataset: $name"
az synapse dataset create \
--workspace-name ${{ parameters.workspaceName }} \
--name "$name" \
--file "@$file"
done
- task: AzureCLI@2
displayName: 'Deploy Notebooks'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
cd $(Pipeline.Workspace)/ci-pipeline/synapse-artifacts/synapse/notebook
for file in *.json; do
name="${file%.json}"
echo "Deploying notebook: $name"
az synapse notebook import \
--workspace-name ${{ parameters.workspaceName }} \
--name "$name" \
--file "@$file"
done
- task: AzureCLI@2
displayName: 'Deploy Pipelines'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
cd $(Pipeline.Workspace)/ci-pipeline/synapse-artifacts/synapse/pipeline
for file in *.json; do
name="${file%.json}"
echo "Deploying pipeline: $name"
az synapse pipeline create \
--workspace-name ${{ parameters.workspaceName }} \
--name "$name" \
--file "@$file"
done
- task: AzureCLI@2
displayName: 'Deploy SQL Scripts'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
cd $(Pipeline.Workspace)/ci-pipeline/synapse-artifacts/synapse/sqlscript
for file in *.json; do
name="${file%.json}"
echo "Deploying SQL script: $name"
az synapse sql-script import \
--workspace-name ${{ parameters.workspaceName }} \
--name "$name" \
--file "@$file"
done
- task: AzureCLI@2
displayName: 'Deploy Triggers'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
cd $(Pipeline.Workspace)/ci-pipeline/synapse-artifacts/synapse/trigger
for file in *.json; do
name="${file%.json}"
echo "Deploying trigger: $name"
az synapse trigger create \
--workspace-name ${{ parameters.workspaceName }} \
--name "$name" \
--file "@$file"
done
- task: AzureCLI@2
displayName: 'Start Triggers'
inputs:
azureSubscription: 'synapse-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
# Start triggers after deployment
triggers=$(az synapse trigger list \
--workspace-name ${{ parameters.workspaceName }} \
--query "[?properties.runtimeState=='Stopped'].name" -o tsv)
for trigger in $triggers; do
echo "Starting trigger: $trigger"
az synapse trigger start \
--workspace-name ${{ parameters.workspaceName }} \
--name "$trigger"
done
Section 3: SQL Pool Deployment¶
Database Project Structure¶
```textsql-database/ ├── dbo/ │ ├── Tables/ │ │ ├── fact.Sales.sql │ │ ├── dim.Product.sql │ │ └── dim.Customer.sql │ ├── Views/ │ │ ├── reporting.vw_SalesSummary.sql │ │ └── reporting.vw_CustomerAnalytics.sql │ ├── StoredProcedures/ │ │ ├── etl.usp_LoadDailySales.sql │ │ └── etl.usp_RefreshMaterializedViews.sql │ └── Functions/ │ └── security.fn_RegionFilter.sql ├── Security/ │ ├── Roles/ │ │ ├── DataAnalyst.sql │ │ └── DataEngineer.sql │ └── Users/ │ └── ServiceAccounts.sql ├── Migrations/ │ ├── V001__InitialSchema.sql │ ├── V002__AddIndexes.sql │ └── V003__AddPartitioning.sql └── deploy.ps1
### Migration Script Example
```sql
-- Migrations/V001__InitialSchema.sql
-- Migration: Initial Schema
-- Version: 001
-- Author: DataTeam
-- Date: 2024-01-15
-- Check if migration already applied
IF NOT EXISTS (
SELECT 1 FROM sys.tables WHERE name = '__SchemaVersion'
)
BEGIN
CREATE TABLE dbo.__SchemaVersion (
VersionId INT NOT NULL,
ScriptName VARCHAR(200) NOT NULL,
AppliedOn DATETIME NOT NULL DEFAULT GETDATE(),
AppliedBy VARCHAR(100) NOT NULL DEFAULT SYSTEM_USER
)
WITH (DISTRIBUTION = REPLICATE);
END
GO
IF NOT EXISTS (
SELECT 1 FROM dbo.__SchemaVersion WHERE VersionId = 1
)
BEGIN
-- Create schemas
IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = 'fact')
EXEC('CREATE SCHEMA fact');
IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = 'dim')
EXEC('CREATE SCHEMA dim');
IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = 'staging')
EXEC('CREATE SCHEMA staging');
IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = 'etl')
EXEC('CREATE SCHEMA etl');
IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = 'reporting')
EXEC('CREATE SCHEMA reporting');
-- Create dimension tables
CREATE TABLE dim.Product (
ProductKey INT NOT NULL,
ProductID VARCHAR(20) NOT NULL,
ProductName VARCHAR(100),
Category VARCHAR(50),
SubCategory VARCHAR(50)
)
WITH (
DISTRIBUTION = REPLICATE,
CLUSTERED COLUMNSTORE INDEX
);
CREATE TABLE dim.Customer (
CustomerKey INT NOT NULL,
CustomerID VARCHAR(20) NOT NULL,
CustomerName VARCHAR(100),
Segment VARCHAR(50),
Region VARCHAR(50)
)
WITH (
DISTRIBUTION = REPLICATE,
CLUSTERED COLUMNSTORE INDEX
);
-- Create fact table
CREATE TABLE fact.Sales (
SaleID BIGINT NOT NULL,
DateKey INT NOT NULL,
ProductKey INT NOT NULL,
CustomerKey INT NOT NULL,
Quantity INT,
Amount DECIMAL(12,2)
)
WITH (
DISTRIBUTION = HASH(CustomerKey),
CLUSTERED COLUMNSTORE INDEX,
PARTITION (DateKey RANGE RIGHT FOR VALUES (20240101, 20240401, 20240701, 20241001))
);
-- Record migration
INSERT INTO dbo.__SchemaVersion (VersionId, ScriptName)
VALUES (1, 'V001__InitialSchema.sql');
END
GO
SQL Deployment Script¶
# deploy.ps1
param(
[Parameter(Mandatory=$true)]
[string]$ServerName,
[Parameter(Mandatory=$true)]
[string]$DatabaseName,
[Parameter(Mandatory=$true)]
[string]$Environment
)
# Get current version
$currentVersion = Invoke-Sqlcmd -ServerInstance $ServerName -Database $DatabaseName -Query @"
SELECT ISNULL(MAX(VersionId), 0) AS CurrentVersion
FROM dbo.__SchemaVersion
"@
Write-Host "Current schema version: $($currentVersion.CurrentVersion)"
# Get migration scripts
$migrationFiles = Get-ChildItem -Path "Migrations" -Filter "V*.sql" |
Sort-Object Name |
Where-Object {
[int]($_.Name -replace 'V(\d+)__.*\.sql', '$1') -gt $currentVersion.CurrentVersion
}
foreach ($file in $migrationFiles) {
Write-Host "Applying migration: $($file.Name)"
$script = Get-Content $file.FullName -Raw
# Replace environment variables
$script = $script -replace '\$\{ENVIRONMENT\}', $Environment
try {
Invoke-Sqlcmd -ServerInstance $ServerName -Database $DatabaseName -Query $script -ErrorAction Stop
Write-Host " Successfully applied: $($file.Name)" -ForegroundColor Green
}
catch {
Write-Host " Failed to apply: $($file.Name)" -ForegroundColor Red
Write-Host " Error: $_"
exit 1
}
}
Write-Host "Database deployment completed successfully!" -ForegroundColor Green
Section 4: Testing Framework¶
Pipeline Testing¶
# tests/test_pipelines.py
import pytest
import json
import os
from pathlib import Path
WORKSPACE_PATH = Path("synapse-workspace")
class TestPipelineDefinitions:
"""Test Synapse pipeline definitions."""
@pytest.fixture
def pipeline_files(self):
"""Get all pipeline definition files."""
pipeline_dir = WORKSPACE_PATH / "pipeline"
return list(pipeline_dir.glob("*.json"))
def test_pipeline_files_exist(self, pipeline_files):
"""Verify pipeline files exist."""
assert len(pipeline_files) > 0, "No pipeline files found"
def test_pipeline_json_valid(self, pipeline_files):
"""Verify all pipeline files have valid JSON."""
for file in pipeline_files:
with open(file) as f:
try:
json.load(f)
except json.JSONDecodeError as e:
pytest.fail(f"Invalid JSON in {file.name}: {e}")
def test_pipeline_has_activities(self, pipeline_files):
"""Verify all pipelines have at least one activity."""
for file in pipeline_files:
with open(file) as f:
pipeline = json.load(f)
activities = pipeline.get("properties", {}).get("activities", [])
assert len(activities) > 0, f"Pipeline {file.name} has no activities"
def test_pipeline_activity_names_unique(self, pipeline_files):
"""Verify activity names are unique within each pipeline."""
for file in pipeline_files:
with open(file) as f:
pipeline = json.load(f)
activities = pipeline.get("properties", {}).get("activities", [])
names = [a.get("name") for a in activities]
assert len(names) == len(set(names)), f"Duplicate activity names in {file.name}"
class TestNotebookDefinitions:
"""Test Synapse notebook definitions."""
@pytest.fixture
def notebook_files(self):
"""Get all notebook definition files."""
notebook_dir = WORKSPACE_PATH / "notebook"
return list(notebook_dir.glob("*.json"))
def test_notebook_has_cells(self, notebook_files):
"""Verify notebooks have cells."""
for file in notebook_files:
with open(file) as f:
notebook = json.load(f)
cells = notebook.get("properties", {}).get("cells", [])
assert len(cells) > 0, f"Notebook {file.name} has no cells"
def test_notebook_spark_pool_configured(self, notebook_files):
"""Verify notebooks have Spark pool configured."""
for file in notebook_files:
with open(file) as f:
notebook = json.load(f)
big_data_pool = notebook.get("properties", {}).get("bigDataPool", {})
assert big_data_pool, f"Notebook {file.name} has no Spark pool configured"
Integration Testing¶
# tests/integration/test_data_pipelines.py
import pytest
from azure.identity import DefaultAzureCredential
from azure.synapse.artifacts import ArtifactsClient
import time
@pytest.fixture(scope="module")
def synapse_client():
"""Create Synapse client."""
credential = DefaultAzureCredential()
endpoint = "https://synapse-dev.dev.azuresynapse.net"
return ArtifactsClient(credential, endpoint)
class TestPipelineExecution:
"""Integration tests for pipeline execution."""
def test_daily_ingestion_pipeline(self, synapse_client):
"""Test daily ingestion pipeline runs successfully."""
pipeline_name = "DailyIngestion"
# Trigger pipeline run
run_response = synapse_client.pipeline.create_pipeline_run(
pipeline_name,
parameters={"runDate": "2024-01-15"}
)
run_id = run_response.run_id
# Wait for completion (timeout: 30 minutes)
timeout = 1800
start_time = time.time()
while time.time() - start_time < timeout:
run = synapse_client.pipeline_run.get_pipeline_run(run_id)
if run.status in ["Succeeded"]:
break
elif run.status in ["Failed", "Cancelled"]:
pytest.fail(f"Pipeline failed with status: {run.status}")
time.sleep(30)
else:
pytest.fail("Pipeline execution timed out")
assert run.status == "Succeeded"
def test_data_quality_checks(self, synapse_client):
"""Test data quality pipeline."""
pipeline_name = "DataQuality"
run_response = synapse_client.pipeline.create_pipeline_run(pipeline_name)
run_id = run_response.run_id
# Wait for completion
time.sleep(60)
run = synapse_client.pipeline_run.get_pipeline_run(run_id)
assert run.status == "Succeeded"
Section 5: Environment Configuration¶
Parameter Files¶
// config/dev.parameters.json
{
"environment": "dev",
"synapse": {
"workspaceName": "synapse-dev",
"resourceGroup": "rg-synapse-dev",
"sqlPoolName": "sqlpool-dev",
"sparkPoolName": "sparkpool-dev"
},
"storage": {
"accountName": "datalakedev",
"containerName": "data"
},
"linkedServices": {
"dataLakeUrl": "https://datalakedev.dfs.core.windows.net",
"sqlServerUrl": "synapse-dev.sql.azuresynapse.net"
}
}
// config/prod.parameters.json
{
"environment": "prod",
"synapse": {
"workspaceName": "synapse-prod",
"resourceGroup": "rg-synapse-prod",
"sqlPoolName": "sqlpool-prod",
"sparkPoolName": "sparkpool-prod"
},
"storage": {
"accountName": "datalakeprod",
"containerName": "data"
},
"linkedServices": {
"dataLakeUrl": "https://datalakeprod.dfs.core.windows.net",
"sqlServerUrl": "synapse-prod.sql.azuresynapse.net"
}
}
Environment Variable Substitution¶
# scripts/apply_parameters.py
import json
import os
import sys
from pathlib import Path
import re
def apply_parameters(artifact_path: str, params_file: str, output_path: str):
"""Apply environment parameters to Synapse artifacts."""
# Load parameters
with open(params_file) as f:
params = json.load(f)
# Flatten parameters for substitution
def flatten_dict(d, parent_key='', sep='.'):
items = []
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.extend(flatten_dict(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
flat_params = flatten_dict(params)
# Process each artifact file
artifact_dir = Path(artifact_path)
output_dir = Path(output_path)
output_dir.mkdir(parents=True, exist_ok=True)
for file in artifact_dir.rglob("*.json"):
with open(file) as f:
content = f.read()
# Replace placeholders
for key, value in flat_params.items():
placeholder = f"${{{key}}}"
content = content.replace(placeholder, str(value))
# Write to output
relative_path = file.relative_to(artifact_dir)
output_file = output_dir / relative_path
output_file.parent.mkdir(parents=True, exist_ok=True)
with open(output_file, 'w') as f:
f.write(content)
print(f"Processed: {relative_path}")
if __name__ == "__main__":
apply_parameters(sys.argv[1], sys.argv[2], sys.argv[3])
Exercises¶
Exercise 1: Set Up CI/CD Pipeline¶
Create a complete CI/CD pipeline for your Synapse workspace.
Exercise 2: Implement Database Migrations¶
Create a migration framework for your dedicated SQL pool.
Exercise 3: Add Testing¶
Implement unit and integration tests for pipelines and notebooks.
Best Practices Summary¶
| Area | Recommendation |
|---|---|
| Branching | Use GitFlow or trunk-based development |
| Testing | Automate validation and integration tests |
| Secrets | Use Key Vault, never commit secrets |
| Environments | Maintain parity between dev/staging/prod |
| Deployments | Use incremental deployments when possible |
| Rollback | Always have a rollback strategy |
Summary¶
Congratulations! You have completed the Azure Synapse Analytics tutorial series. You now have the knowledge to:
- Set up and configure Synapse workspaces
- Build data pipelines and transformations
- Implement security and monitoring
- Deploy using CI/CD best practices
Additional Resources¶
- Azure Synapse Documentation
- Best Practices Guide
- Troubleshooting Guide
- Code Examples