Azure Machine Learning Integration with Azure Synapse Analytics¶

Home > Code Examples > Integration > Azure ML Integration

This guide provides examples and best practices for integrating Azure Synapse Analytics with Azure Machine Learning (Azure ML).

Prerequisites¶

Azure Synapse Analytics workspace
Azure Machine Learning workspace
Appropriate permissions on both services
Azure Storage account accessible by both services

Linked Service Setup¶

First, create a linked service between Azure Synapse and Azure Machine Learning:

# PySpark code to configure Azure ML integration in Synapse
from notebookutils import mssparkutils

# Set up Azure ML workspace connection
synapse_workspace_name = "your-synapse-workspace"
ml_workspace_name = "your-ml-workspace"
resource_group = "your-resource-group"
subscription_id = "your-subscription-id"

# Create linked service connection
linked_service_name = "AzureMLService"
mssparkutils.notebook.run("./setup_linked_service.py", 
                         {"workspace_name": ml_workspace_name,
                          "resource_group": resource_group,
                          "subscription_id": subscription_id})

Training a Machine Learning Model with Synapse Data¶

Example of training an ML model using data from Synapse:

# Import necessary libraries
from azureml.core import Workspace, Experiment, Dataset
from azureml.core.compute import ComputeTarget, SynapseCompute
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.pipeline.steps import SynapseSparkStep, PythonScriptStep
from azureml.core.runconfig import RunConfiguration

# Connect to Azure ML workspace
ws = Workspace.get(name=ml_workspace_name,
                  subscription_id=subscription_id,
                  resource_group=resource_group)

# Set up Synapse as compute target in Azure ML
synapse_compute = SynapseCompute(ws, synapse_workspace_name)

# Create a Synapse Spark step to process data
processed_data = PipelineData("processed_data", datastore=ws.get_default_datastore())
data_prep_step = SynapseSparkStep(
    name="data_preparation",
    synapse_compute=synapse_compute,
    spark_pool_name="your-spark-pool",
    file_name="data_prep.py",
    inputs=[Dataset.get_by_name(ws, "raw_data")],
    outputs=[processed_data],
    arguments=["--output_path", processed_data]
)

# Create a training step using the processed data
model_training_step = PythonScriptStep(
    name="model_training",
    script_name="train.py",
    compute_target=ws.compute_targets["training-cluster"],
    inputs=[processed_data],
    arguments=["--data_path", processed_data,
               "--model_name", "customer_churn_model"]
)

# Create and run the pipeline
pipeline = Pipeline(workspace=ws, steps=[data_prep_step, model_training_step])
experiment = Experiment(ws, "synapse_ml_integration")
pipeline_run = experiment.submit(pipeline)
pipeline_run.wait_for_completion(show_output=True)

Batch Inference with Trained Models¶

Example of performing batch scoring using a trained model:

# Batch scoring using Azure ML and Synapse
from azureml.core import Model
from azureml.core.dataset import Dataset
from azureml.pipeline.core import PipelineEndpoint

# Get the deployed model endpoint
model = Model(ws, "customer_churn_model")
scoring_endpoint = PipelineEndpoint.get(workspace=ws, name="batch_scoring_pipeline")

# Prepare parameters for batch scoring
parameters = {
    "input_data": Dataset.get_by_name(ws, "new_customers"),
    "model_name": "customer_churn_model",
    "output_path": "abfs://container@account.dfs.core.windows.net/scores/"
}

# Submit batch scoring job
pipeline_run = scoring_endpoint.submit("Batch_scoring_run", parameters=parameters)
pipeline_run.wait_for_completion()

MLOps with Azure Synapse and Azure ML¶

Example of implementing MLOps practices:

# Example Azure DevOps pipeline for MLOps with Synapse and Azure ML
trigger:
- main

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: UsePythonVersion@0
  inputs:
    versionSpec: '3.8'
    addToPath: true

- script: |
    python -m pip install --upgrade pip
    pip install -r requirements.txt
  displayName: 'Install dependencies'

- script: |
    python src/validate_data.py
  displayName: 'Validate data quality'

- task: AzureCLI@2
  inputs:
    azureSubscription: 'your-azure-connection'
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      az ml run submit-pipeline --resource-group $(resourceGroup) \
                             --workspace-name $(workspaceName) \
                             --experiment-name $(experimentName) \
                             --pipeline-id $(pipelineId)
  displayName: 'Run ML pipeline'

Real-time Model Deployment and Scoring¶

Example of deploying a model for real-time inference:

# Deploy model to AKS for real-time scoring
from azureml.core.webservice import AksWebservice
from azureml.core.compute import AksCompute
from azureml.core.model import InferenceConfig, Model

# Get the registered model
model = Model(ws, "customer_churn_model")

# Define inference configuration
inference_config = InferenceConfig(
    environment=ws.environments["scoring_env"],
    source_directory="./scoring_scripts",
    entry_script="score.py"
)

# Get or create AKS cluster
try:
    aks_target = AksCompute(ws, "aks-cluster")
except:
    prov_config = AksCompute.provisioning_configuration(vm_size="Standard_D3_v2", 
                                                       agent_count=3)
    aks_target = ComputeTarget.create(ws, "aks-cluster", prov_config)
    aks_target.wait_for_completion(show_output=True)

# Deploy the model
deployment_config = AksWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1,
    enable_app_insights=True,
    auth_enabled=True
)

service = Model.deploy(ws,
                      name="churn-prediction-service",
                      models=[model],
                      inference_config=inference_config,
                      deployment_config=deployment_config,
                      deployment_target=aks_target)

service.wait_for_deployment(show_output=True)
print(service.get_logs())

Best Practices for Synapse and Azure ML Integration¶

Data Preparation: Use Synapse Spark pools for data preparation and feature engineering tasks that require distributed computing.
Model Training: For complex model training, use Azure ML's dedicated compute clusters, which are optimized for machine learning workloads.
Feature Stores: Implement feature stores using Delta tables in Synapse to ensure consistent features across training and inference.
Pipeline Orchestration: Use Azure ML pipelines for end-to-end orchestration, with Synapse steps for data processing.
Model Monitoring: Implement model monitoring using Azure ML and integrate with Azure Monitor for comprehensive observability.
Security Best Practices:
Use managed identities when possible
Implement least privilege access
Encrypt data at rest and in transit
Use private endpoints for service connectivity
Cost Optimization:
Auto-scale compute resources based on workload
Schedule pipelines during off-peak hours
Use serverless SQL pools for ad-hoc data exploration
Monitor and optimize resource usage