Skip to content

AI/ML Migration: SageMaker and Bedrock to Azure AI

A deep-dive guide for ML engineers and AI developers migrating Amazon SageMaker and Bedrock workloads to Azure Machine Learning, Azure OpenAI, and AI Foundry.


Executive summary

AWS AI/ML is split across two primary services: SageMaker for custom model training, deployment, and MLOps, and Bedrock for managed foundation model access. Azure provides equivalent capabilities through Azure Machine Learning (custom ML), Azure OpenAI Service (foundation models), AI Foundry (unified AI development), and Databricks ML (Spark-native ML). The Azure AI ecosystem is broader, with deeper integration into the Microsoft productivity suite via Copilot and tighter governance through Purview and Entra ID.

This guide covers model training migration, endpoint deployment, pipeline orchestration, foundation model access (Bedrock to Azure OpenAI), agent architectures (Bedrock Agents to Azure AI Agents), and RAG pattern migration (Bedrock Knowledge Bases to Azure AI Search).


Service mapping overview

AWS AI/ML service Azure equivalent Migration complexity Notes
SageMaker Studio Azure ML Studio / AI Foundry M Notebook + experiment + deployment IDE
SageMaker Training Azure ML Compute / Databricks ML M GPU and CPU cluster training
SageMaker Processing Azure ML Pipeline steps / Databricks Jobs M Data processing for ML
SageMaker Endpoints (real-time) Azure ML Managed Endpoints M Managed inference hosting
SageMaker Batch Transform Azure ML Batch Endpoints S Batch inference
SageMaker Pipelines Azure ML Pipelines / Prompt Flow M ML workflow orchestration
SageMaker Feature Store Databricks Feature Store / Azure ML Feature Store M Online and offline feature serving
SageMaker Model Registry Azure ML Model Registry / MLflow S Model versioning and lifecycle
SageMaker Experiments Azure ML Experiments / MLflow S Experiment tracking
SageMaker Ground Truth Azure ML Data Labeling M Human-in-the-loop labeling
SageMaker Clarify Azure ML Responsible AI M Fairness, explainability
SageMaker Model Monitor Azure ML Model Monitor M Drift detection, data quality
Bedrock Azure OpenAI Service S Foundation model API access
Bedrock Agents Azure AI Agents / Copilot Studio M Autonomous AI agents
Bedrock Knowledge Bases Azure AI Search (RAG) M Retrieval-augmented generation
Bedrock Guardrails Azure AI Content Safety S Content filtering and moderation

Part 1: SageMaker Studio to Azure ML Studio and AI Foundry

Environment comparison

SageMaker Studio feature Azure ML Studio AI Foundry
JupyterLab notebooks Azure ML Notebooks (JupyterLab) AI Foundry Notebooks
Kernel gateway Compute instances (various VM sizes) Serverless compute
Git integration Native Git integration Native Git integration
Experiment tracking MLflow integration Built-in experiment tracking
Model registry Azure ML Model Registry AI Foundry Model Catalog
Endpoint deployment Managed Endpoints Model-as-a-Service
Studio IDE VS Code for the Web / JupyterLab AI Foundry portal

Migration approach

Step 1: Move notebooks and code

# Export SageMaker notebooks
# SageMaker stores notebooks in the EFS volume or S3
aws s3 sync s3://sagemaker-us-gov-west-1-123456789012/notebooks/ ./sm_notebooks/

# Push to Git repository (Azure DevOps or GitHub)
cd sm_notebooks
git init
git add .
git commit -m "Import SageMaker notebooks"
git remote add origin https://github.com/agency/ml-notebooks.git
git push -u origin main

Step 2: Adapt SageMaker SDK calls to Azure ML SDK

# SageMaker training job
import sagemaker
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point='train.py',
    role='arn:aws:iam::123456789012:role/SageMakerRole',
    instance_count=2,
    instance_type='ml.p3.8xlarge',
    framework_version='2.1',
    py_version='py310',
    hyperparameters={'epochs': 10, 'batch_size': 64}
)
estimator.fit({'training': 's3://bucket/train/', 'validation': 's3://bucket/val/'})
# Azure ML equivalent
from azure.ai.ml import MLClient, command, Input
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="<sub-id>",
    resource_group_name="<rg>",
    workspace_name="<ws>"
)

command_job = command(
    code="./src",
    command="python train.py --epochs 10 --batch_size 64",
    environment="pytorch-2.1-gpu:latest",
    compute="gpu-cluster",  # Pre-created compute cluster
    inputs={
        "training": Input(type="uri_folder", path="azureml://datastores/training/paths/train/"),
        "validation": Input(type="uri_folder", path="azureml://datastores/training/paths/val/")
    },
    instance_count=2
)

returned_job = ml_client.jobs.create_or_update(command_job)

Step 3: Adapt the training script

The training script (train.py) typically requires minimal changes. The main adaptation is data path resolution:

# SageMaker: data paths come from environment variables
import os
train_dir = os.environ.get('SM_CHANNEL_TRAINING', '/opt/ml/input/data/training')
model_dir = os.environ.get('SM_MODEL_DIR', '/opt/ml/model')

# Azure ML: data paths come from command-line arguments or mounted paths
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--training', type=str)
parser.add_argument('--model_output', type=str, default='./outputs/model')
args = parser.parse_args()
train_dir = args.training
model_dir = args.model_output

Part 2: SageMaker Endpoints to Azure ML Managed Endpoints

Endpoint comparison

SageMaker endpoint type Azure ML equivalent Notes
Real-time endpoint Managed Online Endpoint Auto-scaling, blue/green deployment
Serverless endpoint Serverless Online Endpoint Scale to zero; pay per invocation
Multi-model endpoint Multiple deployments under one endpoint Traffic splitting for A/B testing
Batch Transform Batch Endpoint Async batch inference
Inference Recommender Azure ML profiling Right-size compute for inference

Deployment example

SageMaker endpoint:

from sagemaker.pytorch import PyTorchModel

model = PyTorchModel(
    model_data='s3://bucket/model/model.tar.gz',
    role='arn:aws:iam::123456789012:role/SageMakerRole',
    framework_version='2.1',
    py_version='py310',
    entry_point='inference.py'
)

predictor = model.deploy(
    initial_instance_count=2,
    instance_type='ml.g4dn.xlarge',
    endpoint_name='sales-forecast-prod'
)

Azure ML managed endpoint:

from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration
)

# Create endpoint
endpoint = ManagedOnlineEndpoint(
    name="sales-forecast-prod",
    auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Create deployment
model = Model(path="./model/", type="custom_model")
env = Environment(
    image="mcr.microsoft.com/azureml/pytorch-2.1-cuda11.8-cudnn8-runtime:latest",
    conda_file="./environment/conda.yml"
)

deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name="sales-forecast-prod",
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="./src",
        scoring_script="inference.py"
    ),
    instance_type="Standard_NC4as_T4_v3",
    instance_count=2
)
ml_client.online_deployments.begin_create_or_update(deployment).result()

# Route 100% traffic to the deployment
endpoint.traffic = {"blue": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

Part 3: SageMaker Pipelines to Azure ML Pipelines

Pipeline comparison

SageMaker Pipeline step Azure ML Pipeline equivalent Notes
ProcessingStep Command component Data processing
TrainingStep Command component (with GPU) Model training
TransformStep Batch endpoint invocation Batch inference
RegisterModel Model registration component Register in registry
ConditionStep Conditional pipeline step Branching logic
FailStep Pipeline error handling Error paths
TuningStep Sweep job Hyperparameter tuning
CallbackStep Custom component External service integration

Pipeline migration example

SageMaker Pipeline:

from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep

pipeline = Pipeline(
    name="sales-forecast-pipeline",
    steps=[preprocess_step, train_step, evaluate_step, register_step],
    parameters=[input_data, model_approval_status]
)
pipeline.upsert(role_arn=role)
pipeline.start()

Azure ML Pipeline:

from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import Pipeline

@dsl.pipeline(
    description="Sales forecast training pipeline",
    compute="cpu-cluster"
)
def sales_forecast_pipeline(input_data: Input, model_approval: str = "pending"):
    preprocess = preprocess_component(raw_data=input_data)
    train = train_component(
        training_data=preprocess.outputs.processed_data,
        compute="gpu-cluster"
    )
    evaluate = evaluate_component(
        model=train.outputs.model,
        test_data=preprocess.outputs.test_data
    )
    register = register_component(
        model=train.outputs.model,
        metrics=evaluate.outputs.metrics,
        approval_status=model_approval
    )
    return {"model": register.outputs.registered_model}

pipeline_job = sales_forecast_pipeline(
    input_data=Input(type="uri_folder", path="azureml://datastores/training/paths/sales/")
)
returned_pipeline = ml_client.jobs.create_or_update(pipeline_job)

Part 4: Bedrock to Azure OpenAI Service

Model availability comparison

Bedrock model Azure OpenAI equivalent Notes
Anthropic Claude 3.5 Sonnet Claude 3.5 Sonnet (via Azure AI Foundry) Available as model-as-a-service
Amazon Titan Text No direct equivalent Use GPT-4o or open-source models
Amazon Titan Embeddings text-embedding-3-large OpenAI embedding model
Meta Llama 3 Llama 3 (via Azure AI Foundry) Model-as-a-service deployment
Mistral Large Mistral Large (via Azure AI Foundry) Model-as-a-service deployment
Cohere Command R+ Cohere Command R+ (via Azure AI Foundry) Model-as-a-service deployment
AI21 Jurassic No direct equivalent Use GPT-4o
Stability AI SDXL DALL-E 3 (Azure OpenAI) Image generation
GPT-4o GPT-4o (Azure OpenAI) Azure-exclusive model family
GPT-4.1 GPT-4.1 (Azure OpenAI) Latest generation
o3 / o4-mini o3 / o4-mini (Azure OpenAI) Reasoning models

API migration

Bedrock API (Python/boto3):

import boto3
import json

bedrock = boto3.client('bedrock-runtime', region_name='us-gov-west-1')

response = bedrock.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": "Summarize federal procurement regulations"}
        ]
    })
)
result = json.loads(response['body'].read())
answer = result['content'][0]['text']

Azure OpenAI API (Python/openai):

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
    azure_endpoint="https://acme-ai.openai.azure.us",
    azure_ad_token_provider=token_provider,
    api_version="2024-12-01-preview"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a federal procurement expert."},
        {"role": "user", "content": "Summarize federal procurement regulations"}
    ],
    max_tokens=1024
)
answer = response.choices[0].message.content

Key differences:

  • Bedrock uses boto3 with model-specific request/response formats. Azure OpenAI uses the standard OpenAI SDK with consistent request/response format across all models.
  • Bedrock authentication is IAM-based. Azure OpenAI uses Entra ID (managed identity or token provider).
  • Azure OpenAI is available in Azure Government regions for federal workloads.

Part 5: Bedrock Agents to Azure AI Agents and Copilot Studio

Agent architecture comparison

Bedrock Agents concept Azure equivalent Notes
Agent Azure AI Agent / Copilot Studio agent Autonomous task execution
Action group Tool / Function calling Define callable tools
Knowledge base Azure AI Search (RAG) Document retrieval
Guardrails Azure AI Content Safety Input/output filtering
Agent executor Azure AI Agent SDK / Semantic Kernel Orchestration framework
Session management Thread management (Agent SDK) Conversation state

Code-first agent migration (Bedrock Agent to Azure AI Agent)

Bedrock Agent invocation:

bedrock_agent = boto3.client('bedrock-agent-runtime')

response = bedrock_agent.invoke_agent(
    agentId='AGENT123',
    agentAliasId='ALIAS456',
    sessionId='session-789',
    inputText='Find all overdue invoices for Q1 2026'
)

Azure AI Agent (using Azure AI Agent SDK):

from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

project_client = AIProjectClient.from_connection_string(
    credential=DefaultAzureCredential(),
    conn_str="<project-connection-string>"
)

agent = project_client.agents.create_agent(
    model="gpt-4o",
    name="invoice-analyst",
    instructions="You are a federal financial analyst. Find and analyze invoices.",
    tools=[
        {
            "type": "function",
            "function": {
                "name": "query_invoices",
                "description": "Query the invoice database",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "status": {"type": "string", "enum": ["overdue", "paid", "pending"]},
                        "quarter": {"type": "string"}
                    }
                }
            }
        }
    ]
)

thread = project_client.agents.create_thread()
message = project_client.agents.create_message(
    thread_id=thread.id,
    role="user",
    content="Find all overdue invoices for Q1 2026"
)
run = project_client.agents.create_and_process_run(
    thread_id=thread.id,
    assistant_id=agent.id
)

No-code agent migration (to Copilot Studio)

For agents that do not require custom code, Copilot Studio provides a visual agent builder that integrates with:

  • Dataverse (structured data)
  • SharePoint (documents)
  • Azure AI Search (RAG)
  • Power Automate (actions)
  • Microsoft 365 (email, calendar, Teams)

Part 6: Bedrock Knowledge Bases to Azure AI Search (RAG)

RAG architecture comparison

Bedrock Knowledge Bases Azure AI Search RAG Notes
S3 data source ADLS Gen2 / Blob Storage Document source
Document chunking Azure AI Document Intelligence + chunking Built-in or custom chunking
Embedding model (Titan) text-embedding-3-large (OpenAI) Higher-quality embeddings
Vector store (OpenSearch) Azure AI Search (vector + hybrid) Hybrid search (vector + keyword)
Retrieval API AI Search REST API / SDK More control over retrieval
Foundation model Azure OpenAI (GPT-4o) Generation step

RAG pipeline migration

# Azure AI Search + Azure OpenAI RAG pattern
from azure.search.documents import SearchClient
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI

# 1. Search for relevant documents
search_client = SearchClient(
    endpoint="https://acme-search.search.windows.us",
    index_name="federal-docs",
    credential=DefaultAzureCredential()
)

results = search_client.search(
    search_text="federal procurement regulations",
    vector_queries=[{
        "kind": "text",
        "text": "federal procurement regulations",
        "fields": "content_vector",
        "k": 5
    }],
    select=["title", "content", "source_url"],
    top=5
)

# 2. Build context from search results
context = "\n\n".join([
    f"Source: {r['title']}\n{r['content']}"
    for r in results
])

# 3. Generate answer with Azure OpenAI
client = AzureOpenAI(
    azure_endpoint="https://acme-ai.openai.azure.us",
    azure_ad_token_provider=token_provider,
    api_version="2024-12-01-preview"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": f"Answer using only this context:\n{context}"},
        {"role": "user", "content": "Summarize federal procurement regulations"}
    ]
)

Cross-reference: csa_platform/ai_integration/ for AI Foundry and Azure OpenAI integration patterns.


Model registry and lifecycle comparison

SageMaker Model Registry Azure ML Model Registry MLflow (Databricks)
Model groups Model names Registered models
Model versions Model versions Model versions
Approval status (Pending/Approved/Rejected) Model stages (None/Staging/Production/Archived) Model stages
Model metrics Model metrics + tags Logged metrics + tags
Lineage (data → model → endpoint) Lineage (data → experiment → model → endpoint) MLflow lineage
Model cards Responsible AI dashboard MLflow model cards

Migration approach for model registry

# Export SageMaker model registry
import boto3
sm = boto3.client('sagemaker')

# List all model packages
model_groups = sm.list_model_package_groups()
for group in model_groups['ModelPackageGroupSummaryList']:
    packages = sm.list_model_packages(ModelPackageGroupName=group['ModelPackageGroupName'])
    for pkg in packages['ModelPackageSummaryList']:
        details = sm.describe_model_package(ModelPackageName=pkg['ModelPackageArn'])
        # Export model artifact, metrics, and metadata

# Register in Azure ML
from azure.ai.ml.entities import Model
model = Model(
    name="sales-forecast",
    version="1",
    path="./exported_model/",
    type="custom_model",
    description="Sales forecast model migrated from SageMaker",
    tags={"source": "sagemaker", "original_arn": "arn:aws:sagemaker:..."}
)
ml_client.models.create_or_update(model)

Migration sequence

Phase Duration Activities
1. Inventory 1-2 weeks Catalog all SageMaker models, endpoints, pipelines; list Bedrock usage
2. Environment setup 2-3 weeks Create Azure ML workspace, AI Foundry project, Azure OpenAI deployment
3. Training migration 3-4 weeks Adapt training scripts; replicate experiments on Azure ML
4. Model deployment 2-3 weeks Deploy models to Azure ML managed endpoints; validate inference
5. Pipeline migration 3-4 weeks Convert SageMaker Pipelines to Azure ML Pipelines
6. LLM/RAG migration 2-3 weeks Switch Bedrock calls to Azure OpenAI; migrate Knowledge Bases to AI Search
7. Agent migration 2-4 weeks Rebuild Bedrock Agents as Azure AI Agents or Copilot Studio
8. Validation 2-3 weeks Dual-run inference; compare model outputs; validate RAG quality

Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team Related: Migration Center | Compute Migration | Security Migration | Migration Playbook