AI/ML Migration: SageMaker and Bedrock to Azure AI¶
A deep-dive guide for ML engineers and AI developers migrating Amazon SageMaker and Bedrock workloads to Azure Machine Learning, Azure OpenAI, and AI Foundry.
Executive summary¶
AWS AI/ML is split across two primary services: SageMaker for custom model training, deployment, and MLOps, and Bedrock for managed foundation model access. Azure provides equivalent capabilities through Azure Machine Learning (custom ML), Azure OpenAI Service (foundation models), AI Foundry (unified AI development), and Databricks ML (Spark-native ML). The Azure AI ecosystem is broader, with deeper integration into the Microsoft productivity suite via Copilot and tighter governance through Purview and Entra ID.
This guide covers model training migration, endpoint deployment, pipeline orchestration, foundation model access (Bedrock to Azure OpenAI), agent architectures (Bedrock Agents to Azure AI Agents), and RAG pattern migration (Bedrock Knowledge Bases to Azure AI Search).
Service mapping overview¶
| AWS AI/ML service | Azure equivalent | Migration complexity | Notes |
|---|---|---|---|
| SageMaker Studio | Azure ML Studio / AI Foundry | M | Notebook + experiment + deployment IDE |
| SageMaker Training | Azure ML Compute / Databricks ML | M | GPU and CPU cluster training |
| SageMaker Processing | Azure ML Pipeline steps / Databricks Jobs | M | Data processing for ML |
| SageMaker Endpoints (real-time) | Azure ML Managed Endpoints | M | Managed inference hosting |
| SageMaker Batch Transform | Azure ML Batch Endpoints | S | Batch inference |
| SageMaker Pipelines | Azure ML Pipelines / Prompt Flow | M | ML workflow orchestration |
| SageMaker Feature Store | Databricks Feature Store / Azure ML Feature Store | M | Online and offline feature serving |
| SageMaker Model Registry | Azure ML Model Registry / MLflow | S | Model versioning and lifecycle |
| SageMaker Experiments | Azure ML Experiments / MLflow | S | Experiment tracking |
| SageMaker Ground Truth | Azure ML Data Labeling | M | Human-in-the-loop labeling |
| SageMaker Clarify | Azure ML Responsible AI | M | Fairness, explainability |
| SageMaker Model Monitor | Azure ML Model Monitor | M | Drift detection, data quality |
| Bedrock | Azure OpenAI Service | S | Foundation model API access |
| Bedrock Agents | Azure AI Agents / Copilot Studio | M | Autonomous AI agents |
| Bedrock Knowledge Bases | Azure AI Search (RAG) | M | Retrieval-augmented generation |
| Bedrock Guardrails | Azure AI Content Safety | S | Content filtering and moderation |
Part 1: SageMaker Studio to Azure ML Studio and AI Foundry¶
Environment comparison¶
| SageMaker Studio feature | Azure ML Studio | AI Foundry |
|---|---|---|
| JupyterLab notebooks | Azure ML Notebooks (JupyterLab) | AI Foundry Notebooks |
| Kernel gateway | Compute instances (various VM sizes) | Serverless compute |
| Git integration | Native Git integration | Native Git integration |
| Experiment tracking | MLflow integration | Built-in experiment tracking |
| Model registry | Azure ML Model Registry | AI Foundry Model Catalog |
| Endpoint deployment | Managed Endpoints | Model-as-a-Service |
| Studio IDE | VS Code for the Web / JupyterLab | AI Foundry portal |
Migration approach¶
Step 1: Move notebooks and code
# Export SageMaker notebooks
# SageMaker stores notebooks in the EFS volume or S3
aws s3 sync s3://sagemaker-us-gov-west-1-123456789012/notebooks/ ./sm_notebooks/
# Push to Git repository (Azure DevOps or GitHub)
cd sm_notebooks
git init
git add .
git commit -m "Import SageMaker notebooks"
git remote add origin https://github.com/agency/ml-notebooks.git
git push -u origin main
Step 2: Adapt SageMaker SDK calls to Azure ML SDK
# SageMaker training job
import sagemaker
from sagemaker.pytorch import PyTorch
estimator = PyTorch(
entry_point='train.py',
role='arn:aws:iam::123456789012:role/SageMakerRole',
instance_count=2,
instance_type='ml.p3.8xlarge',
framework_version='2.1',
py_version='py310',
hyperparameters={'epochs': 10, 'batch_size': 64}
)
estimator.fit({'training': 's3://bucket/train/', 'validation': 's3://bucket/val/'})
# Azure ML equivalent
from azure.ai.ml import MLClient, command, Input
from azure.identity import DefaultAzureCredential
ml_client = MLClient(
DefaultAzureCredential(),
subscription_id="<sub-id>",
resource_group_name="<rg>",
workspace_name="<ws>"
)
command_job = command(
code="./src",
command="python train.py --epochs 10 --batch_size 64",
environment="pytorch-2.1-gpu:latest",
compute="gpu-cluster", # Pre-created compute cluster
inputs={
"training": Input(type="uri_folder", path="azureml://datastores/training/paths/train/"),
"validation": Input(type="uri_folder", path="azureml://datastores/training/paths/val/")
},
instance_count=2
)
returned_job = ml_client.jobs.create_or_update(command_job)
Step 3: Adapt the training script
The training script (train.py) typically requires minimal changes. The main adaptation is data path resolution:
# SageMaker: data paths come from environment variables
import os
train_dir = os.environ.get('SM_CHANNEL_TRAINING', '/opt/ml/input/data/training')
model_dir = os.environ.get('SM_MODEL_DIR', '/opt/ml/model')
# Azure ML: data paths come from command-line arguments or mounted paths
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--training', type=str)
parser.add_argument('--model_output', type=str, default='./outputs/model')
args = parser.parse_args()
train_dir = args.training
model_dir = args.model_output
Part 2: SageMaker Endpoints to Azure ML Managed Endpoints¶
Endpoint comparison¶
| SageMaker endpoint type | Azure ML equivalent | Notes |
|---|---|---|
| Real-time endpoint | Managed Online Endpoint | Auto-scaling, blue/green deployment |
| Serverless endpoint | Serverless Online Endpoint | Scale to zero; pay per invocation |
| Multi-model endpoint | Multiple deployments under one endpoint | Traffic splitting for A/B testing |
| Batch Transform | Batch Endpoint | Async batch inference |
| Inference Recommender | Azure ML profiling | Right-size compute for inference |
Deployment example¶
SageMaker endpoint:
from sagemaker.pytorch import PyTorchModel
model = PyTorchModel(
model_data='s3://bucket/model/model.tar.gz',
role='arn:aws:iam::123456789012:role/SageMakerRole',
framework_version='2.1',
py_version='py310',
entry_point='inference.py'
)
predictor = model.deploy(
initial_instance_count=2,
instance_type='ml.g4dn.xlarge',
endpoint_name='sales-forecast-prod'
)
Azure ML managed endpoint:
from azure.ai.ml.entities import (
ManagedOnlineEndpoint,
ManagedOnlineDeployment,
Model,
Environment,
CodeConfiguration
)
# Create endpoint
endpoint = ManagedOnlineEndpoint(
name="sales-forecast-prod",
auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
# Create deployment
model = Model(path="./model/", type="custom_model")
env = Environment(
image="mcr.microsoft.com/azureml/pytorch-2.1-cuda11.8-cudnn8-runtime:latest",
conda_file="./environment/conda.yml"
)
deployment = ManagedOnlineDeployment(
name="blue",
endpoint_name="sales-forecast-prod",
model=model,
environment=env,
code_configuration=CodeConfiguration(
code="./src",
scoring_script="inference.py"
),
instance_type="Standard_NC4as_T4_v3",
instance_count=2
)
ml_client.online_deployments.begin_create_or_update(deployment).result()
# Route 100% traffic to the deployment
endpoint.traffic = {"blue": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
Part 3: SageMaker Pipelines to Azure ML Pipelines¶
Pipeline comparison¶
| SageMaker Pipeline step | Azure ML Pipeline equivalent | Notes |
|---|---|---|
| ProcessingStep | Command component | Data processing |
| TrainingStep | Command component (with GPU) | Model training |
| TransformStep | Batch endpoint invocation | Batch inference |
| RegisterModel | Model registration component | Register in registry |
| ConditionStep | Conditional pipeline step | Branching logic |
| FailStep | Pipeline error handling | Error paths |
| TuningStep | Sweep job | Hyperparameter tuning |
| CallbackStep | Custom component | External service integration |
Pipeline migration example¶
SageMaker Pipeline:
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep
pipeline = Pipeline(
name="sales-forecast-pipeline",
steps=[preprocess_step, train_step, evaluate_step, register_step],
parameters=[input_data, model_approval_status]
)
pipeline.upsert(role_arn=role)
pipeline.start()
Azure ML Pipeline:
from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import Pipeline
@dsl.pipeline(
description="Sales forecast training pipeline",
compute="cpu-cluster"
)
def sales_forecast_pipeline(input_data: Input, model_approval: str = "pending"):
preprocess = preprocess_component(raw_data=input_data)
train = train_component(
training_data=preprocess.outputs.processed_data,
compute="gpu-cluster"
)
evaluate = evaluate_component(
model=train.outputs.model,
test_data=preprocess.outputs.test_data
)
register = register_component(
model=train.outputs.model,
metrics=evaluate.outputs.metrics,
approval_status=model_approval
)
return {"model": register.outputs.registered_model}
pipeline_job = sales_forecast_pipeline(
input_data=Input(type="uri_folder", path="azureml://datastores/training/paths/sales/")
)
returned_pipeline = ml_client.jobs.create_or_update(pipeline_job)
Part 4: Bedrock to Azure OpenAI Service¶
Model availability comparison¶
| Bedrock model | Azure OpenAI equivalent | Notes |
|---|---|---|
| Anthropic Claude 3.5 Sonnet | Claude 3.5 Sonnet (via Azure AI Foundry) | Available as model-as-a-service |
| Amazon Titan Text | No direct equivalent | Use GPT-4o or open-source models |
| Amazon Titan Embeddings | text-embedding-3-large | OpenAI embedding model |
| Meta Llama 3 | Llama 3 (via Azure AI Foundry) | Model-as-a-service deployment |
| Mistral Large | Mistral Large (via Azure AI Foundry) | Model-as-a-service deployment |
| Cohere Command R+ | Cohere Command R+ (via Azure AI Foundry) | Model-as-a-service deployment |
| AI21 Jurassic | No direct equivalent | Use GPT-4o |
| Stability AI SDXL | DALL-E 3 (Azure OpenAI) | Image generation |
| GPT-4o | GPT-4o (Azure OpenAI) | Azure-exclusive model family |
| GPT-4.1 | GPT-4.1 (Azure OpenAI) | Latest generation |
| o3 / o4-mini | o3 / o4-mini (Azure OpenAI) | Reasoning models |
API migration¶
Bedrock API (Python/boto3):
import boto3
import json
bedrock = boto3.client('bedrock-runtime', region_name='us-gov-west-1')
response = bedrock.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Summarize federal procurement regulations"}
]
})
)
result = json.loads(response['body'].read())
answer = result['content'][0]['text']
Azure OpenAI API (Python/openai):
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
token_provider = get_bearer_token_provider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAI(
azure_endpoint="https://acme-ai.openai.azure.us",
azure_ad_token_provider=token_provider,
api_version="2024-12-01-preview"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a federal procurement expert."},
{"role": "user", "content": "Summarize federal procurement regulations"}
],
max_tokens=1024
)
answer = response.choices[0].message.content
Key differences:
- Bedrock uses boto3 with model-specific request/response formats. Azure OpenAI uses the standard OpenAI SDK with consistent request/response format across all models.
- Bedrock authentication is IAM-based. Azure OpenAI uses Entra ID (managed identity or token provider).
- Azure OpenAI is available in Azure Government regions for federal workloads.
Part 5: Bedrock Agents to Azure AI Agents and Copilot Studio¶
Agent architecture comparison¶
| Bedrock Agents concept | Azure equivalent | Notes |
|---|---|---|
| Agent | Azure AI Agent / Copilot Studio agent | Autonomous task execution |
| Action group | Tool / Function calling | Define callable tools |
| Knowledge base | Azure AI Search (RAG) | Document retrieval |
| Guardrails | Azure AI Content Safety | Input/output filtering |
| Agent executor | Azure AI Agent SDK / Semantic Kernel | Orchestration framework |
| Session management | Thread management (Agent SDK) | Conversation state |
Code-first agent migration (Bedrock Agent to Azure AI Agent)¶
Bedrock Agent invocation:
bedrock_agent = boto3.client('bedrock-agent-runtime')
response = bedrock_agent.invoke_agent(
agentId='AGENT123',
agentAliasId='ALIAS456',
sessionId='session-789',
inputText='Find all overdue invoices for Q1 2026'
)
Azure AI Agent (using Azure AI Agent SDK):
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
project_client = AIProjectClient.from_connection_string(
credential=DefaultAzureCredential(),
conn_str="<project-connection-string>"
)
agent = project_client.agents.create_agent(
model="gpt-4o",
name="invoice-analyst",
instructions="You are a federal financial analyst. Find and analyze invoices.",
tools=[
{
"type": "function",
"function": {
"name": "query_invoices",
"description": "Query the invoice database",
"parameters": {
"type": "object",
"properties": {
"status": {"type": "string", "enum": ["overdue", "paid", "pending"]},
"quarter": {"type": "string"}
}
}
}
}
]
)
thread = project_client.agents.create_thread()
message = project_client.agents.create_message(
thread_id=thread.id,
role="user",
content="Find all overdue invoices for Q1 2026"
)
run = project_client.agents.create_and_process_run(
thread_id=thread.id,
assistant_id=agent.id
)
No-code agent migration (to Copilot Studio)¶
For agents that do not require custom code, Copilot Studio provides a visual agent builder that integrates with:
- Dataverse (structured data)
- SharePoint (documents)
- Azure AI Search (RAG)
- Power Automate (actions)
- Microsoft 365 (email, calendar, Teams)
Part 6: Bedrock Knowledge Bases to Azure AI Search (RAG)¶
RAG architecture comparison¶
| Bedrock Knowledge Bases | Azure AI Search RAG | Notes |
|---|---|---|
| S3 data source | ADLS Gen2 / Blob Storage | Document source |
| Document chunking | Azure AI Document Intelligence + chunking | Built-in or custom chunking |
| Embedding model (Titan) | text-embedding-3-large (OpenAI) | Higher-quality embeddings |
| Vector store (OpenSearch) | Azure AI Search (vector + hybrid) | Hybrid search (vector + keyword) |
| Retrieval API | AI Search REST API / SDK | More control over retrieval |
| Foundation model | Azure OpenAI (GPT-4o) | Generation step |
RAG pipeline migration¶
# Azure AI Search + Azure OpenAI RAG pattern
from azure.search.documents import SearchClient
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI
# 1. Search for relevant documents
search_client = SearchClient(
endpoint="https://acme-search.search.windows.us",
index_name="federal-docs",
credential=DefaultAzureCredential()
)
results = search_client.search(
search_text="federal procurement regulations",
vector_queries=[{
"kind": "text",
"text": "federal procurement regulations",
"fields": "content_vector",
"k": 5
}],
select=["title", "content", "source_url"],
top=5
)
# 2. Build context from search results
context = "\n\n".join([
f"Source: {r['title']}\n{r['content']}"
for r in results
])
# 3. Generate answer with Azure OpenAI
client = AzureOpenAI(
azure_endpoint="https://acme-ai.openai.azure.us",
azure_ad_token_provider=token_provider,
api_version="2024-12-01-preview"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer using only this context:\n{context}"},
{"role": "user", "content": "Summarize federal procurement regulations"}
]
)
Cross-reference: csa_platform/ai_integration/ for AI Foundry and Azure OpenAI integration patterns.
Model registry and lifecycle comparison¶
| SageMaker Model Registry | Azure ML Model Registry | MLflow (Databricks) |
|---|---|---|
| Model groups | Model names | Registered models |
| Model versions | Model versions | Model versions |
| Approval status (Pending/Approved/Rejected) | Model stages (None/Staging/Production/Archived) | Model stages |
| Model metrics | Model metrics + tags | Logged metrics + tags |
| Lineage (data → model → endpoint) | Lineage (data → experiment → model → endpoint) | MLflow lineage |
| Model cards | Responsible AI dashboard | MLflow model cards |
Migration approach for model registry¶
# Export SageMaker model registry
import boto3
sm = boto3.client('sagemaker')
# List all model packages
model_groups = sm.list_model_package_groups()
for group in model_groups['ModelPackageGroupSummaryList']:
packages = sm.list_model_packages(ModelPackageGroupName=group['ModelPackageGroupName'])
for pkg in packages['ModelPackageSummaryList']:
details = sm.describe_model_package(ModelPackageName=pkg['ModelPackageArn'])
# Export model artifact, metrics, and metadata
# Register in Azure ML
from azure.ai.ml.entities import Model
model = Model(
name="sales-forecast",
version="1",
path="./exported_model/",
type="custom_model",
description="Sales forecast model migrated from SageMaker",
tags={"source": "sagemaker", "original_arn": "arn:aws:sagemaker:..."}
)
ml_client.models.create_or_update(model)
Migration sequence¶
| Phase | Duration | Activities |
|---|---|---|
| 1. Inventory | 1-2 weeks | Catalog all SageMaker models, endpoints, pipelines; list Bedrock usage |
| 2. Environment setup | 2-3 weeks | Create Azure ML workspace, AI Foundry project, Azure OpenAI deployment |
| 3. Training migration | 3-4 weeks | Adapt training scripts; replicate experiments on Azure ML |
| 4. Model deployment | 2-3 weeks | Deploy models to Azure ML managed endpoints; validate inference |
| 5. Pipeline migration | 3-4 weeks | Convert SageMaker Pipelines to Azure ML Pipelines |
| 6. LLM/RAG migration | 2-3 weeks | Switch Bedrock calls to Azure OpenAI; migrate Knowledge Bases to AI Search |
| 7. Agent migration | 2-4 weeks | Rebuild Bedrock Agents as Azure AI Agents or Copilot Studio |
| 8. Validation | 2-3 weeks | Dual-run inference; compare model outputs; validate RAG quality |
Last updated: 2026-04-30 Maintainers: CSA-in-a-Box core team Related: Migration Center | Compute Migration | Security Migration | Migration Playbook