Skip to content

🚀 Azure Data Factory Environment Setup

Tutorial Duration Level

Create and configure your Azure Data Factory instance with proper security, networking, and governance settings for production-ready data integration.

📋 Table of Contents

✅ Prerequisites

Before creating your Data Factory instance, ensure you have:

  • Azure subscription with appropriate permissions
  • Resource group for Data Factory resources
  • Azure CLI or PowerShell installed locally
  • Basic understanding of Azure networking concepts
  • Completed Module 01: Fundamentals

Required Azure Permissions

Resource Required Role Purpose
Subscription Contributor or Owner Create resources
Resource Group Contributor Deploy ADF and dependencies
Azure AD Application Administrator Create service principals
Key Vault Key Vault Administrator Manage secrets

🏗️ Create Data Factory Instance

Option 1: Azure Portal

Step 1: Navigate to Data Factory

  1. Sign in to Azure Portal
  2. Click Create a resource
  3. Search for "Data Factory"
  4. Click Create

Step 2: Configure Basics

Project Details:
├── Subscription: [Your Subscription]
├── Resource Group: rg-adf-tutorial-dev
├── Region: East US 2

Instance Details:
├── Name: adf-tutorial-dev-001
├── Version: V2
└── Enable public network access: Yes (for now)

Complexity

💡 Tip: Use a naming convention that includes environment, purpose, and instance number.

Step 3: Configure Git Configuration (Optional for now)

  • Skip Git configuration initially
  • We'll configure this in a later step

Step 4: Configure Networking

For this tutorial:

  • Enable public access: Yes
  • Managed Virtual Network: Disabled (we'll enable later)

⚠️ Warning: In production, always use private endpoints and managed virtual networks.

Step 5: Review and Create

  1. Review all settings
  2. Click Create
  3. Wait for deployment (2-3 minutes)

Option 2: Azure CLI

# Set variables
SUBSCRIPTION_ID="your-subscription-id"
RESOURCE_GROUP="rg-adf-tutorial-dev"
LOCATION="eastus2"
ADF_NAME="adf-tutorial-dev-001"

# Login to Azure
az login

# Set subscription
az account set --subscription $SUBSCRIPTION_ID

# Create resource group
az group create \
  --name $RESOURCE_GROUP \
  --location $LOCATION

# Create Data Factory
az datafactory create \
  --resource-group $RESOURCE_GROUP \
  --factory-name $ADF_NAME \
  --location $LOCATION

# Verify creation
az datafactory show \
  --resource-group $RESOURCE_GROUP \
  --factory-name $ADF_NAME \
  --output table

Option 3: PowerShell

# Set variables
$SubscriptionId = "your-subscription-id"
$ResourceGroupName = "rg-adf-tutorial-dev"
$Location = "East US 2"
$DataFactoryName = "adf-tutorial-dev-001"

# Connect to Azure
Connect-AzAccount

# Set subscription context
Set-AzContext -SubscriptionId $SubscriptionId

# Create resource group
New-AzResourceGroup `
  -Name $ResourceGroupName `
  -Location $Location

# Create Data Factory
Set-AzDataFactoryV2 `
  -ResourceGroupName $ResourceGroupName `
  -Location $Location `
  -Name $DataFactoryName

# Verify creation
Get-AzDataFactoryV2 `
  -ResourceGroupName $ResourceGroupName `
  -Name $DataFactoryName

Option 4: ARM Template

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "dataFactoryName": {
      "type": "string",
      "defaultValue": "adf-tutorial-dev-001",
      "metadata": {
        "description": "Name of the Data Factory"
      }
    },
    "location": {
      "type": "string",
      "defaultValue": "[resourceGroup().location]",
      "metadata": {
        "description": "Location for all resources"
      }
    }
  },
  "resources": [
    {
      "type": "Microsoft.DataFactory/factories",
      "apiVersion": "2018-06-01",
      "name": "[parameters('dataFactoryName')]",
      "location": "[parameters('location')]",
      "identity": {
        "type": "SystemAssigned"
      },
      "properties": {
        "publicNetworkAccess": "Enabled"
      }
    }
  ],
  "outputs": {
    "dataFactoryId": {
      "type": "string",
      "value": "[resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName'))]"
    },
    "dataFactoryIdentityPrincipalId": {
      "type": "string",
      "value": "[reference(resourceId('Microsoft.DataFactory/factories', parameters('dataFactoryName')), '2018-06-01', 'Full').identity.principalId]"
    }
  }
}

Deploy the template:

az deployment group create \
  --resource-group rg-adf-tutorial-dev \
  --template-file adf-template.json \
  --parameters dataFactoryName=adf-tutorial-dev-001

🔒 Configure Security

Enable System-Assigned Managed Identity

Managed identities provide automatic credential management for Azure resources.

Azure Portal Method

  1. Navigate to your Data Factory
  2. Click Managed Identity under Settings
  3. Note the Object (principal) ID for later use
  4. Status should show Enabled

Azure CLI Method

# Enable system-assigned managed identity
az datafactory update \
  --resource-group $RESOURCE_GROUP \
  --factory-name $ADF_NAME \
  --set identity.type=SystemAssigned

# Get the managed identity principal ID
PRINCIPAL_ID=$(az datafactory show \
  --resource-group $RESOURCE_GROUP \
  --factory-name $ADF_NAME \
  --query identity.principalId \
  --output tsv)

echo "Managed Identity Principal ID: $PRINCIPAL_ID"

Create Azure Key Vault

Store secrets and connection strings securely.

# Set Key Vault name
KEY_VAULT_NAME="kv-adf-tutorial-dev"

# Create Key Vault
az keyvault create \
  --name $KEY_VAULT_NAME \
  --resource-group $RESOURCE_GROUP \
  --location $LOCATION \
  --enable-rbac-authorization true

# Grant ADF managed identity access to Key Vault
az role assignment create \
  --role "Key Vault Secrets User" \
  --assignee $PRINCIPAL_ID \
  --scope $(az keyvault show --name $KEY_VAULT_NAME --query id --output tsv)

Configure Access Policies

# Add access policy for your user account
USER_OBJECT_ID=$(az ad signed-in-user show --query id --output tsv)

az keyvault set-policy \
  --name $KEY_VAULT_NAME \
  --object-id $USER_OBJECT_ID \
  --secret-permissions get list set delete

Store Sample Secrets

# Store sample database connection string
az keyvault secret set \
  --vault-name $KEY_VAULT_NAME \
  --name "sql-connection-string" \
  --value "Server=tcp:myserver.database.windows.net,1433;Database=mydb;"

# Store sample API key
az keyvault secret set \
  --vault-name $KEY_VAULT_NAME \
  --name "api-key" \
  --value "sample-api-key-value"

🌐 Set Up Networking

Configure Public Network Access

For development environments:

# Enable public network access
az datafactory update \
  --resource-group $RESOURCE_GROUP \
  --factory-name $ADF_NAME \
  --public-network-access Enabled

Configure Firewall Rules (Optional)

Restrict access to specific IP addresses:

# Add firewall rule for your IP
YOUR_IP=$(curl -s ifconfig.me)

az datafactory update \
  --resource-group $RESOURCE_GROUP \
  --factory-name $ADF_NAME \
  --set publicNetworkAccess=Enabled \
  --set restrictInboundNetworkAccess=Enabled \
  --set allowedIpRanges="[\"$YOUR_IP/32\"]"

Enable Managed Virtual Network (Production)

For production environments, enable managed virtual network:

  1. Navigate to Data Factory in Azure Portal
  2. Click Managed Virtual Network under Manage
  3. Click Enable
  4. Configure private endpoints for data sources

🔧 Configure Git Integration

Azure DevOps Repository

Prerequisites

  • Azure DevOps organization
  • Project with Git repository
  • Personal Access Token (PAT)

Configuration Steps

  1. In Azure Portal, navigate to your Data Factory
  2. Click Author & Monitor to open ADF Studio
  3. Click Set up code repository
  4. Select Azure DevOps Git
  5. Configure settings:
Azure DevOps Git Configuration:
├── Repository type: Azure DevOps Git
├── Azure DevOps organization: your-org
├── Project name: adf-tutorial
├── Repository name: adf-tutorial-repo
├── Collaboration branch: main
├── Publish branch: adf_publish
├── Root folder: /
└── Import existing resources: Yes

GitHub Repository

Configuration Steps

  1. Navigate to ADF Studio
  2. Click Set up code repository
  3. Select GitHub
  4. Configure settings:
GitHub Configuration:
├── Repository type: GitHub
├── GitHub account: your-account
├── Repository name: adf-tutorial
├── Collaboration branch: main
├── Publish branch: adf_publish
└── Root folder: /

💡 Tip: Use separate branches for dev, test, and production environments.

🛠️ Install Development Tools

Azure Data Factory Extension for VS Code

# Install VS Code extension
code --install-extension ms-azuretools.vscode-azuredatafactory

Features:

  • Syntax highlighting for ADF JSON
  • IntelliSense for ADF properties
  • Validation of pipeline definitions
  • Integration with Azure DevOps

Azure PowerShell Module

# Install Az.DataFactory module
Install-Module -Name Az.DataFactory -Scope CurrentUser -Force

# Verify installation
Get-Module -Name Az.DataFactory -ListAvailable

Azure CLI Data Factory Extension

# Azure CLI already includes Data Factory commands
# Verify installation
az datafactory --help

✅ Validation

Verify Data Factory Deployment

# Check Data Factory status
az datafactory show \
  --resource-group $RESOURCE_GROUP \
  --factory-name $ADF_NAME \
  --output table

# List all Data Factories in resource group
az datafactory list \
  --resource-group $RESOURCE_GROUP \
  --output table

Expected output:

Name                  Location    ResourceGroup         ProvisioningState
--------------------  ----------  --------------------  -------------------
adf-tutorial-dev-001  eastus2     rg-adf-tutorial-dev   Succeeded

Test Key Vault Integration

# Verify Key Vault access from ADF managed identity
az keyvault secret show \
  --vault-name $KEY_VAULT_NAME \
  --name "sql-connection-string" \
  --output table

Access ADF Studio

  1. Navigate to your Data Factory in Azure Portal
  2. Click Author & Monitor
  3. Verify you can access the ADF Studio interface
  4. Check that Author, Monitor, and Manage tabs are accessible

Create Test Pipeline

Create a simple pipeline to verify everything works:

  1. In ADF Studio, click Author (pencil icon)
  2. Click + and select Pipeline
  3. Name it "TestPipeline"
  4. Drag a Wait activity to the canvas
  5. Configure wait duration: 5 seconds
  6. Click Debug to test
  7. Verify the pipeline runs successfully

🎯 Configuration Checklist

Before proceeding to the next module:

  • Data Factory instance created and accessible
  • Managed identity enabled
  • Azure Key Vault configured with access granted
  • Networking configured appropriately
  • Git integration configured (optional but recommended)
  • Development tools installed
  • Test pipeline created and executed successfully
  • ADF Studio accessible and responsive

📊 Resource Summary

After completing this module, you should have:

Resource Purpose Configuration
Data Factory Core orchestration service System-assigned identity enabled
Key Vault Secrets management RBAC-based access control
Managed Identity Authentication to Azure services Key Vault access granted
Git Repository Source control Collaboration and publish branches
Development Tools Local development VS Code extension, PowerShell module

🚨 Troubleshooting

Issue: Cannot Access ADF Studio

Symptoms: Error when clicking "Author & Monitor"

Solutions:

# Verify you have proper role assignments
az role assignment list \
  --assignee $(az ad signed-in-user show --query id --output tsv) \
  --scope $(az datafactory show \
    --resource-group $RESOURCE_GROUP \
    --factory-name $ADF_NAME \
    --query id --output tsv)

# Grant Data Factory Contributor role if missing
az role assignment create \
  --role "Data Factory Contributor" \
  --assignee $(az ad signed-in-user show --query id --output tsv) \
  --scope $(az datafactory show \
    --resource-group $RESOURCE_GROUP \
    --factory-name $ADF_NAME \
    --query id --output tsv)

Issue: Managed Identity Not Working

Symptoms: Cannot access Key Vault from ADF

Solutions:

# Verify managed identity is enabled
az datafactory show \
  --resource-group $RESOURCE_GROUP \
  --factory-name $ADF_NAME \
  --query identity

# Re-grant Key Vault access
az role assignment create \
  --role "Key Vault Secrets User" \
  --assignee $(az datafactory show \
    --resource-group $RESOURCE_GROUP \
    --factory-name $ADF_NAME \
    --query identity.principalId --output tsv) \
  --scope $(az keyvault show \
    --name $KEY_VAULT_NAME \
    --query id --output tsv)

📚 Additional Resources

🚀 Next Steps

Environment setup complete! Continue to:

03. Integration Runtime Configuration - Set up compute infrastructure for data movement


Module Progress: 2 of 18 complete

Tutorial Version: 1.0 Last Updated: January 2025