Skip to content

🔗 Linked Services & Datasets

Tutorial Duration Level

Master the creation and management of linked services and datasets for seamless data source connectivity in Azure Data Factory.

📋 Table of Contents

🔌 Understanding Linked Services

Linked services define connection information to external resources. Think of them as connection strings that ADF uses to connect to your data stores.

Linked Service Components

graph LR
    A[Linked Service] --> B[Connection Info]
    A --> C[Authentication]
    A --> D[Integration Runtime]

    B --> E[Server/Endpoint]
    B --> F[Database/Container]

    C --> G[Key-based]
    C --> H[Managed Identity]
    C --> I[Service Principal]

Linked Service Categories

Category Purpose Examples
Data Stores Connect to data sources Azure SQL, Blob Storage, Cosmos DB
Compute Services Execute transformations Databricks, HDInsight, Synapse
File Systems Access files SFTP, HTTP, File Share
Databases Connect to databases SQL Server, Oracle, PostgreSQL
SaaS Applications Integrate with apps Salesforce, Dynamics 365, SAP

🗂️ Common Linked Service Types

Azure SQL Database

{
  "name": "AzureSqlDatabase_LinkedService",
  "type": "Microsoft.DataFactory/factories/linkedservices",
  "properties": {
    "annotations": [],
    "type": "AzureSqlDatabase",
    "typeProperties": {
      "connectionString": {
        "type": "AzureKeyVaultSecret",
        "store": {
          "referenceName": "AzureKeyVault_LinkedService",
          "type": "LinkedServiceReference"
        },
        "secretName": "sql-connection-string"
      }
    },
    "connectVia": {
      "referenceName": "AutoResolveIntegrationRuntime",
      "type": "IntegrationRuntimeReference"
    }
  }
}

Create Using Portal

  1. Navigate to ADF Studio > Manage > Linked services
  2. Click + New
  3. Search for "Azure SQL Database"
  4. Configure:
  5. Name: AzureSqlDatabase_LinkedService
  6. Integration Runtime: AutoResolveIntegrationRuntime
  7. Server name: your-server.database.windows.net
  8. Database name: your-database
  9. Authentication: Managed Identity
  10. Click Test connection
  11. Click Create

Azure Blob Storage

{
  "name": "AzureBlobStorage_LinkedService",
  "type": "Microsoft.DataFactory/factories/linkedservices",
  "properties": {
    "annotations": [],
    "type": "AzureBlobStorage",
    "typeProperties": {
      "serviceEndpoint": "https://youraccount.blob.core.windows.net/",
      "accountKind": "StorageV2",
      "authenticationType": "ManagedIdentity"
    },
    "connectVia": {
      "referenceName": "AutoResolveIntegrationRuntime",
      "type": "IntegrationRuntimeReference"
    }
  }
}

Azure Data Lake Storage Gen2

{
  "name": "AzureDataLakeGen2_LinkedService",
  "type": "Microsoft.DataFactory/factories/linkedservices",
  "properties": {
    "type": "AzureBlobFS",
    "typeProperties": {
      "url": "https://youraccount.dfs.core.windows.net/",
      "authenticationType": "ManagedIdentity"
    }
  }
}

Azure Key Vault

{
  "name": "AzureKeyVault_LinkedService",
  "type": "Microsoft.DataFactory/factories/linkedservices",
  "properties": {
    "type": "AzureKeyVault",
    "typeProperties": {
      "baseUrl": "https://your-keyvault.vault.azure.net/"
    }
  }
}

REST API

{
  "name": "RestAPI_LinkedService",
  "type": "Microsoft.DataFactory/factories/linkedservices",
  "properties": {
    "type": "RestService",
    "typeProperties": {
      "url": "https://api.example.com",
      "enableServerCertificateValidation": true,
      "authenticationType": "Anonymous"
    }
  }
}

On-Premises SQL Server

{
  "name": "OnPremSqlServer_LinkedService",
  "type": "Microsoft.DataFactory/factories/linkedservices",
  "properties": {
    "type": "SqlServer",
    "typeProperties": {
      "connectionString": "Server=myserver;Database=mydb;Integrated Security=False;User ID=username;",
      "password": {
        "type": "AzureKeyVaultSecret",
        "store": {
          "referenceName": "AzureKeyVault_LinkedService",
          "type": "LinkedServiceReference"
        },
        "secretName": "onprem-sql-password"
      }
    },
    "connectVia": {
      "referenceName": "SelfHostedIR",
      "type": "IntegrationRuntimeReference"
    }
  }
}

🔐 Authentication Methods

Benefits: - No credential management - Azure AD integration - Automatic rotation

Configuration:

# Grant ADF managed identity access to Azure SQL
PRINCIPAL_ID=$(az datafactory show \
  --resource-group rg-adf-tutorial-dev \
  --factory-name adf-tutorial-dev-001 \
  --query identity.principalId \
  --output tsv)

# Add to SQL Database
az sql server ad-admin create \
  --resource-group rg-adf-tutorial-dev \
  --server-name your-sql-server \
  --display-name adf-tutorial-dev-001 \
  --object-id $PRINCIPAL_ID

Service Principal

{
  "typeProperties": {
    "servicePrincipalId": "your-app-id",
    "servicePrincipalKey": {
      "type": "AzureKeyVaultSecret",
      "store": {
        "referenceName": "AzureKeyVault_LinkedService",
        "type": "LinkedServiceReference"
      },
      "secretName": "service-principal-key"
    },
    "tenant": "your-tenant-id"
  }
}

Key-Based Authentication

{
  "typeProperties": {
    "accountKey": {
      "type": "AzureKeyVaultSecret",
      "store": {
        "referenceName": "AzureKeyVault_LinkedService",
        "type": "LinkedServiceReference"
      },
      "secretName": "storage-account-key"
    }
  }
}

SQL Authentication

{
  "typeProperties": {
    "connectionString": "Server=myserver;Database=mydb;User ID=username;",
    "password": {
      "type": "AzureKeyVaultSecret",
      "store": {
        "referenceName": "AzureKeyVault_LinkedService",
        "type": "LinkedServiceReference"
      },
      "secretName": "sql-password"
    }
  }
}

📊 Creating Datasets

Datasets represent data structures within linked services.

Azure SQL Table Dataset

{
  "name": "AzureSqlTable_Dataset",
  "type": "Microsoft.DataFactory/factories/datasets",
  "properties": {
    "linkedServiceName": {
      "referenceName": "AzureSqlDatabase_LinkedService",
      "type": "LinkedServiceReference"
    },
    "annotations": [],
    "type": "AzureSqlTable",
    "schema": [],
    "typeProperties": {
      "schema": "dbo",
      "table": "Customer"
    }
  }
}

Parameterized Dataset

{
  "name": "ParameterizedSqlTable_Dataset",
  "type": "Microsoft.DataFactory/factories/datasets",
  "properties": {
    "linkedServiceName": {
      "referenceName": "AzureSqlDatabase_LinkedService",
      "type": "LinkedServiceReference"
    },
    "parameters": {
      "schemaName": {
        "type": "string",
        "defaultValue": "dbo"
      },
      "tableName": {
        "type": "string"
      }
    },
    "type": "AzureSqlTable",
    "typeProperties": {
      "schema": {
        "value": "@dataset().schemaName",
        "type": "Expression"
      },
      "table": {
        "value": "@dataset().tableName",
        "type": "Expression"
      }
    }
  }
}

Blob CSV Dataset

{
  "name": "BlobCSV_Dataset",
  "type": "Microsoft.DataFactory/factories/datasets",
  "properties": {
    "linkedServiceName": {
      "referenceName": "AzureBlobStorage_LinkedService",
      "type": "LinkedServiceReference"
    },
    "parameters": {
      "containerName": {
        "type": "string"
      },
      "fileName": {
        "type": "string"
      }
    },
    "type": "DelimitedText",
    "typeProperties": {
      "location": {
        "type": "AzureBlobStorageLocation",
        "fileName": {
          "value": "@dataset().fileName",
          "type": "Expression"
        },
        "container": {
          "value": "@dataset().containerName",
          "type": "Expression"
        }
      },
      "columnDelimiter": ",",
      "escapeChar": "\\",
      "firstRowAsHeader": true,
      "quoteChar": "\""
    },
    "schema": []
  }
}

Parquet Dataset

{
  "name": "ParquetDataLake_Dataset",
  "type": "Microsoft.DataFactory/factories/datasets",
  "properties": {
    "linkedServiceName": {
      "referenceName": "AzureDataLakeGen2_LinkedService",
      "type": "LinkedServiceReference"
    },
    "parameters": {
      "folderPath": {
        "type": "string"
      }
    },
    "type": "Parquet",
    "typeProperties": {
      "location": {
        "type": "AzureBlobFSLocation",
        "folderPath": {
          "value": "@dataset().folderPath",
          "type": "Expression"
        },
        "fileSystem": "data"
      },
      "compressionCodec": "snappy"
    }
  }
}

🎯 Best Practices

Linked Services

  1. Use Managed Identity Where Possible
  2. Eliminates credential management
  3. More secure than key-based auth
  4. Integrates with Azure RBAC

  5. Store Secrets in Key Vault

    {
      "password": {
        "type": "AzureKeyVaultSecret",
        "store": {
          "referenceName": "AzureKeyVault_LinkedService",
          "type": "LinkedServiceReference"
        },
        "secretName": "my-secret"
      }
    }
    

  6. Use Descriptive Naming

  7. Include data source type
  8. Include environment
  9. Example: AzureSql_SalesDB_Prod

  10. Test Connections

  11. Always test before saving
  12. Verify network connectivity
  13. Confirm permissions

Datasets

  1. Parameterize for Reusability

    {
      "parameters": {
        "tableName": {"type": "string"},
        "schemaName": {"type": "string"}
      }
    }
    

  2. Define Schema When Possible

  3. Improves performance
  4. Enables better validation
  5. Helps with data lineage

  6. Use Appropriate File Formats

  7. Parquet for big data scenarios
  8. CSV for interoperability
  9. Avro for schema evolution

  10. Organize by Purpose

  11. Separate source and sink datasets
  12. Group by system or domain
  13. Use consistent naming

✅ Hands-On Exercise

Exercise 1: Create Linked Services

Create the following linked services:

  1. Azure SQL Database
  2. Use managed identity
  3. Test connection
  4. Document connection details

  5. Azure Blob Storage

  6. Use storage account key from Key Vault
  7. Test connection

  8. Azure Key Vault

  9. For storing secrets
  10. Verify ADF has access

Exercise 2: Create Datasets

  1. SQL Table Dataset
  2. Points to a specific table
  3. Define schema

  4. Parameterized CSV Dataset

  5. Parameters: container, folder, file
  6. Enable first row as header

  7. Parquet Dataset

  8. For data lake storage
  9. Enable compression

🚨 Troubleshooting

Connection Test Fails

Error: Cannot connect to server

Solutions:
1. Check firewall rules
2. Verify credentials in Key Vault
3. Confirm Integration Runtime is running
4. Check network connectivity

Managed Identity Access Denied

# Grant managed identity access to SQL Database
# Connect to SQL Database and run:
CREATE USER [adf-tutorial-dev-001] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [adf-tutorial-dev-001];
ALTER ROLE db_datawriter ADD MEMBER [adf-tutorial-dev-001];

Key Vault Access Issues

# Verify ADF has Key Vault access
az role assignment list \
  --scope $(az keyvault show --name your-keyvault --query id --output tsv) \
  --assignee $(az datafactory show \
    --resource-group rg-adf-tutorial-dev \
    --factory-name adf-tutorial-dev-001 \
    --query identity.principalId --output tsv)

📚 Additional Resources

🚀 Next Steps

Linked services and datasets configured! Continue to:

05. Multi-Source Integration - Connect multiple data sources


Module Progress: 4 of 18 complete

Tutorial Version: 1.0 Last Updated: January 2025