🔗 Linked Services & Datasets¶
Master the creation and management of linked services and datasets for seamless data source connectivity in Azure Data Factory.
📋 Table of Contents¶
- Understanding Linked Services
- Common Linked Service Types
- Authentication Methods
- Creating Datasets
- Best Practices
- Next Steps
🔌 Understanding Linked Services¶
Linked services define connection information to external resources. Think of them as connection strings that ADF uses to connect to your data stores.
Linked Service Components¶
graph LR
A[Linked Service] --> B[Connection Info]
A --> C[Authentication]
A --> D[Integration Runtime]
B --> E[Server/Endpoint]
B --> F[Database/Container]
C --> G[Key-based]
C --> H[Managed Identity]
C --> I[Service Principal] Linked Service Categories¶
| Category | Purpose | Examples |
|---|---|---|
| Data Stores | Connect to data sources | Azure SQL, Blob Storage, Cosmos DB |
| Compute Services | Execute transformations | Databricks, HDInsight, Synapse |
| File Systems | Access files | SFTP, HTTP, File Share |
| Databases | Connect to databases | SQL Server, Oracle, PostgreSQL |
| SaaS Applications | Integrate with apps | Salesforce, Dynamics 365, SAP |
🗂️ Common Linked Service Types¶
Azure SQL Database¶
{
"name": "AzureSqlDatabase_LinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"annotations": [],
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault_LinkedService",
"type": "LinkedServiceReference"
},
"secretName": "sql-connection-string"
}
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
}
}
Create Using Portal¶
- Navigate to ADF Studio > Manage > Linked services
- Click + New
- Search for "Azure SQL Database"
- Configure:
- Name:
AzureSqlDatabase_LinkedService - Integration Runtime:
AutoResolveIntegrationRuntime - Server name:
your-server.database.windows.net - Database name:
your-database - Authentication: Managed Identity
- Click Test connection
- Click Create
Azure Blob Storage¶
{
"name": "AzureBlobStorage_LinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"annotations": [],
"type": "AzureBlobStorage",
"typeProperties": {
"serviceEndpoint": "https://youraccount.blob.core.windows.net/",
"accountKind": "StorageV2",
"authenticationType": "ManagedIdentity"
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
}
}
Azure Data Lake Storage Gen2¶
{
"name": "AzureDataLakeGen2_LinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "AzureBlobFS",
"typeProperties": {
"url": "https://youraccount.dfs.core.windows.net/",
"authenticationType": "ManagedIdentity"
}
}
}
Azure Key Vault¶
{
"name": "AzureKeyVault_LinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "AzureKeyVault",
"typeProperties": {
"baseUrl": "https://your-keyvault.vault.azure.net/"
}
}
}
REST API¶
{
"name": "RestAPI_LinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "RestService",
"typeProperties": {
"url": "https://api.example.com",
"enableServerCertificateValidation": true,
"authenticationType": "Anonymous"
}
}
}
On-Premises SQL Server¶
{
"name": "OnPremSqlServer_LinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "SqlServer",
"typeProperties": {
"connectionString": "Server=myserver;Database=mydb;Integrated Security=False;User ID=username;",
"password": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault_LinkedService",
"type": "LinkedServiceReference"
},
"secretName": "onprem-sql-password"
}
},
"connectVia": {
"referenceName": "SelfHostedIR",
"type": "IntegrationRuntimeReference"
}
}
}
🔐 Authentication Methods¶
Managed Identity (Recommended)¶
Benefits: - No credential management - Azure AD integration - Automatic rotation
Configuration:
# Grant ADF managed identity access to Azure SQL
PRINCIPAL_ID=$(az datafactory show \
--resource-group rg-adf-tutorial-dev \
--factory-name adf-tutorial-dev-001 \
--query identity.principalId \
--output tsv)
# Add to SQL Database
az sql server ad-admin create \
--resource-group rg-adf-tutorial-dev \
--server-name your-sql-server \
--display-name adf-tutorial-dev-001 \
--object-id $PRINCIPAL_ID
Service Principal¶
{
"typeProperties": {
"servicePrincipalId": "your-app-id",
"servicePrincipalKey": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault_LinkedService",
"type": "LinkedServiceReference"
},
"secretName": "service-principal-key"
},
"tenant": "your-tenant-id"
}
}
Key-Based Authentication¶
{
"typeProperties": {
"accountKey": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault_LinkedService",
"type": "LinkedServiceReference"
},
"secretName": "storage-account-key"
}
}
}
SQL Authentication¶
{
"typeProperties": {
"connectionString": "Server=myserver;Database=mydb;User ID=username;",
"password": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVault_LinkedService",
"type": "LinkedServiceReference"
},
"secretName": "sql-password"
}
}
}
📊 Creating Datasets¶
Datasets represent data structures within linked services.
Azure SQL Table Dataset¶
{
"name": "AzureSqlTable_Dataset",
"type": "Microsoft.DataFactory/factories/datasets",
"properties": {
"linkedServiceName": {
"referenceName": "AzureSqlDatabase_LinkedService",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "AzureSqlTable",
"schema": [],
"typeProperties": {
"schema": "dbo",
"table": "Customer"
}
}
}
Parameterized Dataset¶
{
"name": "ParameterizedSqlTable_Dataset",
"type": "Microsoft.DataFactory/factories/datasets",
"properties": {
"linkedServiceName": {
"referenceName": "AzureSqlDatabase_LinkedService",
"type": "LinkedServiceReference"
},
"parameters": {
"schemaName": {
"type": "string",
"defaultValue": "dbo"
},
"tableName": {
"type": "string"
}
},
"type": "AzureSqlTable",
"typeProperties": {
"schema": {
"value": "@dataset().schemaName",
"type": "Expression"
},
"table": {
"value": "@dataset().tableName",
"type": "Expression"
}
}
}
}
Blob CSV Dataset¶
{
"name": "BlobCSV_Dataset",
"type": "Microsoft.DataFactory/factories/datasets",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage_LinkedService",
"type": "LinkedServiceReference"
},
"parameters": {
"containerName": {
"type": "string"
},
"fileName": {
"type": "string"
}
},
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": {
"value": "@dataset().fileName",
"type": "Expression"
},
"container": {
"value": "@dataset().containerName",
"type": "Expression"
}
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": []
}
}
Parquet Dataset¶
{
"name": "ParquetDataLake_Dataset",
"type": "Microsoft.DataFactory/factories/datasets",
"properties": {
"linkedServiceName": {
"referenceName": "AzureDataLakeGen2_LinkedService",
"type": "LinkedServiceReference"
},
"parameters": {
"folderPath": {
"type": "string"
}
},
"type": "Parquet",
"typeProperties": {
"location": {
"type": "AzureBlobFSLocation",
"folderPath": {
"value": "@dataset().folderPath",
"type": "Expression"
},
"fileSystem": "data"
},
"compressionCodec": "snappy"
}
}
}
🎯 Best Practices¶
Linked Services¶
- Use Managed Identity Where Possible
- Eliminates credential management
- More secure than key-based auth
-
Integrates with Azure RBAC
-
Store Secrets in Key Vault
-
Use Descriptive Naming
- Include data source type
- Include environment
-
Example:
AzureSql_SalesDB_Prod -
Test Connections
- Always test before saving
- Verify network connectivity
- Confirm permissions
Datasets¶
-
Parameterize for Reusability
-
Define Schema When Possible
- Improves performance
- Enables better validation
-
Helps with data lineage
-
Use Appropriate File Formats
- Parquet for big data scenarios
- CSV for interoperability
-
Avro for schema evolution
-
Organize by Purpose
- Separate source and sink datasets
- Group by system or domain
- Use consistent naming
✅ Hands-On Exercise¶
Exercise 1: Create Linked Services¶
Create the following linked services:
- Azure SQL Database
- Use managed identity
- Test connection
-
Document connection details
-
Azure Blob Storage
- Use storage account key from Key Vault
-
Test connection
-
Azure Key Vault
- For storing secrets
- Verify ADF has access
Exercise 2: Create Datasets¶
- SQL Table Dataset
- Points to a specific table
-
Define schema
-
Parameterized CSV Dataset
- Parameters: container, folder, file
-
Enable first row as header
-
Parquet Dataset
- For data lake storage
- Enable compression
🚨 Troubleshooting¶
Connection Test Fails¶
Error: Cannot connect to server
Solutions:
1. Check firewall rules
2. Verify credentials in Key Vault
3. Confirm Integration Runtime is running
4. Check network connectivity
Managed Identity Access Denied¶
# Grant managed identity access to SQL Database
# Connect to SQL Database and run:
CREATE USER [adf-tutorial-dev-001] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [adf-tutorial-dev-001];
ALTER ROLE db_datawriter ADD MEMBER [adf-tutorial-dev-001];
Key Vault Access Issues¶
# Verify ADF has Key Vault access
az role assignment list \
--scope $(az keyvault show --name your-keyvault --query id --output tsv) \
--assignee $(az datafactory show \
--resource-group rg-adf-tutorial-dev \
--factory-name adf-tutorial-dev-001 \
--query identity.principalId --output tsv)
📚 Additional Resources¶
🚀 Next Steps¶
Linked services and datasets configured! Continue to:
→ 05. Multi-Source Integration - Connect multiple data sources
Module Progress: 4 of 18 complete
Tutorial Version: 1.0 Last Updated: January 2025