Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
160 views

Copy Multiple Tables in Bulk by Using Azure Data Factory

This document describes using Azure Data Factory to copy multiple tables in bulk from an Azure SQL database to SQL Data Warehouse. It involves two pipelines - the first pipeline looks up the list of tables to copy and triggers the second pipeline, which iterates over each table and copies the data using staged copy via Blob storage for performance. The prerequisites and steps to create the datasets, linked services, parameters and pipelines are provided.

Uploaded by

Siddarth
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views

Copy Multiple Tables in Bulk by Using Azure Data Factory

This document describes using Azure Data Factory to copy multiple tables in bulk from an Azure SQL database to SQL Data Warehouse. It involves two pipelines - the first pipeline looks up the list of tables to copy and triggers the second pipeline, which iterates over each table and copies the data using staged copy via Blob storage for performance. The prerequisites and steps to create the datasets, linked services, parameters and pipelines are provided.

Uploaded by

Siddarth
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Copy multiple tables in bulk by using Azure Data Factory:

Note: (You can apply the same pattern in other copy scenarios as well. For example, copying
tables from SQL Server/Oracle to Azure SQL Database/Data Warehouse/Azure Blob, copying
different paths from Blob to Azure SQL Database tables)

Pipeline-2-IterateAndCopySQLTables & Pipeline-1-GetTableListAndTriggerCopyData

End-to-end workflow

In this scenario, you have a number of tables in Azure SQL Database that you want to copy
to SQL Data Warehouse. Here is the logical sequence of steps in the workflow that happens
in pipelines:

 The first pipeline looks up the list of tables that needs to be copied over to the
sink data stores. Alternatively you can maintain a metadata table that lists all the
tables to be copied to the sink data store. Then, the pipeline triggers another pipeline,
which iterates over each table in the database and performs the data copy operation.
 The second pipeline performs the actual copy. It takes the list of tables as a
parameter. For each table in the list, copy the specific table in Azure SQL Database to
the corresponding table in SQL Data Warehouse using staged copy via Blob storage
and PolyBase for best performance.
 In this example, the first pipeline passes the list of tables as a value for the
parameter.
Prerequisites:
Create a SQL DB with blank DB:

Create table “demo” and “employee”

create table dbo.employee

(eid int null,

ename varchar(20) null,

sal bigint null,

deptno int null

create table dbo.employee

(eid int null,

ename varchar(20) null,

sal bigint null,

deptno int null

Insert into employee values (1,’Amit’,10000,10)

Insert into employee values (2,’Alice’,20000,20)

Insert into employee values (3,’Bob’,12000,15)

create table dbo.demo

c1 varchar(20) null,

c2 varchar(20) null

Insert into demo values (‘hi’,’Bye’);

Insert into demo values (‘hello’,’hi’)


Create a Blank SQL DW: Create corresponding schema in DW or Migrate using migration utility tool

Create table “Demo” and “Employee” in DW

create table dbo.employee

(eid int null,

ename varchar(20) null,

sal bigint null,

deptno int null

create table dbo.demo

c1 varchar(20) null,

c2 varchar(20) null

Azure Blob Storage (Staging): Create a Blob storage

Create Linked Service for Blob Storage:

Create SQL DB linked Service:

Create SQL DW linked Service:


Create SQL DB dataset:

Name the dataset as “Sqldb_Source_Dataset”

NOTE: Select any table for Table. This table is a dummy table. You specify a query on the
source dataset when creating a pipeline. The query is used to extract data from the Azure
SQL database.
OR

Alternatively, you can click Edit check box, and enter dummyName as the table name
Create SQL DW dataset: Name it “Sql_DW_Target_Dataset”
Go to Parameter Tab  Pass a parameter “DWTableName”

Note: If you copy/paste this name from the file, ensure that there is no trailing space
character at the end of DWTableName
Go to Connections  Select Linked Service  Click on Edit and then on Text box  Add Dynamic
Content  Click on “DWTableName” parameter  Finish.
Create Pipeline:

Create the Pipeline-2-IterateAndCopySQLTables:

In the General tab, specify Pipeline-2-IterateAndCopySQLTables for
name.

Go to Parameters tab  New  create tableList parameter array type


In the Activities toolbox, expand Iteration & Conditions, and drag-drop the ForEach
Go to Setting Tab  Items  Click on Text box and explore “Add Dynamic content”
Collapse the System Variables and Functions section, click
the tableList under Parameters which will automatically populate the top expression text
box as @pipeline().parameter.tableList, then click Finish
Go to Activity Tab  Add Activity
Once you click on Add activity

Drag and Drop Copy Activity here with in Foreach activity:


Go to Source tab  Select Source Dataset i.e Sqldb_Source_Dataset.

Select Query and write  SELECT * FROM [@{item().TABLE_SCHEMA}].


[@{item().TABLE_NAME}]
Go to Sink Tab  Select Sink DataSet i.e Sql_DW_Target_Dataset

Dataset Properties  “DWTablename” Parameter  Value

Values  [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]

Expand Polybase  Settings, and Select Allow polybase

Use Type default Uncheck.

Pre Copy Script  TRUNCATE TABLE [@{item().TABLE_SCHEMA}].


[@{item().TABLE_NAME}]
Go to Setting Tab Enable Staging  Give Storage Linked Service

Note: If Path is blank it will take root path


Create Pipeline-1-GetTableListAndTriggerCopyData pipeline performs two steps:

 Looks up the Azure SQL Database system table to get the list of tables to be
copied.
 Triggers the pipeline Pipeline-2-IterateAndCopySQLTables to do the actual
data copy.

Create a pipeline and file name Pipeline-1- GetTableListAndTriggerCopyData


Drag and Drop Look Up activity Name is LookupTableList 
Go to Setting tab  Select Data Set “Sqldb_Source_Dataset”  Select Query
option

Write query: SELECT TABLE_SCHEMA, TABLE_NAME FROM


information_schema.TABLES WHERE TABLE_TYPE = 'BASE TABLE' and
TABLE_SCHEMA = 'dbo' and TABLE_NAME IN ('demo','employee')

Uncheck First Row Only option

Drag-and-drop Execute Pipeline activity from the Activities toolbox to the pipeline designer


surface, and set the name to TriggerCopy.
Go to Settings page, and do the following steps:

Select Pipeline-1-IterateAndCopySQLTables for Invoked pipeline.

Expand the Advanced section.

Click + New in the Parameters section.

Enter tableList for parameter name.

Click VALUE input box -> select the Add dynamic content below ->
enter @activity('LookupTableList').output.value as table name value ->
select Finish. You are setting the result list from the Lookup activity as an input to
the second pipeline. The result list contains the list of tables whose data needs to be
copied to the destination.
Connect Look Up Activity and Execute Pipeline:

Validate  Publish All  Trigger Now


Go to monitor

Check Activity Pipeline 

You might also like