Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
296 views

MS Azure Data Factory Lab Overview

This document provides an overview of a lab on Azure Data Factory. The lab consists of 9 modules that set up ADF and related resources, lift and shift an existing SSIS package to Azure, rebuild ETL processes using ADF, enhance data by retrieving from web APIs, transform and merge data using Hive on HDInsight, load data into a data warehouse, schedule pipelines, and monitor ADF pipelines and triggers. A variety of Azure services are leveraged including Azure SQL Database, Blob Storage, Data Factory, HDInsight, and Logic Apps.

Uploaded by

vida adf
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
296 views

MS Azure Data Factory Lab Overview

This document provides an overview of a lab on Azure Data Factory. The lab consists of 9 modules that set up ADF and related resources, lift and shift an existing SSIS package to Azure, rebuild ETL processes using ADF, enhance data by retrieving from web APIs, transform and merge data using Hive on HDInsight, load data into a data warehouse, schedule pipelines, and monitor ADF pipelines and triggers. A variety of Azure services are leveraged including Azure SQL Database, Blob Storage, Data Factory, HDInsight, and Logic Apps.

Uploaded by

vida adf
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

Azure Data Factory

Lab Overview
PREREQUISITES

Microsoft
Azure
SQL Data Warehouse

Powershell script file


Storage (Azure)

Resource group
Data Factory
HDInsight

Script file
Azure SQL database
CANDIDATE DATA SET
Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Module 1 – Setting Up ADF
and Resources
Module 1 Goal

Deploy and configure all the resources needed for


upcoming labs.
Lab Task Overview
• Configure and deploy PowerShell script for Azure Services

• Configure Office365 API Connection for sending email notifications

• Create Azure Data Factory 


Prerequisites
• Deployment files for this Lab downloaded to a local folder 

• Azure Subscription with rights to use/deploy Azure services 

• Azure PowerShell 

• SQL Server Management Studio

• Microsoft Azure Storage Explorer (Optional)

• Web browser (Edge/Chrome recommended) 


Technologies Leveraged
Powershell script file
• PowerShell

Azure Blob Storage • Azure SQL Database

Azure SQL database


• Azure Blob Storage

Azure Data Factory • Azure Data Factory

• Azure SQL Data Warehouse


SQL Data Warehouse
• Azure Logic App

• Office 365
Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Module 2 – Lift and Shift of
SSIS to Azure
Module 2 Goal

Use Azure Data Factory Integration Runtime to schedule


then execute a SSIS Package to simulate a typical Data
Warehouse Extract, Transform, and Load cycle. 
Prerequisites
• Azure Subscription with rights to use/deploy Azure services 

• SQL Server Management Studio

• Azure Resources created in Module 1

• SSIS Package located in Lab Module folder


Lab Task Overview
• Create Azure SSIS Integration Runtime

• Upload SSIS Package to Integration Services Catalog

• Manually Execute and Monitor Package Execution

• Create Pipeline and Trigger based Execution


Technologies Leveraged
• Azure SQL Database

• Azure Blob Storage

• Azure Data Factory

• Azure SQL Data Warehouse


Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Module 3 – Rebuilding the
Extract and Load with ADF
Module 3 Goal

Create a pipeline copy activity to copy a file from an S3


storage location to an Azure blob storage container in
preparation for later transformations.
Lab Task Overview
• Show the graphical user interface for creating a pipeline

• Copy CSV file via a Copy Activity

• Creating branching success and failure paths to send an email

• Use parameters to make the pipeline easy to change and more reusable

• Call an Azure Logic app to send an email via a Web Activity


Prerequisites
• Azure Subscription with rights to use/deploy Azure services 

• Azure Data Factory created in Module 1

• Visual Studio Team Services Git project (optional)


Technologies Leveraged
• AWS S3 (as data source)

• Azure Blob Storage

• Azure Data Factory

• Azure Logic App


Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Module 4 – Enhancing Data
with Cloud Services
Module 4 Goal

Create a pipeline copy activity to copy web REST API


weather data to a local file in Azure blob storage for later
transformations.
Prerequisites
• Azure Subscription with rights to use/deploy Azure services 

• Azure Data Factory created in Module 1

• Azure Blob storage container from Module 3

• Restful API configured for GET access with key


Lab Task Overview
• Show the Copy Data wizard to configure the pipeline 

• Configure the HTTP Source 

• Chain one pipeline to another using the Execute Pipeline activity


Technologies Leveraged
• Web data source

• Azure Blob Storage

• Azure Data Factory


Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Module 5 – Transform and
Merge Data with ADF and
HDInsight
Module 5 Goal

Create a pipeline Hive activity to merge the FAAmaster


and FAAaircraft data together into one file, leveraging Hive
for transformation activities.
Prerequisites
• Azure Subscription with rights to use/deploy Azure services 

• Azure Data Factory created in Module 1

• FAA Master and FAA Aircraft Hive Script files in Azure Storage from Module

1
• Azure Blob storage container from Module 3
Lab Task Overview
• Show the Hive activity to run Hive scripts against an HDInsight cluster

• Configure the Hive activity

• Chain one pipeline to another using the Execute Pipeline activity


Technologies Leveraged
• Azure Blob Storage

• Azure Data Factory

• Hive

• Azure HDInsight Clusters


Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Module 6 – Load Data into DW
with ADF
Module 6 Goal

Create a pipeline to load the Azure SQL Data Warehouse


dimension and fact tables from Azure SQL Database tables
and flat files.
Prerequisites
• Azure Subscription with rights to use/deploy Azure services 

• Azure Data Factory created in Module 1

• Azure Linked Service created in Module 3


Lab Task Overview
• Create a Stored Procedure activity to truncate our staging tables

• Create Copy activities to copy Azure DB and Azure Blob files to the staging

schema
• Create Stored Procedure activities to call a load dimensions and load fact

stored procedure on the Azure DW database


Technologies Leveraged
• Azure Blob Storage

• Azure SQL Database

• Azure Data Factory

• Azure SQL Data Warehouse


Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Module 7 – Scheduling your
ADF
Module 7 Goal

Schedule a pipeline run from the Azure Data Factory GUI


with the Schedule trigger for Time
Prerequisites
• Azure Subscription with rights to use/deploy Azure services 

• Azure Data Factory created in Module 1


Lab Task Overview
• Rename the Pipeline

• Schedule the Pipeline


Technologies Leveraged
• Azure Data Factory

• Azure Data Factory Pipeline

• Azure Data Factory Pipeline Trigger


Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Module 8 – Monitoring your
ADF
Module 8 Goal

Use Azure Data Factory monitoring tools to view


information about your triggers, pipelines, and integration
runtimes.
Prerequisites
• Azure Subscription with rights to use/deploy Azure services 

• Azure Data Factory created in Module 1

• Azure Data Factory Pipeline with a fired trigger from Module 7


Lab Task Overview
• Monitor Pipeline execution including drilling down to actvities executed

• Monitor the status of our trigger event

• View the status of the integration runtimes


Technologies Leveraged
• Azure Data Factory

• Azure Data Factory Pipeline

• Azure Data Factory Pipeline Trigger


Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Module 9 – Bringing it all
Together
Module 9 Goal
Verify and explore the results of our loaded data warehouse using SQL queries.
Prerequisites
• Azure Subscription with rights to use/deploy Azure services 

• Azure Data Factory created in Module 1

• Complete previous lab modules 3 - 7 to ensure data is loaded in Azure SQL Data Warehouse

• SQL Server Management Studio


Lab Task Overview
• Run queries via SQL Server Management Studio
• Explore Data
Lab Overview
Module 1 – Setting Up ADF and Resources
Module 2 – Lift and Shift of SSIS to Azure
Module 3 – Rebuilding the Extract and Load with ADF
Module 4 – Enhancing Data with Cloud Services
Module 5 – Transform and Merge Data with ADF and HDInsight
Module 6 – Load Data into DW with ADF
Module 7 – Scheduling your ADF
Module 8 – Monitoring your ADF
Module 9 – Bringing it all Together
Get started with Azure Data Factory
https://azure.microsoft.com/en-us/services/data-factory/

View pricing
https://azure.microsoft.com/en-us/pricing/details/data-factory/

Documentation
https://docs.microsoft.com/en-us/azure/data-factory/

You might also like