Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

For More Information, Check: Starting Your Journey With Microsoft Azure Data Factory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Q1: Briefly describe the purpose of the ADF Service

ADF is used mainly to orchestrate the data copying between different relational and non-
relational data sources, hosted in the cloud or locally in your datacenters. Also, ADF can be used
for transforming the ingested data to meet your business requirements. It is ETL, or ELT tool for
data ingestion in most Big Data solutions.

• For more information, check Starting your journey with Microsoft Azure Data Factory

Q2: Data Factory consists of a number of components. Mention these components briefly

• Pipeline: The activities logical container


• Activity: An execution step in the Data Factory pipeline that can be used for data
ingestion and transformation
• Mapping Data Flow: A data transformation UI logic
• Dataset: A pointer to the data used in the pipeline activities
• Linked Service: A descriptive connection string for the data sources used in the pipeline
activities
• Trigger: Specify when the pipeline will be executed
• Control flow: Controls the execution flow of the pipeline activities

• For more information, check Starting your journey with Microsoft Azure Data Factory

Q3: What is the difference between the Dataset and Linked Service in Data Factory?

Linked Service is a description of the connection string that is used to connect to the data stores.
For example, when ingesting data from a SQL Server instance, the linked service contains the
name for the SQL Server instance and the credentials used to connect to that instance.

Dataset is a reference to the data store that is described by the linked service. When ingesting
data from a SQL Server instance, the dataset points to the name of the table that contains the
target data or the query that returns data from different tables.

• For more information, check Starting your journey with Microsoft Azure Data Factory

Q4: What is Data Factory Integration Runtime?

Integration Runtime is a secure compute infrastructure that is used by Data Factory to provide
the data integration capabilities across the different network environments and make sure that
these activities will be executed in the closest possible region to the data store.

• For more information, check Copy data between Azure data stores using Azure Data
Factory

Q5: Data Factory supports three types of Integration Runtimes. Mention these supported
types with a brief description for each

• Azure Integration Runtime: used for copying data from or to data stores accessed
publicly via the internet
• Self-Hosted Integration Runtime: used for copying data from or to an on-premises data
store or networks with access control
• Azure SSIS Integration Runtime: used to run SSIS packages in the Data Factory

• For more information, check Copy data between Azure data stores using Azure Data
Factory

Q6: When copying data from or to an Azure SQL Database using Data Factory, what is the
firewall option that we should enable to allow the Data Factory to access that database?

Allow Azure services and resources to access this server firewall option.

Q7: If we need to copy data from an on-premises SQL Server instance using Data Factory,
which Integration Runtime should be used and where should it be installed?

Self-Hosted Integration Runtime should be installed on the on-premises machine where the SQL
Server instance is hosted.

• For more information, check Copy data from On-premises data store to an Azure data store
using Azure Data Factory.

Q8: After installing the Self-Hosted Integration Runtime to the machine where the SQL
Server instance is hosted, how could we associate the SH-IR created from the Data Factory
portal?

We need to register it using the authentication key provided by the ADF portal.

• For more information, check Copy data from On-premises data store to an Azure data store
using Azure Data Factory

Q9: What is the difference between the Mapping data flow and Wrangling data flow
transformation activities in Data Factory?

Mapping data flow activity is a visually designed data transformation activity that allows us to
design a graphical data transformation logic without the need to be an expert developer, and
executed as an activity within the ADF pipeline on an ADF fully managed scaled-out Spark cluster.

Wrangling data flow activity is a code-free data preparation activity that integrates with Power
Query Online in order to make the Power Query M functions available for data wrangling using
spark execution.

• For more information, check Transform data using a Mapping Data Flow in Azure Data
Factory

Q10: Data Factory supports two types of compute environments to execute the transform
activities. Mention these two types briefly
On-demand compute environment, using a computing environment fully managed by the ADF.
In this compute type, the cluster will be created to execute the transform activity and removed
automatically when the activity is completed.

Bring Your Own environment, in which the used compute environment is managed by you and
ADF.

• For more information, check Transform data using a Mapping Data Flow in Azure Data
Factory

Q11: What is Azure SSIS Integration Runtime?

A fully managed cluster of virtual machines hosted in Azure and dedicated to run SSIS packages
in the Data Factory. The SSIS IR nodes can be scaled up, by configuring the node size, or scaled
out by configuring the number of nodes in the VMs cluster.

• For more information, check Run SSIS packages in Azure Data Factory

Q12: What is required to execute an SSIS package in Data Factory?

We need to create an SSIS IR and an SSISDB catalog hosted in Azure SQL Database or Azure SQL
Managed Instance.

• For more information, check Run SSIS packages in Azure Data Factory

Q13: Which Data Factory activity is used to run an SSIS package in Azure?

Execute SSIS Package activity.

• For more information, check Run SSIS packages in Azure Data Factory

Q14: Which Data Factory activity can be used to get the list of all source files in a specific
storage account and the properties of each file located in that storage?

Get Metadata activity.

• For more information, check How to use iterations and conditions activities in Azure Data
Factory

Q15: Which Data Factory activities can be used to iterate through all files stored in a
specific storage account, making sure that the files smaller than 1KB will be deleted from
the source storage account?

• ForEach activity for iteration


• Get Metadata to get the size of all files in the source storage
• If Condition to check the size of the files
• Delete activity to delete all files smaller than 1KB
• For more information, check How to use iterations and conditions activities in Azure Data
Factory

Q16: Data Factory supports three types of triggers. Mention these types briefly

• The Schedule trigger that is used to execute the ADF pipeline on a wall-clock schedule
• The Tumbling window trigger that is used to execute the ADF pipeline on a periodic
interval, and retains the pipeline state
• The Event-based trigger that responds to a blob related event, such as adding or deleting
a blob from an Azure storage account

• For more information, check How to schedule Azure Data Factory pipeline executions using
Triggers

Q17: Any Data Factory pipeline can be executed using three methods. Mention these
methods

• Under Debug mode


• Manual execution using Trigger now
• Using an added scheduled, tumbling window or event trigger

Q18: Data Factory supports four types of execution dependencies between the ADF
activities. Which dependency guarantees that the next activity will be executed regardless
of the status of the previous activity?

Completion dependency.

Q19: Data Factory supports four types of execution dependencies between the ADF
activities. Which dependency guarantees that the next activity will be executed only if the
previous activity is not executed?

Skipped dependency.

• For more information, check Dependencies in ADF

Q20: From where we can monitor the execution of a pipeline that is executed under the
Debug mode?

The Output tab of the pipeline, without the ability to use the Pipeline runs or Trigger runs under
ADF Monitor window to monitor it.

You might also like