Is there a way that we can build our Data Factory all with parameters all based on MetaData? Yes there's and I will show you how to. During this session I will show how you can load Incremental or Full datasets from your sql database to your Azure Data Lake. The next step is that we want to track our history from these extracted tables. We will do this with Azure Databricks using Delta Lake. The last step that we want, is to make this data available in Azure SQL Database or Azure Synapse Analytics. Oh and we want to have some logging as well from our processes A lot to talk and to demo about during this session.
The document discusses Azure Data Factory v2. It provides an agenda that includes topics like triggers, control flow, and executing SSIS packages in ADFv2. It then introduces the speaker, Stefan Kirner, who has over 15 years of experience with Microsoft BI tools. The rest of the document consists of slides on ADFv2 topics like the pipeline model, triggers, activities, integration runtimes, scaling SSIS packages, and notes from the field on using SSIS packages in ADFv2.
Analyzing StackExchange data with Azure Data LakeBizTalk360
Big data is the new big thing where storing the data is the easy part. Gaining insights in your pile of data is something different. Based on a data dump of the well-known StackExchange websites, we will store & analyse 150+ GB of data with Azure Data Lake Store & Analytics to gain some insights about their users. After that we will use Power BI to give an at a glance overview of our learnings.
If you are a developer that is interested in big data, this is your time to shine! We will use our existing SQL & C# skills to analyse everything without having to worry about running clusters.
J1 T1 4 - Azure Data Factory vs SSIS - Regis BaccaroMS Cloud Summit
This document compares Azure Data Factory (ADF) and SQL Server Integration Services (SSIS) for data integration tasks. It outlines the core concepts and architecture of ADF, including datasets, pipelines, activities, scheduling and execution. It then provides an overview of what SSIS is used for and its benefits. The document proceeds to compare ADF and SSIS in terms of development, administration, deployment, monitoring, supported sources and destinations, security, and pricing. It concludes that while both tools are not meant for the same purposes, organizations can benefit from using them together in a hybrid approach for different tasks.
Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco
Participants will get a deep dive into one of Azure’s newest offering: Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. In this session, we start with a technical overview of Spark and quickly jump into Azure Databricks’ key collaboration features, cluster management, and tight data integration with Azure data sources. Concepts are made concrete via a detailed walk through of an advance analytics pipeline built using Spark and Azure Databricks.
Full video of the presentation: https://www.youtube.com/watch?v=14D9VzI152o
Presentation demo: https://github.com/devlace/azure-databricks-anomaly
Slidedeck related to the talk presented at the Manila Data Day event March 2020. The demo covers Azure services like Data Lake Storage (Gen 2), Azure Data Factory, Azure Databricks, Azure Synapse, Key Vault and Active directory to build a modern data warehouse.
Azure Data Factory is a data integration service that allows for data movement and transformation between both on-premises and cloud data stores. It uses datasets to represent data structures, activities to define actions on data with pipelines grouping related activities, and linked services to connect to external resources. Key concepts include datasets representing input/output data, activities performing actions like copy, and pipelines logically grouping activities.
Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule
Slide deck of the third part of building Modern Data Warehouse using Azure. This session covered Azure Synapse, formerly SQL Data Warehouse. We look at the Azure Synapse Architecture, external files, integration with Azuer Data Factory.
The recording of the session is available on YouTube
https://www.youtube.com/watch?v=LZlu6_rFzm8&WT.mc_id=DP-MVP-5003170
You may know Google for search, YouTube, Android, Chrome, and Gmail, but that's only as an end-user of OUR apps. Did you know you can also integrate Google technologies into YOUR apps? We have many APIs and open source libraries that help you do that! If you have tried and found it challenging, didn't find not enough examples, run into roadblocks, got confused, or just curious about what Google APIs can offer, join us to resolve any blockers. Code samples will be in Python and/or Node.js/JavaScript. This session focuses on showing you how to access Google Cloud APIs from one of Google Cloud's compute platforms, whether serverless or otherwise.
Modern DW Architecture
- The document discusses modern data warehouse architectures using Azure cloud services like Azure Data Lake, Azure Databricks, and Azure Synapse. It covers storage options like ADLS Gen 1 and Gen 2 and data processing tools like Databricks and Synapse. It highlights how to optimize architectures for cost and performance using features like auto-scaling, shutdown, and lifecycle management policies. Finally, it provides a demo of a sample end-to-end data pipeline.
Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform optimized for Azure. Designed in collaboration with the founders of Apache Spark, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. As an Azure service, customers automatically benefit from the native integration with other Azure services such as Power BI, SQL Data Warehouse, and Cosmos DB, as well as from enterprise-grade Azure security, including Active Directory integration, compliance, and enterprise-grade SLAs.
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
The driving force behind Apache Spark (Databricks Inc.) and Microsoft have designed a joint service to quickly and easily create Big Data and Advanced Analytics solutions. The combination of the comprehensive Databricks Unified Analytics platform and the powerful capabilities of Microsoft Azure make it easy to analyse data streams or large amounts of data, as well asthe training of AI models. Sascha Dittmann shows in this session how the new Azure service can be set up and used in various real-world scenarios. He also shows, how to connect the various Azure Services to the Azure Databricks service.
This document provides an overview of Azure Databricks, including:
- Azure Databricks is an Apache Spark-based analytics platform optimized for Microsoft Azure cloud services. It includes Spark SQL, streaming, machine learning libraries, and integrates fully with Azure services.
- Clusters in Azure Databricks provide a unified platform for various analytics use cases. The workspace stores notebooks, libraries, dashboards, and folders. Notebooks provide a code environment with visualizations. Jobs and alerts can run and notify on notebooks.
- The Databricks File System (DBFS) stores files in Azure Blob storage in a distributed file system accessible from notebooks. Business intelligence tools can connect to Databricks clusters via JDBC
The document discusses Azure Data Factory V2 data flows. It will provide an introduction to Azure Data Factory, discuss data flows, and have attendees build a simple data flow to demonstrate how they work. The speaker will introduce Azure Data Factory and data flows, explain concepts like pipelines, linked services, and data flows, and guide a hands-on demo where attendees build a data flow to join customer data to postal district data to add matching postal towns.
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenMS Cloud Summit
This document provides an overview and demonstration of Azure Data Lake Store and Azure Data Lake Analytics. The presenter discusses how Azure Data Lake can store and analyze large amounts of data in its native format. Key capabilities of Azure Data Lake Store like unlimited storage, security features, and support for any data type are highlighted. Azure Data Lake Analytics is presented as an elastic analytics service built on Apache YARN that can process large amounts of data. The U-SQL language for big data analytics is demonstrated, along with using Visual Studio and PowerShell for interacting with Azure Data Lake. The presentation concludes with a question and answer section.
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
Azure Databricks - An Introduction (by Kris Bock)Daniel Toomey
Azure Databricks is a fast, easy to use, and collaborative Apache Spark-based analytics platform optimized for Azure. It allows for interactive collaboration through a unified workspace, enables sharing of insights through integration with Power BI, and provides native integration with other Azure services. It also offers enterprise-grade security through integration with Azure Active Directory and compliance features.
This document discusses Microsoft Azure and its capabilities. It highlights that Azure has over 100 datacenters globally, with 19 regions currently online. It also notes that Azure has one of the top 3 networks in the world and offers larger VM sizes than AWS or Google Cloud. The document then summarizes some of Azure's core capabilities like compute, storage, databases, analytics and more. It provides examples of how customers can use Azure's tools and services.
Antonios Chatzipavlis presented on migrating SQL workloads to Azure. He discussed modernizing data platforms by discovering, assessing, planning, transforming, optimizing, testing and remediating. Key migration considerations include remaining, rehosting, refactoring, rearchitecting, rebuilding or replacing workloads. Tools for migrating data include Microsoft Assessment and Planning Toolkit, Data Migration Assistant, Database Experimentation Assistant, SQL Server Migration Assistant, and Azure Database Migration Service. Workloads can be migrated to Azure VMs, Azure SQL Databases or Azure SQL Managed Instances.
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
The main objective of this workshop is to give the audience hands on experience with several Hadoop technologies and jump start their hadoop journey. In this workshop, you will load data and submit queries using Hadoop! Before jumping in to the technology, the Founders of DataKitchen review Hadoop and some of its technologies (MapReduce, Hive, Pig, Impala and Spark), look at performance, and present a rubric for choosing which technology to use when.
NOTE: To complete hands on poriton in the time allotted, attendees should come with a newly created AWS (Amazon Web Services) Account and complete the other prerequisites found in the DataKitchen blog <http: />.
Database Performance monitoring tool for Microsoft SQL Server 2005 & 2008 (included in "SQL Server 2008 R2 Unleashed" best-selling book), Sybase ASE 11.5 to 15.5 and Oracle 8i to 11g.
Cloud-Native Patterns for Data-Intensive ApplicationsVMware Tanzu
Are you interested in learning how to schedule batch jobs in container runtimes?
Maybe you’re wondering how to apply continuous delivery in practice for data-intensive applications? Perhaps you’re looking for an orchestration tool for data pipelines?
Questions like these are common, so rest assured that you’re not alone.
In this webinar, we’ll cover the recent feature improvements in Spring Cloud Data Flow. More specifically, we’ll discuss data processing use cases and how they simplify the overall orchestration experience in cloud runtimes like Cloud Foundry and Kubernetes.
Please join us and be part of the community discussion!
Presenters :
Sabby Anandan, Product Manager
Mark Pollack, Software Engineer, Pivotal
The document outlines a multi-month implementation plan for a BI project with the following key stages:
1) Preparation and Planning in Month 1 involving prioritization, hardware installation, staffing, and software procurement.
2) ETL development from Month 1-3 involving requirement analysis, design, development and testing of the ETL processes.
3) Initial deployment from Month 2-3 setting up the metadata framework and data governance with report reductions.
4) Ongoing development from Month 4-10 involving further report reductions, incremental deployments, building the data library and dashboards. Headcount savings also take effect during this stage.
5) Long term operations starting from Month 11 involving targeting
This document provides an overview of Azure Data Factory (ADF), including why it is used, its key components and activities, how it works, and differences between versions 1 and 2. It describes the main steps in ADF as connect and collect, transform and enrich, publish, and monitor. The main components are pipelines, activities, datasets, and linked services. Activities include data movement, transformation, and control. Integration runtime and system variables are also summarized.
This document provides an overview of a course on implementing a modern data platform architecture using Azure services. The course objectives are to understand cloud and big data concepts, the role of Azure data services in a modern data platform, and how to implement a reference architecture using Azure data services. The course will provide an ARM template for a data platform solution that can address most data challenges.
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
Oracle Data Integration Platform is a cornerstone for big data solutions that provides five core capabilities: business continuity, data movement, data transformation, data governance, and streaming data handling. It includes eight core products that can operate in the cloud or on-premise, and is considered the most innovative in areas like real-time/streaming integration and extract-load-transform capabilities with big data technologies. The platform offers a comprehensive architecture covering key areas like data ingestion, preparation, streaming integration, parallel connectivity, and governance.
This document provides information about inplant training programs offered by KAASHIV INFOTECH in Chennai, India. It outlines 5-day training schedules for students of CSE/IT/MCA, ECE/EEE, and Mechanical/Civil engineering. The CSE/IT/MCA schedule focuses on topics like Big Data, app development, ethical hacking, and cloud computing. The ECE/EEE schedule covers embedded systems, wireless systems, and CCNA networking. The mechanical/civil schedule includes aircraft design, vehicle movement, and 3D modeling and packaging. The training is handled by professionals and aims to equip students with strong technical skills.
This document provides information about Venkatesan Prabu Jayakantham (Venkat), the Managing Director of KAASHIV INFOTECH, a software company in Chennai. It outlines Venkat's experience in Microsoft technologies and certifications. It also details the various awards he has received throughout his career. Finally, it advertises KAASHIV INFOTECH's inplant training programs for students in fields like computer science, electronics, and mechanical engineering.
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoRomit Mehta
This is my presentation at TDWI Leadership Summit. It talks about how products like Gimel, Unified Data Catalog and PayPal Notebooks help improve data scientist productivity and enable machine learning at scale at PayPal.
Using Databricks as an Analysis PlatformDatabricks
Over the past year, YipitData spearheaded a full migration of its data pipelines to Apache Spark via the Databricks platform. Databricks now empowers its 40+ data analysts to independently create data ingestion systems, manage ETL workflows, and produce meaningful financial research for our clients.
The Magic Of Application Lifecycle Management In Vs PublicDavid Solivan
The document discusses challenges with software development projects and how tools from Microsoft can help address these challenges. It notes that most projects fail or are over budget and challenges include poor requirements gathering and testing. However, tools like Visual Studio and Team Foundation Server that integrate requirements, work tracking, source control, testing and other functions can help make successful projects more possible by facilitating team collaboration. The document outlines features of these tools and how they aim to make application lifecycle management a routine part of development.
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
This presentation is geared toward enterprise architects and senior IT leaders looking to drive more value from their data by learning about cloud data lake management.
As businesses focus on leveraging big data to drive digital transformation, technology leaders are struggling to keep pace with the high volume of data coming in at high speed and rapidly evolving technologies. What's needed is an approach that helps you turn petabytes into profit.
Cloud data lakes and cloud data warehouses have emerged as a popular architectural pattern to support next-generation analytics. Informatica's comprehensive AI-driven cloud data lake management solution natively ingests, streams, integrates, cleanses, governs, protects and processes big data workloads in multi-cloud environments.
Please leave any questions or comments below.
The document discusses Azure Data Factory and its capabilities for cloud-first data integration and transformation. ADF allows orchestrating data movement and transforming data at scale across hybrid and multi-cloud environments using a visual, code-free interface. It provides serverless scalability without infrastructure to manage along with capabilities for lifting and running SQL Server Integration Services packages in Azure.
This document summarizes the key points from a presentation on SQL Server 2016. It discusses in-memory and columnstore features, including performance gains from processing data in memory instead of on disk. New capabilities for real-time operational analytics are presented that allow analytics queries to run concurrently with OLTP workloads using the same data schema. Maintaining a columnstore index for analytics queries is suggested to improve performance.
The document discusses Delta Live Tables (DLT), a tool from Databricks that allows users to build reliable data pipelines in a declarative way. DLT automates complex ETL tasks, ensures data quality, and provides end-to-end visibility into data pipelines. It unifies batch and streaming data processing with a single SQL API. Customers report that DLT helps them save significant time and effort in managing data at scale, accelerates data pipeline development, and reduces infrastructure costs.
SmartNews uses various SaaS products like New Relic, Datadog, and Chartio to monitor applications and infrastructure, collect metrics, and create dashboards. SaaS allows SmartNews to focus on its core product without having to dedicate engineering resources to tasks like monitoring and visualization. It provides built-in integrations, easy setup processes, and expert support so SmartNews can move fast without having to develop and maintain these capabilities internally. SaaS makes operations more efficient and allows SmartNews to gather insights that help optimize performance and troubleshoot issues.
Similar to Is there a way that we can build our Azure Data Factory all with parameters based on MetaData? (20)
Azure Key Vault, Azure Dev Ops and Azure Synapse - how these services work pe...Erwin de Kreuk
Can we store our Connectionstrings or BlobStorageKeys or other Secretvalues somewhere else then in Azure Synapse Pipelines? Yes you can! You can store these valuable secrets in Azure Key Vault(AKV).
• But how can we achieve this in Azure Synapse Analytics?
• How do we deploy our Synapse Pipelines in Azure Dev Ops to Test, Acceptance and Production environments with these Secrets ?
• Can this be setup dynamically?
During this session I will give answers on all these questions. You will learn how to setup your Azure Key Vault, connect these secrets in Azure Synapse Analytics and finally deploy these secrets dynamically in Azure Dev Ops. As you can see a lot to talk about during this session.
Data weekender4.2 azure purview erwin de kreukErwin de Kreuk
This document provides information about Azure Purview and its capabilities for unified data governance. It discusses:
- Azure Purview allows for automated discovery of data across on-premises, multicloud and SaaS sources through its data map. It enables classification, lineage tracking and compliance.
- The data catalog provides semantic search and browse capabilities along with a business glossary and data lineage visualizations.
- Insights features provide reporting on assets, scans, the business glossary, classifications and labeling to give visibility into data usage across the organization.
- The document demonstrates registering and scanning a Power BI tenant to discover data with Azure Purview.
Data saturday Oslo Azure Purview Erwin de KreukErwin de Kreuk
Azure Purview provides unified data governance capabilities including automated data discovery, classification, and lineage visualization. It helps organizations overcome data governance silos, comply with regulations, and increase data agility. The key components of Azure Purview include the Data Map for automated metadata extraction and lineage, the Data Catalog for data discovery and governance, and Insights for monitoring data usage. It supports governance of data across cloud and on-premises environments in a serverless and fully managed platform.
Datasaturday Pordenone Azure Purview Erwin de KreukErwin de Kreuk
Azure Purview is Microsoft's solution for unified data governance. It includes three main components:
1. The Purview Data Map automates metadata scanning and lineage identification across hybrid data stores and applies over 100 classifiers and Microsoft sensitivity labels.
2. The Purview Data Catalog enables effortless discovery through semantic search and a business glossary, and shows data lineage with sources, owners, and transformations.
3. Purview Insights provides reports on assets, scans, the glossary, classification, and sensitive data labeling to give visibility into data usage across the estate.
SQL KONFERENZ 2020 Azure Key Vault, Azure Dev Ops and Azure Data Factory how...Erwin de Kreuk
Can we store our Connectionstrings or BlobStorageKeys or other Secretvalues somewhere else then in Azure Data Factory(ADF)? Yes you can! You can store these valuable secrets in Azure Key Vault(AKV).
But how can we achieve this in ADF? And finally how do we deploy our DataFactories in Azure Dev Ops to Test, Acceptance and Production environments with these Secrets ? Can this be setup dynamically?
During this session I will give answers on all of these questions. You will learn how to setup your Azure Key Vault, connect these secrets in ADF and finally deploy these secrets dynamically in Azure Dev Ops. As you can see a lot to talk about during this session.
DatamindsConnect2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory ho...Erwin de Kreuk
Can we store our Connectionstrings or BlobStorageKeys or other Secretvalues somewhere else then in Azure Data Factory(ADF)? Yes you can! You can store these valuable secrets in Azure Key Vault(AKV).
But how can we achieve this in ADF? And finally how do we deploy our DataFactories in Azure Dev Ops to Test, Acceptance and Production environments with these Secrets ? Can this be setup dynamically?
During this session I will give answers on all of these questions. You will learn how to setup your Azure Key Vault, connect these secrets in ADF and finally deploy these secrets dynamically in Azure Dev Ops. As you can see a lot to talk about during this session.
Help, I need to migrate my On Premise Database to Azure, which Database Tier ...Erwin de Kreuk
Azure SQL Database provides several deployment options including single databases and elastic pools. The single database option provides resource guarantees at the database level while elastic pools allow for sharing of resources across multiple databases for better cost efficiency. Azure SQL Database offers different service tiers including Basic, Standard, and Premium that provide different performance levels and features. Customers can choose between DTU-based and vCore-based purchasing models, with vCores offering more flexibility and control over compute and storage. The Data Migration Assistant and Data Migration Service can help customers assess, plan, and execute migrations of databases to Azure SQL Database.
DataSaturdayNL 2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory h...Erwin de Kreuk
Can we store our Connectionstrings or BlobStorageKeys or other Secretvalues somewhere else then in Azure Data Factory(ADF)? Yes you can! You can store these valuable secrets in Azure Key Vault(AKV). But how can we achieve this in ADF? And finally how do we deploy our DataFactories in Azure Dev Ops to Test, Acceptance and Production environments with these Secrets ? Can this be setup dynamically? During this session I will give answers on all of these questions. You will learn how to setup your Azure Key Vault, connect these secrets in ADF and finally deploy these secrets dynamically in Azure Dev Ops. As you can see a lot to talk about during this session.
Airline Satisfaction Project using Azure
This presentation is created as a foundation of understanding and comparing data science/machine learning solutions made in Python notebooks locally and on Azure cloud, as a part of Course DP-100 - Designing and Implementing a Data Science Solution on Azure.
Applications of Data Science in Various IndustriesIABAC
The wide-ranging applications of data science across industries.
From healthcare to finance, data science drives innovation and efficiency by transforming raw data into actionable insights.
Learn how data science enhances decision-making, boosts productivity, and fosters new advancements in technology and business. Explore real-world examples of data science applications today.
Is there a way that we can build our Azure Data Factory all with parameters based on MetaData?
1. #ScottishSummit2021
E r w i n d e K r e u k
A z u r e D a t a F a c t o r y
E l o n 1 7 : 0 0 G M T
Is there a way that we can build our Azure Data Factory all with
parameters based on MetaData?
2. I n S p a r k
L e a d D a t a & A I
@ e r w i n d e k r e u k
Erwin
De Kreuk
Is there a way that we can build our Azure Data Factory all
with parameters based on MetaData?
6. • Hybrid data integration service
• With visual tools, you can build, debug, deploy, operationalize and
monitor your (big) data pipelines
• Provides a way to transform data at scale without any coding required ELT
Platform
What is Azure Data Factory?
8. Global Parameters
Can be used across all your
Pipelines
Can be deployment in CI/CD
pipeline().globalParameters.<parameterName>.
9. Can be used across all your
Pipelines
Can be deployment in CI/CD
Global parameters - Azure Data Factory | Microsoft Docs
Disabled
Global Parameters
10. Enabled
Global Parameters
Can be used across all your
Pipelines
Can be deployment in CI/CD
Global parameters - Azure Data Factory | Microsoft Docs
11. Dataset Parameters
Create 1 dataset for all your
Linked Services activities
You can’t use Global Parameters
FileSystem Directory FileName
12. Dataset Parameters
Create 1 dataset for all your
Linked Services activities
You can’t use Global Parameters
FileSystem Directory FileName
21. Can we get answers on the
following questions?
Can we build ADF Pipelines
dynamically?
Can we extract data from my sources
based on MetaData?
Can we load the active(current) or
historical records to a DataStore?
Can we build history from extracted
data based on MetaData?
Can we log the execution of the
Pipelines?
24. Lookup
Get Source data
ForEach
For Each
Execute Pipeline
Load
Lookup
Get LastLoadDate
Copy
Copy Source to
ADLS
Stored Procedure
Set LastLoadDate
Command
Execute
SELECT [PipelineParameterId]
,[SourceName]
,[SourceSchema]
,[SelectQuery]
,[SelectLastLoaddate]
,[FilePath]
,[FileName]
,[TableDestinationName]
,[ProcessType]
,[IsActive]
,[IsIncremental]
,[IsIncrementalColumn]
,[LastLoadtime]
FROM [execution].[Pipeline_DataLake_Files]
SELECT case when 1=1 then
convert(varchar,max(LasteditedWhen),120) else
convert(varchar,getdate(),120) end as LastLoadDate FROM
SourceSchema.SourceTable
Metadata
Source Parameter table
25. Logging Log Start and End Time of records
Log Extracted Records
Log Execution Failure
Create Pipeline_ExecutionLog table
[audit].[Event_Pipeline_OnBegin] [audit].[Event_Pipeline_OnEnd]
[audit].[Event_Pipeline_OnError]
PIPELINE ACTIVITY
26. Logging Log Start and End Time of records
Log Extracted Records
Log Execution Failure
Create Pipeline_ExecutionLog table
Pipeline_ExecutionLog
BEGIN
Insert new Record
Insert Metadata
Insert Start time
END
End Time
Status(1)
Row Counts
Pipeline Details
ERROR
End Time
Status(2)
Failure Message
30. Can we get answers on the
following questions?
Can we build ADF Pipelines
dynamically?
Can we extract data from my sources
based on MetaData?
Can we load the active(current) or
historical records to a DataStore?
Can we build history from extracted
data based on MetaData?
Can we log the execution of the
Pipelines?
31. Integration
runtime
on premises
datasources
Databricks
Data Factory
Azure SQL
Azure SQL Database
Data Lake
Intermediate Zone
Parquet
Azure
Synapse
Analytics
Data Factory
Data Lake
Raw Zone
Parquet
Data Store
Delta Lake
Data Lake
EXTRACT PREP LOAD
Auditing, Logging, MetaData and Execution
Power BI
HIGH OVERVIEW ARCHITECTURE
NITROGEN Data Accelerator
32. Process Flow
For each
Daily Run
Data Lake
Command
Delta Lake
Command
Data Store
Command
Data Lake
Execute
For each
Delta Lake
Execute
For each
Data Store
Execute
Auditing
Pipeline_DataLake
Pipeline_ExecutionLog
Pipeline_DeltaLake Pipeline_DataStore
Begin End Error
Auditing
Begin End Error
Auditing
Begib End Error
Command
Execute
36. Can we get answers on the
following questions?
Can we build ADF Pipelines
dynamically?
Can we extract data from my sources
based on MetaData?
Can we load the active(current) or
historical records to a DataStore?
Can we build history from extracted
data based on MetaData?
Can we log the execution of the
Pipelines?
Hallo and Welcome to my session about
Is there a way that we can build our Azure Data Factory all with parameters based on MetaData?
My name is Erwin de Kreuk and I’m working as a Lead Data and AI for InSpark a Microsoft Partner in the Netherlands
Azure Data Factory is a Hybrid data integration
During the session today I will explain how you can use Parameters within Azure DataFactory
How you can replace these parameters with MetaData
How we can log these dynamic pipelines
And a quick walk through of a complete solution with DataBricks and Azure SQL Database as endpoint
Hybrid data integration service where you easily extract data from On Prem Sources, Cloud Sources, SaaS application with more then 120 different DataConnectors
With visual tools, you can build, debug, deploy, operationalize and monitor your data pipelines or big data Pipelines
Provides an easy way to transform data at scale without any coding required ELT Platform.
With Parameters you build a complete dynamically solution and this is what I’m going
Passing parameters to ADF or Azure Synapse is quite important as it provides the flexibility required to create dynamic pipelines.
To reference a parameter, you must provide the fully qualified name of the parameter.
It is worth noting that parameter names are case sensitive.
A parameter could be a user input, which means that the parameter is passed from the pipeline layer or could be an input coming from an activity within the pipeline.
Global parameters can be used in any pipeline expression. If a pipeline is referencing another resource such as a dataset or data flow, you can pass down the global parameter value via that resource's parameters.
Global Parameters are only available in Azure Data Factory and not in Azure Synapse Analytics
You can Global Parameters in the Management Hub in ADF
You must define the datatype of the Global Parameter
Global parameters are referenced as pipeline().globalParameters.<parameterName>.
There are two ways to integrate global parameters in your continuous integration and deployment solution:
Include global parameters in the ARM template
Deploy global parameters via a PowerShell script
Or you can enable this box
I’m going not that much in detail, but you can find details in the added link
Or you can enable this box
I will show you later in the Demo how you deploy these parameters to a next environment
Create 1 Dataset for all your Activities per Linked Service
Create 1 Dataset for all your Activities per Linked Service
Explain parameter name
Explain add dynamic content.
You can define pipeline parameters to pass through values to your dataset example.
How this works I will explain in the upcoming demo
Use widgets in Databricks notebooks to use these as a Parameter in ADF
Make your Linked Service Dynamiccaly, fe if you want to extract data from the same server but from different databases
Now we have learned to implement a Dynamically set up Pipeline is it time for the next stage
How can we fill all the parameters based on metadata
Source Name => The name of Source Table without schema name
Source Schema The name of Source Schema
DataLake Catalog Folder in the DataLake
Table Destination Schema
Table Destination Table
IsActive
IsIncremental
IsIncremental
LastLoadDataTime
With the auditing we have created 3 Stored Proc.
1 where we will start the execution
1 where we will end the execution if successful
1 where we will end the execution if not successful
With the auditing we have created 3 Stored Proc.
1 where we will start the execution
1 where we will end the execution if successful
1 where we will end the execution if not successful
Show Source Parameters table
Show [execution].[Pipeline_DataLake_Files]
Show Execution pipeline
Show Command Pipeline
Get Files
For Each
Pipeline Execution
Run Pipeline
DEMO 3
Show SP On Begin, On End, On Error
Explain Parent Log Id
Show Execution
Show Table
Show Source Parameters table
Show [execution].[Pipeline_DataLake_Files]
Show Execution pipeline
Show Command Pipeline
Get Files
For Each
Pipeline Execution
Run Pipeline
DEMO 3
Show SP On Begin, On End, On Error
Explain Parent Log Id
Show Execution
Show Table