Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
news.trivadis.com/blog@trivadis
Data Warehouse – (high) added value in Azure
Cloud
yves.mauron@trivadis.com
marco.amhof@trivadis.com
Agenda
• The modern data warehouse pattern
• Azure Cloud architecture pillars
• Ingest data (Azure Data Factory, ADF SSIS Runtime Integration, others)
• Store data (Azure SQL, Azure Data Lake, Azure SQL DWH)
• Process data (Azure Databricks)
• Azure SSAS – Advantages and possibilities of a BI semantic layer
• Power BI
• Trivadis Customer Cases
• Case 1 (Lift and Shift on-premise DWH to Azure Cloud)
• Case 2 (BI Solution in Azure Cloud with Data Lake and Databricks Technology)
• Case 3 (Green field a BI Solution from scratch in Azure Cloud)
© Microsoft Corporation
Modern data warehousing pattern
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
AI built-in | Most secure | Lowest TCO
Data warehouses
Data lakes
Operational databases
Data warehouses
Data lakes
Operational databasesIndustry leader 4 years in a row
#1 TPC-H performance
T-SQL query over any data
70% faster
2x the global reach
99.9% SLA
Easiest lift and shift
with no code changes
The Microsoft offering
SQL Server
Hybrid
Azure Data Services
Security and performanceFlexibility of choiceReason over any data, anywhere
SocialLOB Graph IoTImageCRM
The Azure data landscape
Azure
Data
Factory
Azure Import/Export
service
Azure SDKAzure
CLI
Cognitive servicesBot service
Azure Search Azure Data Catalog
Azure ExpressRoute Azure network
security groups
Azure Functions Visual StudioOperations
Management Suite
Azure Active Directory Azure key
management service
Azure Blob Storage Azure Data
Lake Store
Azure IoT Hub Azure event
hubs
Kafka on Azure HDInsight
Azure SQL data warehouseAzure SQL DB Azure Cosmos DB Azure Analysis Services Power BI
Azure
HDInsight
Azure
Databricks
Azure
HDInsight
Azure
Databricks
Azure Stream
Analytics
Azure
ML
Azure
Databricks
ML Server
10
01
SQL
NSG
>_
INGEST STORE PREP MODEL & SERVE
Ingest
ETL vs. ELT
vs.
Azure Data Factory
• Data Integration Service: Serverless, Scalable, Hybrid
Hybrid Pipeline Model
Seamlessly span: on prem, Azure, other clouds & SaaS
Run on-demand, scheduled, data-availability or on event
Data Movement @Scale
Cloud & Hybrid w/ 80+ connectors provided
Up to 1 GB/s
SSIS Package Execution
Lift existing SQL Server ETL to Azure
Use existing tools (SSMS, SSDT)
Author & Monitor
Programmability w/ multi-language SDK
Visual Tools
Azure
Pipeline foreach (…)Trigger
Linked
Service
© Microsoft Corporation
No-code data transformation @ scale
Mapping Data Flow
Data cleansing, transformation, aggregation, conversion, etc.
Cloud scale via Spark execution
Easily build resilient data flows
Store
Azure SQL Database resource types
Azure SQL Database
Database-scoped
deployment option with
predictable workload
performance
Shared resource model optimized
for greater efficiency of multi-
tenant applications
Best for apps that require resource
guarantee at database level
Best for SaaS apps with multiple
databases that can share resources
at database level, achieving better
cost efficiency
Best for modernization at scale
with low friction and effort
Elastic PoolSingle Managed Instance
Instance-scoped deployment option
with high compatibility with SQL Server
and full PaaS benefits
Azure SQL Data Warehouse
Best in class
price-performance
Up to 14X times faster
and 94% less expensive
than cloud competitors
Industry-leading
security
Defense-in-depth
security and 99.9%
financially backed
availability SLA
Intelligent workload
management
Separation of compute
and storage
Prioritize resources for
the most valuable
workloads
Developer productivityData flexibility
Azure SQL Data Warehouse MPP Architecture
© Microsoft Corporation
Data ingestion using external data sources
Polybase
-- Create Azure DataLake Gen2 Storage reference
CREATE EXTERNAL DATA SOURCE AzureStorage with
(
TYPE = HADOOP,
LOCATION='abfss://<container>@<storageaccnt>.blob.core.windows.net'
,
CREDENTIAL = AzureStorageCredential –- not required if using
managed identity
);
-- Type of format in Hadoop (CSV, RCFILE , ORC, PARQUET).
CREATE EXTERNAL FILE FORMAT TextFileFormat WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT =
TRUE)
)
-- LOCATION: path to file or directory that contains data
CREATE EXTERNAL TABLE [dbo].[CarSensor_Data]
(
[SensorKey] int NOT NULL,
[Speed] float NOT NULL,
[YearMeasured] int NOT NULL
)
WITH (LOCATION='/Demo/’, DATA_SOURCE = AzureStorage,
FILE_FORMAT = TextFileFormat
);
Overview
Polybase supports querying files (Parquet,
Delimited Text) stored in a Hadoop File System
(HDFS), Azure Blob storage, or Azure Data Lake
Store.
To query files, users create three objects:
External data source, external file format,
external table.
Starting in SQL Server 2019, you can now use
PolyBase to access external data in SQL
Server, Oracle, Teradata, and MongoDB.
DWU’s (CPU, memory, and IO) in $$$
SQL Azure Data Warehouse (a preview)
© Microsoft Corporation
It is a central storage repository that holds data coming from many sources in a raw, granular format. It can store structured,
semi-structured, or unstructured data, which means data ingested quickly and can be kept in a more flexible format for
future use cases.
What is a Data Lake?
Characteristics
• Schema-on-read
(ELT)
• Collection of data,
not a platform
• Perfect place for
evolving data
Benefits
• Quickly ingest high
volumes of diverse
data structures
• Enable advanced
analytics and data
exploration
• Scalability and
storage cost
reduction
BestPractices
• Data Governance
needed to avoid
Data Swamp
• Security
considerations
• Design your Data
Lake
• Metadata
management
© Microsoft Corporation
Data Lake Design Considerations
Data Lake Zones
Transient Landing Zone
Temporary storage of data to meet regulatory and quality control
requirements. Limited access. May not be required depending on
requirements.
Raw Zone
Original source of data ready for consumption. Metadata publicly
available but access to data still limited.
Trusted Zone
Standardized and enriched datasets ready for consumption to
those with appropriate role-based access. Metadata available to
all.
Curated/Refined Zone
Data transformed from Trusted Zone to meet specific business
requirements.
Sandbox Zone
Playground for Data Scientists for ad hoc exploratory use cases.
Data Governance Considerations
Security and Compliance
Access Control
Encryption
Row-Level Security
Metadata Management
Data Quality
Metadata Management
Lifecycle Management
Transform & Clean
© Microsoft Corporation
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Azure Databricks
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business
analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, ADLS, Azure Storage, Azure Data
Factory, Azure AD, Event Hub, IoT Hub, HDInsight Kafka, SQL DB)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)
Azure Databricks
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE
PIPELINES
DATA
ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA
SCIENTIST
BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Azure Databricks Deployment
Azure Resource
Manager APIs
Azure Portal
Azure
Databricks
Workspace
Managed
Resource
Group
Attached Azure
BLOB (DBFS)
Workspace
VNET
Workspace
NSG rulesCluster Node(s)
Notebooks
Clusters
Jobs
Run on
Interact using UI or Azure Databricks REST API
Integrate with other Azure Services
Azure BLOBs Data Lake
Event Hub IOT Hub Kafka
Cosmos DB SQL DW
Data Factory
Model & Serve
Why do I also need a cube if I have a data warehouse?
• Semantic layer
• Handle many concurrent users
• Implement complex business logic (DAX)
• Aggregating data for performance
• multidimensional analysis
• No joins or relationships
• Hierarchies, KPI’s
• Row-level Security
• Advanced time-calculations
• Slowly Changing Dimensions (SCD)
• Required for some reporting tools
What is Azure Analysis Services?
• Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise-
grade data models in the cloud.
• Use advanced mashup and modeling features to combine data from multiple data sources, define
metrics, and secure your data in a single, trusted tabular semantic data model.
• The data model provides an easier and faster way for users to browse massive amounts of data for
ad hoc data analysis.
Business / custom apps
(Structured)
Logs, files and media
(unstructured)
Azure Storage
Polybase
Azure SQL Data Warehouse
Data Factory
Data Factory
Azure Databricks
(Spark)
Analytical dashboards
(PowerBI)
Model & ServePrep & TrainStoreIngest Intelligence
Modern Data Analytics Landscape
AZURE DATA FACTORY ORCHESTRATES DATA PIPELINE ACTIVITY WORKFLOW & SCHEDULING
Azure Analysis ServicesOn Prem, Cloud
Apps & Data
BI & Reporting
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Power BI
Power BI Report Server (on-premise)
Customer success stories
Case 1 – «on-premise» to Azure Cloud
Customer
Case 1 - Approach
Case 1 – Timeline
Case 2 – Modern Data Warehouse – (Delta) Lake
Case 3 – Management Reporting
PowerBI
Excel
…
Sub1
Sub2
Sub n
Structures
Messaging
Model & ServePrep / TrainStoreIngest
SQL DatabaseBLOB Storage
Logic App Function App
Analysis Services
Azure AD
DEMO
Questions?
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)

More Related Content

Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)

  • 1. news.trivadis.com/blog@trivadis Data Warehouse – (high) added value in Azure Cloud yves.mauron@trivadis.com marco.amhof@trivadis.com
  • 2. Agenda • The modern data warehouse pattern • Azure Cloud architecture pillars • Ingest data (Azure Data Factory, ADF SSIS Runtime Integration, others) • Store data (Azure SQL, Azure Data Lake, Azure SQL DWH) • Process data (Azure Databricks) • Azure SSAS – Advantages and possibilities of a BI semantic layer • Power BI • Trivadis Customer Cases • Case 1 (Lift and Shift on-premise DWH to Azure Cloud) • Case 2 (BI Solution in Azure Cloud with Data Lake and Databricks Technology) • Case 3 (Green field a BI Solution from scratch in Azure Cloud)
  • 3. © Microsoft Corporation Modern data warehousing pattern Advanced Analytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting
  • 4. AI built-in | Most secure | Lowest TCO Data warehouses Data lakes Operational databases Data warehouses Data lakes Operational databasesIndustry leader 4 years in a row #1 TPC-H performance T-SQL query over any data 70% faster 2x the global reach 99.9% SLA Easiest lift and shift with no code changes The Microsoft offering SQL Server Hybrid Azure Data Services Security and performanceFlexibility of choiceReason over any data, anywhere SocialLOB Graph IoTImageCRM
  • 5. The Azure data landscape Azure Data Factory Azure Import/Export service Azure SDKAzure CLI Cognitive servicesBot service Azure Search Azure Data Catalog Azure ExpressRoute Azure network security groups Azure Functions Visual StudioOperations Management Suite Azure Active Directory Azure key management service Azure Blob Storage Azure Data Lake Store Azure IoT Hub Azure event hubs Kafka on Azure HDInsight Azure SQL data warehouseAzure SQL DB Azure Cosmos DB Azure Analysis Services Power BI Azure HDInsight Azure Databricks Azure HDInsight Azure Databricks Azure Stream Analytics Azure ML Azure Databricks ML Server 10 01 SQL NSG >_ INGEST STORE PREP MODEL & SERVE
  • 8. Azure Data Factory • Data Integration Service: Serverless, Scalable, Hybrid Hybrid Pipeline Model Seamlessly span: on prem, Azure, other clouds & SaaS Run on-demand, scheduled, data-availability or on event Data Movement @Scale Cloud & Hybrid w/ 80+ connectors provided Up to 1 GB/s SSIS Package Execution Lift existing SQL Server ETL to Azure Use existing tools (SSMS, SSDT) Author & Monitor Programmability w/ multi-language SDK Visual Tools Azure
  • 10. © Microsoft Corporation No-code data transformation @ scale Mapping Data Flow Data cleansing, transformation, aggregation, conversion, etc. Cloud scale via Spark execution Easily build resilient data flows
  • 11. Store
  • 12. Azure SQL Database resource types Azure SQL Database Database-scoped deployment option with predictable workload performance Shared resource model optimized for greater efficiency of multi- tenant applications Best for apps that require resource guarantee at database level Best for SaaS apps with multiple databases that can share resources at database level, achieving better cost efficiency Best for modernization at scale with low friction and effort Elastic PoolSingle Managed Instance Instance-scoped deployment option with high compatibility with SQL Server and full PaaS benefits
  • 13. Azure SQL Data Warehouse Best in class price-performance Up to 14X times faster and 94% less expensive than cloud competitors Industry-leading security Defense-in-depth security and 99.9% financially backed availability SLA Intelligent workload management Separation of compute and storage Prioritize resources for the most valuable workloads Developer productivityData flexibility
  • 14. Azure SQL Data Warehouse MPP Architecture
  • 15. © Microsoft Corporation Data ingestion using external data sources Polybase -- Create Azure DataLake Gen2 Storage reference CREATE EXTERNAL DATA SOURCE AzureStorage with ( TYPE = HADOOP, LOCATION='abfss://<container>@<storageaccnt>.blob.core.windows.net' , CREDENTIAL = AzureStorageCredential –- not required if using managed identity ); -- Type of format in Hadoop (CSV, RCFILE , ORC, PARQUET). CREATE EXTERNAL FILE FORMAT TextFileFormat WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE) ) -- LOCATION: path to file or directory that contains data CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] ( [SensorKey] int NOT NULL, [Speed] float NOT NULL, [YearMeasured] int NOT NULL ) WITH (LOCATION='/Demo/’, DATA_SOURCE = AzureStorage, FILE_FORMAT = TextFileFormat ); Overview Polybase supports querying files (Parquet, Delimited Text) stored in a Hadoop File System (HDFS), Azure Blob storage, or Azure Data Lake Store. To query files, users create three objects: External data source, external file format, external table. Starting in SQL Server 2019, you can now use PolyBase to access external data in SQL Server, Oracle, Teradata, and MongoDB.
  • 16. DWU’s (CPU, memory, and IO) in $$$ SQL Azure Data Warehouse (a preview)
  • 17. © Microsoft Corporation It is a central storage repository that holds data coming from many sources in a raw, granular format. It can store structured, semi-structured, or unstructured data, which means data ingested quickly and can be kept in a more flexible format for future use cases. What is a Data Lake? Characteristics • Schema-on-read (ELT) • Collection of data, not a platform • Perfect place for evolving data Benefits • Quickly ingest high volumes of diverse data structures • Enable advanced analytics and data exploration • Scalability and storage cost reduction BestPractices • Data Governance needed to avoid Data Swamp • Security considerations • Design your Data Lake • Metadata management
  • 18. © Microsoft Corporation Data Lake Design Considerations Data Lake Zones Transient Landing Zone Temporary storage of data to meet regulatory and quality control requirements. Limited access. May not be required depending on requirements. Raw Zone Original source of data ready for consumption. Metadata publicly available but access to data still limited. Trusted Zone Standardized and enriched datasets ready for consumption to those with appropriate role-based access. Metadata available to all. Curated/Refined Zone Data transformed from Trusted Zone to meet specific business requirements. Sandbox Zone Playground for Data Scientists for ad hoc exploratory use cases. Data Governance Considerations Security and Compliance Access Control Encryption Row-Level Security Metadata Management Data Quality Metadata Management Lifecycle Management
  • 20. © Microsoft Corporation A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure Azure Databricks Best of Databricks Best of Microsoft Designed in collaboration with the founders of Apache Spark One-click set up; streamlined workflows Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Native integration with Azure services (Power BI, SQL DW, Cosmos DB, ADLS, Azure Storage, Azure Data Factory, Azure AD, Event Hub, IoT Hub, HDInsight Kafka, SQL DB) Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)
  • 21. Azure Databricks Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits
  • 22. Azure Databricks Deployment Azure Resource Manager APIs Azure Portal Azure Databricks Workspace Managed Resource Group Attached Azure BLOB (DBFS) Workspace VNET Workspace NSG rulesCluster Node(s) Notebooks Clusters Jobs Run on Interact using UI or Azure Databricks REST API Integrate with other Azure Services Azure BLOBs Data Lake Event Hub IOT Hub Kafka Cosmos DB SQL DW Data Factory
  • 24. Why do I also need a cube if I have a data warehouse? • Semantic layer • Handle many concurrent users • Implement complex business logic (DAX) • Aggregating data for performance • multidimensional analysis • No joins or relationships • Hierarchies, KPI’s • Row-level Security • Advanced time-calculations • Slowly Changing Dimensions (SCD) • Required for some reporting tools
  • 25. What is Azure Analysis Services? • Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise- grade data models in the cloud. • Use advanced mashup and modeling features to combine data from multiple data sources, define metrics, and secure your data in a single, trusted tabular semantic data model. • The data model provides an easier and faster way for users to browse massive amounts of data for ad hoc data analysis.
  • 26. Business / custom apps (Structured) Logs, files and media (unstructured) Azure Storage Polybase Azure SQL Data Warehouse Data Factory Data Factory Azure Databricks (Spark) Analytical dashboards (PowerBI) Model & ServePrep & TrainStoreIngest Intelligence Modern Data Analytics Landscape AZURE DATA FACTORY ORCHESTRATES DATA PIPELINE ACTIVITY WORKFLOW & SCHEDULING Azure Analysis ServicesOn Prem, Cloud Apps & Data
  • 29. Power BI Power BI Report Server (on-premise)
  • 31. Case 1 – «on-premise» to Azure Cloud Customer
  • 32. Case 1 - Approach
  • 33. Case 1 – Timeline
  • 34. Case 2 – Modern Data Warehouse – (Delta) Lake
  • 35. Case 3 – Management Reporting PowerBI Excel … Sub1 Sub2 Sub n Structures Messaging Model & ServePrep / TrainStoreIngest SQL DatabaseBLOB Storage Logic App Function App Analysis Services Azure AD
  • 36. DEMO