Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Lakehouse in Azure
Sergio Zenatti Filho
Sr Cloud Solution Architect - Data & Analytics
@Microsoft
Sergio has over 20 years of experience designing and
delivering Data and Analytics Solutions. He has extensive
experience in the Microsoft Data and Analytics Platform in the
cloud and also on-premises. Sergio is passionate about
learning new technology and helping customers to define the
best solution for their business.
Sergio Zenatti Filho
Senior Cloud Solution Architect
at Microsoft
Connect
©Microsoft Corporation
Azure
Agenda • Lakehouse
• Delta Lake
• Ingestion and Transformation
• Architecture
• Power BI
• Next Steps
• Q&A
©Microsoft Corporation
Azure
Data Warehouse and Data Lake
• Have Powered BI for over 30
years
• Purpose-built for BI and
Reporting
• Limited support for Semi-
Structured and Unstructured
data
• Limited support for
streaming
BI
Data
Science
Machine
Learning
Structured, Semi-Structured and Unstructured
Data
Data Lake
Real-Time
Database
Reports
Data
Warehouses
Data Prep and
Validation
ETL
ETL
External Data Operational Data
Data Warehouses
BI Reports
• Powered by technological
advances in data storage
• Cheap to store any data
• Support machine
learning user cases
• Poor BI Support
• Complex to set up
• Hard to append data
Data Lake
Data Warehouse
©Microsoft Corporation
Azure
Lakehouse
Data Warehouse Data Lake
Streaming
Analytics
BI Data
Science
Machine
Learning
Structured, Semi-Structured and Unstructured
Data
Key features:
• Transaction support
• Schema enforcement and
governance
• Data reliability and consistency
• Low query latency and high
reliability for BI and advanced
analytics
• Optimized for machine learning
and data science
• Enable end-to-end streaming
Lakehouse Platform combines the best elements of data lakes and data warehouses to deliver the reliability, strong governance
and performance of data warehouses with the openness, flexibility and machine learning support of data lakes.
©Microsoft Corporation
Azure
Delta Lake
Key features:
• ACID Transactions
• Scalable Metadata
• Unified Streaming and Batch
• Schema Evolution / Enforcement
• Time Travel
• Upserts and deletes
Delta Lake is an open source project that enables building a Lakehouse architecture on top of data
lakes.
Demo
Delta Lake
Data Ingestion and Transformation
Power BI
©Microsoft Corporation
Azure
Data Ingestion
Azure Synapse Pipeline or Azure Data Factory Databricks Other Solutions
• 90+ Data Sources including files, databases,
SaaS, PaaS and more
• Copy activity: supports Azure Databricks Delta
Lake connector to copy data from any
supported source to delta lake table, and from
delta lake table to any supported sink data
store.
• Mapping Data Flow: supports generic Delta
format on Azure Storage as source and sink to
read and write Delta files for code-free ETL, and
runs on managed Azure Integration Runtime.
• Data Formats: Delta Lake, Parquet, ORC,
JSON, CSV, Avro, Text and Binary
• Data Sources: SQL Server, MariaDB,
MySQL, PostgreSQL, Azure Synapse
Analytics, Azure Cosmos DB, MongoDB,
Cassandra, Couchbase, ElasticSearch,
Neo4j, Redis, Snowflake and more.
• Event Hub
• IoT Hub
• SQL Server BCP (bulk copy program)
• Polybase
• SAP Data Services
• Informatica
• Striim
• Fivetran
• Qlik
• Confluent
©Microsoft Corporation
Azure
Data Transformation
Databricks
Synapse Spark
Azure Synapse Pipeline and Azure Data Factory
• Spark notebooks using Python, Scala, SQL
and R
• Spark Notebook using Python, Scala, Spark
SQL, C# and R (Preview)
• Mapping data flows: visually designed data
transformations in Azure Data Factory and Azure Synapse
Pipeline
• External Transformations: Azure Synapse Notebook and
Databricks.
Architecture
©Microsoft Corporation
Azure
Lakehouse Architecture - Databricks
©Microsoft Corporation
Azure
Lakehouse Architecture – Azure Synapse
©Microsoft Corporation
Azure
Lakehouse Architecture – Azure Synapse and Databricks
©Microsoft Corporation
Azure
Power BI
Azure Synapse
Databricks Delta Sharing
• Databricks (Beta): connector for
Databricks SQL Warehouse running on
AWS and using OAuth
• Azure Databricks: for Databricks SQL
Warehouse in Azure or on AWS but not
using OAuth
• Authentication using Personal Access
Token or OAuth
• Azure Synapse Analytics SQL: connector
for Lake DB (Spark), Serverless DB and
Dedicated SQL Pool
• Azure Synapse Analytics workspace
(beta): connector for Lake DB (Spark),
Serverless DB and Dedicated SQL Pool
• Authentication using Microsoft Account,
Windows and Database
• Import Mode Only
• Authentication using Token
Delta.io connector (Open Source)
• Reading Delta Lake tables natively in
PowerBI
• Support all storage systems that are
supported by PowerBI
https://github.com/delta-
io/connectors/tree/master/powerbi
©Microsoft Corporation
Azure
What next?
• Free training - Databricks Lakehouse Fundamentals: https://www.databricks.com/learn/training/lakehouse-
fundamentals
• Free training - Use Delta Lake in Azure Synapse Analytics: https://learn.microsoft.com/en-
us/training/modules/use-delta-lake-azure-synapse-analytics/
• Solution Accelerator for Financial Analytics: https://github.com/microsoft/Azure-Databricks-Solution-
Accelerator-Financial-Analytics-Customer-Revenue-Growth-Factor
• Open Education Analytics: https://github.com/microsoft/OpenEduAnalytics
• Delta Lake: https://delta.io/
• Dynamics 365 Finance and Operations Apps - Export to data lake: https://github.com/microsoft/Dynamics-
365-FastTrack-Implementation-Assets/tree/master/Analytics/ArchitecturePatterns
© Copyright Microsoft Corporation. All rights reserved.
Q&A
Thank you!
Sergio Zenatti Filho - Sr Cloud Solution Architect at Microsoft
Email: zenatti@gmail.com
LinkedIn: https://www.linkedin.com/in/sergiozenatti/
Connect

More Related Content

Lakehouse in Azure

  • 1. Lakehouse in Azure Sergio Zenatti Filho Sr Cloud Solution Architect - Data & Analytics @Microsoft
  • 2. Sergio has over 20 years of experience designing and delivering Data and Analytics Solutions. He has extensive experience in the Microsoft Data and Analytics Platform in the cloud and also on-premises. Sergio is passionate about learning new technology and helping customers to define the best solution for their business. Sergio Zenatti Filho Senior Cloud Solution Architect at Microsoft Connect
  • 3. ©Microsoft Corporation Azure Agenda • Lakehouse • Delta Lake • Ingestion and Transformation • Architecture • Power BI • Next Steps • Q&A
  • 4. ©Microsoft Corporation Azure Data Warehouse and Data Lake • Have Powered BI for over 30 years • Purpose-built for BI and Reporting • Limited support for Semi- Structured and Unstructured data • Limited support for streaming BI Data Science Machine Learning Structured, Semi-Structured and Unstructured Data Data Lake Real-Time Database Reports Data Warehouses Data Prep and Validation ETL ETL External Data Operational Data Data Warehouses BI Reports • Powered by technological advances in data storage • Cheap to store any data • Support machine learning user cases • Poor BI Support • Complex to set up • Hard to append data Data Lake Data Warehouse
  • 5. ©Microsoft Corporation Azure Lakehouse Data Warehouse Data Lake Streaming Analytics BI Data Science Machine Learning Structured, Semi-Structured and Unstructured Data Key features: • Transaction support • Schema enforcement and governance • Data reliability and consistency • Low query latency and high reliability for BI and advanced analytics • Optimized for machine learning and data science • Enable end-to-end streaming Lakehouse Platform combines the best elements of data lakes and data warehouses to deliver the reliability, strong governance and performance of data warehouses with the openness, flexibility and machine learning support of data lakes.
  • 6. ©Microsoft Corporation Azure Delta Lake Key features: • ACID Transactions • Scalable Metadata • Unified Streaming and Batch • Schema Evolution / Enforcement • Time Travel • Upserts and deletes Delta Lake is an open source project that enables building a Lakehouse architecture on top of data lakes.
  • 7. Demo Delta Lake Data Ingestion and Transformation Power BI
  • 8. ©Microsoft Corporation Azure Data Ingestion Azure Synapse Pipeline or Azure Data Factory Databricks Other Solutions • 90+ Data Sources including files, databases, SaaS, PaaS and more • Copy activity: supports Azure Databricks Delta Lake connector to copy data from any supported source to delta lake table, and from delta lake table to any supported sink data store. • Mapping Data Flow: supports generic Delta format on Azure Storage as source and sink to read and write Delta files for code-free ETL, and runs on managed Azure Integration Runtime. • Data Formats: Delta Lake, Parquet, ORC, JSON, CSV, Avro, Text and Binary • Data Sources: SQL Server, MariaDB, MySQL, PostgreSQL, Azure Synapse Analytics, Azure Cosmos DB, MongoDB, Cassandra, Couchbase, ElasticSearch, Neo4j, Redis, Snowflake and more. • Event Hub • IoT Hub • SQL Server BCP (bulk copy program) • Polybase • SAP Data Services • Informatica • Striim • Fivetran • Qlik • Confluent
  • 9. ©Microsoft Corporation Azure Data Transformation Databricks Synapse Spark Azure Synapse Pipeline and Azure Data Factory • Spark notebooks using Python, Scala, SQL and R • Spark Notebook using Python, Scala, Spark SQL, C# and R (Preview) • Mapping data flows: visually designed data transformations in Azure Data Factory and Azure Synapse Pipeline • External Transformations: Azure Synapse Notebook and Databricks.
  • 13. ©Microsoft Corporation Azure Lakehouse Architecture – Azure Synapse and Databricks
  • 14. ©Microsoft Corporation Azure Power BI Azure Synapse Databricks Delta Sharing • Databricks (Beta): connector for Databricks SQL Warehouse running on AWS and using OAuth • Azure Databricks: for Databricks SQL Warehouse in Azure or on AWS but not using OAuth • Authentication using Personal Access Token or OAuth • Azure Synapse Analytics SQL: connector for Lake DB (Spark), Serverless DB and Dedicated SQL Pool • Azure Synapse Analytics workspace (beta): connector for Lake DB (Spark), Serverless DB and Dedicated SQL Pool • Authentication using Microsoft Account, Windows and Database • Import Mode Only • Authentication using Token Delta.io connector (Open Source) • Reading Delta Lake tables natively in PowerBI • Support all storage systems that are supported by PowerBI https://github.com/delta- io/connectors/tree/master/powerbi
  • 15. ©Microsoft Corporation Azure What next? • Free training - Databricks Lakehouse Fundamentals: https://www.databricks.com/learn/training/lakehouse- fundamentals • Free training - Use Delta Lake in Azure Synapse Analytics: https://learn.microsoft.com/en- us/training/modules/use-delta-lake-azure-synapse-analytics/ • Solution Accelerator for Financial Analytics: https://github.com/microsoft/Azure-Databricks-Solution- Accelerator-Financial-Analytics-Customer-Revenue-Growth-Factor • Open Education Analytics: https://github.com/microsoft/OpenEduAnalytics • Delta Lake: https://delta.io/ • Dynamics 365 Finance and Operations Apps - Export to data lake: https://github.com/microsoft/Dynamics- 365-FastTrack-Implementation-Assets/tree/master/Analytics/ArchitecturePatterns
  • 16. © Copyright Microsoft Corporation. All rights reserved. Q&A Thank you! Sergio Zenatti Filho - Sr Cloud Solution Architect at Microsoft Email: zenatti@gmail.com LinkedIn: https://www.linkedin.com/in/sergiozenatti/ Connect