Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
©2021 Databricks Inc. — All rights reserved
Modernize your Data
Warehouse
Amit Kara, Director, Technical Product Marketing
Soham Bhatt, SME Lead, DW Migration
A migration journey to the Databricks Lakehouse
Platform
©2021 Databricks Inc. — All rights reserved
Agenda
• Why lakehouse for data warehousing
• How does Databricks help with Data Warehousing
• Key differentiators when using the Databricks Lakehouse Platform
• Demo: Data warehousing on Databricks
• How to modernize your data warehouse to a Lakehouse
• Key takeaways for migrating to the Lakehouse
©2021 Databricks Inc. — All rights reserved
What’s the problem
we’re solving?
©2021 Databricks Inc. — All rights reserved
Legacy Data Warehouses aren’t keeping up
Data Warehouses can’t
keep up with data
volume and variety
Innovation hinges on
integrating ML/AI and
predictive insights
Business agility requires
reliable, real-time data
Not cost effective,
especially with scale
Data is vendor locked-in
and duplicated
©2021 Databricks Inc. — All rights reserved
The problem with legacy CDW: a fragmented
approach to modernizing your architecture
Structured
Cloud
Data
Warehouse
Unstructured
Semi-Structured
DATA LAKE
BI Reports, Dashboards & SQL ELT/ETL
ADLS AWS S3 GCP
Data Science Model Training
Model Scoring Model Deployment
Limited support
for streaming
Limited support for
unstructured data
(audio/images/video)
Complex & many
stages.
Data is duplicated
Lock-in / proprietary
format
Compute cost for
all data access
Disparate tooling decreases data team
productivity
©2021 Databricks Inc. — All rights reserved
Why Data Warehousing on
Databricks?
©2021 Databricks Inc. — All rights reserved
Your tools of choice
Use your favorite tools like Fivetran, dbt, PowerBI , Tableau or
Databricks to ingest, transform and query all your data in-place.
Serverless compute
Lower costs and eliminate the need to manage, configure or scale
cloud infrastructure with serverless and get the best
price/performance.
Unified governance
Simplify architecture, establish one single copy for all your data, and
one unified governance layer across all data teams using standard SQL.
Why Data Warehousing
on Databricks
Unity Catalog
Delta Lake
All structured and unstructured data
Cloud Data Lake
Data
Warehousing
Data
Engineering
Data Science
and ML
Data
Streaming
Break down silos
Empower data scientists and analysts to access the most complete
and freshest data faster, and uncover new insights together.
©2021 Databricks Inc. — All rights reserved
Connect your data, analytics and AI
tools to the Databricks Lakehouse
Discover validated data and AI
solutions for new use cases
Setup in a few clicks with pre-built
integrations
Integrated out-of-the-box with Partner Connect
Business
Intelligence
ML
Tools
Data
Preparation
Data
Connectors
Solution
Accelerators
Data
Apps
Partners
Discover, connect, and process data, analytics, and AI tools to your lakehouse
©2021 Databricks Inc. — All rights reserved
Databricks thrives within your modern data
stack
Unity Catalog
Delta Lake
All structured and unstructured data
Cloud Data Lake
Data
Warehousing
Data
Engineering
Data Science
and ML
Data
Streaming
BI and Dashboards Data Science
Data Pipelines
Data Governance
Machine Learning
10
Data Ingestion
©2021 Databricks Inc. — All rights reserved
First-class SQL development experience
Query data lake data using
familiar ANSI SQL, and
collaboratively find and share
new insights faster with the
built-in SQL query editor, alerts,
visualizations, and interactive
dashboards.
Collaboratively query, explore, and transform data in-place
©2021 Databricks Inc. — All rights reserved
Elastic, instant compute decoupled from storage
• Quickly setup optimized compute
resources with SQL endpoints
(powered by vectorized engine Photon)
• High concurrency built-in with
automatic load balancing
• Intelligent workload management and
faster reads from cloud storage
• Instant startup and greater availability
• Available in Databricks Serverless
(preview) !
No resource management needed with Serverless
©2021 Databricks Inc. — All rights reserved
Built from the ground up for best price/performance
Source: Performance Benchmark with Barcelona Supercomputing Center
Query and analyze your most complete and freshest data with
up to 12x better price/performance than traditional cloud data warehouses.
Lightning fast analytics
©2021 Databricks Inc. — All rights reserved 15
● Centralized metadata and user
management
● Centralized data access controls
● Data lineage Private Preview
● Data access auditing
● Data search and discovery Coming Soon
● Secure data sharing with Delta Sharing
● Standard SQL
Fine-grained governance on the Lakehouse
Unity Catalog
©2021 Databricks Inc. — All rights reserved
Key considerations for Modern Analytics & DW
❏ Empower Business Units for Self-service and Advanced Analytics
❏ Simple, Collaborative, Agile Cross-Functional teams
❏ Machine Learning and Artificial Intelligence - CIO level initiatives
❏ Platform that support for all data types - structured and
unstructured
❏ Cloud - choose Best of the Breed - Open Tech Stack vs Proprietary
©2021 Databricks Inc. — All rights reserved
Demo
©2021 Databricks Inc. — All rights reserved
Modern Data Warehousing on Databricks
Data Science and
Machine Learning
Databricks Machine Learning
Batch Ingestion
Stream Ingestion
Curated Data
Raw
Ingestion
and History
BRONZE
Filtered,
Cleaned,
Augmented
SILVER
Business
Aggregates &
Data Models
GOLD
Enterprise
Reporting and BI
DBSQL
Endpoints
Databricks SQL
Databricks Notebooks, Delta Live Tables
Select the Ingestion, ETL, Presentation Layer and Governance Ecosystem on the Databricks Platform
ETL Partners
Data Governance powered by Databricks Unity Catalog
EDC
©2022 Databricks Inc. — All rights reserved
Building your
Lakehouse
Comprehensive investment
into your success
20
Supported by 24/7/365 global,
production operations at scale
Your
success
Solution
Accelerators
In-person and
Virtual Training
Co-located
Professional
Services
©2021 Databricks Inc. — All rights reserved
Migration Methodology
21
Phase 1
Discovery
Migration
specific
discovery and
consultation
Phase 2
Assessment
Assessment,
Design, Tooling,
Accelerators,
Sizing, Partners
Phase 3
Strategy
Technology
mapping,
migration
workshop,
migration
planning
Databricks Migration Team with/without Partner
Phase 4
Production Pilot
Reference
implementation
of a production
use case, Overall
migration
implementation
plan
Phase 5
Execution
Migration
execution and
support
Databricks PS Driven
Partner Driven
©2021 Databricks Inc. — All rights reserved
Migration Approach
22
Architecture/
Infrastructure
● Establish
deployment
Architecture
● Implement
Security and
Governance
framework
Data Migration
● Map Data
Structures and
Layout
● Complete One
time load
● Implement
incremental load
approach
ETL and Pipelines
● Migrate Data
transformation
and pipeline
code,
orchestration
and jobs
● Speedup your
migration using
Automation tools
● Validate:
Compare your
results with On
Prem data and
expected results
BI and Analytics
● Re-point reports
and analytics for
Business
Analysts and
Business
Outcomes
● Semantic
Layer/OLAP
cube repointing
● Connect to
reporting and
analytics
applications
Data Science/ML
● Establish
connectivity to
ML Tools
● Onboard Data
Science teams
©2021 Databricks Inc. — All rights reserved
Strategies for Data Migration
One-time loads, catch-up loads , Real-time vs Batch Ingestion
1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion)
1. Extract to Cloud Storage and use Databricks Autoloader for streaming ingest
1. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
©2021 Databricks Inc. — All rights reserved
Strategies for ETL/Code Migration
Use of Automated tools or frameworks can reduce your timelines by over 50%!
Migration of Stored Procedures and/or ETL Mappings
• For Databricks Notebooks based ETL:
• Delta Live Tables or Databricks Notebook-based ETL
• Metadata-driven Ingestion Frameworks
• ETL tool Partners:
• Matillion, Prophecy, DBT, Informatica, Talend, Infoworks.. many more
• Auto code converters accelerate migrations!
©2022 Databricks Inc. — All rights reserved
Repoint Cubes and Reports to Databricks
• As easy as repointing your reports to DBSQL jdbc/odbc drivers
(Photon and our newest cloudfetch ODBC drivers )
• Key Integrations
• PowerBI Premium ( semantic layers, composite models, upto 400 GB caching)
• Tableau Hyper Extracts
• Looker
• OLAP cube partners like Microstrategy
• Atscale: Universal Semantic layer
( aggs built in Databricks)
Unleash Self-service Analytics with a Semantic Lakehouse
25
©2022 Databricks Inc. — All rights reserved
Key Takeaways..
Migration is a team sport
● Data Warehousing on Lakehouse is simple
● Migrations can be accelerated using automation tools
● Extensive Partner Ecosystem around Databricks Modern Data Stack
● Huge set of joint offerings to accelerate migrations with SI/Consulting
Partners
©2021 Databricks Inc. — All rights reserved
Next Steps
1. Learn more about the Inner Workings of the Lakehouse
1. Schedule a Data Warehouse migration workshop
1. Schedule a Databricks SQL Hands-on workshop
Customize your EDW/ETL Migration Success Plan with an Expert-led Migration
Assessment Workshop
©2021 Databricks Inc. — All rights reserved

More Related Content

DW Migration Webinar-March 2022.pptx

  • 1. ©2021 Databricks Inc. — All rights reserved Modernize your Data Warehouse Amit Kara, Director, Technical Product Marketing Soham Bhatt, SME Lead, DW Migration A migration journey to the Databricks Lakehouse Platform
  • 2. ©2021 Databricks Inc. — All rights reserved Agenda • Why lakehouse for data warehousing • How does Databricks help with Data Warehousing • Key differentiators when using the Databricks Lakehouse Platform • Demo: Data warehousing on Databricks • How to modernize your data warehouse to a Lakehouse • Key takeaways for migrating to the Lakehouse
  • 3. ©2021 Databricks Inc. — All rights reserved What’s the problem we’re solving?
  • 4. ©2021 Databricks Inc. — All rights reserved Legacy Data Warehouses aren’t keeping up Data Warehouses can’t keep up with data volume and variety Innovation hinges on integrating ML/AI and predictive insights Business agility requires reliable, real-time data Not cost effective, especially with scale Data is vendor locked-in and duplicated
  • 5. ©2021 Databricks Inc. — All rights reserved The problem with legacy CDW: a fragmented approach to modernizing your architecture Structured Cloud Data Warehouse Unstructured Semi-Structured DATA LAKE BI Reports, Dashboards & SQL ELT/ETL ADLS AWS S3 GCP Data Science Model Training Model Scoring Model Deployment Limited support for streaming Limited support for unstructured data (audio/images/video) Complex & many stages. Data is duplicated Lock-in / proprietary format Compute cost for all data access Disparate tooling decreases data team productivity
  • 6. ©2021 Databricks Inc. — All rights reserved Why Data Warehousing on Databricks?
  • 7. ©2021 Databricks Inc. — All rights reserved Your tools of choice Use your favorite tools like Fivetran, dbt, PowerBI , Tableau or Databricks to ingest, transform and query all your data in-place. Serverless compute Lower costs and eliminate the need to manage, configure or scale cloud infrastructure with serverless and get the best price/performance. Unified governance Simplify architecture, establish one single copy for all your data, and one unified governance layer across all data teams using standard SQL. Why Data Warehousing on Databricks Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming Break down silos Empower data scientists and analysts to access the most complete and freshest data faster, and uncover new insights together.
  • 8. ©2021 Databricks Inc. — All rights reserved Connect your data, analytics and AI tools to the Databricks Lakehouse Discover validated data and AI solutions for new use cases Setup in a few clicks with pre-built integrations Integrated out-of-the-box with Partner Connect Business Intelligence ML Tools Data Preparation Data Connectors Solution Accelerators Data Apps Partners Discover, connect, and process data, analytics, and AI tools to your lakehouse
  • 9. ©2021 Databricks Inc. — All rights reserved Databricks thrives within your modern data stack Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming BI and Dashboards Data Science Data Pipelines Data Governance Machine Learning 10 Data Ingestion
  • 10. ©2021 Databricks Inc. — All rights reserved First-class SQL development experience Query data lake data using familiar ANSI SQL, and collaboratively find and share new insights faster with the built-in SQL query editor, alerts, visualizations, and interactive dashboards. Collaboratively query, explore, and transform data in-place
  • 11. ©2021 Databricks Inc. — All rights reserved Elastic, instant compute decoupled from storage • Quickly setup optimized compute resources with SQL endpoints (powered by vectorized engine Photon) • High concurrency built-in with automatic load balancing • Intelligent workload management and faster reads from cloud storage • Instant startup and greater availability • Available in Databricks Serverless (preview) ! No resource management needed with Serverless
  • 12. ©2021 Databricks Inc. — All rights reserved Built from the ground up for best price/performance Source: Performance Benchmark with Barcelona Supercomputing Center Query and analyze your most complete and freshest data with up to 12x better price/performance than traditional cloud data warehouses. Lightning fast analytics
  • 13. ©2021 Databricks Inc. — All rights reserved 15 ● Centralized metadata and user management ● Centralized data access controls ● Data lineage Private Preview ● Data access auditing ● Data search and discovery Coming Soon ● Secure data sharing with Delta Sharing ● Standard SQL Fine-grained governance on the Lakehouse Unity Catalog
  • 14. ©2021 Databricks Inc. — All rights reserved Key considerations for Modern Analytics & DW ❏ Empower Business Units for Self-service and Advanced Analytics ❏ Simple, Collaborative, Agile Cross-Functional teams ❏ Machine Learning and Artificial Intelligence - CIO level initiatives ❏ Platform that support for all data types - structured and unstructured ❏ Cloud - choose Best of the Breed - Open Tech Stack vs Proprietary
  • 15. ©2021 Databricks Inc. — All rights reserved Demo
  • 16. ©2021 Databricks Inc. — All rights reserved Modern Data Warehousing on Databricks Data Science and Machine Learning Databricks Machine Learning Batch Ingestion Stream Ingestion Curated Data Raw Ingestion and History BRONZE Filtered, Cleaned, Augmented SILVER Business Aggregates & Data Models GOLD Enterprise Reporting and BI DBSQL Endpoints Databricks SQL Databricks Notebooks, Delta Live Tables Select the Ingestion, ETL, Presentation Layer and Governance Ecosystem on the Databricks Platform ETL Partners Data Governance powered by Databricks Unity Catalog EDC
  • 17. ©2022 Databricks Inc. — All rights reserved Building your Lakehouse Comprehensive investment into your success 20 Supported by 24/7/365 global, production operations at scale Your success Solution Accelerators In-person and Virtual Training Co-located Professional Services
  • 18. ©2021 Databricks Inc. — All rights reserved Migration Methodology 21 Phase 1 Discovery Migration specific discovery and consultation Phase 2 Assessment Assessment, Design, Tooling, Accelerators, Sizing, Partners Phase 3 Strategy Technology mapping, migration workshop, migration planning Databricks Migration Team with/without Partner Phase 4 Production Pilot Reference implementation of a production use case, Overall migration implementation plan Phase 5 Execution Migration execution and support Databricks PS Driven Partner Driven
  • 19. ©2021 Databricks Inc. — All rights reserved Migration Approach 22 Architecture/ Infrastructure ● Establish deployment Architecture ● Implement Security and Governance framework Data Migration ● Map Data Structures and Layout ● Complete One time load ● Implement incremental load approach ETL and Pipelines ● Migrate Data transformation and pipeline code, orchestration and jobs ● Speedup your migration using Automation tools ● Validate: Compare your results with On Prem data and expected results BI and Analytics ● Re-point reports and analytics for Business Analysts and Business Outcomes ● Semantic Layer/OLAP cube repointing ● Connect to reporting and analytics applications Data Science/ML ● Establish connectivity to ML Tools ● Onboard Data Science teams
  • 20. ©2021 Databricks Inc. — All rights reserved Strategies for Data Migration One-time loads, catch-up loads , Real-time vs Batch Ingestion 1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion) 1. Extract to Cloud Storage and use Databricks Autoloader for streaming ingest 1. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
  • 21. ©2021 Databricks Inc. — All rights reserved Strategies for ETL/Code Migration Use of Automated tools or frameworks can reduce your timelines by over 50%! Migration of Stored Procedures and/or ETL Mappings • For Databricks Notebooks based ETL: • Delta Live Tables or Databricks Notebook-based ETL • Metadata-driven Ingestion Frameworks • ETL tool Partners: • Matillion, Prophecy, DBT, Informatica, Talend, Infoworks.. many more • Auto code converters accelerate migrations!
  • 22. ©2022 Databricks Inc. — All rights reserved Repoint Cubes and Reports to Databricks • As easy as repointing your reports to DBSQL jdbc/odbc drivers (Photon and our newest cloudfetch ODBC drivers ) • Key Integrations • PowerBI Premium ( semantic layers, composite models, upto 400 GB caching) • Tableau Hyper Extracts • Looker • OLAP cube partners like Microstrategy • Atscale: Universal Semantic layer ( aggs built in Databricks) Unleash Self-service Analytics with a Semantic Lakehouse 25
  • 23. ©2022 Databricks Inc. — All rights reserved Key Takeaways.. Migration is a team sport ● Data Warehousing on Lakehouse is simple ● Migrations can be accelerated using automation tools ● Extensive Partner Ecosystem around Databricks Modern Data Stack ● Huge set of joint offerings to accelerate migrations with SI/Consulting Partners
  • 24. ©2021 Databricks Inc. — All rights reserved Next Steps 1. Learn more about the Inner Workings of the Lakehouse 1. Schedule a Data Warehouse migration workshop 1. Schedule a Databricks SQL Hands-on workshop Customize your EDW/ETL Migration Success Plan with an Expert-led Migration Assessment Workshop
  • 25. ©2021 Databricks Inc. — All rights reserved