Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Modernizing to a
cloud data
architecture
Guido Oswald, Solutions Architect, Databricks
Matt Graves, VP of Enterprise Data & Analytics,
GCI Communication Corp
Agenda
• Top reasons to modernize from Hadoop to Databricks
• Success stories, technical and business benefits
• Fast migrations with low costs & low risk
• Fireside Chat: Matt Graves
Digital transformation is
accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries…
Digital transformation is
accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries…
The data surge is placing
tremendous pressure on
traditional data and analytics
infrastructure
Digital transformation is accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries…
The data surge is placing
tremendous pressure on
traditional data and analytics
infrastructure
Source: Gartner cited by Battery Ventures - Open Cloud report
Cloud adoption is
accelerating by $100B
from 2021 - 2023
Today, most enterprises struggle with data
Siloed stacks increase data architecture complexity
Data Warehousing Data Engineering Streaming
Data Science & Machine
Learning
Extract
Transform
Streaming data sources
Streaming Data Engine
Analytics and BI
Data marts
Data warehouse
Structured data
Structured, semi-structured
and unstructured data
Structured, semi-structured
and unstructured data
Data Lake
Data prep
Data Lake
Machine
Learning
Data
Science
Amazon Redshift Teradata
Azure Synapse Google BigQuery
Snowflake IBM Db2
SAP Oracle Autonomous
Data Warehouse
Hadoop Apache Airflow
Amazon EMR Apache Spark
Google Dataproc Cloudera
Jupyter Amazon SageMaker
Azure ML Studio MatLAB
Domino Data Labs SAS
TensorFlow PyTorch
Apache Kafka Apache Spark
Apache Flink Amazon Kinesis
Azure Stream Analytics
Tibco Spotfire
Google Dataflow
Confluent
Disconnected systems and proprietary data formats make integration difficult
Data
Scientists
Data
Engineers
Data
Analysts
Data
Engineers
Siloed data teams decrease productivity
Load Real-time Database
Is your architecture enabling growth?
Legacy on-premise data and analytics architectures are not keeping up
Hadoop costs rising when
costs need to be cut
Innovation hinges on ML
and predictive insights
Business agility requires
real-time data
Hadoop is costly, complex and ineffective
Hadoop ecosystem is complex,
hard to manage, and prone to
failures
24/7 HDFS clusters that need
to built for peak usage and are
costly to upgrade
• RIGID AND INELASTIC
• DEVOPS INTENSIVE
No out-of-box support for
ML/AI and separate data and AI
environments
• LACKS AI CAPABILITIES
Low Productivity Cost Prohibitive Slow Innovation
X
Enterprises need a modern data and analytics
architecture
CRITICAL REQUIREMENTS
Cost-effective scale and performance in the cloud
Easy to manage and highly reliable for diverse data
Predictive and real-time insights to drive
innovation
Modernization delivers business value
Forrester TEI study finds 417% ROI for
companies switching to Databricks
47%
Cost-savings from retiring
legacy infrastructure
5%
Increase in revenue
25%
Data team productivity
increase
Source: Forrester TEI: The total economic impact of the Databricks Unified Analytics Platform
The Databricks Lakehouse Platform is one simple platform to unify all
your data, analytics, and AI workloads
Original creators of popular data and machine learning open-source projects
Global company with over 5,000 customers and more than 450 partners
Data
Warehouse
Lakehouse
One platform to unify all of
your data, analytics, and AI workloads
Data
Lake
Structured Semi-structured Unstructured Streaming
Lakehouse Platform
Data Engineering
BI & SQL
Analytics
Real-time Data
Applications
Data Science
& Machine
Learning
Data Management & Governance
Open Data Lake
SIMPLE OPEN COLLABORATIVE
From BI to AI
All your data,
analytics and
AI on one
Lakehouse
platform
Data Eng, ML
(Spark)
Scalable apps on Columnar store
(Hbase)
ETL, SQL
(Hive/ Impala)
Databricks jobs / Delta Lake / SparkSQL
(Highly tuned Spark engine: faster, less compute, one-stop-shop)
Batch Process
(MapReduce)
Real-time Event Processing
(Storm/ Spark)
Databricks Spark jobs
(orders of magnitude faster - but may need manual work)
Databricks Structured Streaming
(Spark Structured Streaming + Delta Lake: Streaming + Batch ingest)
Databricks jobs/ Delta Lake
(Highly tuned Spark engine: faster, less compute, one-stop-
shop)
Databricks Spark integrates w/ HBase on cloud
(Alternatively: use cloud data stores well integrated with Databricks)
Technology mapping: deliver better outcomes
Automation for most workload types
Data Migration
Metastore Migration
SQL Migration
Security
Scheduled Data pulls
Orchestration
HDFS
Hive Databases / Tables / Views
Impala Databases / Tables/ Views
HDFS
Hive Queries
Spark Queries
Sentry permissions /Ranger policies
HDFS access permissions
Sqoop statements
Oozie Jobs
Azure ADLS Gen 2, AWS S3, GCS
Databricks Tables
Databricks Tables
Spark Sql Databricks Notebooks
Spark Sql Databricks Notebooks
Databricks Notebooks
Databricks permissions
AWS IAM, ADLS ACLs
Databricks compatible PySpark code
Airflow DAGs & Databricks Jobs
55-66 % reduction in costs and 2-3x reduction in
timelines by using automation tools
Data Migration
Assessment & Design
Manual
Migration
Workloads Migration, Validation Cutover Operations
17- 20 Weeks
8 Weeks
Using
Automation
Accelerated Data & Workloads Migration,
Validation
Accelerated
Assessment &
Design
Cutover
Operations
* Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered
Same tool
used for pre-
migration
Assessment
Our partner ecosystem accelerates migrations
ISV Partners and Migration Tools
Security
Governance
Consulting & SI Partners
Databricks
Migration
SWAT team +
CS Packaged
Services
For Migration
Cloud
Modernization with Databricks - recap
Why - costs, productivity, innovation → business
impact
Your competitors and market leaders are doing it
NOW
Databricks experts and automation strategy can
help you migrate faster, with much lower cost and
risk
Visit databricks.com/migration to learn more
Fireside chat
Matt Graves
VP of Enterprise Data & Analytics
GCI Communication Corp
Backup
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

Modernizing to a Cloud Data Architecture

  • 1. Modernizing to a cloud data architecture Guido Oswald, Solutions Architect, Databricks Matt Graves, VP of Enterprise Data & Analytics, GCI Communication Corp
  • 2. Agenda • Top reasons to modernize from Hadoop to Databricks • Success stories, technical and business benefits • Fast migrations with low costs & low risk • Fireside Chat: Matt Graves
  • 3. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries…
  • 4. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries… The data surge is placing tremendous pressure on traditional data and analytics infrastructure
  • 5. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries… The data surge is placing tremendous pressure on traditional data and analytics infrastructure Source: Gartner cited by Battery Ventures - Open Cloud report Cloud adoption is accelerating by $100B from 2021 - 2023
  • 6. Today, most enterprises struggle with data Siloed stacks increase data architecture complexity Data Warehousing Data Engineering Streaming Data Science & Machine Learning Extract Transform Streaming data sources Streaming Data Engine Analytics and BI Data marts Data warehouse Structured data Structured, semi-structured and unstructured data Structured, semi-structured and unstructured data Data Lake Data prep Data Lake Machine Learning Data Science Amazon Redshift Teradata Azure Synapse Google BigQuery Snowflake IBM Db2 SAP Oracle Autonomous Data Warehouse Hadoop Apache Airflow Amazon EMR Apache Spark Google Dataproc Cloudera Jupyter Amazon SageMaker Azure ML Studio MatLAB Domino Data Labs SAS TensorFlow PyTorch Apache Kafka Apache Spark Apache Flink Amazon Kinesis Azure Stream Analytics Tibco Spotfire Google Dataflow Confluent Disconnected systems and proprietary data formats make integration difficult Data Scientists Data Engineers Data Analysts Data Engineers Siloed data teams decrease productivity Load Real-time Database
  • 7. Is your architecture enabling growth? Legacy on-premise data and analytics architectures are not keeping up Hadoop costs rising when costs need to be cut Innovation hinges on ML and predictive insights Business agility requires real-time data
  • 8. Hadoop is costly, complex and ineffective Hadoop ecosystem is complex, hard to manage, and prone to failures 24/7 HDFS clusters that need to built for peak usage and are costly to upgrade • RIGID AND INELASTIC • DEVOPS INTENSIVE No out-of-box support for ML/AI and separate data and AI environments • LACKS AI CAPABILITIES Low Productivity Cost Prohibitive Slow Innovation X
  • 9. Enterprises need a modern data and analytics architecture CRITICAL REQUIREMENTS Cost-effective scale and performance in the cloud Easy to manage and highly reliable for diverse data Predictive and real-time insights to drive innovation
  • 10. Modernization delivers business value Forrester TEI study finds 417% ROI for companies switching to Databricks 47% Cost-savings from retiring legacy infrastructure 5% Increase in revenue 25% Data team productivity increase Source: Forrester TEI: The total economic impact of the Databricks Unified Analytics Platform
  • 11. The Databricks Lakehouse Platform is one simple platform to unify all your data, analytics, and AI workloads Original creators of popular data and machine learning open-source projects Global company with over 5,000 customers and more than 450 partners
  • 12. Data Warehouse Lakehouse One platform to unify all of your data, analytics, and AI workloads Data Lake
  • 13. Structured Semi-structured Unstructured Streaming Lakehouse Platform Data Engineering BI & SQL Analytics Real-time Data Applications Data Science & Machine Learning Data Management & Governance Open Data Lake SIMPLE OPEN COLLABORATIVE From BI to AI All your data, analytics and AI on one Lakehouse platform
  • 14. Data Eng, ML (Spark) Scalable apps on Columnar store (Hbase) ETL, SQL (Hive/ Impala) Databricks jobs / Delta Lake / SparkSQL (Highly tuned Spark engine: faster, less compute, one-stop-shop) Batch Process (MapReduce) Real-time Event Processing (Storm/ Spark) Databricks Spark jobs (orders of magnitude faster - but may need manual work) Databricks Structured Streaming (Spark Structured Streaming + Delta Lake: Streaming + Batch ingest) Databricks jobs/ Delta Lake (Highly tuned Spark engine: faster, less compute, one-stop- shop) Databricks Spark integrates w/ HBase on cloud (Alternatively: use cloud data stores well integrated with Databricks) Technology mapping: deliver better outcomes
  • 15. Automation for most workload types Data Migration Metastore Migration SQL Migration Security Scheduled Data pulls Orchestration HDFS Hive Databases / Tables / Views Impala Databases / Tables/ Views HDFS Hive Queries Spark Queries Sentry permissions /Ranger policies HDFS access permissions Sqoop statements Oozie Jobs Azure ADLS Gen 2, AWS S3, GCS Databricks Tables Databricks Tables Spark Sql Databricks Notebooks Spark Sql Databricks Notebooks Databricks Notebooks Databricks permissions AWS IAM, ADLS ACLs Databricks compatible PySpark code Airflow DAGs & Databricks Jobs
  • 16. 55-66 % reduction in costs and 2-3x reduction in timelines by using automation tools Data Migration Assessment & Design Manual Migration Workloads Migration, Validation Cutover Operations 17- 20 Weeks 8 Weeks Using Automation Accelerated Data & Workloads Migration, Validation Accelerated Assessment & Design Cutover Operations * Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered Same tool used for pre- migration Assessment
  • 17. Our partner ecosystem accelerates migrations ISV Partners and Migration Tools Security Governance Consulting & SI Partners Databricks Migration SWAT team + CS Packaged Services For Migration Cloud
  • 18. Modernization with Databricks - recap Why - costs, productivity, innovation → business impact Your competitors and market leaders are doing it NOW Databricks experts and automation strategy can help you migrate faster, with much lower cost and risk
  • 20. Fireside chat Matt Graves VP of Enterprise Data & Analytics GCI Communication Corp
  • 22. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.