Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Datawarehouse To Data Lakehouse

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

A Data Journey: from the Data

Warehouse to the Data Lakehouse


SUPPORT
Pablo A. Doval
Principal Data Architect – Plain Concepts UK

“I work with code and data, but don’t tell my mom; she thinks I’m a
piano player in a whorehouse.”

@PabloDoval

palvarez@plainconcepts.com
Innovation through Data and AI…
Smart Video Content Real Time Knowledge Lead Cases
Maintenance classifier Churn Analysis Extraction Selection

Anomaly Audience Automatic Document Semantic


Detection Segmentation Case Classifier Classification Search

Behavioural Smart Pricing Case Outcome Clause Outlier Settlement


Analysis Models Forecasting Detection Forecast
available

trustworthy

accurate
… requieres data first.
easy to Use

timely

authoritative
Where are we?

Unstructured, chaotic data estate

Lack of skills and productivity

Solutions not Enterprise-ready


We have evolved and transformed…
… but what about our data systems?
We have lots of data, but it is in silos
Reactive System Example

Fee earner work Customer


Customer Call Data Entry
Issue solved

Analytics
Proactive System Example

Automation Automation Automation Automation


and AI and AI and AI and AI

Fee earner work Customer


Customer Call Data Entry
Issue solved

Analytics Analytics Analytics Analytics


Some challenges ahead…
Multi-modal support (tabular, video, audio…)

Support advanced analytics through the data lifetime

Promote rapid prototyping, and rapid productionalization

Data dispersal due to proprietary systems and formats

Lack of common semantic model


A bit of data history…
We need semantic models
IoT Social Ads Marketplace
Once upon a time…
Once upon a time…
Finances Marketing

Operations HR

IT
Lessons Learned

End users / departments require certain technical skills

Ensure veracity and data quality is practically impossible

Data lineage management is also nearly impossible

Balance between cost and data governance


The mighty Data Warehouse…
Finances Marketing

Operations HR

IT
Lessons Learned

High cost in trying to create a single version on truth

Difficult to change without impacting the rest of the org

Lack of multi-modal support (Video, Audio…) and ML

Limited Support for Streaming


The Data Mart approach…
Finances

Operations

IT
Lessons Learned

Manage master data is a very complex challenge

Requires extra capacity and skills on IT team

Lack of multi-modal support (Video, Audio…) and ML

Limited Support for Streaming


The New World

Real Time Data

Highly Volatile Data Structures

Hybrid and Multi-vendor Ecosystems

AI/ML Capabilities
Data Lakes
The promise of the Data Lake
3 Develop business
use cases

1 Collect Everything

Storage

Data Lake Storage


(Optimized)

2 Store it all on the


Data Lake
What a Data Lake is *not*

Just storage / Azure Data Lake Storage ☺

Just a Hadoop/Spark/HPC cluster

An more modern kind of Data Warehouse


Disclaimer!

The next slides will be heavily based on our approach to


data lakes, as implemented by Sidra Data Platform. Far
from a sales pitch, I will use them to explain our approach
to some technical challenges; what has worked and what
needed to be improved over time.
Client Storage
Client Storage

Client Storage

LANDING ZONE
DATA INTAKE

USE CASES / CLIENT APPLICATIONS

SECURITY AND COMPLIANCE

DATA GOVERNANCE

AI & ML ENABLEMENT
Enabling Business
Cases via Client Apps
The road ahead…
Challenges with Data Lakes
Deployment Complexity

Real-Time (lambda architectures)

Reprocessing (Validations, etc…)

Append, Update and Merge (Right to be forgotten, etc…)

Keeping Historical Versions of Data


Enter the Delta Engine

Full ACID Transactions

Based on Parquet – Open Standard, Open Source

Runs on Spark
Enter the Delta Engine

Ingestion Tables Refined Tables Business-level Aggregates


(Bronze) (Silver) (Gold)
Delta Lake and the Data Lakehouse

Source: What is a Lakehouse? - The Databricks Blog (30-Jan-2020)


Our lessons learnt…
What have we learnt?

Integrated Data Quality processes

Integrated Data Mastering processes

3-tier Approach using Transactional Tables


SUPPORT

You might also like