Chapter 1
Chapter 1
Chapter 1
Databricks
Lakehouse
I N T R O D U C T I O N T O D ATA B R I C K S
Kevin Barlow
Data Analytics Practitioner
The Data Warehouse
Data Warehouse
Pros
Highly performant
Cons
Very expensive
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
INTRODUCTION TO DATABRICKS
The Data Lake
Data Lake
Pros
Very flexible
Cost effective
Cons
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
INTRODUCTION TO DATABRICKS
Birth of the Lakehouse
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
INTRODUCTION TO DATABRICKS
Birth of the Lakehouse
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
INTRODUCTION TO DATABRICKS
The Databricks Lakehouse
The Databricks Lakehouse Platform
Simplified architecture
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
INTRODUCTION TO DATABRICKS
Databricks Architecture Benefits
Unification Multi-Cloud
Benefits of data warehouse and data lake No lock-in to a specific cloud platform
INTRODUCTION TO DATABRICKS
Databricks Development Benefits
Collaborative Open-Source
Ability to work in same platform in real- Support for most popular languages
time (Python, R, Scala, SQL)
INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Core features of the
Databricks
Lakehouse Platform
I N T R O D U C T I O N T O D ATA B R I C K S
Kevin Barlow
Data Practitioner
Apache Spark
Apache Spark is an open-source data processing framework and is the engine underneath
Databricks.
DataCamp Courses
Introduction to Pyspark
INTRODUCTION TO DATABRICKS
Benefits of Spark
Key Benefits:
4. Databricks optimizations
1 https://spark.apache.org/docs/latest/cluster-overview.html
INTRODUCTION TO DATABRICKS
Cloud computing basics
INTRODUCTION TO DATABRICKS
Databricks Compute
Clusters
SQL Warehouses
SQL only
BI use cases
Photon
INTRODUCTION TO DATABRICKS
Cloud data storage
INTRODUCTION TO DATABRICKS
Delta
Delta is an open-source data storage file
format, and provides:
ACID transactions
Schema evolution
Table history
Time-travel
1 delta.io
INTRODUCTION TO DATABRICKS
Unity Catalog
Unity Catalog is an open data governance
strategy that controls access to all data
assets in the Databricks Lakehouse platform.
INTRODUCTION TO DATABRICKS
Databricks UI
Designed for easier access to capabilities
based on your data workload.
INTRODUCTION TO DATABRICKS
Let's review!
I N T R O D U C T I O N T O D ATA B R I C K S
Administering a
Databricks
workspace
I N T R O D U C T I O N T O D ATA B R I C K S
Kevin Barlow
Data Practitioner
Account Admin
Key Responsibilities:
INTRODUCTION TO DATABRICKS
Account Console
https://accounts.cloud.databricks.com/
INTRODUCTION TO DATABRICKS
Account Console - Workspaces
https://accounts.cloud.databricks.com/
INTRODUCTION TO DATABRICKS
Account Console - Data
https://accounts.cloud.databricks.com/
INTRODUCTION TO DATABRICKS
Account Console - Users & Groups
https://accounts.cloud.databricks.com/
INTRODUCTION TO DATABRICKS
Account Console - Settings
https://accounts.cloud.databricks.com/
INTRODUCTION TO DATABRICKS
Workspace Admin
Key Responsibilities:
INTRODUCTION TO DATABRICKS
Data Plane
Contains all of the customer's assets needed for computation with Databricks.
INTRODUCTION TO DATABRICKS
Control Plane
The portion of the platform that is managed and hosted by Databricks.
INTRODUCTION TO DATABRICKS
Databricks Platform Architecture
Each cloud will have the same general
options to create a workspace:
Account Console
1 https://docs.databricks.com/getting-started/overview.html
INTRODUCTION TO DATABRICKS
Let's review!
I N T R O D U C T I O N T O D ATA B R I C K S
Setting up a
Databricks
workspace example
I N T R O D U C T I O N T O D ATA B R I C K S
Kevin Barlow
Data Practitioner
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S