Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
?Google Cloud
Data Platform
GoDataFest Workshop
28-10-2022
Agenda
1. Intro GCP for Data
09:00 - 09:30
2. Roles & tools (per role)
09:35 - 10:10
3. Build on GCP (workgroups)
10:30 - 12:30
● Data Democratization
● Why (Google) Cloud?
● Data platforms
● Data Engineer
● Analytics Engineer
● Analyst
● clean & prep data
● build the model
● create & share insights
Introductions
Thomas van Latum - thomasvanlatum@godatadriven.com
Who are we?
Bas Leenders - bas@gcompany.nl
data analytics & BI
prescriptive
predictive
descriptive
diagnostic
1
2 3
4
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
Infrastructure
Big Data and
Machine Learning
Application
Development
G Suite
For the past 15 years, Google
has been building out the fastest,
most powerful, highest quality
cloud infrastructure on the planet. Images by Connie
Zhou
Google Global Cache
(GGC) edge nodes
Points of presence (>100)
Network fiber
FASTER (US, JP, TW) 2016
Unity (US, JP) 2010
SJC (JP, HK, SG) 2013
Monet (US, BR) 2017
Google network
More than a collection of data centers
research.google.com/pubs/papers.html
Google has been innovating data technologies
2002 2004 2006 2008 2010 2012 2014 2016
GFS
MapReduce TensorFlow
Bigtable
Dremel
Colossus
Flume
Megastore
Spanner
Millwheel
Pub/Sub
F1
Google needed to invent data processing methods
2002 2004 2006 2008 2010 2012 2014
Google has been innovating data
technologies
2016
Cloud Storage
Dataproc ML Engine
Bigtable
BigQuery
Cloud Storage
Dataflow
Datastore
Dataflow
Pub/Sub
Google then shared it’s innovations
research.google.com/pubs/papers.html
Auto ML
2018
process & analyze
transform to information
● meaningful
● usable
ingest
read raw data
● streaming
● batch
● ad-hoc
store
store in the right format
● durable
● accessible
explore & visualize
convert to insights
● insightful
● shareable
data lifecycle – 4 steps
@pvergadia #GCPSketchnote
the modern data platform
data life cycle with BigQuery & Looker
raw data clean sources business logic
data engineer analytics engineer explorer
BigQuery / dataform
ERP
source systems
Finance
HR
Marketing
Other
data platform
reports
viewer
Looker
● storage & compute
● extremely fast, very cost-efficient
● use (standard) SQL
● integrate
○ Cloud SQL
○ Data Studio
○ Connected Sheets
● BQ ML
○ machine learning “for business”
○ SQL-powered
○ brings ML to the data
modern data warehouse
BigQuery
What is BigQuery?
Big(!) Data with Big Query
more info→ !
Dataform & BigQuery
● Open source, SQL-based language to manage data transformations
● Fully managed, serverless orchestration for data pipelines
● Fully featured cloud development environment to develop with SQL
Looker
Looker Data Platform
● modern data technology
● modern problems
databases then databases now
maximize efficiency
compensate for inefficiency
database technology has changed
Bottleneck Chaos
and/or
NEXT!
analyst
bottleneck chaos
two problems Looker solves
Data Lake
Data Storage
Best practice for companies
to centralise their data
Data Extraction
Data Analysts extracting your
data into workbooks or
aggregated cubes
HARD TO MAINTAIN
HARD TO SCALE
→ DATA CHAOS
Data Visualisation
BI tool sits on top of these
siloed workbooks to present
dashboards and reports
LIMITED DATA
MULTIPLE TRUTHS
→ DATA BOTTLENECK
Tech team
headcount
legacy BI “workbook” architecture
Looker’s universal semantic model
governed metrics best-in-class APIs in-database
Git version control security Cloud
integrated insights
modern BI & analytics data-driven workflows custom applications
SQL in results back
Agenda
1. Intro GCP for Data
09:00 - 09:30
2. Roles & tools (per role)
09:45 - 10:30
3. Build on GCP (workgroups)
10:45 - 12:30
● Data Democratization
● Why (Google) Cloud?
● Data platforms
● Data Engineer
● Analytics Engineer
● Analyst
● clean & prep data
● build the model
● create & share insights
Looker User
• Looker Explorer
• Looker Dashboarding
Looker Dev
• (BigQuery & Dataform)
• Looker & LookML
Data Engineer
• BigQuery
• Dataform
• Terraform
Tools by role
Pick your preferred role → https://leend.rs/GDF-pick (with your Google-account!)
Analytics Engineer
Data Analyst
Cloud Data Engineer
Looker User
• Looker Explorer
• Looker Dashboarding
Looker Dev
• (BigQuery & Dataform)
• Looker & LookML
Data Engineer
• BigQuery
• Dataform
• Terraform
Tools by role
Pick your preferred role → https://leend.rs/GDF-pick (with your Google-account!)
Data Analyst
Analytics Translator
Machine Learning Engineer
Data Architect
Data Scientist
Business Analyst
Analytics Engineer
Cloud Data Engineer
Tools by role
Looker User
• Looker Explorer
• Looker Dashboarding
Looker Data Explorer - Qwik Start
→ https://leend.rs/GDF-QL-An1
Filtering and Sorting Data in Looker
→ https://leend.rs/GDF-QL-An2
Data Engineer
• BigQuery
• Dataform
• Terraform
console.cloud.google.com/ ...
?project=go-data-fest
→ https://leend.rs/GDF-project
Looker Dev
• (BigQuery & Dataform)
• Looker & LookML
Looker Developer - Qwik Start
→ https://leend.rs/GDF-QL-AE1
Creating Measures and Dimensions
Using LookML
→ https://leend.rs/GDF-QL-AE2
Agenda
1. Intro GCP for Data
2. Roles & tools
(per role)
3. Build on GCP
(mixed workgroups)
● Data Democratization
● Why (Google) Cloud?
● Data platforms
● Data Engineer
● Analytics Engineer
● Analyst
● clean & prep data
● build the model
● create & share insights
Groups & roles
30
Team Google-user Data Engineer Looker Dev Looker User
Arthur erik.clabbers@... 1
fcm073@... 1
haydnruthams@... 1
thomas.hantke@... 1
Ford caiofabiomc@... 1
chung.kally@... 1
debbysmit@... 1
mfharms6@... 1
spstrempel@... 1
Trillian christovvillamon@... 1
e.j.m.hamberg@... 1
saheli.de@... 1
vhverhagen@... 1
1. Build the dataset
2. Build the Model, Explore & Views
3. Create and Share Dashboards
→ leend.rs/GDF-project
→ gcompany.eu.looker.com
Now let’s have some fun!
Insights needed
● What products show yearly
seasonal (sales) trends?
● How are stocks in the distribution
centers doing?
●
● How are order prices related to
product list prices
● Is there a relationship between a
product’s events & sales

More Related Content

Workshop on Google Cloud Data Platform

  • 2. Agenda 1. Intro GCP for Data 09:00 - 09:30 2. Roles & tools (per role) 09:35 - 10:10 3. Build on GCP (workgroups) 10:30 - 12:30 ● Data Democratization ● Why (Google) Cloud? ● Data platforms ● Data Engineer ● Analytics Engineer ● Analyst ● clean & prep data ● build the model ● create & share insights
  • 3. Introductions Thomas van Latum - thomasvanlatum@godatadriven.com Who are we? Bas Leenders - bas@gcompany.nl
  • 4. data analytics & BI prescriptive predictive descriptive diagnostic
  • 8. Infrastructure Big Data and Machine Learning Application Development G Suite
  • 9. For the past 15 years, Google has been building out the fastest, most powerful, highest quality cloud infrastructure on the planet. Images by Connie Zhou
  • 10. Google Global Cache (GGC) edge nodes Points of presence (>100) Network fiber FASTER (US, JP, TW) 2016 Unity (US, JP) 2010 SJC (JP, HK, SG) 2013 Monet (US, BR) 2017 Google network More than a collection of data centers
  • 11. research.google.com/pubs/papers.html Google has been innovating data technologies 2002 2004 2006 2008 2010 2012 2014 2016 GFS MapReduce TensorFlow Bigtable Dremel Colossus Flume Megastore Spanner Millwheel Pub/Sub F1 Google needed to invent data processing methods
  • 12. 2002 2004 2006 2008 2010 2012 2014 Google has been innovating data technologies 2016 Cloud Storage Dataproc ML Engine Bigtable BigQuery Cloud Storage Dataflow Datastore Dataflow Pub/Sub Google then shared it’s innovations research.google.com/pubs/papers.html Auto ML 2018
  • 13. process & analyze transform to information ● meaningful ● usable ingest read raw data ● streaming ● batch ● ad-hoc store store in the right format ● durable ● accessible explore & visualize convert to insights ● insightful ● shareable data lifecycle – 4 steps
  • 15. the modern data platform
  • 16. data life cycle with BigQuery & Looker raw data clean sources business logic data engineer analytics engineer explorer BigQuery / dataform ERP source systems Finance HR Marketing Other data platform reports viewer Looker
  • 17. ● storage & compute ● extremely fast, very cost-efficient ● use (standard) SQL ● integrate ○ Cloud SQL ○ Data Studio ○ Connected Sheets ● BQ ML ○ machine learning “for business” ○ SQL-powered ○ brings ML to the data modern data warehouse BigQuery What is BigQuery?
  • 18. Big(!) Data with Big Query more info→ !
  • 19. Dataform & BigQuery ● Open source, SQL-based language to manage data transformations ● Fully managed, serverless orchestration for data pipelines ● Fully featured cloud development environment to develop with SQL
  • 20. Looker Looker Data Platform ● modern data technology ● modern problems
  • 21. databases then databases now maximize efficiency compensate for inefficiency database technology has changed
  • 23. Data Lake Data Storage Best practice for companies to centralise their data Data Extraction Data Analysts extracting your data into workbooks or aggregated cubes HARD TO MAINTAIN HARD TO SCALE → DATA CHAOS Data Visualisation BI tool sits on top of these siloed workbooks to present dashboards and reports LIMITED DATA MULTIPLE TRUTHS → DATA BOTTLENECK Tech team headcount legacy BI “workbook” architecture
  • 24. Looker’s universal semantic model governed metrics best-in-class APIs in-database Git version control security Cloud integrated insights modern BI & analytics data-driven workflows custom applications SQL in results back
  • 25. Agenda 1. Intro GCP for Data 09:00 - 09:30 2. Roles & tools (per role) 09:45 - 10:30 3. Build on GCP (workgroups) 10:45 - 12:30 ● Data Democratization ● Why (Google) Cloud? ● Data platforms ● Data Engineer ● Analytics Engineer ● Analyst ● clean & prep data ● build the model ● create & share insights
  • 26. Looker User • Looker Explorer • Looker Dashboarding Looker Dev • (BigQuery & Dataform) • Looker & LookML Data Engineer • BigQuery • Dataform • Terraform Tools by role Pick your preferred role → https://leend.rs/GDF-pick (with your Google-account!) Analytics Engineer Data Analyst Cloud Data Engineer
  • 27. Looker User • Looker Explorer • Looker Dashboarding Looker Dev • (BigQuery & Dataform) • Looker & LookML Data Engineer • BigQuery • Dataform • Terraform Tools by role Pick your preferred role → https://leend.rs/GDF-pick (with your Google-account!) Data Analyst Analytics Translator Machine Learning Engineer Data Architect Data Scientist Business Analyst Analytics Engineer Cloud Data Engineer
  • 28. Tools by role Looker User • Looker Explorer • Looker Dashboarding Looker Data Explorer - Qwik Start → https://leend.rs/GDF-QL-An1 Filtering and Sorting Data in Looker → https://leend.rs/GDF-QL-An2 Data Engineer • BigQuery • Dataform • Terraform console.cloud.google.com/ ... ?project=go-data-fest → https://leend.rs/GDF-project Looker Dev • (BigQuery & Dataform) • Looker & LookML Looker Developer - Qwik Start → https://leend.rs/GDF-QL-AE1 Creating Measures and Dimensions Using LookML → https://leend.rs/GDF-QL-AE2
  • 29. Agenda 1. Intro GCP for Data 2. Roles & tools (per role) 3. Build on GCP (mixed workgroups) ● Data Democratization ● Why (Google) Cloud? ● Data platforms ● Data Engineer ● Analytics Engineer ● Analyst ● clean & prep data ● build the model ● create & share insights
  • 30. Groups & roles 30 Team Google-user Data Engineer Looker Dev Looker User Arthur erik.clabbers@... 1 fcm073@... 1 haydnruthams@... 1 thomas.hantke@... 1 Ford caiofabiomc@... 1 chung.kally@... 1 debbysmit@... 1 mfharms6@... 1 spstrempel@... 1 Trillian christovvillamon@... 1 e.j.m.hamberg@... 1 saheli.de@... 1 vhverhagen@... 1
  • 31. 1. Build the dataset 2. Build the Model, Explore & Views 3. Create and Share Dashboards → leend.rs/GDF-project → gcompany.eu.looker.com Now let’s have some fun! Insights needed ● What products show yearly seasonal (sales) trends? ● How are stocks in the distribution centers doing? ● ● How are order prices related to product list prices ● Is there a relationship between a product’s events & sales