The document provides an agenda and information about a GoDataFest workshop on Google Cloud Platform for data. The agenda includes an introduction to GCP for data, a session on roles and tools on GCP for different data roles, and a session where participants will build projects on GCP in mixed workgroups. It outlines the goals and tools used by different roles like data engineer, analytics engineer, and Looker user. It also provides information on Google Cloud technologies like BigQuery, Dataform, Looker, and how they fit into the modern data lifecycle and platform. Participants are then divided into mixed workgroups based on their preferred role and given insights to explore in their projects.
9. For the past 15 years, Google
has been building out the fastest,
most powerful, highest quality
cloud infrastructure on the planet. Images by Connie
Zhou
10. Google Global Cache
(GGC) edge nodes
Points of presence (>100)
Network fiber
FASTER (US, JP, TW) 2016
Unity (US, JP) 2010
SJC (JP, HK, SG) 2013
Monet (US, BR) 2017
Google network
More than a collection of data centers
11. research.google.com/pubs/papers.html
Google has been innovating data technologies
2002 2004 2006 2008 2010 2012 2014 2016
GFS
MapReduce TensorFlow
Bigtable
Dremel
Colossus
Flume
Megastore
Spanner
Millwheel
Pub/Sub
F1
Google needed to invent data processing methods
12. 2002 2004 2006 2008 2010 2012 2014
Google has been innovating data
technologies
2016
Cloud Storage
Dataproc ML Engine
Bigtable
BigQuery
Cloud Storage
Dataflow
Datastore
Dataflow
Pub/Sub
Google then shared it’s innovations
research.google.com/pubs/papers.html
Auto ML
2018
13. process & analyze
transform to information
● meaningful
● usable
ingest
read raw data
● streaming
● batch
● ad-hoc
store
store in the right format
● durable
● accessible
explore & visualize
convert to insights
● insightful
● shareable
data lifecycle – 4 steps
16. data life cycle with BigQuery & Looker
raw data clean sources business logic
data engineer analytics engineer explorer
BigQuery / dataform
ERP
source systems
Finance
HR
Marketing
Other
data platform
reports
viewer
Looker
17. ● storage & compute
● extremely fast, very cost-efficient
● use (standard) SQL
● integrate
○ Cloud SQL
○ Data Studio
○ Connected Sheets
● BQ ML
○ machine learning “for business”
○ SQL-powered
○ brings ML to the data
modern data warehouse
BigQuery
What is BigQuery?
19. Dataform & BigQuery
● Open source, SQL-based language to manage data transformations
● Fully managed, serverless orchestration for data pipelines
● Fully featured cloud development environment to develop with SQL
23. Data Lake
Data Storage
Best practice for companies
to centralise their data
Data Extraction
Data Analysts extracting your
data into workbooks or
aggregated cubes
HARD TO MAINTAIN
HARD TO SCALE
→ DATA CHAOS
Data Visualisation
BI tool sits on top of these
siloed workbooks to present
dashboards and reports
LIMITED DATA
MULTIPLE TRUTHS
→ DATA BOTTLENECK
Tech team
headcount
legacy BI “workbook” architecture
24. Looker’s universal semantic model
governed metrics best-in-class APIs in-database
Git version control security Cloud
integrated insights
modern BI & analytics data-driven workflows custom applications
SQL in results back
25. Agenda
1. Intro GCP for Data
09:00 - 09:30
2. Roles & tools (per role)
09:45 - 10:30
3. Build on GCP (workgroups)
10:45 - 12:30
● Data Democratization
● Why (Google) Cloud?
● Data platforms
● Data Engineer
● Analytics Engineer
● Analyst
● clean & prep data
● build the model
● create & share insights
26. Looker User
• Looker Explorer
• Looker Dashboarding
Looker Dev
• (BigQuery & Dataform)
• Looker & LookML
Data Engineer
• BigQuery
• Dataform
• Terraform
Tools by role
Pick your preferred role → https://leend.rs/GDF-pick (with your Google-account!)
Analytics Engineer
Data Analyst
Cloud Data Engineer
27. Looker User
• Looker Explorer
• Looker Dashboarding
Looker Dev
• (BigQuery & Dataform)
• Looker & LookML
Data Engineer
• BigQuery
• Dataform
• Terraform
Tools by role
Pick your preferred role → https://leend.rs/GDF-pick (with your Google-account!)
Data Analyst
Analytics Translator
Machine Learning Engineer
Data Architect
Data Scientist
Business Analyst
Analytics Engineer
Cloud Data Engineer
28. Tools by role
Looker User
• Looker Explorer
• Looker Dashboarding
Looker Data Explorer - Qwik Start
→ https://leend.rs/GDF-QL-An1
Filtering and Sorting Data in Looker
→ https://leend.rs/GDF-QL-An2
Data Engineer
• BigQuery
• Dataform
• Terraform
console.cloud.google.com/ ...
?project=go-data-fest
→ https://leend.rs/GDF-project
Looker Dev
• (BigQuery & Dataform)
• Looker & LookML
Looker Developer - Qwik Start
→ https://leend.rs/GDF-QL-AE1
Creating Measures and Dimensions
Using LookML
→ https://leend.rs/GDF-QL-AE2
29. Agenda
1. Intro GCP for Data
2. Roles & tools
(per role)
3. Build on GCP
(mixed workgroups)
● Data Democratization
● Why (Google) Cloud?
● Data platforms
● Data Engineer
● Analytics Engineer
● Analyst
● clean & prep data
● build the model
● create & share insights
30. Groups & roles
30
Team Google-user Data Engineer Looker Dev Looker User
Arthur erik.clabbers@... 1
fcm073@... 1
haydnruthams@... 1
thomas.hantke@... 1
Ford caiofabiomc@... 1
chung.kally@... 1
debbysmit@... 1
mfharms6@... 1
spstrempel@... 1
Trillian christovvillamon@... 1
e.j.m.hamberg@... 1
saheli.de@... 1
vhverhagen@... 1
31. 1. Build the dataset
2. Build the Model, Explore & Views
3. Create and Share Dashboards
→ leend.rs/GDF-project
→ gcompany.eu.looker.com
Now let’s have some fun!
Insights needed
● What products show yearly
seasonal (sales) trends?
● How are stocks in the distribution
centers doing?
●
● How are order prices related to
product list prices
● Is there a relationship between a
product’s events & sales