Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
The New Big Data
Scott Shaw
© 2020 Cloudera, Inc. All rights reserved. 2
DATA MANAGEMENT IS SPREAD ALL OVER
47% 21%24%26%32%
On-premises Single cloudMulti cloudHybrid cloudPrivate cloud
Gartner recently warned that “Data and analytics leaders must
prepare for the complexities of multi cloud and intercloud
deployments to avoid potential performance issues… unplanned
cost overruns and ... difficulties with integration efforts.”
HBR June 2019
© 2020 Cloudera, Inc. All rights reserved. 3
“Enterprise IT doesn’t operate
at the speed of business.
Your IT group needs to perform
better than shadow IT.”Shadow IT as a % of overall IT spend
CIO Magazine
© 2020 Cloudera, Inc. All rights reserved. 4
HOW TIMES HAVE CHANGED
2008
SCALE 1 JOB TO
1000s OF SERVERS
2020
SCALE 1 PLATFORM TO
1000s OF USERS
© 2020 Cloudera, Inc. All rights reserved. 5
CLOUDERA - THE ENTERPRISE DATA CLOUD COMPANY
01
Collect
03
Report
05
Predict
04
Serve
02
Curate
Data
Engineering
Streaming &
Data Flow
Data
Warehouse
Operational
Database
Machine
Learning & AI
Security | Governance | Lineage | Management | Automation
Manage and secure the data lifecycle in any cloud or datacenter
© 2020 Cloudera, Inc. All rights reserved. 6
BUSINESS USE CASES REQUIRE THE DATA LIFECYCLE
An integrated lifecycle is easier to use, manage and secure
SUPPLY CHAIN
OPTIMIZATION
COMPUTER
VISION FOR QA
PREDICTIVE
MAINTENANCE
PROCESS
MONITORING
DASHBOARDS
REAL-TIME & TRANSACTIONAL DATA LIFECYCLE USE CASES
ENTERPRISE
DATA
ENTERPRISE
DATA CLOUD
ENTERPRISE
USE CASES
CONNECTED
PRODUCTS
CONNECTED
PRODUCTION
CONNECTED
SUPPLY CHAIN
CONNECTED
CONSUMER
THROUGHPUT
OPTIMIZATION
SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION
© 2020 Cloudera, Inc. All rights reserved. 7
CLOUDERA DATA PLATFORM
COMPONENT ARCHITECTURE
© 2020 Cloudera, Inc. All rights reserved. 9
THE ENTERPRISE DATA CLOUD
COMPONENTS
Traditional Platform Consumption:
• Data Hub Clusters
New analytic experiences:
• Data Warehouse
• Machine Learning
• Data Engineering
• Operational Database
• More to come
Control Plane services:
• Workload Manager
• Replication Manager
• Data Catalog
• Management Console
© 2020 Cloudera, Inc. All rights reserved. 10
KEY CONCEPTS & COMPONENTS
Environment
•1 Template
•1 Region
•1 VPC
•Multiple Roles/Buckets
Data Lake
•SDX: Atlas, Ranger,
Knox, IdBroker, CM
•Associated with
groups/users
Data Hub
Clusters /
Experiences
•DH templates
•ML Env
•DW Database
Catalogs/Virtual
Compute
1:1
1:N
ENVIRONMENTS
© 2020 Cloudera, Inc. All rights reserved. 11
KEY CONCEPTS & COMPONENTS
Typical user flow
Enterprise IT CDP Control Plane Enterprise Cloud Resources (IAM, Network, VMs, Buckets, etc.)
Management Console
1
Step 1
User connects to
CDP with their
enterprise identity
Step 2
They create an
environment and
data lake for their
enterprise
2
Environment
Step 3
They create data hub
clusters for traditional
workloads
Data Lake
Atlas
Ranger
Knox
IdBroker
FreeIPA
CM
HMS
3
BI Team Cluster ETL Team Cluster
4
Node 1 Node 2 Node 3
Step 4
They create access
points for containerized
analytic experiences
Node 1 Node 2 Node 3
Data Warehouse Experience Machine Learning Experience
© 2020 Cloudera, Inc. All rights reserved. 12
ENVIRONMENT
What is an environment?
Definition of where CDP creates
resources in a customer
environment.
A long running permanent cluster
called a Data Lake gets created
here.
© 2020 Cloudera, Inc. All rights reserved. 13
DATA LAKE
What is a Data Lake?
A common set of Services (SDX)
within an Environment that are
shared across multiple
Clusters/Experiences.
These include Services for:
• Security
• Auditing
• Governance
• Data Discovery
© 2020 Cloudera, Inc. All rights reserved. 14
DATA HUB CLUSTERS AND EXPERIENCES
What are the consumption options?
A Data Hub Cluster is a
customizable environment that
runs like a traditional Hadoop
cluster, but is designed to leverage
Cloud Storage.
An Experience is a container-based
compute environment for specific
purposes:
ML, DW, DE, OD, DF
© 2020 Cloudera, Inc. All rights reserved. 15
CONTROL PLANE
What is the Control Plane?
The Control Plane is the common
set of tools for management,
workload analysis, data movement
and data discovery across multiple
environments
PRODUCT WALKTHROUGH
HYBRID ARCHITECTURE
© 2020 Cloudera, Inc. All rights reserved. 18
TARGET ARCHITECTURE: THE ENTERPRISE DATA CLOUD
CDP Public Cloud
(platform-as-a-service)
Cloudera Runtime
Control Plane
Data Hub
Virtual Private
Clusters
DW, ML, DE, …
Self-Serve
Experiences
Data Hub
Virtual Private
Clusters
DW, ML, DE, …
Self-Serve
Experiences
CDP On-Prem
(installable software)
AzureAWS GCP
Private
Cloud
CDP Datacenter
© 2020 Cloudera, Inc. All rights reserved. 19
OpenShift 101
Master Nodes
Worker Node
➔ OpenShift → Kubernetes++
➔ K8s → System to deploy, scale, manage apps
➔ Applications → exposed through services
➔ Service → collection of Pods
➔ Pods → collection of containers
➔ Containers → runtime environment
Worker Node Worker Node
Container
Pod
CPU RAM Disk CPU RAM Disk CPU RAM Disk
Kubelet Kubelet Kubelet
THANK YOU

More Related Content

The new big data

  • 1. The New Big Data Scott Shaw
  • 2. © 2020 Cloudera, Inc. All rights reserved. 2 DATA MANAGEMENT IS SPREAD ALL OVER 47% 21%24%26%32% On-premises Single cloudMulti cloudHybrid cloudPrivate cloud Gartner recently warned that “Data and analytics leaders must prepare for the complexities of multi cloud and intercloud deployments to avoid potential performance issues… unplanned cost overruns and ... difficulties with integration efforts.” HBR June 2019
  • 3. © 2020 Cloudera, Inc. All rights reserved. 3 “Enterprise IT doesn’t operate at the speed of business. Your IT group needs to perform better than shadow IT.”Shadow IT as a % of overall IT spend CIO Magazine
  • 4. © 2020 Cloudera, Inc. All rights reserved. 4 HOW TIMES HAVE CHANGED 2008 SCALE 1 JOB TO 1000s OF SERVERS 2020 SCALE 1 PLATFORM TO 1000s OF USERS
  • 5. © 2020 Cloudera, Inc. All rights reserved. 5 CLOUDERA - THE ENTERPRISE DATA CLOUD COMPANY 01 Collect 03 Report 05 Predict 04 Serve 02 Curate Data Engineering Streaming & Data Flow Data Warehouse Operational Database Machine Learning & AI Security | Governance | Lineage | Management | Automation Manage and secure the data lifecycle in any cloud or datacenter
  • 6. © 2020 Cloudera, Inc. All rights reserved. 6 BUSINESS USE CASES REQUIRE THE DATA LIFECYCLE An integrated lifecycle is easier to use, manage and secure SUPPLY CHAIN OPTIMIZATION COMPUTER VISION FOR QA PREDICTIVE MAINTENANCE PROCESS MONITORING DASHBOARDS REAL-TIME & TRANSACTIONAL DATA LIFECYCLE USE CASES ENTERPRISE DATA ENTERPRISE DATA CLOUD ENTERPRISE USE CASES CONNECTED PRODUCTS CONNECTED PRODUCTION CONNECTED SUPPLY CHAIN CONNECTED CONSUMER THROUGHPUT OPTIMIZATION SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION
  • 7. © 2020 Cloudera, Inc. All rights reserved. 7 CLOUDERA DATA PLATFORM
  • 9. © 2020 Cloudera, Inc. All rights reserved. 9 THE ENTERPRISE DATA CLOUD COMPONENTS Traditional Platform Consumption: • Data Hub Clusters New analytic experiences: • Data Warehouse • Machine Learning • Data Engineering • Operational Database • More to come Control Plane services: • Workload Manager • Replication Manager • Data Catalog • Management Console
  • 10. © 2020 Cloudera, Inc. All rights reserved. 10 KEY CONCEPTS & COMPONENTS Environment •1 Template •1 Region •1 VPC •Multiple Roles/Buckets Data Lake •SDX: Atlas, Ranger, Knox, IdBroker, CM •Associated with groups/users Data Hub Clusters / Experiences •DH templates •ML Env •DW Database Catalogs/Virtual Compute 1:1 1:N ENVIRONMENTS
  • 11. © 2020 Cloudera, Inc. All rights reserved. 11 KEY CONCEPTS & COMPONENTS Typical user flow Enterprise IT CDP Control Plane Enterprise Cloud Resources (IAM, Network, VMs, Buckets, etc.) Management Console 1 Step 1 User connects to CDP with their enterprise identity Step 2 They create an environment and data lake for their enterprise 2 Environment Step 3 They create data hub clusters for traditional workloads Data Lake Atlas Ranger Knox IdBroker FreeIPA CM HMS 3 BI Team Cluster ETL Team Cluster 4 Node 1 Node 2 Node 3 Step 4 They create access points for containerized analytic experiences Node 1 Node 2 Node 3 Data Warehouse Experience Machine Learning Experience
  • 12. © 2020 Cloudera, Inc. All rights reserved. 12 ENVIRONMENT What is an environment? Definition of where CDP creates resources in a customer environment. A long running permanent cluster called a Data Lake gets created here.
  • 13. © 2020 Cloudera, Inc. All rights reserved. 13 DATA LAKE What is a Data Lake? A common set of Services (SDX) within an Environment that are shared across multiple Clusters/Experiences. These include Services for: • Security • Auditing • Governance • Data Discovery
  • 14. © 2020 Cloudera, Inc. All rights reserved. 14 DATA HUB CLUSTERS AND EXPERIENCES What are the consumption options? A Data Hub Cluster is a customizable environment that runs like a traditional Hadoop cluster, but is designed to leverage Cloud Storage. An Experience is a container-based compute environment for specific purposes: ML, DW, DE, OD, DF
  • 15. © 2020 Cloudera, Inc. All rights reserved. 15 CONTROL PLANE What is the Control Plane? The Control Plane is the common set of tools for management, workload analysis, data movement and data discovery across multiple environments
  • 18. © 2020 Cloudera, Inc. All rights reserved. 18 TARGET ARCHITECTURE: THE ENTERPRISE DATA CLOUD CDP Public Cloud (platform-as-a-service) Cloudera Runtime Control Plane Data Hub Virtual Private Clusters DW, ML, DE, … Self-Serve Experiences Data Hub Virtual Private Clusters DW, ML, DE, … Self-Serve Experiences CDP On-Prem (installable software) AzureAWS GCP Private Cloud CDP Datacenter
  • 19. © 2020 Cloudera, Inc. All rights reserved. 19 OpenShift 101 Master Nodes Worker Node ➔ OpenShift → Kubernetes++ ➔ K8s → System to deploy, scale, manage apps ➔ Applications → exposed through services ➔ Service → collection of Pods ➔ Pods → collection of containers ➔ Containers → runtime environment Worker Node Worker Node Container Pod CPU RAM Disk CPU RAM Disk CPU RAM Disk Kubelet Kubelet Kubelet