Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Simplifying Real-Time
Architectures for IoT using
Apache Kudu
Vijay Raja| Solutions Marketing Lead, IoT
Ryan Lippert | Product Marketing, Operational DB
2© Cloudera, Inc. All rights reserved.
IoT – Key Drivers & Objectives
Drive Internal
Efficiencies
Improve Product
& Customer Exp.
New Services &
Business Models
• Predictive Maintenance
• Real-time monitoring
• Ops optimization
• Reduced equipment
down-times
• Product Usage Analytics
• Personalized products &
offerings
• Improved Product
Development
• New usage based
business models
• New service offerings
• E.g. On Command Connect
• Remote Monitoring
Who are my customers?
How are they using my products?
How can I lower downtime?
How can I drive efficiencies?
How do we implement a usage-based
model?
How can I launch new revenue streams?
3© Cloudera, Inc. All rights reserved.
2 PB of data/car/ year 1 – 2 TB of data / day 1 – 5 TB of data / day
4© Cloudera, Inc. All rights reserved.
IoT Data Characteristics
- The Foundation of Hadoop’s Potential
IoT data comes from a variety of different sources
• Massive volumes of intermittent data streams
• Generated from a variety of data sources
• Predominantly time-series
• Can come in streams (real-time) or batches
• Diverse data structures and schemas
• Some of it may be perishable
Combining sensor data with contextual data is the key to
value creation from IoT
5© Cloudera, Inc. All rights reserved.
Polling Question - 1
Where is your organization in your IoT journey?
A. Not sure where to start
B. Currently exploring use cases
C. Implementing our first IoT use case
D. Already deployed first IoT use case
E. Multiple IoT use cases in production
(Single Choice)
6© Cloudera, Inc. All rights reserved.
The IoT Ecosystem & Architecture
IoT Gateway
Data Center
Gateway
• Data Routing
• Edge-Processing
• Edge-Storage
IoT Data Storage, Processing & Analytics
Centralized IoT Data Analytics
• Time Series Data, Trends
• Machine Learning
• Context Enrichment
• Deeper business insights
Distributed Data
Processing & Analytics
• Cloud & On-Premise
Cloud
Sensors/ Things
• Analytics at the edge
• For Immediate
response
IoT Analytics
Enterprise Data Sources
7© Cloudera, Inc. All rights reserved.
What Happens at the Edge & What happens in the Cloud?
• Analytics that needs to be acted upon
immediately
• Low latency req. - Hazard detection,
collision avoidance etc.
• Human response times
• Context Enrichment
• Time series Analysis
• Comparative / Trend analysis
• Machine Learning
Cloud
Analytics
Edge
Analytics
Cloud
Analytics
8© Cloudera, Inc. All rights reserved.
Cloudera Enterprise – Hadoop as a Data Platform for IoT
Sensors/ IoT
Data Sources
Internal Systems External Sources
BI Solutions Real-Time AppsSearch Data Science
Workbench
SQL
Machine
Learning
Data Center
Cloud
Sensor/ IoT Data
IoT Gateway
• Data Storage
• Data Processing
• Machine Learning
• Real-time Analytics
OPERATIONS
Cloudera Manager
Cloudera Director
DATA
MANAGEMENT
Cloudera Navigator
Encrypt and KeyTrustee
Optimizer
BATCH
Sqoop
REAL-TIME
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Partners
9© Cloudera, Inc. All rights reserved.
IoT: Lots of Buzz, but what is the core concept?
And critically, what do we need from our infrastructure?
IoT promises prediction
and optimization, but
often delivers
monitoring.
The right solution allows you to
analyze data and serve
information in time to change
business outcomes.
That means the right solution is
built on real-time analytics.
10© Cloudera, Inc. All rights reserved.
IoT: Driven by Data
11© Cloudera, Inc. All rights reserved.
Polling Question - 2
What area of the real-time data chain does your organization need the
most help with?
A. Data ingest
B. Data processing
C. Data serving
D. All of the above
(Single Choice)
12© Cloudera, Inc. All rights reserved.
HDFS
Fast Scans, Analytics
and Processing of
Stored Data
Fast On-Line
Updates &
Data Serving
Arbitrary Storage
(Active Archive)
Fast Analytics
(on fast-changing or
frequently-updated data)
Traditional Hadoop Databases Leave a Gap
Use cases that fall between HDFS and HBase were difficult to manage
Unchanging
Fast Changing
Frequent Updates
HBase
Append-Only
Real-Time
Complex Hybrid
Architectures
Analytic
Gap
Pace of Analysis
PaceofData
13© Cloudera, Inc. All rights reserved.
The Trouble with Lambda
Batch Layer
Serving Layer
Speed Layer
New Data
Data Lake
(HDFS)
Precompute
Views
Stream or
Micro Batch
Increment
Views
Data
Application
“Real-time” Increment
Batch Recompute
Merge
Hadoop
Storm/Spark
HBase
Impala
Code must be kept in sync
Restatement is difficult
14© Cloudera, Inc. All rights reserved.
Updateable Analytic Storage
Simple real-time analytics and updates with Apache Kudu
Kudu: Storage for fast analytics on fast data
• Simplified architecture for building real-time analytic
applications
• Designed for next-generation hardware for faster analytic
performance across frameworks
• Native Hadoop storage engine
Flexibility for the right tools for the right use
case in one platform
• Only analytic database for Hadoop with Kudu + Impala
• Simple real-time applications with Kudu + Spark
Use cases
• Time series data
• Machine data analytics
• Online reporting
STRUCTURED
Sqoop
UNSTRUCTURED
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
OTHER
Kite
NoSQL
HBase
FILESYSTEM
HDFS
RELATIONAL
Kudu
OBJECT
Cloud
15© Cloudera, Inc. All rights reserved.
HDFS
Fast Scans, Analytics
and Processing of
Stored Data
Fast On-Line
Updates &
Data Serving
Arbitrary Storage
(Active Archive)
Fast Analytics
(on fast-changing or
frequently-updated data)
Kudu: Fast Analytics on Fast-Changing Data
New storage engine enables new Hadoop use cases
Unchanging
Fast Changing
Frequent Updates
HBase
Append-Only
Real-Time
Kudu Kudu fills the Gap
Modern analytic
applications often
require complex data
flow & difficult
integration work to
move data between
HBase & HDFS
Analytic
Gap
Pace of Analysis
PaceofData
16© Cloudera, Inc. All rights reserved.
Better Together
Kudu Benefits from Integration with the Apache Ecosystem
Spark – Stream Processing for Kudu
• Open standard for real-time stream processing
• Effective for automating decision processes and machine
learning
• Use Cases include: Time Series Data & Machine Data
Analytics
Impala – High-Performance BI & SQL for Kudu
• Open standard for interactive SQL queries
• Powers analytic database workloads with flexibility, scale, and
open architecture
• Use Cases include: Online Reporting
17© Cloudera, Inc. All rights reserved.
Why Kudu, Why Cloudera?
A simultaneous combination of sequential and random reads and writes
Can you insert time series data
in real time? How long does it
take to prepare it for analysis?
Can you get results and act fast
enough to change outcomes?
Can you handle large volumes
of machine-generated data? Do
you have the tools to identify
problems or threats? Can your
system do machine learning?
Time Series Data Machine Data Analytics
18© Cloudera, Inc. All rights reserved.
Kudu Increases the Value of Time Series Data
Time Series
Inserts, updates, scans, lookups
Workload
Examples
Stream market data; IoT; fraud detection &
prevention; risk monitoring; connected cars;
Time series data is most valuable if you can
analyze it to change outcomes in real time.
Kudu simulateneously enables:
• Time series data inserted/updated as it arrives
• Analytic scans to find trends on fresh time series data
• Lookups to quickly visit the point in time where an
event occured
19© Cloudera, Inc. All rights reserved.
Kudu Keeps Your Business Operational
Machine Data
Analytics
Inserts, scans, lookups
Workload
Examples
Network threat detection; network health
monitoring; application performance
monitoring
Kudu can help spot problems before they
happen. Real-time data inserts with the ability to
analyze trends identifies potential problems.
Kudu identifies trouble through:
• Unlimited storage, yielding better historic trend analysis
• Fast inserts to enable an up-to-date network view
• Fast scans identify/flag undesired states for remedy
20© Cloudera, Inc. All rights reserved.
Operational DB: Real-Time Architecture
Driving the Model Through Machine Learning
Kafka
Spark
Streaming
Spark MLlib
IoT Analytics
Individual Session
Full Model/Learning
Genesis
Spark
1 Event
Occurs
2
Messaging
3
Stream
Processing 4
Land in
Relational
Store
5
Apply ML
Libraries
IoT Data
Sources
Other Data Sources
21© Cloudera, Inc. All rights reserved.
Operational DB: Real-Time Architecture
MLlib & K-Means: Defining Microsegments via Machine Learning
Height
Weight
Height
Weight
1 2
Height
Weight
3
Height
Weight
4
L
M
S
XL
L
M
S
XS
Near
Custom
?
22© Cloudera, Inc. All rights reserved.
Operational DB: Real-Time Architecture
Driving Prediction and Optimization
Kafka
Spark
Streaming
Spark MLlib
IoT Analytics
Individual Session
1
Data
Processed
Genesis
Spark
2
Request Processed/
Kudu Queried
3
4
Results
Returned
Results
Processed
5
Processed
Data
Returned
Full Model/Learning
IoT Data
Sources
Other Data Sources
23© Cloudera, Inc. All rights reserved.
Operational DB: Real-Time Architecture
Driving Prediction and Optimization
Step 1: Data Processed
Apache Spark processes the data from the event (car sensors, manufacturing,
wearables, etc), which potentially involves keeping a running list of the last X
number of events
Step 2: Request Processed/Kudu Queried
A Spark application uses the data gathered in step one to query Kudu’s database
in a predefined manner to look for similar patterns defined via machine learning
Step 3: Kudu Results Returned
Kudu returns the results from the query in step 2 back to Spark to determine what
needs to be returned to the application
Step 4: Results Processed
Spark associates the results from Kudu with the information stored from the
current event to determine the next step to feed back to the application
Step 5: Processed Data Returned
The machine-generated, best possible outcome is prescribed and served to the
application
24© Cloudera, Inc. All rights reserved.
Operational DB: IoT Use Case
Prediction and Optimization
Kafka
Spark
Streaming
Spark MLlib
Application
Individual Session
Sensor Data
Spark
Full Model/Learning
Data Request Sent For Stream Processing
Data Cleaned/Ordered/Processed, Then
Delivered to Kudu for Modelling
Automated processes based on machine
learning enable prediction and
optimization at a new level.
Illustrative,
models will likely
have >2
dimensions
IoT Data
Sources
Kudu
Other Data Sources
25© Cloudera, Inc. All rights reserved.
Key IoT Use Cases
26© Cloudera, Inc. All rights reserved.
Using Predictive Maintenance to Improve
Performance and Reduce Fleet Downtime
• Real-time visibility of 300,000+ trucks in
order to improve uptime and vehicle
performance
• OnCommand Connection is collecting
telematics and geolocation data across
the fleet
• Reduced maintenance costs to $.03 per
mile from $.12-$.15 per mile
• Centralizing data from 13 systems with
varying frequency and semantic
definitions
TRANSPORTATION
» PREDICTIVE MAINTENANCE
» IMPROVED SERVICE
» DATA DRIVEN PRODUCTS
DATA-DRIVEN
PRODUCTS
CASE STUDY
27© Cloudera, Inc. All rights reserved.
Predictive Maintenance on industrial-
grade turbines for hydro power stations
Challenge:
• Gather, store and analyze noise levels
from turbines for anomaly detection
Solution:
• Cloudera platform used to gather and
analyze acoustic data/audio files coming
from the turbines in real-time
• Using diagnostic solution to monitor the
health of turbines and predict failures
in advance
PREDICTIVE MAINTENANCE
» INDUSTRIAL IoT
» LOWERED DOWNTIME
» LOWERED COSTS
Predictive Maintenance - Turbines
DATA-DRIVEN
PROCESS
CASE STUDY
DATA-DRIVEN
PRODUCTS
28© Cloudera, Inc. All rights reserved.
#1 Telematics provider with 130 billion
miles of driving data collected from black
boxes in connected cars
Challenge:
• Drive analytics on 12 million miles of
driving data collected every hour
Solution:
• Telematics solution based on Cloudera
to process data from black boxes
• Analytics around driving behavior, risks,
location, braking patterns, contextual
elements and crash information
TELEMATICS
» CONNECTED VEHICLES
» INSURANCE TELEMATICS
» PREDICTIVE ANALYTICS
Connected Car Telematics for Insurance
CASE STUDY
DATA-DRIVEN
PROCESS
DATA-DRIVEN
PRODUCTS
29© Cloudera, Inc. All rights reserved.
Powering a Variety of IoT Use Cases…
Connected Vehicles
Usage Based Insurance
Industrial IoT
Predictive Maintenance
Smart Cities/ Ports Oil & Gas
Aerospace & Aviation Smart Healthcare
30© Cloudera, Inc. All rights reserved.
Connected Car Demo
31© Cloudera, Inc. All rights reserved.
Connected Car – Demo Architecture
OPERATIONS
Cloudera Manager
Cloudera Director
DATA
MANAGEMENT
Cloudera Navigator
Encrypt and KeyTrustee
Optimizer
BATCH
Sqoop
REAL-TIME
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Partners
Cloudera Enterprise Data Hub
MQTT -
Kafka
Bridge
Connected Car
Simulator
Data Ingest &
Pipeline
Enterprise Data Hub BI & Visualization
Streaming Data:
• Time
• VIN
• Location
• Mileage
• Speed
• Acceleration
• Brakes applied?
• Turn signal on?
• Lane departed?
• Collision
detected?
• Hazard detected?
StreamSets Data
Collector
32© Cloudera, Inc. All rights reserved.
Connected Car – Demo Architecture
Cloudera Enterprise Data Hub
MQTT -
Kafka
Bridge
Connected Car
Simulator
Data Ingest &
Pipeline
Enterprise Data Hub BI & Visualization
Streaming Data:
• Time
• VIN
• Location
• Mileage
• Acceleration
• Speed
• Brakes applied?
• Turn signal on?
• Lane departed?
• Collision
detected?
• Hazard detected?
Data Storage Layer
Search
#2
#1
Pub-Sub Messaging
System
Real-Time
Processing Engine
StreamSets Data
Collector
Interactive SQL Engine
33© Cloudera, Inc. All rights reserved.
Thank You

More Related Content

Simplifying Real-Time Architectures for IoT with Apache Kudu

  • 1. 1© Cloudera, Inc. All rights reserved. Simplifying Real-Time Architectures for IoT using Apache Kudu Vijay Raja| Solutions Marketing Lead, IoT Ryan Lippert | Product Marketing, Operational DB
  • 2. 2© Cloudera, Inc. All rights reserved. IoT – Key Drivers & Objectives Drive Internal Efficiencies Improve Product & Customer Exp. New Services & Business Models • Predictive Maintenance • Real-time monitoring • Ops optimization • Reduced equipment down-times • Product Usage Analytics • Personalized products & offerings • Improved Product Development • New usage based business models • New service offerings • E.g. On Command Connect • Remote Monitoring Who are my customers? How are they using my products? How can I lower downtime? How can I drive efficiencies? How do we implement a usage-based model? How can I launch new revenue streams?
  • 3. 3© Cloudera, Inc. All rights reserved. 2 PB of data/car/ year 1 – 2 TB of data / day 1 – 5 TB of data / day
  • 4. 4© Cloudera, Inc. All rights reserved. IoT Data Characteristics - The Foundation of Hadoop’s Potential IoT data comes from a variety of different sources • Massive volumes of intermittent data streams • Generated from a variety of data sources • Predominantly time-series • Can come in streams (real-time) or batches • Diverse data structures and schemas • Some of it may be perishable Combining sensor data with contextual data is the key to value creation from IoT
  • 5. 5© Cloudera, Inc. All rights reserved. Polling Question - 1 Where is your organization in your IoT journey? A. Not sure where to start B. Currently exploring use cases C. Implementing our first IoT use case D. Already deployed first IoT use case E. Multiple IoT use cases in production (Single Choice)
  • 6. 6© Cloudera, Inc. All rights reserved. The IoT Ecosystem & Architecture IoT Gateway Data Center Gateway • Data Routing • Edge-Processing • Edge-Storage IoT Data Storage, Processing & Analytics Centralized IoT Data Analytics • Time Series Data, Trends • Machine Learning • Context Enrichment • Deeper business insights Distributed Data Processing & Analytics • Cloud & On-Premise Cloud Sensors/ Things • Analytics at the edge • For Immediate response IoT Analytics Enterprise Data Sources
  • 7. 7© Cloudera, Inc. All rights reserved. What Happens at the Edge & What happens in the Cloud? • Analytics that needs to be acted upon immediately • Low latency req. - Hazard detection, collision avoidance etc. • Human response times • Context Enrichment • Time series Analysis • Comparative / Trend analysis • Machine Learning Cloud Analytics Edge Analytics Cloud Analytics
  • 8. 8© Cloudera, Inc. All rights reserved. Cloudera Enterprise – Hadoop as a Data Platform for IoT Sensors/ IoT Data Sources Internal Systems External Sources BI Solutions Real-Time AppsSearch Data Science Workbench SQL Machine Learning Data Center Cloud Sensor/ IoT Data IoT Gateway • Data Storage • Data Processing • Machine Learning • Real-time Analytics OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer BATCH Sqoop REAL-TIME Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Partners
  • 9. 9© Cloudera, Inc. All rights reserved. IoT: Lots of Buzz, but what is the core concept? And critically, what do we need from our infrastructure? IoT promises prediction and optimization, but often delivers monitoring. The right solution allows you to analyze data and serve information in time to change business outcomes. That means the right solution is built on real-time analytics.
  • 10. 10© Cloudera, Inc. All rights reserved. IoT: Driven by Data
  • 11. 11© Cloudera, Inc. All rights reserved. Polling Question - 2 What area of the real-time data chain does your organization need the most help with? A. Data ingest B. Data processing C. Data serving D. All of the above (Single Choice)
  • 12. 12© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Traditional Hadoop Databases Leave a Gap Use cases that fall between HDFS and HBase were difficult to manage Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Complex Hybrid Architectures Analytic Gap Pace of Analysis PaceofData
  • 13. 13© Cloudera, Inc. All rights reserved. The Trouble with Lambda Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala Code must be kept in sync Restatement is difficult
  • 14. 14© Cloudera, Inc. All rights reserved. Updateable Analytic Storage Simple real-time analytics and updates with Apache Kudu Kudu: Storage for fast analytics on fast data • Simplified architecture for building real-time analytic applications • Designed for next-generation hardware for faster analytic performance across frameworks • Native Hadoop storage engine Flexibility for the right tools for the right use case in one platform • Only analytic database for Hadoop with Kudu + Impala • Simple real-time applications with Kudu + Spark Use cases • Time series data • Machine data analytics • Online reporting STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr OTHER Kite NoSQL HBase FILESYSTEM HDFS RELATIONAL Kudu OBJECT Cloud
  • 15. 15© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Kudu: Fast Analytics on Fast-Changing Data New storage engine enables new Hadoop use cases Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData
  • 16. 16© Cloudera, Inc. All rights reserved. Better Together Kudu Benefits from Integration with the Apache Ecosystem Spark – Stream Processing for Kudu • Open standard for real-time stream processing • Effective for automating decision processes and machine learning • Use Cases include: Time Series Data & Machine Data Analytics Impala – High-Performance BI & SQL for Kudu • Open standard for interactive SQL queries • Powers analytic database workloads with flexibility, scale, and open architecture • Use Cases include: Online Reporting
  • 17. 17© Cloudera, Inc. All rights reserved. Why Kudu, Why Cloudera? A simultaneous combination of sequential and random reads and writes Can you insert time series data in real time? How long does it take to prepare it for analysis? Can you get results and act fast enough to change outcomes? Can you handle large volumes of machine-generated data? Do you have the tools to identify problems or threats? Can your system do machine learning? Time Series Data Machine Data Analytics
  • 18. 18© Cloudera, Inc. All rights reserved. Kudu Increases the Value of Time Series Data Time Series Inserts, updates, scans, lookups Workload Examples Stream market data; IoT; fraud detection & prevention; risk monitoring; connected cars; Time series data is most valuable if you can analyze it to change outcomes in real time. Kudu simulateneously enables: • Time series data inserted/updated as it arrives • Analytic scans to find trends on fresh time series data • Lookups to quickly visit the point in time where an event occured
  • 19. 19© Cloudera, Inc. All rights reserved. Kudu Keeps Your Business Operational Machine Data Analytics Inserts, scans, lookups Workload Examples Network threat detection; network health monitoring; application performance monitoring Kudu can help spot problems before they happen. Real-time data inserts with the ability to analyze trends identifies potential problems. Kudu identifies trouble through: • Unlimited storage, yielding better historic trend analysis • Fast inserts to enable an up-to-date network view • Fast scans identify/flag undesired states for remedy
  • 20. 20© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving the Model Through Machine Learning Kafka Spark Streaming Spark MLlib IoT Analytics Individual Session Full Model/Learning Genesis Spark 1 Event Occurs 2 Messaging 3 Stream Processing 4 Land in Relational Store 5 Apply ML Libraries IoT Data Sources Other Data Sources
  • 21. 21© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture MLlib & K-Means: Defining Microsegments via Machine Learning Height Weight Height Weight 1 2 Height Weight 3 Height Weight 4 L M S XL L M S XS Near Custom ?
  • 22. 22© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving Prediction and Optimization Kafka Spark Streaming Spark MLlib IoT Analytics Individual Session 1 Data Processed Genesis Spark 2 Request Processed/ Kudu Queried 3 4 Results Returned Results Processed 5 Processed Data Returned Full Model/Learning IoT Data Sources Other Data Sources
  • 23. 23© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving Prediction and Optimization Step 1: Data Processed Apache Spark processes the data from the event (car sensors, manufacturing, wearables, etc), which potentially involves keeping a running list of the last X number of events Step 2: Request Processed/Kudu Queried A Spark application uses the data gathered in step one to query Kudu’s database in a predefined manner to look for similar patterns defined via machine learning Step 3: Kudu Results Returned Kudu returns the results from the query in step 2 back to Spark to determine what needs to be returned to the application Step 4: Results Processed Spark associates the results from Kudu with the information stored from the current event to determine the next step to feed back to the application Step 5: Processed Data Returned The machine-generated, best possible outcome is prescribed and served to the application
  • 24. 24© Cloudera, Inc. All rights reserved. Operational DB: IoT Use Case Prediction and Optimization Kafka Spark Streaming Spark MLlib Application Individual Session Sensor Data Spark Full Model/Learning Data Request Sent For Stream Processing Data Cleaned/Ordered/Processed, Then Delivered to Kudu for Modelling Automated processes based on machine learning enable prediction and optimization at a new level. Illustrative, models will likely have >2 dimensions IoT Data Sources Kudu Other Data Sources
  • 25. 25© Cloudera, Inc. All rights reserved. Key IoT Use Cases
  • 26. 26© Cloudera, Inc. All rights reserved. Using Predictive Maintenance to Improve Performance and Reduce Fleet Downtime • Real-time visibility of 300,000+ trucks in order to improve uptime and vehicle performance • OnCommand Connection is collecting telematics and geolocation data across the fleet • Reduced maintenance costs to $.03 per mile from $.12-$.15 per mile • Centralizing data from 13 systems with varying frequency and semantic definitions TRANSPORTATION » PREDICTIVE MAINTENANCE » IMPROVED SERVICE » DATA DRIVEN PRODUCTS DATA-DRIVEN PRODUCTS CASE STUDY
  • 27. 27© Cloudera, Inc. All rights reserved. Predictive Maintenance on industrial- grade turbines for hydro power stations Challenge: • Gather, store and analyze noise levels from turbines for anomaly detection Solution: • Cloudera platform used to gather and analyze acoustic data/audio files coming from the turbines in real-time • Using diagnostic solution to monitor the health of turbines and predict failures in advance PREDICTIVE MAINTENANCE » INDUSTRIAL IoT » LOWERED DOWNTIME » LOWERED COSTS Predictive Maintenance - Turbines DATA-DRIVEN PROCESS CASE STUDY DATA-DRIVEN PRODUCTS
  • 28. 28© Cloudera, Inc. All rights reserved. #1 Telematics provider with 130 billion miles of driving data collected from black boxes in connected cars Challenge: • Drive analytics on 12 million miles of driving data collected every hour Solution: • Telematics solution based on Cloudera to process data from black boxes • Analytics around driving behavior, risks, location, braking patterns, contextual elements and crash information TELEMATICS » CONNECTED VEHICLES » INSURANCE TELEMATICS » PREDICTIVE ANALYTICS Connected Car Telematics for Insurance CASE STUDY DATA-DRIVEN PROCESS DATA-DRIVEN PRODUCTS
  • 29. 29© Cloudera, Inc. All rights reserved. Powering a Variety of IoT Use Cases… Connected Vehicles Usage Based Insurance Industrial IoT Predictive Maintenance Smart Cities/ Ports Oil & Gas Aerospace & Aviation Smart Healthcare
  • 30. 30© Cloudera, Inc. All rights reserved. Connected Car Demo
  • 31. 31© Cloudera, Inc. All rights reserved. Connected Car – Demo Architecture OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer BATCH Sqoop REAL-TIME Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Partners Cloudera Enterprise Data Hub MQTT - Kafka Bridge Connected Car Simulator Data Ingest & Pipeline Enterprise Data Hub BI & Visualization Streaming Data: • Time • VIN • Location • Mileage • Speed • Acceleration • Brakes applied? • Turn signal on? • Lane departed? • Collision detected? • Hazard detected? StreamSets Data Collector
  • 32. 32© Cloudera, Inc. All rights reserved. Connected Car – Demo Architecture Cloudera Enterprise Data Hub MQTT - Kafka Bridge Connected Car Simulator Data Ingest & Pipeline Enterprise Data Hub BI & Visualization Streaming Data: • Time • VIN • Location • Mileage • Acceleration • Speed • Brakes applied? • Turn signal on? • Lane departed? • Collision detected? • Hazard detected? Data Storage Layer Search #2 #1 Pub-Sub Messaging System Real-Time Processing Engine StreamSets Data Collector Interactive SQL Engine
  • 33. 33© Cloudera, Inc. All rights reserved. Thank You