Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
16 views37 pages

2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 37

2

3 V’s of Big Data


4 V’s of Big Data
5 V’s of Big Data
6 V’s of Big Data
7 V’s of Big Data
9 V’s of Big Data
21 V’s of Big Data
3 V’s of Big Data
4 V’s of Big Data
5 V’s of Big Data
6 V’s of Big Data
7 V’s of Big Data
1-VOLUME
(Generic View)

• Consider the following –


• Facebook has 2 billion users,
• Youtube 1 billion users,
• Twitter 350 million users and
• Instagram 700 million users.

Every day, these users contribute to billions of images, posts, videos, tweets
etc. You can now imagine the insanely large amount -or Volume- of data that
is generated every minute and every hour.

Twitter alone generates more than 7 terabytes (TB) of data every day,
Facebook 10 TB
1-VOLUME
(Application View)

“Amount of data need to be processed in given time”


Model of Generating/Consuming Data
Old Model: Few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming data

1
9
Big Data sources

Scientific instruments
Social media and networks (collecting all sorts of data)
(all of us are generating data)

Mobile devices Sensor technology and networks


(tracking all objects all the time) (measuring all kinds of data) 2
0
2-VELOCITY
(Generic View)

With Velocity we refer to the speed with which data are being generated.
Staying with our social media example,

• every day 900 million photos are uploaded on Facebook,


• 500 million tweets are posted on Twitter,
• 0.4 million hours of video are uploaded on Youtube and
• 3.5 billion searches are performed in Google.

This is like a nuclear data explosion.


Data never sleeps…

How Much Email Users


Data Is Send
Generated 20,41,66,667
Every Minute? Emails
24/7/365
Data never sleeps…

How Much Google


Data Is Receives Over
Generated 20,00,000
Every Minute? Search
24/7/365 Queries
Data never sleeps…

How Much Apple


Data Is Receives
Generated About
Every Minute? 47,000
24/7/365 App
Downloads
Data never sleeps…

How Much Brands on


Data Is Facebook Get
Generated 34,722
Every Minute? Likes
24/7/365
Digital Data is Exploding

According to …was created


IBM 90% of in the last 2
the worlds years
information…
2-VELOCITY
(Application View)

“Velocity we refer to the speed with which data are being generated
and need to handled by application”
3-VARIETY
(Generic View)

Variety in Big Data refers to all the structured, semi-structured and


unstructured data that has the possibility of getting generated either
by humans or by machines.

Ex-texts, tweets, pictures & videos emails, voicemails, hand-written


text, ECG reading, audio recordings, logs, etc,
Variety (Complexity)

 Different Types:
✓ Relational Data (Tables/Transaction/Legacy Data)
✓ Text Data (Web)
✓ Semi-structured Data (XML)
✓ Graph Data
– Social Network, Semantic Web (RDF), …

✓ Streaming Data
– You can only scan the data once

✓ A single application can be generating/collecting many


types of data

To extract knowledge➔ all these types


of data need to linked together

30
3-VARIETY
(Application View)

“How many/what kind of data need to process and handle”


4-VERACITY

• Veracity means how much the data is reliable.


• characteristic related to consistency, accuracy, quality, and trustworthiness
• veracity refers to the biasedness, noise, abnormality in data
• refers to incomplete data or the presence of errors, outliers, and missing values
• It refers to inconsistencies and uncertainty in data,

Can we trust the answers to our queries?

Let’s discuss an example to know the effects of data veracity—communications


with customers that fail to convert to sales due to incorrect customer information.
Poor data quality or incorrect data can result in the targeting of wrong customers
and communications, which ultimately cause a loss in revenue.
5-VALUE

 Big data is meaningless if it does not provide value toward some meaningful goal

33
5-VALUE

• Value is the worth of the data being collected.


• Some Data have little or no value in decision making or improving
operations.
• Extracted patterns are not interested or not useful.
6-VARIABILTY

• How often does the meaning or shape of your data change?


• Variability mainly focuses on understanding and
interpreting the correct meanings of raw data.
• Variability refers to data whose meaning is constantly
changing.
• Randomness and uncertainty in meaning.
Evolution of V’s

 Gartner /Douglas Laney — 3Vs definition


• 1. Volume, which means the incoming data stream and cumulative volume of data
• 2. Velocity, which represents the pace of data used to support interaction and generated by interactions
• 3. Variety, which signifies the variety of incompatible and inconsistent data formats and data structures

 IBM — 4Vs definition


• 1. Volume stands for the scale of data
• 2. Velocity denotes the analysis of streaming data
• 3. Variety indicates different forms of data
• 4. Veracity implies the uncertainty of data

 Microsoft— 6Vs definition


• 5. Variability refers to the complexity of data set. In comparison with “Variety” (or different data
format), it means the number of variables in data sets
• 6. Visibility emphasizes that you need to have a full picture of data in order to make informative
decision
Why Study Big Data Technologies?
 The hottest topic in both research and industry
 Highly demanded in real world
 A promising future career
 Research and development of big data systems:
Distributed systems (eg, Hadoop), visualization
tools, data warehouse, OLAP, data integration, data
quality control, …
 Big data applications:
social marketing, healthcare, …
 Data analysis: to get values out of big data
discovering and applying patterns, predicative
analysis, business intelligence, privacy and security, …
Big Data : why is it possible Now ?
I. Traditional approach : Data to Function
I.Traditional approach
I. Application server and Database server
User request Query Data
are separate
Database II. Data can be on multiple servers
Application server III. Analysis Program can run on multiple
server
Application servers
Send result IV. Network is still at the middle
return Data
V. Data have to go through the network
process Data
Data

•Big Data Approach


I. I. Big Data approach : Function to Data ➢ Analysis Program runs on the data: on
Query & Data Node
process Data ➢ Only the Analysis Program are have to
Send Function to
process on Data Data go through the network
User request Data
nodes ➢ Analysis Program need to be
Data
nodes
Data
nodes MapReduce aware
Master node
nodes ➢ Highly Scalable :
Data
Data ➢ 1000s Nodes
Data ➢ Petabytes and more
Data
Send Consolidate result

38

You might also like