Big Data Analytics PPT Fat 2
Big Data Analytics PPT Fat 2
Big Data Analytics PPT Fat 2
Presented by ,
Tanushree soni
Himanshu wagh
Sanskruti chandorkar
What is Big Data?
“Big Data” exceeds the capacity of traditional analytics and information management
paradigms across what is known as the 4 V’s: Volume, Variety, Velocity, and Veracity
With exponential The speed at which Represents the Reflects the size of a
increases of data from data is generated and diversity of the data. data set. New
unfiltered and used. New data is Data sets will vary by information is
constantly flowing being created every type (e.g. social generated daily and
data sources, data second and in some networking, media, in some cases hourly,
quality often suffers cases it may need to text) and they will creating data sets that
and new methods be analyzed just as vary how well they are measured in
must find ways to quickly are structured terabytes and
“sift” through junk to petabytes
find meaning
PwC 2
The Promise of Big Data
Even more important than its definition is what Big Data promises to achieve:
intelligence in the moment.
Traditional Techniques &
Big Data Differentiators
Issues
• Does not account for biases,
Veracity
In real-time:
Velocity
• Analysis is limited to small data • Scalable for huge amounts of multi-sourced data
Volume
sets
• Facilitation of massively parallel processing
• Analyzing large data sets = High
• Low-cost data storage
Costs & High Memory
PwC 3
Types of Big Data
Variety is the most unique aspect of Big Data. New technologies and new types of data
have driven much of the evolution around Big Data.
Structured, semi-structured or
unstructured information
distinguished by one or more of
the four “V”s: Veracity, Velocity,
Variety, Volume.
Big Data Open Data
Crowdsourced
Data
Graphic and definitions based on “Big Data in Action for Development,” World Bank, worldbank.org
PwC 5
It’s not just about the data…
It is important to understand the distinction between Big Data sets (large, unstructured,
fast, and uncertain data) and ‘Big Data Analytics’.
PwC 6
Data Mining, Text Mining, and Natural Language Processing
What are they and how are they used?
Natural Language
Processing
NLP is a theoretically
motivated range of
computational
Text Mining techniques for analyzing
Analysis of large and representing
quantities of natural naturally occurring texts
language text and at one or more levels of
Data Mining detecting lexical or linguistic analysis for the
linguistic usage purpose of achieving
Extraction of implicit, human-like
patterns to extract
previously unknown, language processing for a
probably useful
and potentially useful range of tasks or
information
information from data applications.
Source: Text Mining, Ian Witten, 2004
PwC 7
PwC 8
PwC 9