Lecture 1
Lecture 1
Data
- Introduction
100s of
millions
of GPS
data every day
? TBs of
enabled
devices sold
annually
25+ TBs of 2+
log data
every day billion
people on
the Web
76 million smart meters by end
in 2009… 2011
200M by 2014
Variety of big data
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can only scan the data once
• Multi Media Data
Social Banking
Media Finance
Our
Gaming
Customer Known
History
Purchas
Entertain
e
Velocity
• Data is begin generated fast and need to
be processed fast
• Online Data Analytics
• Late decisions -> missing opportunities
• Examples
• E-Promotions: Based on your current
location, your purchase history, what you
like -> send promotions right now for store
next to you
• Healthcare monitoring: sensors monitoring
your activities and body -> any abnormal
measurements require immediate reaction
Real-time/Fast Data
Mobile devices
(tracking all objects all the time)
Friend Invitations
Improving the Customer to join a
Marketing Game or Activity
Effectiveness of a that expands
Promotion while it business
is still in Play
Preventing Fraud
as it is Occurring
& preventing more
proactively
Veracity
• Data veracity is the degree of accuracy or
truthfulness of a data set
• Data Mining
• Machine Learning
• Recommendation
• Finance
• …
What about big data
programming?
• If single computer/server is not big enough for the
large amount of data, what shall we do?
• In this case, how do computers communicate and
managed?
• If we would like some results on the entire data set,
how could it work?
Other References
• https://en.wikipedia.org/wiki/Big_data
• https://www.slideshare.net/hktripathy/lecture1-
introduction-to-big-data?from_action=save