Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Mining Big Data: Current
State of work and Challenges
Group members:
Misbah Rashid
Mariam Rashid

2

About Journal
• The journal is published in the year 2015 in (IJANA) International
Journal of Advanced Networking and Applications
• The journal was published by Kaushika Pal and Dr. Jatinderkumar R.
Saini.

3

Brief Overview
• Introduction to big data
• Big Data Mining
• Big data mining importance

4

Introduction To Big Data
• Huge amount of data are generated and collected from various sources like
sensors, devices etc. all are in different formats from connected or independent
application.
• This data has to be processed, investigated, stored and understood. Considering
internet data the web pages indexed by Google were One million in 1998, One
billion in 2000 and one trillion in 2008.
• Examples are from social media- Facebook, Twitter, GooglePlus, YouTube,
LinkedIn.
• Each of these site receives huge volume of data on a daily basis.
• Smartphones are now highly connected to internet and use and store data on
web and thus increasing web volume.Twitter process around 400 millions tweets
each day.
• Smartphones are the real producer of big data, and it is up to us how we can
utilize that data to change our lives.

5

• Data created via smartphones can be put to good use. Smartphone
usage patterns helped researchers in Africa determine where malaria
outbreaks were occurring and where the affected people went [10].
This information can be used to determine where to best distribute
medicines more efficiently. This is the power of big data analysis
which has a positive impact on humanity.

6

Big Data Mining
• Big data mining is referred to the collective data miming.
• Extraction techniques that are performed on large volume of data.
• We need new tools and new algorithm to deal with all this huge amount of
data. While working with Big Data 7 V’s have to be considered for Big Data
Management
• Volume:every industry is flooded with data, which can be extremely
valuable, if it can be used to retrieve important information.
• Variety:90% of data generated is amorphous coming in all shapes and
forms-the data is generated from geo-spatial, tweets, photos and videos
uploading on social networking sites, which can be analysed for content

7

• Velocity:Velocity’ refers to the increasing speed at which this data is
created, and the increasing speed at which the data can be processed,
stored and analysed.
• Value: The probable value of Big Data is huge.
• Variability: Variability refers to data whose meaning is constantly changing.
There are changes in the structure of data and how users want to interpret
that data.
• Veracity: Big Data Veracity refers to the noise and abnormality in data. In
scoping out your big data strategy you need to help keep your data clean
and processes to keep ‘dirty data’ from accumulating in your systems.
• Visibility: Data from different sources should be visible to the technology
stack making up Big Data.Certain data which are crucial are available but
not visible to Big Data.

8

Literature Review
• Mining heterogeneous information networks is a new and promising
research frontier in Big Data mining. It considers interconnected, various
different types of data, including the relational database data, as
heterogeneous information networks.
• Mining Big Data in Real Time discusses the challenges in structured pattern
classification. The classification methods mostly deal with vector data. To
apply them to graph pattern classification can be converted into vectors of
attributes. Each and every attributes indicates the presence or absence of
sub patterns. Attributes are created for every frequent sub patterns. The
number of such sub patterns can be very large.
• Data Mining with Big data had drawn our attention on challenges with
mining big data at three levels dealing with data, model, and system.

9

Application Of Big Data Mining
• Business: expands customer intelligence, improves
operational efficiencies, customer personalization. To gain deep
customer requirements one need strong personal connections
and give customized services if possible which will drive more
sales.
• Managing demands in the market By capturing external
market and retailer data in real time to sense, evaluate, and
answer to demand indicators faster than ever before.
• Fraud detection: By analysing certain abnormal pattern from
various data sources, fraud can be detected in financial
transaction, health insurance etc

10

Challenges
• Variety and Heterogeneity: Different sources generate Big Data leading to great variety
or heterogeneity of big data. Heterogeneity in big data deals with structured, semi-
structured, and even entirely unstructured data concurrently. The challenge is to unveil
or extract the hidden knowledge in such data sets.
• Scalability: The extraordinary volume requires high scalability of its data management
and mining tools. However, most algorithms currently used in data mining do not scale
very well when applied to very large data sets because they were initially developed
and tested upon smaller data sets. we have such large data sets that these algorithms
are no longer efficient enough for mining and analysing
• Velocity/Speed: The capability of fast accessing and mining big data is highly essential.
Mining of a task must be finished within a definite period of time, otherwise, the
processing/mining results becomes less valuable or even worthless. However design of
new and more efficient indexing schemes is much desired, but remains one of the
greatest challenges to the research community.

11

Challenges
• Privacy Crisis: Data privacy has been always an issue. The concern has become
extremely serious with big data mining that often requires personal information in
order to produce relevant/accurate results such as location-based and personalized
services. Also, with the huge volume of big data such as social media that contains
incredible amount of highly interrelated personal information, each bit of information
can be mined out. Every transaction regarding our daily life is being pushed to online
and leaves a trace there: we comminute with friends via email, instant message, blog,
and Facebook; we do shopping and pay our bills online; credit card companies hold our
confidential identity information. As time goes, your personal information will be
scattered here or there. Everyone would easily gain the privilege of using powerful
tools to extract your confidential information.
• Garbage Mining: As the volume of data is increasing day by day so the amount of
irrelevant and unnecessary data is also increasing.Garbage minig is to extract the
hidden data and clean it from important data. It is not easy as it is difficult to extract
hidden data from bulk of data and then clean it. Garbage mining remains one of the
greatest challenges

12

Appreciation
• In this journal, author has fully explained the insights about the
mining of big data including the main concerns and main challenges
for the future.
• The most positive aspect of this article is its clarity in the statement of
research problem
• The author selected 14 relevant sources published between the years
of (2012) and (2014). Ten of these references were primary sources.
The author did a reasonable job of highlighting the previous search on
topics related to their research and even provided comparisons of
literature when possible.

13

Critic
• The statement of the problem was implied in the abstract section of
the article but the specific problem is not being addressed until the
author has described the usefulness of mining big data later in the
article.
• The author has not clearly explained the applications of mining big
data in medical, healthcare and engineering.
• The author has disscussed the big data in terms of mobile phones.The
scope of big data is far more than what author has disscussed.

14

Future work
• The techniques will be developed to overcome the challenges facing
in mining big data
• Social media and Big Data be used to understand public opinion
trends.

15

Thank You

16

Big data Mining

More Related Content

Big data Mining

  • 1. Mining Big Data: Current State of work and Challenges Group members: Misbah Rashid Mariam Rashid
  • 2. About Journal • The journal is published in the year 2015 in (IJANA) International Journal of Advanced Networking and Applications • The journal was published by Kaushika Pal and Dr. Jatinderkumar R. Saini.
  • 3. Brief Overview • Introduction to big data • Big Data Mining • Big data mining importance
  • 4. Introduction To Big Data • Huge amount of data are generated and collected from various sources like sensors, devices etc. all are in different formats from connected or independent application. • This data has to be processed, investigated, stored and understood. Considering internet data the web pages indexed by Google were One million in 1998, One billion in 2000 and one trillion in 2008. • Examples are from social media- Facebook, Twitter, GooglePlus, YouTube, LinkedIn. • Each of these site receives huge volume of data on a daily basis. • Smartphones are now highly connected to internet and use and store data on web and thus increasing web volume.Twitter process around 400 millions tweets each day. • Smartphones are the real producer of big data, and it is up to us how we can utilize that data to change our lives.
  • 5. • Data created via smartphones can be put to good use. Smartphone usage patterns helped researchers in Africa determine where malaria outbreaks were occurring and where the affected people went [10]. This information can be used to determine where to best distribute medicines more efficiently. This is the power of big data analysis which has a positive impact on humanity.
  • 6. Big Data Mining • Big data mining is referred to the collective data miming. • Extraction techniques that are performed on large volume of data. • We need new tools and new algorithm to deal with all this huge amount of data. While working with Big Data 7 V’s have to be considered for Big Data Management • Volume:every industry is flooded with data, which can be extremely valuable, if it can be used to retrieve important information. • Variety:90% of data generated is amorphous coming in all shapes and forms-the data is generated from geo-spatial, tweets, photos and videos uploading on social networking sites, which can be analysed for content
  • 7. • Velocity:Velocity’ refers to the increasing speed at which this data is created, and the increasing speed at which the data can be processed, stored and analysed. • Value: The probable value of Big Data is huge. • Variability: Variability refers to data whose meaning is constantly changing. There are changes in the structure of data and how users want to interpret that data. • Veracity: Big Data Veracity refers to the noise and abnormality in data. In scoping out your big data strategy you need to help keep your data clean and processes to keep ‘dirty data’ from accumulating in your systems. • Visibility: Data from different sources should be visible to the technology stack making up Big Data.Certain data which are crucial are available but not visible to Big Data.
  • 8. Literature Review • Mining heterogeneous information networks is a new and promising research frontier in Big Data mining. It considers interconnected, various different types of data, including the relational database data, as heterogeneous information networks. • Mining Big Data in Real Time discusses the challenges in structured pattern classification. The classification methods mostly deal with vector data. To apply them to graph pattern classification can be converted into vectors of attributes. Each and every attributes indicates the presence or absence of sub patterns. Attributes are created for every frequent sub patterns. The number of such sub patterns can be very large. • Data Mining with Big data had drawn our attention on challenges with mining big data at three levels dealing with data, model, and system.
  • 9. Application Of Big Data Mining • Business: expands customer intelligence, improves operational efficiencies, customer personalization. To gain deep customer requirements one need strong personal connections and give customized services if possible which will drive more sales. • Managing demands in the market By capturing external market and retailer data in real time to sense, evaluate, and answer to demand indicators faster than ever before. • Fraud detection: By analysing certain abnormal pattern from various data sources, fraud can be detected in financial transaction, health insurance etc
  • 10. Challenges • Variety and Heterogeneity: Different sources generate Big Data leading to great variety or heterogeneity of big data. Heterogeneity in big data deals with structured, semi- structured, and even entirely unstructured data concurrently. The challenge is to unveil or extract the hidden knowledge in such data sets. • Scalability: The extraordinary volume requires high scalability of its data management and mining tools. However, most algorithms currently used in data mining do not scale very well when applied to very large data sets because they were initially developed and tested upon smaller data sets. we have such large data sets that these algorithms are no longer efficient enough for mining and analysing • Velocity/Speed: The capability of fast accessing and mining big data is highly essential. Mining of a task must be finished within a definite period of time, otherwise, the processing/mining results becomes less valuable or even worthless. However design of new and more efficient indexing schemes is much desired, but remains one of the greatest challenges to the research community.
  • 11. Challenges • Privacy Crisis: Data privacy has been always an issue. The concern has become extremely serious with big data mining that often requires personal information in order to produce relevant/accurate results such as location-based and personalized services. Also, with the huge volume of big data such as social media that contains incredible amount of highly interrelated personal information, each bit of information can be mined out. Every transaction regarding our daily life is being pushed to online and leaves a trace there: we comminute with friends via email, instant message, blog, and Facebook; we do shopping and pay our bills online; credit card companies hold our confidential identity information. As time goes, your personal information will be scattered here or there. Everyone would easily gain the privilege of using powerful tools to extract your confidential information. • Garbage Mining: As the volume of data is increasing day by day so the amount of irrelevant and unnecessary data is also increasing.Garbage minig is to extract the hidden data and clean it from important data. It is not easy as it is difficult to extract hidden data from bulk of data and then clean it. Garbage mining remains one of the greatest challenges
  • 12. Appreciation • In this journal, author has fully explained the insights about the mining of big data including the main concerns and main challenges for the future. • The most positive aspect of this article is its clarity in the statement of research problem • The author selected 14 relevant sources published between the years of (2012) and (2014). Ten of these references were primary sources. The author did a reasonable job of highlighting the previous search on topics related to their research and even provided comparisons of literature when possible.
  • 13. Critic • The statement of the problem was implied in the abstract section of the article but the specific problem is not being addressed until the author has described the usefulness of mining big data later in the article. • The author has not clearly explained the applications of mining big data in medical, healthcare and engineering. • The author has disscussed the big data in terms of mobile phones.The scope of big data is far more than what author has disscussed.
  • 14. Future work • The techniques will be developed to overcome the challenges facing in mining big data • Social media and Big Data be used to understand public opinion trends.

Editor's Notes

  1. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  2. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  3. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  4. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  5. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  6. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  7. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  8. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?
  9. When conducting research, it is easy to go to one source: Wikipedia. However, you need to include a variety of sources in your research. Consider the following sources: Who can I interview to get more information on the topic? Is the topic current and will it be relevant to my audience? What articles, blogs, and magazines may have something related to my topic? Is there a YouTube video on the topic? If so, what is it about? What images can I find related to the topic?