Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-40270-8_1guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

From Big Data to Big Data Mining: Challenges, Issues, and Opportunities

Published: 22 April 2013 Publication History
  • Get Citation Alerts
  • Abstract

    While "big data" has become a highlighted buzzword since last year, "big data mining", i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and discusses the related challenges and the new opportunities. The discussion includes a review of state-of-the-art frameworks and platforms for processing and managing big data as well as the efforts expected on big data mining. We address broad issues related to big data and/or big data mining, and point out opportunities and research topics as they shall duly flesh out. We hope our effort will help reshape the subject area of today's data mining technology toward solving tomorrow's bigger challenges emerging in accordance with big data.

    References

    [1]
    Fayyad, U.M., Gregory, P.S., Padhraic, S.: From Data Mining to Knowledge Discovery: an Overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1---36. AAAI Press, Menlo Park 1996
    [2]
    Berkovich, S., Liao, D.: On Clusterization of big data Streams. In: 3rd International Conference on Computing for Geospatial Research and Applications, article no. 26. ACM Press, New York 2012
    [3]
    Beyer, M.A., Laney, D.: The Importance of 'Big Data': a Definition. Gartner 2012
    [4]
    Madden, S.: From Databases to big data. IEEE Internet Computing 163, 4---6 2012
    [5]
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: 6th Symposium on Operating System Design and Implementation OSDI, pp. 137---150 2004
    [6]
    Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. In: 19th ACM Symposium on Operating Systems Principles, Bolton Landing, New York, pp. 29---43 2003
    [7]
    Dean, J., Ghemawat, S.: MapReduce: a Flexible Data Processing Tool. Communication of the ACM 531, 72---77 2010
    [8]
    Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: A Distributed Storage System for Structured Data. In: 7th Symposium on Operating Systems Design and Implementation, vol. 7, pp. 205---218. USENIX Association Berkeley, CA 2006
    [9]
    DeCandia, G., Hastorun, D.: Jampani, et al: Dynamo: Amazon's Highly Available Key-Value Store. In: 21st ACM SIGOPS Symposium on Operating Systems Principles, pp. 14---17. Stevenson, Washington 2007
    [10]
    Shmueli, G., Patel, N.R., Bruce, P.C.: Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner, 2nd edn. Wiley & Sons, Hoboken 2010
    [11]
    Ghoting, A., Kambadur, P., Pednault, E., Kannan, R.: NIMBLE: a Toolkit for the Implementation of Parallel Data Mining and Machine Learning Algorithms on MapReduce. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, pp. 334---342 2011
    [12]
    Mahout, http://lucene.apache.org/mahout/
    [13]
    Yu, L., Zheng, J., Shen, W.C., et al.: BC-PDM: Data Mining, Social Network Analysis and Text Mining System Based on Cloud Computing. In: 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1496---1499 2012
    [14]
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations. In: 9th IEEE International Conference on Data Mining, pp. 229---238 2009
    [15]
    Apache Giraph Project, http://giraph.apache.org/
    [16]
    Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. VLDB Endowment 58, 71---727 2012
    [17]
    Brown, P.G.: Overview of SciDB: Large Scale Array Storage, Processing and Analysis. In: ACM SIGMOD International Conference on Management of Data, pp. 963---968 2010
    [18]
    Wu, K.: FastBit: An Efficient Indexing Technology for Accelerating Data-intensive Science. Journal of Physics, Conference Series 16, 550---560 2005
    [19]
    Borkar, V.R., Carey, M.J., Li, C.: big data Platforms: What's Next? ACM Crossroads 191, 44---49 2012
    [20]
    Sun, Y., Han, J., Yan, X., Yu, P.S.: Mining Knowledge from Interconnected Data: A Heterogeneous Information Network Analysis Approach. VLDB Endowment 512, 2022---2023 2012
    [21]
    Obradovic, Z., Vucetic, S.: Challenges in Scientific Data Mining: Heterogeneous, Biased, and Large Samples. Technical Report, Center for Information Science and Technology Temple University, ch. 1, pp. 1---24 2004
    [22]
    Vucetic, S., Obradovic, Z.: Discovering Homogeneous Regions in Spatial Data through Competition. In: 17th International Conference of Machine Learning, Stanford, CA, pp. 1095---1102 2000
    [23]
    Wu, K., Ahern, S.: Bethel, et al: FastBit: Interactively Searching Massive Data. SciDAC 180 2009
    [24]
    Cai, D., Shao, Z., He, X., Yan, X., Han, J.: Mining Hidden Communities in Heterogeneous Social Network. In: 3rd International Workshop Link Discovery LinkKDD, pp. 58---65 2005
    [25]
    Apache Hive, http://hive.apache.org/
    [26]
    Berkeley Data Analytics Stack BDAS, https://amplab.cs.berkeley.edu/bdas/
    [27]
    Xin, R.S., Rosen, J., Zaharia, M., Franklin, M., Shenker, S., Stoica, I.: Shark: SQL and Rich Analytics at Scale. In: ACM SIGMOD Conference accepted, 2013
    [28]
    Agrawal, D., Bernstein, P., Bertino, E., et al.: Challenges and Opportunities With big data --- A Community White Paper Developed by Leading Researchers Across the United States 2012, http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf
    [29]
    Laney, D.: 3D Data Management: Controlling Data Volume, Velocity and Variety. Gartner 2001
    [30]
    Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An Efficient Multi-dimensional Index for Cloud Data Management. In: 1st International Workshop on Cloud Data Management, pp. 17---24. ACM Press, Hong Kong 2009
    [31]
    Yin, X., Han, J., Yu, P.S.: Truth Discovery with Multiple Conflicting Information Providers on the Web. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, pp. 1048---1052 2007
    [32]
    Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating Conflicting Data: The Role of Source Dependence. VLDB Endowment 21, 550---561 2009
    [33]
    Yin, X., Tan, W.: Semi-Supervised Truth Discovery. In: 20th International Conference on World Wide Web, Hyderabad, India, pp. 217---226 2011
    [34]
    Tene, O., Polonetsky, J.: Privacy in the Age of big data: A Time for Big Decisions. Stanford Law Review Online 64, 63---69 2012
    [35]
    Pedreschi, D., Calders, T., Custers, B., et al.: big data Mining, Fairness and Privacy - A Vision Statement Towards an Interdisciplinary Roadmap of Research. Data Mining and Analytics Software, KDnuggets Review Online 1126 2011
    [36]
    NewVantage Partners: Big Data Executive Survey 2013, http://newvantage.com/wp-content/uploads/2013/02/NVP-Big-Data-Survey-2013-Summary-Report.pdf
    [37]
    Greenwald, M., Fredian, T., Schissel, D., Stillerman, J.: A Metadata Catalog for Organization and Systemization of Fusion Simulation Data. Fusion Engineering & Design 8712, 2205---2208 2012

    Cited By

    View all
    • (2021)Big Data Analysis for Drilling and Blasting in a Mine in the Central AndesProceedings of the 2021 9th International Conference on Communications and Broadband Networking10.1145/3456415.3456421(27-33)Online publication date: 25-Feb-2021
    • (2020) TuorisFuture Generation Computer Systems10.1016/j.future.2020.01.015106:C(559-571)Online publication date: 1-May-2020
    • (2019)Resource Allocation and Sharing for Transmission of Batched NB IoT Trafic over 3GPP LTEProceedings of the 24th Conference of Open Innovations Association FRUCT10.5555/3338290.3338349(422-429)Online publication date: 15-Apr-2019
    • Show More Cited By

    Index Terms

    1. From Big Data to Big Data Mining: Challenges, Issues, and Opportunities
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      Proceedings of the 18th International Conference on Database Systems for Advanced Applications - Volume 7827
      April 2013
      243 pages
      ISBN:9783642402692
      • Editors:
      • Bonghee Hong,
      • Xiaofeng Meng,
      • Lei Chen,
      • Werner Winiwarter,
      • Wei Song

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 22 April 2013

      Author Tags

      1. big data
      2. big data management
      3. big data mining
      4. data mining
      5. data-intensive computation
      6. knowledge discovery

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Big Data Analysis for Drilling and Blasting in a Mine in the Central AndesProceedings of the 2021 9th International Conference on Communications and Broadband Networking10.1145/3456415.3456421(27-33)Online publication date: 25-Feb-2021
      • (2020) TuorisFuture Generation Computer Systems10.1016/j.future.2020.01.015106:C(559-571)Online publication date: 1-May-2020
      • (2019)Resource Allocation and Sharing for Transmission of Batched NB IoT Trafic over 3GPP LTEProceedings of the 24th Conference of Open Innovations Association FRUCT10.5555/3338290.3338349(422-429)Online publication date: 15-Apr-2019
      • (2018)Big Data Handling Over Cloud for Internet of ThingsInternational Journal of Information Technology and Web Engineering10.4018/IJITWE.201804010413:2(37-47)Online publication date: 1-Apr-2018
      • (2018)Efficient Learning-Based Recommendation Algorithms for Top-N Tasks and Top-N Workers in Large-Scale Crowdsourcing SystemsACM Transactions on Information Systems10.1145/323193437:1(1-46)Online publication date: 30-Oct-2018
      • (2018)Quality awareness for a Successful Big Data ExploitationProceedings of the 22nd International Database Engineering & Applications Symposium10.1145/3216122.3216124(37-44)Online publication date: 18-Jun-2018
      • (2017)Assessing causal claims about complex engineered systems with quantitative dataSystems Engineering10.1002/sys.2141420:6(483-496)Online publication date: 1-Nov-2017

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media