Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Data Cube Materialization and Mining over MapReduce

Published: 01 October 2012 Publication History

Abstract

Computing interesting measures for data cubes and subsequent mining of interesting cube groups over massive data sets are critical for many important analyses done in the real world. Previous studies have focused on algebraic measures such as SUM that are amenable to parallel computation and can easily benefit from the recent advancement of parallel computing infrastructure such as MapReduce. Dealing with holistic measures such as TOP-K, however, is nontrivial. In this paper, we detail real-world challenges in cube materialization and mining tasks on web-scale data sets. Specifically, we identify an important subset of holistic measures and introduce MR-Cube, a MapReduce-based framework for efficient cube computation and identification of interesting cube groups on holistic measures. We provide extensive experimental analyses over both real and synthetic data. We demonstrate that, unlike existing techniques which cannot scale to the 100 million tuple mark for our data sets, MR-Cube successfully and efficiently computes cubes with holistic measures over billion-tuple data sets.

Cited By

View all
  • (2019)Distributed graph cube generation using Spark frameworkThe Journal of Supercomputing10.1007/s11227-019-02746-476:10(8118-8139)Online publication date: 10-Jan-2019
  • (2019)Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark clusterCluster Computing10.1007/s10586-018-1811-122:1(2063-2087)Online publication date: 1-Jan-2019
  • (2018)A Session-Based Approach to Fast-But-Approximate Interactive Data Cube ExplorationACM Transactions on Knowledge Discovery from Data10.1145/307064812:1(1-26)Online publication date: 13-Feb-2018
  • Show More Cited By
  1. Data Cube Materialization and Mining over MapReduce

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Knowledge and Data Engineering
    IEEE Transactions on Knowledge and Data Engineering  Volume 24, Issue 10
    October 2012
    188 pages

    Publisher

    IEEE Educational Activities Department

    United States

    Publication History

    Published: 01 October 2012

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Distributed graph cube generation using Spark frameworkThe Journal of Supercomputing10.1007/s11227-019-02746-476:10(8118-8139)Online publication date: 10-Jan-2019
    • (2019)Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark clusterCluster Computing10.1007/s10586-018-1811-122:1(2063-2087)Online publication date: 1-Jan-2019
    • (2018)A Session-Based Approach to Fast-But-Approximate Interactive Data Cube ExplorationACM Transactions on Knowledge Discovery from Data10.1145/307064812:1(1-26)Online publication date: 13-Feb-2018
    • (2016)A Workload Assignment Strategy for Efficient ROLAP Data Cube Computation in Distributed SystemsInternational Journal of Data Warehousing and Mining10.5555/3147364.314736812:3(51-71)Online publication date: 1-Jul-2016
    • (2016)Visual exploration of machine learning results using data cube analysisProceedings of the Workshop on Human-In-the-Loop Data Analytics10.1145/2939502.2939503(1-6)Online publication date: 26-Jun-2016
    • (2016)Computing Marginals Using MapReduceProceedings of the 20th International Database Engineering & Applications Symposium10.1145/2938503.2939571(12-23)Online publication date: 11-Jul-2016
    • (2016)An Efficient MapReduce Cube Algorithm for Varied DataDistributionsProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882922(1151-1165)Online publication date: 26-Jun-2016
    • (2016)Security and privacy aspects in MapReduce on cloudsComputer Science Review10.1016/j.cosrev.2016.05.00120:C(1-28)Online publication date: 1-May-2016
    • (2015)SPOOLProceedings of the 17th International Conference on Information Integration and Web-based Applications & Services10.1145/2837185.2837230(1-10)Online publication date: 11-Dec-2015
    • (2015)A Map-Reduce based Approach for Mining Group Stock PortfolioProceedings of the ASE BigData & SocialInformatics 201510.1145/2818869.2818901(1-5)Online publication date: 7-Oct-2015
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media