research-article

High-Dimensional Data Cubes

Authors:

Sachin Basil John,

Christoph KochAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 15, Issue 13

Pages 3828 - 3840

https://doi.org/10.14778/3565838.3565839

Published: 01 September 2022 Publication History

Abstract

This paper introduces an approach to supporting high-dimensional data cubes at interactive query speeds and moderate storage cost. The approach is based on binary(-domain) data cubes that are judiciously partially materialized; the missing information can be quickly reconstructed using statistical or linear programming techniques. This enables new applications such as exploratory data analysis for feature engineering and other fields of data science. Moreover, it removes the need to compromise when building a data cube - all columns that we might ever wish to use can be included as dimensions. Our approach also speeds up certain dice, roll-up, and drill-down operations on data cubes with hierarchical dimensions compared to traditional data cubes.

References

[1]

Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, and Sunita Sarawagi. 1996. On the Computation of Multidimensional Aggregates. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96). 506--521.

[2]

Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). 29--42.

Digital Library

[3]

Elena Baralis, Stefano Paraboschi, and Ernest Teniente. 1997. Materialized Views Selection in a Multidimensional Database. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97). 156--165.

Digital Library

[4]

Daniel Barbará and Mark Sullivan. 1997. Quasi-Cubes: Exploiting Approximations in Multidimensional Databases. SIGMOD Record 26, 3 (1997), 12--17.

Digital Library

[5]

Daniel Barbará and Xintao Wu. 2000. Using Loglinear Models to Compress Datacubes. In Proceedings of the 1st International Conference on Web-Age Information Management (WAIM '00). 311--323.

[6]

Sachin Basil John and Christoph Koch. 2022. High-dimensional Data Cubes. (2022), 15. Retrieved September 25, 2022 from http://infoscience.epfl.ch/record/292499

[7]

Andreas Björklund, Thore Husfeldt, Petteri Kaski, and Mikko Koivisto. 2007. Fourier Meets Möbius: Fast Subset Convolution. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC '07). 67--74.

Digital Library

[8]

Surajit Chaudhuri, Gautam Das, and Vivek Narasayya. 2007. Optimized Stratified Sampling for Approximate Query Processing. ACM Transactions on Database Systems 32, 2 (jun 2007), 9.

Digital Library

[9]

Surajit Chaudhuri and Umeshwar Dayal. 1997. An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26, 1 (1997), 65--74.

Digital Library

[10]

Zhimin Chen and Vivek R. Narasayya. 2005. Efficient Computation of Multiple Group By Queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD '05). 263--274.

[11]

James W. Cooley and John W. Tukey. 1965. An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Comp. 19, 90 (1965), 297--301.

[12]

Roberto Fontana and Patrizia Semeraro. 2018. Representation of multivariate Bernoulli distributions with a given set of specified moments. Journal of Multivariate Analysis 168 (2018), 290--303.

Digital Library

[13]

Saul I. Gass. 2003. Linear Programming: Methods and Applications. Courier Corporation.

[14]

Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Mining and Knowledge Discovery 1, 1 (1997), 29--53.

Digital Library

[15]

Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015. Amazon Redshift and the Case for Simpler Data Warehouses. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). 1917--1923.

Digital Library

[16]

Himanshu Gupta and Inderpal Singh Mumick. 2005. Selection of Views to Materialize in a Data Warehouse. IEEE Transactions on Knowledge and Data Engineering 17, 1 (2005), 24--43.

Digital Library

[17]

Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman. 1996. Implementing Data Cubes Efficiently. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD '96). 205--216.

[18]

Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. 1997. Online Aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD '97). 171--182.

[19]

David C. Hoaglin, Frederick Mosteller, and John W. Tukey (Eds.). 2006. Exploring Data Tables, Trends, and Shapes. John Wiley & Sons.

[20]

Kenneth Hoffman. 1971. Linear Algebra. Englewood Cliffs, NJ, Prentice-Hall.

[21]

Chris Jermaine, Subramanian Arumugam, Abhijit Pol, and Alin Dobra. 2008. Scalable approximate query processing with the DBO engine. ACM Transactions on Database Systems 33, 4 (2008), 23:1--23:54.

Digital Library

[22]

Ruoming Jin, Leonid Glimcher, Chris Jermaine, and Gagan Agrawal. 2006. New Sampling-Based Estimators for OLAP Queries. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). 18.

[23]

Minsuk Kahng, Dezhi Fang, and Duen Horng (Polo) Chau. 2016. Visual exploration of machine learning results using data cube analysis. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA '16). 1--6.

Digital Library

[24]

Niranjan Kamat, Prasanth Jayachandran, Karthik Tunga, and Arnab Nandi. 2014. Distributed and interactive cube exploration. In IEEE 30th International Conference on Data Engineering (ICDE '14). 472--483.

[25]

Laks V. S. Lakshmanan, Jian Pei, and Jiawei Han. 2002. Quotient Cube: How to Summarize the Semantics of a Data Cube. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB '02). 778--789.

Digital Library

[26]

Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The Vertica Analytic Database: C-Store 7 Years Later. Proceedings of the VLDB Endowment 2012 5, 12 (aug 2012), 1790--1801.

Digital Library

[27]

Fangling Leng, Yubin Bao, Ge Yu, Daling Wang, and Yuntao Liu. 2006. An Efficient Indexing Technique for Computing High Dimensional Data Cubes. In Proceedings of the 7th International Conference on Advances in Web-Age Information Management (WAIM '06). 557--568.

Digital Library

[28]

Alon Y. Levy, Alberto O. Mendelzon, and Yehoshua Sagiv. 1995. Answering Queries Using Views. In Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '95). 95--104.

[29]

Xiaolei Li, Jiawei Han, and Hector Gonzalez. 2004. High-Dimensional OLAP: A Minimal Cubing Approach. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB '04). 528--539.

[30]

Xiaolei Li, Jiawei Han, Zhijun Yin, Jae-Gil Lee, and Yizhou Sun. 2008. Sampling cube: a framework for statistical olap over sampling data. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). 779--790.

Digital Library

[31]

Eric Lo, Ben Kao, Wai-Shing Ho, Sau Dan Lee, Chun Kit Chui, and David W. Cheung. 2008. OLAP on sequence data. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIMOD '08). 649--660.

[32]

Konstantinos Morfonios and Yannis E. Ioannidis. 2006. CURE for Cubes: Cubing Using a ROLAP Engine. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB '06). 379--390.

[33]

Konstantinos Morfonios, Stratis Konakas, Yannis E. Ioannidis, and Nikolaos Kotsis. 2007. ROLAP implementations of the data cube. Comput. Surveys 39, 4 (2007), 12.

Digital Library

[34]

New York City Department of Finance. 2021. Parking Violations Issued - Fiscal Year 2021. Retrieved August 4, 2022 from https://data.cityofnewyork.us/City-Government/Parking-Violations-Issued-Fiscal-Year-2021/kvfd-bves

[35]

Patrick E. O'Neil, Elizabeth J. O'Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In Performance Evaluation and Benchmarking (TPTPC '09). 237--252.

[36]

Athanasios Papoulis and S Unnikrishna Pillai. 2002. Probability, Random Variables and Stochastic Processes (4 ed.). McGraw-Hill Professional.

[37]

Kenneth A. Ross and Divesh Srivastava. 1997. Fast Computation of Sparse Datacubes. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97). 116--125.

[38]

Eyal Rozenberg. 2020. Star Schema Benchmark data set generator (ssb-dbgen). Retrieved August 4, 2022 from https://github.com/eyalroz/ssb-dbgen

[39]

Iztok Savnik. 2013. Index Data Structure for Fast Subset and Superset Queries. In Availability, Reliability, and Security in Information Systems and HCI. Springer, 134--148.

[40]

Jayavel Shanmugasundaram, Usama M. Fayyad, and Paul S. Bradley. 1999. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD '99). 223--232.

[41]

Amit Shukla, Prasad Deshpande, and Jeffrey F. Naughton. 1998. Materialized View Selection for Multidimensional Datasets. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB '98). 488--499.

[42]

Rodrigo Rocha Silva, Celso Massaki Hirata, and Joubert de Castro Lima. 2020. Big high-dimension data cube designs for hybrid memory systems. Knowledge and Information Systems 62, 12 (2020), 4717--4746.

[43]

Yannis Sismanis, Antonios Deligiannakis, Nick Roussopoulos, and Yannis Kotidis. 2002. Dwarf: shrinking the PetaCube. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02). 464--475.

Digital Library

[44]

Divesh Srivastava, Shaul Dar, H. V. Jagadish, and Alon Y. Levy. 1996. Answering Queries with Aggregation Using Views. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96). 318--329.

[45]

Jozef L Teugels. 1990. Some representations of the multivariate Bernoulli and binomial distributions. Journal of Multivariate Analysis 32, 2 (1990), 256--268.

Digital Library

[46]

Jeffrey Scott Vitter, Min Wang, and Balakrishna R. Iyer. 1998. Data Cube Approximation and Histograms via Wavelets. In Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management (CIKM '98). 96--104.

[47]

Wei Wang, Hongjun Lu, Jianlin Feng, and Jeffrey Xu Yu. 2002. Condensed Cube: An Efficient Approach to Reducing Data Cube Size. In Proceedings of the 18th International Conference on Data Engineering (ICDE '02). 155--165.

[48]

Yihong Zhao, Prasad Deshpande, and Jeffrey F. Naughton. 1997. An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD '97). 159--170.

Cited By

Lindner PBasil John SKoch CSuciu D(2024)The Moments Method for Approximate Data Cube QueriesProceedings of the ACM on Management of Data10.1145/36511472:2(1-23)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3651147
Basil John SLindner PJiang ZKoch CDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Aggregation and Exploration of High-Dimensional Data Using the Sudokube Data Cube EngineCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589729(175-178)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589729

Recommendations

The Mobius Cubes
Efficient incremental maintenance of data cubes
VLDB '06: Proceedings of the 32nd international conference on Very large data bases

The data cube provides users with aggregated results that are group-bys for all possible combinations of dimension attributes. When the number of dimension attributes is n, the data cube computes 2ⁿ group-bys, each of which is called a cuboid. A data ...
Pancyclicity of Möbius cubes
ICPADS '02: Proceedings of the 9th International Conference on Parallel and Distributed Systems

The problem of containing pancyclic interconnectionnetworks is an important research topic.An n-dimensionalMöbius cube, MQn, is a variant of hypercubes according tospecific rules.In this paper, we prove that Möbius cubes areall pancyclic ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 15, Issue 13

September 2022

278 pages

ISSN:2150-8097

Editors:
Fatma Özcan
Google
,
Juliana Freire
New York University
,
Xuemin Lin
University of New South Wales

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2022

Published in PVLDB Volume 15, Issue 13

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
84
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lindner PBasil John SKoch CSuciu D(2024)The Moments Method for Approximate Data Cube QueriesProceedings of the ACM on Management of Data10.1145/36511472:2(1-23)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3651147
Basil John SLindner PJiang ZKoch CDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Aggregation and Exploration of High-Dimensional Data Using the Sudokube Data Cube EngineCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589729(175-178)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589729

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents