Abstract
Data analysis applications typically aggregate data across manydimensions looking for anomalies or unusual patterns. The SQL aggregatefunctions and the GROUP BY operator produce zero-dimensional orone-dimensional aggregates. Applications need the N-dimensionalgeneralization of these operators. This paper defines that operator, calledthe data cube or simply cube. The cube operator generalizes the histogram,cross-tabulation, roll-up,drill-down, and sub-total constructs found in most report writers.The novelty is that cubes are relations. Consequently, the cubeoperator can be imbedded in more complex non-procedural dataanalysis programs. The cube operator treats each of the Naggregation attributes as a dimension of N-space. The aggregate ofa particular set of attribute values is a point in this space. Theset of points forms an N-dimensional cube. Super-aggregates arecomputed by aggregating the N-cube to lower dimensional spaces.This paper (1) explains the cube and roll-up operators, (2) showshow they fit in SQL, (3) explains how users can define new aggregatefunctions for cubes, and (4) discusses efficient techniques tocompute the cube. Many of these features are being added to the SQLStandard.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., and Sarawagi, S. 1996. On the Computation of Multidimensional Aggregates. Proc. 21st VLDB, Bombay.
Chamberlin, D. 1996. Using the New DB2-IBM's Object-Relational Database System. San Francisco, CA: Morgan Kaufmann.
DataBlade Developer's Kit: Users Guide 2.0. Informix Software, Menlo Park, CA, 1996.
Date, C.J. 1995. Introduction to Database Systems. 6th edition, N.Y.: Addison Wesley.
Date, C.J. 1996. Aggregate functions. Database Programming and Design, 9(4): 17–19.
Graefe, C.J. 1993. Query evaluation techniques for large databases. ACM Computing Surveys, 25.2, pp. 73–170.
Gray, J. (Ed.) 1991. The Benchmark Handbook. San Francisco, CA: Morgan Kaufmann.
Gray, J., Bosworth, A., Layman, A., and Pirahesh, H. 1996. Data cube: A relational operator generalizing group-by, cross-tab, and roll-up. Proc. International Conf. on Data Engineering. New Orleans: IEEE Press.
Harinarayn, V., Rajaraman, A., and Ullman, J.D. 1996. Implementing data cubes efficiently. Proc. ACMSIGMOD. Montreal, pp. 205–216.
1992. IS 9075 International Standard for Database Language SQL, document ISO/IEC 9075:1992, J. Melton (Ed.).
1996. ISO/IEC DBL:MCI-006 (ISO Working Draft) Database Language SQL-Part 4: Persistent Stored Modules (SQL/PSM), J. Melton (Ed.).
Melton, J. and Simon, A.R. 1993. Understanding the New SQL: A Complete Guide. San Francisco, CA: Morgan Kaufmann.
1994. Method and Apparatus for Storing and Retrieving Multi-Dimensional Data in Computer Memory. Inventor: Earle; Robert J.,Assignee: Arbor Software Corporation, US Patent 05359724.
1994. Microsoft Access Relational Database Management System for Windows, Language Reference-Functions, Statements, Methods, Properties, and Actions, DB26142, Microsoft, Redmond, WA.
1995. Microsoft Excel-User's Guide. Microsoft. Redmond, WA.
1996. Microsoft SQL Server: Transact-SQL Reference, Document 63900. Microsoft Corp. Redmond, WA.
1994. RISQL Reference Guide, Red Brick Warehouse VPT Version 3, Part no.: 401530, Red Brick Systems, Los Gatos. CA.
Shukla, A., Deshpande, P., Naughton, J.F., and Ramaswamy, K. 1996. Storage estimation for multidimensional aggregates in the presence of hierarchies. Proc. 21st VLDB, Bombay.
1993. The Benchmark Handbook for Database and Transaction Processing Systems-2nd edition, J. Gray (Ed.), San Francisco, CA: Morgan Kaufmann. Or http://www.tpc.org/
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gray, J., Chaudhuri, S., Bosworth, A. et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1, 29–53 (1997). https://doi.org/10.1023/A:1009726021843
Issue Date:
DOI: https://doi.org/10.1023/A:1009726021843