Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Data analysis applications typically aggregate data across manydimensions looking for anomalies or unusual patterns. The SQL aggregatefunctions and the GROUP BY operator produce zero-dimensional orone-dimensional aggregates. Applications need the N-dimensionalgeneralization of these operators. This paper defines that operator, calledthe data cube or simply cube. The cube operator generalizes the histogram,cross-tabulation, roll-up,drill-down, and sub-total constructs found in most report writers.The novelty is that cubes are relations. Consequently, the cubeoperator can be imbedded in more complex non-procedural dataanalysis programs. The cube operator treats each of the Naggregation attributes as a dimension of N-space. The aggregate ofa particular set of attribute values is a point in this space. Theset of points forms an N-dimensional cube. Super-aggregates arecomputed by aggregating the N-cube to lower dimensional spaces.This paper (1) explains the cube and roll-up operators, (2) showshow they fit in SQL, (3) explains how users can define new aggregatefunctions for cubes, and (4) discusses efficient techniques tocompute the cube. Many of these features are being added to the SQLStandard.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., and Sarawagi, S. 1996. On the Computation of Multidimensional Aggregates. Proc. 21st VLDB, Bombay.

    Google Scholar 

  • Chamberlin, D. 1996. Using the New DB2-IBM's Object-Relational Database System. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • DataBlade Developer's Kit: Users Guide 2.0. Informix Software, Menlo Park, CA, 1996.

  • Date, C.J. 1995. Introduction to Database Systems. 6th edition, N.Y.: Addison Wesley.

    Google Scholar 

  • Date, C.J. 1996. Aggregate functions. Database Programming and Design, 9(4): 17–19.

    Google Scholar 

  • Graefe, C.J. 1993. Query evaluation techniques for large databases. ACM Computing Surveys, 25.2, pp. 73–170.

    Google Scholar 

  • Gray, J. (Ed.) 1991. The Benchmark Handbook. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Gray, J., Bosworth, A., Layman, A., and Pirahesh, H. 1996. Data cube: A relational operator generalizing group-by, cross-tab, and roll-up. Proc. International Conf. on Data Engineering. New Orleans: IEEE Press.

    Google Scholar 

  • Harinarayn, V., Rajaraman, A., and Ullman, J.D. 1996. Implementing data cubes efficiently. Proc. ACMSIGMOD. Montreal, pp. 205–216.

  • 1992. IS 9075 International Standard for Database Language SQL, document ISO/IEC 9075:1992, J. Melton (Ed.).

  • 1996. ISO/IEC DBL:MCI-006 (ISO Working Draft) Database Language SQL-Part 4: Persistent Stored Modules (SQL/PSM), J. Melton (Ed.).

  • Melton, J. and Simon, A.R. 1993. Understanding the New SQL: A Complete Guide. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • 1994. Method and Apparatus for Storing and Retrieving Multi-Dimensional Data in Computer Memory. Inventor: Earle; Robert J.,Assignee: Arbor Software Corporation, US Patent 05359724.

  • 1994. Microsoft Access Relational Database Management System for Windows, Language Reference-Functions, Statements, Methods, Properties, and Actions, DB26142, Microsoft, Redmond, WA.

  • 1995. Microsoft Excel-User's Guide. Microsoft. Redmond, WA.

  • 1996. Microsoft SQL Server: Transact-SQL Reference, Document 63900. Microsoft Corp. Redmond, WA.

  • 1994. RISQL Reference Guide, Red Brick Warehouse VPT Version 3, Part no.: 401530, Red Brick Systems, Los Gatos. CA.

  • Shukla, A., Deshpande, P., Naughton, J.F., and Ramaswamy, K. 1996. Storage estimation for multidimensional aggregates in the presence of hierarchies. Proc. 21st VLDB, Bombay.

    Google Scholar 

  • 1993. The Benchmark Handbook for Database and Transaction Processing Systems-2nd edition, J. Gray (Ed.), San Francisco, CA: Morgan Kaufmann. Or http://www.tpc.org/

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gray, J., Chaudhuri, S., Bosworth, A. et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1, 29–53 (1997). https://doi.org/10.1023/A:1009726021843

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009726021843