Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/781995.782005dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
Article
Free access

Storage estimation for multidimensional aggregates in OLAP

Published: 08 November 1999 Publication History
  • Get Citation Alerts
  • Abstract

    On-line analytical processing (OLAP) is an important technique for analyzing data in decision support systems. Most analytical queries require aggregation of the interesting data. Pre-aggregation is one of the most important techniques used to speed up the query response time. However, precomputing every aggregate takes a large amount of time and space. The decision of which aggregates should be precomputed and how much space is required is thus important. By estimating the storage space required for each aggregate view, we can allocate the space for aggregates efficienlty and decide which aggregates to precompute. We investigate four existing strategies for this problem: two based on mathematical approximations, one based on sampling, and one hybrid approach based on mathematical approximation and sampling. We propose a new hybrid strategy that is based on mathematical approximation and sampling and is easy to compute. We evaluate the accuracy of these algorithms in estimating the storage explosion due to aggregation for different data distributions and data densities. The result indicate that our proposed strategy approximates the explosion more accurately then other strategies.

    References

    [1]
    {1} S. Agrawal, R. Agrawal, P. Deshpande, J. Naughton, S. Sarawagi, and R. Ramakrishnan. On the Computation on Multi-dimensional Aggregates. In Proc. of the 22nd VLDB Conference, 1996.
    [2]
    {2} E. Baralis, S. Paraboschi, and E. Teniente. Materialized View Selection in a Multidimensional Database. In Proc. of the 23rd VLDB Conference, 1997.
    [3]
    {3} D. Barbar, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. E. Ioannidis, H. V. Jagadish, T. Johnson, R. T. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey Data Reduction Report. Data Engineering Bulletin 20(4), 1997.
    [4]
    {4} A. F. Cardenas. Analysis and Performance of Inverted Database Structures. Comm. ACM, 1975.
    [5]
    {5} C. Faloutsos, Y. Matias, and A. Silberschatz. Modeling skewed distributions using multifractals and the '80-20 law'. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, 1996.
    [6]
    {6} P. Flajolet and G.N. Martin. Probabilistic Counting Algorithms for Database Applications. Journal of Computer and System Sciences, 1985.
    [7]
    {7} M. Golfarelli and S. Rizzi. A Methodological Framework for Data Warehouse Design. In Proc. ACM 1st Int. Workshop on Data Warehousing and OLAP, 1998.
    [8]
    {8} J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab and Sub-Totals. In Proc. 12th ICDE, 1996.
    [9]
    {9} H. Gupta. Selection of Views to Materialize in a Data Warehouse. In Proc. 6th ICDT, 1997.
    [10]
    {10} H. Gupta, A. Harinarayan, A. Rajaraman, and J.D. Ullman. Index Selection for OLAP. In Proc. 13th ICDE, 1997.
    [11]
    {11} V. Harinarayan, A. Rajaraman, and J.D. Ullman. Implementing Data Cubes Efficiently. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, 1996.
    [12]
    {12} P. Hass, J.F. Naughton, S. Seshadri, and L. Stokes. Sampling-Based Estimation of the Number of Distinct Values of an Attribute. In Proc. of the 21st VLDB Conference, 1995.
    [13]
    {13} C.W. Chung, J.H. Lee, D.H. Kim. Multi-Dimensional Selectivity Estimation Using Compressed Histogram Information. In Proc. of the 1996 ACM-SIGMOD Conference, 1999.
    [14]
    {14} V. Poosala and Y. E. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. In Proc. of the 23rd VLDB Conference, 1997.
    [15]
    {15} K. Runapongsa, H. Uchiyama, and T. Nadeau. Analysis of the Performance Parameter in ROLAP. http://www.umich.edu/~krunapon/ research/paper584.pdf, winter 1999. Term paper in EECS 584, Computer science & Engineering, Univ. of Michigan.
    [16]
    {16} A. Shukla, P.M. Deshpande, and J.F. Naughton. Materialized View Selection for Multidimensional Datasets. In Proc. of the 24th VLDB Conference, 1998.
    [17]
    {17} A. Shukla, P.M. Deshpande, and J.F. Naughton, and K. Ramasamy. Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies. In Proc. of the 22nd VLDB Conference, 1996.
    [18]
    {18} A. Silberschatz, H.F. Korth, and S. Sudarshan. Databse System Concepts. McGraw-Hill, 3rd edition, 1996.
    [19]
    {19} T.J. Teorey. Database Modeling and Design. Morgan Kaufman Pub, 3rd edition, 1999.
    [20]
    {20} D. Theodoratos and T. Sellis. Data Warehouse Configuration. In Proc. of the 23rd VLDB Conference, 1997.
    [21]
    {21} E. Thomsen. OLAP Solutions: Building Multidimensional Information Systems . Wiley, 1997.
    [22]
    {22} S.B. Yao. Approximating Block Accesses in Database Organizations. Comm. ACM, 1977.
    [23]
    {23} G.K. Zipf. Human Behavior and Principle of Least Effort: and Introduction to Human Ecology. Addison Wesley, Cambridge, 1949.

    Cited By

    View all
    • (2001)A Pareto model for OLAP view size estimationProceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research10.5555/782096.782109Online publication date: 5-Nov-2001

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image DL Hosted proceedings
    CASCON '99: Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research
    November 1999
    186 pages

    Sponsors

    • IBM Canada: IBM Canada
    • NRC: National Research Council - Canada

    Publisher

    IBM Press

    Publication History

    Published: 08 November 1999

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 24 of 90 submissions, 27%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2001)A Pareto model for OLAP view size estimationProceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research10.5555/782096.782109Online publication date: 5-Nov-2001

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media