Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Efficient OLAP Query Processing in Distributed Data Warehouses

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT 2002 (EDBT 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2287))

Included in the following conference series:

  • 570 Accesses

Abstract

The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network requires collecting and analyzing network data, such as flowlevel traffc statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of efficient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specified as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. Salient properties of our approach are that only partial results are shipped - never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization traffic and the local processing done at each site. We finally present an experimental study based on TPC(R) data. Our results demonstrate the scalability of our techniques and quantify the performance benefits of the optimization techniques that have gone into the Skalla system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. of the Int. Conf. on Very Large Databases, pages 506–521, 1996.

    Google Scholar 

  2. M. O. Akinde, and M. H. Böhlen. Generalized MD-joins: Evaluation and reduction to SQL. In Databases in Telecommunications II, pages 52–67, Sept. 2001.

    Google Scholar 

  3. D. Bitton, H. Boral, D. J. DeWitt, and W. K. Wilkinson. Parallel algorithms for the executions of relational database operations. ACM TODS 8(3):324–353, 1983.

    Article  Google Scholar 

  4. H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping Bubba, a highly parallel database system. IEEE TKDE 2(1), March 1990

    Google Scholar 

  5. R. Cáceres, N. Duffield, A. Feldmann, J. Friedmann, A. Greenberg, R. Greer, T. Johnson, C. Kalmanek, B. Krishnamurthy, D. Lavelle, P. Mishra, K. K. Ramakrishnan, J. Rexford, F. True, and J. van der Merwe. Measurement and analysis of IP network usage and behavior. IEEE Communications Magazine, May 2000.

    Google Scholar 

  6. D. Chatziantoniou. Ad hoc OLAP: Expression and evaluation. In Proc. of the IEEE Int. Conf. on Data Engineering, 1999.

    Google Scholar 

  7. D. Chatziantoniou, M. O. Akinde, T. Johnson, and S. Kim. The MD-join: An operator for complex OLAP. In Proc. of the IEEE Int. Conf. on Data Engineering, 2001.

    Google Scholar 

  8. S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1):65–74, Mar. 1997.

    Article  Google Scholar 

  9. R. Elmasri and S. B. Navathe. Fundamentals of Database Systems. Benjamin/Cummings Publishers, second edition, 1994.

    Google Scholar 

  10. A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F. True. Deriving traffic demands for operational IP networks: Methodology and experience. In Proc. of ACM SIGCOMM, 2000.

    Google Scholar 

  11. G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. of Int. Conf. on Knowledge Discovery and Data Mining, pages 204–208, 1998.

    Google Scholar 

  12. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Datacube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1(1):29–53, 1997.

    Article  Google Scholar 

  13. R. Greer. Daytona and the fourth-generation language Cymbal. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 525–526, 1999.

    Google Scholar 

  14. R. Kimball. The data warehouse toolkit. John Wiley, 1996.

    Google Scholar 

  15. D. Kossman The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422–469, 2000.

    Article  Google Scholar 

  16. M. T. Özsu and P. Valduriez. Principles of Distributed Database Systems. Prentice Hall, 1991.

    Google Scholar 

  17. K. A. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proc. of the Int. Conf. on Very Large Databases, pages 116–125, 1997.

    Google Scholar 

  18. K. A. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple granularities. In Proc. of the Int. Conf. on Extending Database Technology, pages 263–277, 1998.

    Google Scholar 

  19. A. Shatdal and J. F. Naughton. Adaptive parallel aggregation algorithms. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 104–114, 1995.

    Google Scholar 

  20. C. T. Yu, K. C. Guh, and A. L. P. Chen. An integrated algorithm for distributed query processing. In Proc. of the IFIP Conf. on Distributed Processing, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Akinde, M.O., Böhlen, M.H., Johnson, T., Lakshmanan, L.V., Srivastava, D. (2002). Efficient OLAP Query Processing in Distributed Data Warehouses. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-45876-X_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43324-8

  • Online ISBN: 978-3-540-45876-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics