Efficient OLAP Query Processing in Distributed Data Warehouses

Akinde, Michael O.; Böhlen, Michael H.; Johnson, Theodore; Lakshmanan, Laks V.S.; Srivastava, Divesh

doi:10.1007/3-540-45876-X_23

Michael O. Akinde⁷,
Michael H. Böhlen⁷,
Theodore Johnson⁸,
Laks V.S. Lakshmanan⁹ &
…
Divesh Srivastava⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2287))

Included in the following conference series:

International Conference on Extending Database Technology

570 Accesses

Abstract

The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network requires collecting and analyzing network data, such as flowlevel traffc statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of efficient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specified as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. Salient properties of our approach are that only partial results are shipped - never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization traffic and the local processing done at each site. We finally present an experimental study based on TPC(R) data. Our results demonstrate the scalability of our techniques and quantify the performance benefits of the optimization techniques that have gone into the Skalla system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. of the Int. Conf. on Very Large Databases, pages 506–521, 1996.
Google Scholar
M. O. Akinde, and M. H. Böhlen. Generalized MD-joins: Evaluation and reduction to SQL. In Databases in Telecommunications II, pages 52–67, Sept. 2001.
Google Scholar
D. Bitton, H. Boral, D. J. DeWitt, and W. K. Wilkinson. Parallel algorithms for the executions of relational database operations. ACM TODS 8(3):324–353, 1983.
Article Google Scholar
H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping Bubba, a highly parallel database system. IEEE TKDE 2(1), March 1990
Google Scholar
R. Cáceres, N. Duffield, A. Feldmann, J. Friedmann, A. Greenberg, R. Greer, T. Johnson, C. Kalmanek, B. Krishnamurthy, D. Lavelle, P. Mishra, K. K. Ramakrishnan, J. Rexford, F. True, and J. van der Merwe. Measurement and analysis of IP network usage and behavior. IEEE Communications Magazine, May 2000.
Google Scholar
D. Chatziantoniou. Ad hoc OLAP: Expression and evaluation. In Proc. of the IEEE Int. Conf. on Data Engineering, 1999.
Google Scholar
D. Chatziantoniou, M. O. Akinde, T. Johnson, and S. Kim. The MD-join: An operator for complex OLAP. In Proc. of the IEEE Int. Conf. on Data Engineering, 2001.
Google Scholar
S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1):65–74, Mar. 1997.
Article Google Scholar
R. Elmasri and S. B. Navathe. Fundamentals of Database Systems. Benjamin/Cummings Publishers, second edition, 1994.
Google Scholar
A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F. True. Deriving traffic demands for operational IP networks: Methodology and experience. In Proc. of ACM SIGCOMM, 2000.
Google Scholar
G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. of Int. Conf. on Knowledge Discovery and Data Mining, pages 204–208, 1998.
Google Scholar
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Datacube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1(1):29–53, 1997.
Article Google Scholar
R. Greer. Daytona and the fourth-generation language Cymbal. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 525–526, 1999.
Google Scholar
R. Kimball. The data warehouse toolkit. John Wiley, 1996.
Google Scholar
D. Kossman The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422–469, 2000.
Article Google Scholar
M. T. Özsu and P. Valduriez. Principles of Distributed Database Systems. Prentice Hall, 1991.
Google Scholar
K. A. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proc. of the Int. Conf. on Very Large Databases, pages 116–125, 1997.
Google Scholar
K. A. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple granularities. In Proc. of the Int. Conf. on Extending Database Technology, pages 263–277, 1998.
Google Scholar
A. Shatdal and J. F. Naughton. Adaptive parallel aggregation algorithms. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 104–114, 1995.
Google Scholar
C. T. Yu, K. C. Guh, and A. L. P. Chen. An integrated algorithm for distributed query processing. In Proc. of the IFIP Conf. on Distributed Processing, 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

Aalborg University, Aalborg
Michael O. Akinde & Michael H. Böhlen
AT&T Labs-Research, USA
Theodore Johnson & Divesh Srivastava
University of British Columbia, British Columbia
Laks V.S. Lakshmanan

Authors

Michael O. Akinde
View author publications
You can also search for this author in PubMed Google Scholar
Michael H. Böhlen
View author publications
You can also search for this author in PubMed Google Scholar
Theodore Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Laks V.S. Lakshmanan
View author publications
You can also search for this author in PubMed Google Scholar
Divesh Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Aalborg University, Aalborg
Christian S. Jensen & Simonas Šaltenis &
Business and Information Technology Dept., CLRC Rutherford Appleton Laboratory, UK
Keith G. Jeffery
Faculty of Mathematics and Physics, Charles University, Czech Republic
Jaroslav Pokorny
Department of Information Science, University of Milan, Milan
Elisa Bertino
Institute of Information Systems, ETH Zurich, Zurich
Klemens Böhn
Informatik V, RWTH Aachen, Aachen
Matthias Jarke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akinde, M.O., Böhlen, M.H., Johnson, T., Lakshmanan, L.V., Srivastava, D. (2002). Efficient OLAP Query Processing in Distributed Data Warehouses. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_23

Download citation

DOI: https://doi.org/10.1007/3-540-45876-X_23
Published: 14 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43324-8
Online ISBN: 978-3-540-45876-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics