Abstract
The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network requires collecting and analyzing network data, such as flowlevel traffc statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of efficient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specified as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. Salient properties of our approach are that only partial results are shipped - never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization traffic and the local processing done at each site. We finally present an experimental study based on TPC(R) data. Our results demonstrate the scalability of our techniques and quantify the performance benefits of the optimization techniques that have gone into the Skalla system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. of the Int. Conf. on Very Large Databases, pages 506–521, 1996.
M. O. Akinde, and M. H. Böhlen. Generalized MD-joins: Evaluation and reduction to SQL. In Databases in Telecommunications II, pages 52–67, Sept. 2001.
D. Bitton, H. Boral, D. J. DeWitt, and W. K. Wilkinson. Parallel algorithms for the executions of relational database operations. ACM TODS 8(3):324–353, 1983.
H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping Bubba, a highly parallel database system. IEEE TKDE 2(1), March 1990
R. Cáceres, N. Duffield, A. Feldmann, J. Friedmann, A. Greenberg, R. Greer, T. Johnson, C. Kalmanek, B. Krishnamurthy, D. Lavelle, P. Mishra, K. K. Ramakrishnan, J. Rexford, F. True, and J. van der Merwe. Measurement and analysis of IP network usage and behavior. IEEE Communications Magazine, May 2000.
D. Chatziantoniou. Ad hoc OLAP: Expression and evaluation. In Proc. of the IEEE Int. Conf. on Data Engineering, 1999.
D. Chatziantoniou, M. O. Akinde, T. Johnson, and S. Kim. The MD-join: An operator for complex OLAP. In Proc. of the IEEE Int. Conf. on Data Engineering, 2001.
S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1):65–74, Mar. 1997.
R. Elmasri and S. B. Navathe. Fundamentals of Database Systems. Benjamin/Cummings Publishers, second edition, 1994.
A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F. True. Deriving traffic demands for operational IP networks: Methodology and experience. In Proc. of ACM SIGCOMM, 2000.
G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. of Int. Conf. on Knowledge Discovery and Data Mining, pages 204–208, 1998.
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Datacube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1(1):29–53, 1997.
R. Greer. Daytona and the fourth-generation language Cymbal. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 525–526, 1999.
R. Kimball. The data warehouse toolkit. John Wiley, 1996.
D. Kossman The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422–469, 2000.
M. T. Özsu and P. Valduriez. Principles of Distributed Database Systems. Prentice Hall, 1991.
K. A. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proc. of the Int. Conf. on Very Large Databases, pages 116–125, 1997.
K. A. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple granularities. In Proc. of the Int. Conf. on Extending Database Technology, pages 263–277, 1998.
A. Shatdal and J. F. Naughton. Adaptive parallel aggregation algorithms. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 104–114, 1995.
C. T. Yu, K. C. Guh, and A. L. P. Chen. An integrated algorithm for distributed query processing. In Proc. of the IFIP Conf. on Distributed Processing, 1987.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Akinde, M.O., Böhlen, M.H., Johnson, T., Lakshmanan, L.V., Srivastava, D. (2002). Efficient OLAP Query Processing in Distributed Data Warehouses. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_23
Download citation
DOI: https://doi.org/10.1007/3-540-45876-X_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43324-8
Online ISBN: 978-3-540-45876-0
eBook Packages: Springer Book Archive