Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Performance analysis of "Groupby-After-Join" query processing in parallel database systems

Published: 03 December 2004 Publication History

Abstract

Queries containing aggregate functions often combine multiple tables through join operations. This query is subsequently called "Groupby-Join". There is a special category of this query whereby the group-by operation can only be performed after the join operation. This is known as "Groupby-After-Join" queries--the focus of this paper. In parallel processing of such queries, it must be decided which attribute is used as a partitioning attribute, particularly join attribute or group-by attribute. Based on the partitioning attribute, two parallel processing methods, namely join partition method (JPM) and aggregate partition method (APM) are discussed. The behaviours of these parallelization methods are described in terms of cost models. Experiments are performed based on simulations. The simulation results show that the aggregate partition method performs better than the join partition method.

References

[1]
{1} J.A. Bedell, Outstanding challenges in OLAP, in: Proceedings of 14th International Conference on Data Engineering, 1998.]]
[2]
{2} G. Bultzingsloewen, Translating and optimizing SQL queries having aggregate, in: Proceedings of the 13th International Conference on Very Large Data Bases, 1987.]]
[3]
{3} A. Datta, B. Moon, A case for parallelism in data warehousing and OLAP, in: Proceedings of 9th International Workshop on Database and Expert Systems Applications, 1998.]]
[4]
{4} U. Dayal, Of nests and trees: a unified approach to processing queries that contain nested subqueries, aggregates, and quantifiers, in: Proceedings of the 13th International Conference on Very Large Data Bases, Brighton, UK, 1987.]]
[5]
{5} D.J. DeWitt, J. Gray, Parallel database systems: the future of high performance database systems, Communication of the ACM 35 (6) (1992) 85-98.]]
[6]
{6} G. Graefe, Query evaluation techniques for large databases, ACM Computing Surveys 25 (2) (1993) 73-170.]]
[7]
{7} W. Kim, On optimizing an SQL-like nested query, ACM Transactions on Database Systems 7 (3) (1982) 443-469.]]
[8]
{8} D.E. Knuth, in: The Art of Computer Programming, vol. 3, Addison-Wesley, 1973.]]
[9]
{9} C.H.C. Leung, H.T. Ghogomu, A high-performance parallel database architecture, in: Proceedings of the Seventh ACM International Conference on Supercomputing, Tokyo, 1993, pp. 377-386.]]
[10]
{10} C.H.C. Leung, D. Taniar, Parallel query processing in object-oriented database systems, Australian Computer Science Communications 17 (2) (1995) 119-131.]]
[11]
{11} K.H. Liu, C.H.C. Leung, Y. Jiang, Analysis and taxonomy of skew in parallel database systems, in: Proceedings of the International Symposium on High Performance Computing Systems (HPCS'95), Montreal, Canada, July 1995, pp. 304-315.]]
[12]
{12} K.H. Liu, Y. Jiang, C.H.C. Leung, Query execution in the presence of data skew in parallel databases, Australian Computer Science Communications 18 (2) (1996) 157-166.]]
[13]
{13} P. Mishra, M.H. Eich, Join processing in relational databases, ACM Computing Surveys 24 (1) (1992) 63-113.]]
[14]
{14} D. Taniar, J.W. Rahayu, Parallel Group-By query processing in a cluster architecture, International Journal of Computer Systems: Science and Engineering 17 (1) (2002) 23-39.]]
[15]
{15} D. Taniar, Y. Jiang, K.H. Liu, C.H.C. Leung, Parallel Aggregate-Join query processing, Informatica: An International Journal of Computing and Informatics 26 (2002) 321-332.]]
[16]
{16} D. Taniar, R.B.-N. Tan, Parallel processing of multi-join Expansion_Aggregate data cube query in high performance database systems, in: Proceedings of The Sixth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'02), IEEE Computer Society Press, 2002, pp. 51-56.]]
[17]
{17} D. Taniar, J.W. Rahayu, H. Ekonomosa, Performance evaluation of parallel GroupBy-Before-Join query processing in high performance database systems, in: B. Hertzberger, A. Hoekstra, R. Williams (Eds.), High Performance Computing and Networking, Lecture Notes in Computer Science, vol. 2110, Springer-Verlag, 2001, pp. 241-250.]]
[18]
{18} D. Taniar, J.W. Rahayu, Parallel processing of "Group By-Before-Join" queries in cluster architecture, in: Proceedings of The First IEEE/ACM International Symposium on Cluster Computing and the Grid, IEEE Computer Society Press, 2001, pp. 178-185.]]
[19]
{19} W.P. Yan, P. Larson, Performing group-by before join, in: Proceedings of the International Conference on Data Engineering, 1994.]]

Cited By

View all
  • (2012)Self-adaptive approximate queries for large-scale information aggregationInternational Journal of Web and Grid Services10.1504/IJWGS.2012.0491688:3(225-247)Online publication date: 1-Sep-2012
  • (2011)Data warehouse design on the basis of Hierarchical Degenerate Snowflake (HDS)International Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2011.0394106:2(154-183)Online publication date: 1-Apr-2011
  • (2011)Voronoi-based range and continuous range query processing in mobile databasesJournal of Computer and System Sciences10.1016/j.jcss.2010.02.00577:4(637-651)Online publication date: 1-Jul-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 168, Issue 1-4
3 December 2004
288 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 03 December 2004

Author Tags

  1. groupby queries
  2. groupby-join queries
  3. parallel databases
  4. parallel query optimization
  5. parallel query processing
  6. performance analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Self-adaptive approximate queries for large-scale information aggregationInternational Journal of Web and Grid Services10.1504/IJWGS.2012.0491688:3(225-247)Online publication date: 1-Sep-2012
  • (2011)Data warehouse design on the basis of Hierarchical Degenerate Snowflake (HDS)International Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2011.0394106:2(154-183)Online publication date: 1-Apr-2011
  • (2011)Voronoi-based range and continuous range query processing in mobile databasesJournal of Computer and System Sciences10.1016/j.jcss.2010.02.00577:4(637-651)Online publication date: 1-Jul-2011
  • (2010)Beyond pagesProceedings of the 13th International Conference on Extending Database Technology10.1145/1739041.1739047(15-26)Online publication date: 22-Mar-2010
  • (2009)Estimating software readiness using predictive modelsInformation Sciences: an International Journal10.1016/j.ins.2008.10.005179:4(430-445)Online publication date: 1-Feb-2009
  • (2007)Benchmarking data warehousesInternational Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2007.0129472:1(79-104)Online publication date: 1-Mar-2007
  • (2007)The use of Hints in SQL-Nested query optimizationInformation Sciences: an International Journal10.1016/j.ins.2006.12.015177:12(2493-2521)Online publication date: 20-Jun-2007

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media