article

Performance analysis of "Groupby-After-Join" query processing in parallel database systems

Authors:

Rebecca Boon-Noi Tan,

C. H. C. Leung,

K. H. LiuAuthors Info & Claims

Information Sciences—Informatics and Computer Science, Intelligent Systems, Applications: An International Journal, Volume 168, Issue 1-4

Pages 25 - 50

https://doi.org/10.1016/j.ins.2003.09.029

Published: 03 December 2004 Publication History

Abstract

Queries containing aggregate functions often combine multiple tables through join operations. This query is subsequently called "Groupby-Join". There is a special category of this query whereby the group-by operation can only be performed after the join operation. This is known as "Groupby-After-Join" queries--the focus of this paper. In parallel processing of such queries, it must be decided which attribute is used as a partitioning attribute, particularly join attribute or group-by attribute. Based on the partitioning attribute, two parallel processing methods, namely join partition method (JPM) and aggregate partition method (APM) are discussed. The behaviours of these parallelization methods are described in terms of cost models. Experiments are performed based on simulations. The simulation results show that the aggregate partition method performs better than the join partition method.

References

[1]

{1} J.A. Bedell, Outstanding challenges in OLAP, in: Proceedings of 14th International Conference on Data Engineering, 1998.]]

Digital Library

[2]

{2} G. Bultzingsloewen, Translating and optimizing SQL queries having aggregate, in: Proceedings of the 13th International Conference on Very Large Data Bases, 1987.]]

Digital Library

[3]

{3} A. Datta, B. Moon, A case for parallelism in data warehousing and OLAP, in: Proceedings of 9th International Workshop on Database and Expert Systems Applications, 1998.]]

Digital Library

[4]

{4} U. Dayal, Of nests and trees: a unified approach to processing queries that contain nested subqueries, aggregates, and quantifiers, in: Proceedings of the 13th International Conference on Very Large Data Bases, Brighton, UK, 1987.]]

Digital Library

[5]

{5} D.J. DeWitt, J. Gray, Parallel database systems: the future of high performance database systems, Communication of the ACM 35 (6) (1992) 85-98.]]

Digital Library

[6]

{6} G. Graefe, Query evaluation techniques for large databases, ACM Computing Surveys 25 (2) (1993) 73-170.]]

Digital Library

[7]

{7} W. Kim, On optimizing an SQL-like nested query, ACM Transactions on Database Systems 7 (3) (1982) 443-469.]]

Digital Library

[8]

{8} D.E. Knuth, in: The Art of Computer Programming, vol. 3, Addison-Wesley, 1973.]]

Digital Library

[9]

{9} C.H.C. Leung, H.T. Ghogomu, A high-performance parallel database architecture, in: Proceedings of the Seventh ACM International Conference on Supercomputing, Tokyo, 1993, pp. 377-386.]]

Digital Library

[10]

{10} C.H.C. Leung, D. Taniar, Parallel query processing in object-oriented database systems, Australian Computer Science Communications 17 (2) (1995) 119-131.]]

[11]

{11} K.H. Liu, C.H.C. Leung, Y. Jiang, Analysis and taxonomy of skew in parallel database systems, in: Proceedings of the International Symposium on High Performance Computing Systems (HPCS'95), Montreal, Canada, July 1995, pp. 304-315.]]

[12]

{12} K.H. Liu, Y. Jiang, C.H.C. Leung, Query execution in the presence of data skew in parallel databases, Australian Computer Science Communications 18 (2) (1996) 157-166.]]

[13]

{13} P. Mishra, M.H. Eich, Join processing in relational databases, ACM Computing Surveys 24 (1) (1992) 63-113.]]

Digital Library

[14]

{14} D. Taniar, J.W. Rahayu, Parallel Group-By query processing in a cluster architecture, International Journal of Computer Systems: Science and Engineering 17 (1) (2002) 23-39.]]

[15]

{15} D. Taniar, Y. Jiang, K.H. Liu, C.H.C. Leung, Parallel Aggregate-Join query processing, Informatica: An International Journal of Computing and Informatics 26 (2002) 321-332.]]

[16]

{16} D. Taniar, R.B.-N. Tan, Parallel processing of multi-join Expansion_Aggregate data cube query in high performance database systems, in: Proceedings of The Sixth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'02), IEEE Computer Society Press, 2002, pp. 51-56.]]

Digital Library

[17]

{17} D. Taniar, J.W. Rahayu, H. Ekonomosa, Performance evaluation of parallel GroupBy-Before-Join query processing in high performance database systems, in: B. Hertzberger, A. Hoekstra, R. Williams (Eds.), High Performance Computing and Networking, Lecture Notes in Computer Science, vol. 2110, Springer-Verlag, 2001, pp. 241-250.]]

Digital Library

[18]

{18} D. Taniar, J.W. Rahayu, Parallel processing of "Group By-Before-Join" queries in cluster architecture, in: Proceedings of The First IEEE/ACM International Symposium on Cluster Computing and the Grid, IEEE Computer Society Press, 2001, pp. 178-185.]]

Digital Library

[19]

{19} W.P. Yan, P. Larson, Performing group-by before join, in: Proceedings of the International Conference on Data Engineering, 1994.]]

Digital Library

Cited By

Brunner RFreitag FNavarro LRana O(2012)Self-adaptive approximate queries for large-scale information aggregationInternational Journal of Web and Grid Services10.1504/IJWGS.2012.0491688:3(225-247)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1504/IJWGS.2012.049168
Zaker MYasin NPhon-Amnuaisuk SHaw S(2011)Data warehouse design on the basis of Hierarchical Degenerate Snowflake (HDS)International Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2011.0394106:2(154-183)Online publication date: 1-Apr-2011
https://dl.acm.org/doi/10.1504/IJBIDM.2011.039410
Xuan KZhao GTaniar DRahayu WSafar MSrinivasan B(2011)Voronoi-based range and continuous range query processing in mobile databasesJournal of Computer and System Sciences10.1016/j.jcss.2010.02.00577:4(637-651)Online publication date: 1-Jul-2011
https://dl.acm.org/doi/10.1016/j.jcss.2010.02.005
Show More Cited By

Index Terms

Performance analysis of "Groupby-After-Join" query processing in parallel database systems

Recommendations

Parallel "GroupBy-Before-Join" Query Processing for High Performance Parallel/Distributed Database Systems
AINA '06: Proceedings of the 20th International Conference on Advanced Information Networking and Applications - Volume 01

GroupBy-Join queries in SQL are queries involving the group by clause joining several tables. In this paper, we describe three parallelization techniques for GroupBby-Join queries, particularly the queries where the group-by clause can be performed ...
Aggregate-Join Query Processing in Parallel Database Systems
HPC '00: Proceedings of the The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region-Volume 2 - Volume 2

Queries containing aggregate functions often combine multiple tables through join operations. We call these queries Aggregate-Join queries. In parallel processing of such queries, it must be decided which attribute to be used as a partitioning attribute,...
Performance Evaluation of Parallel GroupBy-Before-Join Query Processing in High Performance Database Systems
HPCN Europe 2001: Proceedings of the 9th International Conference on High-Performance Computing and Networking

Strategic decision making process uses a lot of GroupBy clauses and join operations queries. As the source of information in this type of application to these queries is commonly very huge, then parallelization of GroupBy-Join queries is unavoidable in ...

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal

Information Sciences: an International Journal Volume 168, Issue 1-4

3 December 2004

288 pages

ISSN:0020-0255

Issue’s Table of Contents

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 03 December 2004

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Brunner RFreitag FNavarro LRana O(2012)Self-adaptive approximate queries for large-scale information aggregationInternational Journal of Web and Grid Services10.1504/IJWGS.2012.0491688:3(225-247)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1504/IJWGS.2012.049168
Zaker MYasin NPhon-Amnuaisuk SHaw S(2011)Data warehouse design on the basis of Hierarchical Degenerate Snowflake (HDS)International Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2011.0394106:2(154-183)Online publication date: 1-Apr-2011
https://dl.acm.org/doi/10.1504/IJBIDM.2011.039410
Xuan KZhao GTaniar DRahayu WSafar MSrinivasan B(2011)Voronoi-based range and continuous range query processing in mobile databasesJournal of Computer and System Sciences10.1016/j.jcss.2010.02.00577:4(637-651)Online publication date: 1-Jul-2011
https://dl.acm.org/doi/10.1016/j.jcss.2010.02.005
Cheng TChang K(2010)Beyond pagesProceedings of the 13th International Conference on Extending Database Technology10.1145/1739041.1739047(15-26)Online publication date: 22-Mar-2010
https://dl.acm.org/doi/10.1145/1739041.1739047
Quah T(2009)Estimating software readiness using predictive modelsInformation Sciences: an International Journal10.1016/j.ins.2008.10.005179:4(430-445)Online publication date: 1-Feb-2009
https://dl.acm.org/doi/10.1016/j.ins.2008.10.005
Darmont JBentayeb FBoussaid O(2007)Benchmarking data warehousesInternational Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2007.0129472:1(79-104)Online publication date: 1-Mar-2007
https://dl.acm.org/doi/10.1504/IJBIDM.2007.012947
Taniar DKhaw HTjioe HPardede E(2007)The use of Hints in SQL-Nested query optimizationInformation Sciences: an International Journal10.1016/j.ins.2006.12.015177:12(2493-2521)Online publication date: 20-Jun-2007
https://dl.acm.org/doi/10.1016/j.ins.2006.12.015

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents