Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

On applying hash filters to improving the execution of multi-join queries

Published: 05 May 1997 Publication History

Abstract

In this paper, we explore an approach of interleaving a bushy execution tree with hash filters to improve the execution of multi-join queries. Similar to semi-joins in distributed query processing, hash filters can be applied to eliminate non-matching tuples from joining relations before the execution of a join, thus reducing the join cost. Note that hash filters built in different execution stages of a bushy tree can have different costs and effects. The effect of hash filters is evaluat ed first. Then, an efficient scheme to determine an effective sequence of hash filters for a bushy execution tree is developed, where hash filters are built and applied based on the join sequence specified in the bushy tree so that not only is the reduction effect optimized but also the cost associated is minimized. Various schemes using hash filters are implemented and evaluated via simulation. It is experimentally shown that the application of hash filters is in general a very powerful means to improve th e execution of multi-join queries, and the improvement becomes more prominent as the number of relations in a query increases.

References

[1]
1. Babb E (1979) Implementing a relational database by means of specialized hardware. ACM Trans Database Syst, 4(1):1-29.
[2]
2. Bernstein PA, Chiu D-MW (1981) Using semi-joins to solve relational queries. J ACM 28(1):25-40.
[3]
3. Bitton D, Gray J (1988) Disk shadowing. Proc the 14th International Conference on Very Large Data Bases.
[4]
4. Boral H, Alexander W, et al (1990) Prototyping Bubba, a highly parallel database system. IEEE Trans Knowl Data Eng, 2(1):4-24.
[5]
5. Chen M-S, Lo M-L, Yu PS, Young HC (1995) Applying segmented right-deep trees to pipelining multiple hash joins. IEEE Trans Knowl Data Eng, 7(4):656-668.
[6]
6. Chen M-S, Yu PS (1992) Interleaving a join sequence with semi-joins in distributed query processing. IEEE Trans Parallel Distrib Syst, 3(5):611-621.
[7]
7. Chen M-S, Yu PS, Wu K-L (1995) Optimization of Parallel Execution for Multi-Join Queries. IEEE Trans Knowl Data Eng, 8(3): 416-428.
[8]
8. DeWitt DJ, Ghandeharizadeh S, Schneider DA, Bricker A, Hsiao HI, Rasmussen R (1990) The Gamma database machine project. IEEE Trans Knowl Data Eng, 2(1):44-62.
[9]
9. DeWitt DJ, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85-98.
[10]
10. Gardy D, Puech C (1989) On the effect of join operations on relation sizes. ACM Trans Database Syst 14(4):574-603.
[11]
11. Hong W (1992) Exploiting Inter-Operator Parallelism in XPRS. Proc ACM SIGMOD, June, pp. 19-28.
[12]
12. Hong W, Stonebraker M (1991) Optimization of parallel query execution plans in XPRS. Proc 1st Conf Parallel and Distributed Information Systems, December, pp. 218-225.
[13]
13. Hsiao H-I, DeWitt D (1991) A performance study of three high availability data replication strategies. Proc Conference Parallel and Distributed Information Systems, December, pp. 79-84.
[14]
14. Ioannidis YE, Kang YC (1991) Left-deep vs. bushy trees: an analysis of strategy spaces and its implication for query optimization. Proc ACM SIGMOD, May, pp. 168-177.
[15]
15. Jarke M, Koch J (1982) Query optimization in database systems. ACM Computing Surveys, 16(2):111-152.
[16]
16. Kitsuregawa M, Tanaka H, Moto-Oka T (1984) Architecture and performance of relational algebra machine GRACE. Proc Int Conf Parallel Processing, August, pp. 241-250.
[17]
17. Krishnamurthy R, Boral H, Zaniolo C (1986) Optimization of Nonrecursive Queries. Proc 12th Int Conf Very Large Data Bases, August, pp. 128-137.
[18]
18. Lo M-L, Chen M-S, Ravishankar CV, Yu PS (1993) On optimal processor allocation to support pipelined hash joins. Proc ACM SIGMOD, May, pp. 69-78.
[19]
19. Lorie RA, Daudenarde J-J, Stamos JW, Young HC (1991) Exploiting database parallelism in a message-passing multiprocessor. IBM J Res Dev 35(5/6):681-695.
[20]
20. Lu H, Shan M-C, Tan K-L (1991) Optimization of multi-way join queries for parallel execution. Proc 17th Int Conf Very Large Data Bases, September, pp. 549-560.
[21]
21. Mishra P, Eich MH (1992) Join processing in relational databases. ACM Computing Surveys 24(1):63-113.
[22]
22. Pirahesh H, Mohan C, Cheng J, Liu TS, Selinger P (1990) Parallelism in relational data base systems: architectural issues and design approaches. Proc 2nd Int Sympos Databases in Parallel and Distributed Systems, July, pp. 4-29.
[23]
23. Roussopoulos N, Kang H (1991) A pipeline N-way join algorithm based on the 2-way semijoin program. IEEE Trans Knowl Data Eng, 3(4):461-473.
[24]
24. Schneider D (1990) Complex query processing in multiprocessor database machines. Tech Rep 965, Computer Science Department, University of Wisconsin, Madison.
[25]
25. Schneider D, DeWitt DJ (1989) A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. Proc ACM SIGMOD, pp. 110-121.
[26]
26. Selinger PG, Astrahan MM, Chamberlin DD, Lorie RA, Price TG (1979) Access path selection in a relational database management system. Proc ACM SIGMOD, pp. 23-34.
[27]
27. Stonebraker M, Katz R, Patterson D, Ousterhout J (1988) The design of XPRS. Proc 14th Int Conf Very Large Data Bases, pp. 318-330.
[28]
28. Swami A (1989) Optimization of large join queries: combining heuristics with combinatorial techniques. Proc ACM SIGMOD, pp. 367-376.
[29]
29. Swami A, Gupta A (1988) Optimization of large join queries. Proc ACM SIGMOD, pp. 8-17.
[30]
30. Teradata (1985) DBC/1012 Database computer system manual release 2.0. Tech Rep Doc C10-0001-02, Teradata Corporation.
[31]
31. Valduriez P, Gardarin G (1984) Join and semijoin algorithms for a multiprocessor database machine. ACM Trans Database Syst 9(1):133- 161.
[32]
32. Walton CB, Dale AG, Jenevein RM (1991) A taxonomy and performance model of data skew effects in parallel joins. Proc 17th Int Conf Very Large Data Bases, September, pp. 537-548.
[33]
33. Yao SB (1977) Approximating block access in database organizations. Commun ACM 20:260-261.
[34]
34. Yu PS, Chen M-S, Heiss H, Lee SH (1992) On workload characterization of relational database environments. IEEE Trans Software Eng 18(4):347-355.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 6, Issue 2
May 1997
100 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 05 May 1997

Author Tags

  1. Bushy trees
  2. Hash filters
  3. Parallel query processing
  4. Sort-merge joins

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)11
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Robust Join Processing with Diamond Hardened JoinsProceedings of the VLDB Endowment10.14778/3681954.368199517:11(3215-3228)Online publication date: 1-Jul-2024
  • (2023)Anser: Adaptive Information Sharing Framework of AnalyticDBProceedings of the VLDB Endowment10.14778/3611540.361155316:12(3636-3648)Online publication date: 1-Aug-2023
  • (2022)SQLiteProceedings of the VLDB Endowment10.14778/3554821.355484215:12(3535-3547)Online publication date: 1-Aug-2022
  • (2022)Threshold queries in theory and in the wildProceedings of the VLDB Endowment10.14778/3510397.351040715:5(1105-1118)Online publication date: 18-May-2022
  • (2020)Bitvector-aware Query Optimization for Decision Support QueriesProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389769(2011-2026)Online publication date: 11-Jun-2020
  • (2017)Looking ahead makes query plans robustProceedings of the VLDB Endowment10.14778/3090163.309016710:8(889-900)Online publication date: 1-Apr-2017
  • (2016)G-SQLProceedings of the VLDB Endowment10.14778/2994509.29945109:12(900-911)Online publication date: 1-Aug-2016
  • (2015)R3FWorld Wide Web10.1007/s11280-013-0253-118:2(317-357)Online publication date: 1-Mar-2015
  • (2008)Dynamic adaptive data structures for monitoring data streamsData & Knowledge Engineering10.1016/j.datak.2007.12.00666:1(92-115)Online publication date: 1-Jul-2008

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media