Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Parallel Execution of Hash Joins in Parallel Databases

Published: 01 August 1997 Publication History

Abstract

In this paper, we explore two important issues, processor allocation and the use of hash filters, to improve the parallel execution of hash joins. To exploit the opportunity of pipelining for hash join execution, a scheme to transform a bushy execution tree to an allocation tree is first devised. In an allocation tree, each node denotes a pipeline. Then, using the concept of synchronous execution time, processors are allocated to the nodes in the allocation tree in such a way that inner relations in a pipeline can be made available at approximately the same time. Also, the approach of hash filtering is investigated to further improve the parallel execution of hash joins. Extensive performance studies are conducted via simulation to demonstrate the importance of processor allocation and to evaluate various schemes using hash filters. It is experimentally shown that processor allocation is, in general,the dominant factor to performance, and the effect of hash filtering becomes more prominent as the number of relations in a query increases.

References

[1]
C. Baru, et al., "An Overview of DB2 Parallel Edition," Proc. ACM SIGMOD, pp. 460-462, May 1995.
[2]
D. Bitton and J. Gray, "Disk Shadowing," Proc. 14th Int'l Conf. Very Large Data Bases, 1988.
[3]
H. Boral W. Alexander, et al., "Prototyping Bubba, A Highly Parallel Database System," IEEE Trans. Knowledge and Data Eng., vol. 2, no. 1, pp. 4-24, Mar. 1990.
[4]
M.-S. Chen H.-I. Hsiao and P.S. Yu, "Applying Hash Filerts to Improving the Execution of Bushy Trees," Proc. 19th Int'l Conf. Very Large Data Bases, pp. 505-516, Aug. 1993.
[5]
M.-S. Chen M.-L. Lo P.S. Yu and H.C. Young, "Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins," IEEE Trans. Knowledge and Data Eng., vol. 7, no. 4, pp. 656-668, Aug. 1995.
[6]
M.-S. Chen P.S. Yu and K.L. Wu, "Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries," Proc. Eighth Int'l Conf. Data Eng., pp. 58-67, Feb. 1992.
[7]
D.J. DeWitt and R. Gerber, "Multiprocessor Hash-Based Join Algorithms," Proc. 11th Int'l Conf. Very Large Data Bases, pp. 151-162, Aug. 1985.
[8]
D.J. DeWitt S. Ghandeharizadeh D.A. Schneider A. Bricker H.I. Hsiao and R. Rasmussen, "The Gamma Database Machine Project," IEEE Trans. Knowledge and Data Eng., vol. 2, no. 1, pp. 44-62, Mar. 1990.
[9]
D.J. DeWitt and J. Gray, "Parallel Database Systems: The Future of High Performance Database Systems," Comm. ACM, vol. 35, no. 6, pp. 85-98, June 1992.
[10]
D. Gardy and C. Puech, "On the Effect of Join Operations on Relation Sizes," ACM Trans. Database Systems, vol. 14, no. 4, pp. 574-603, Dec. 1989.
[11]
B. Gerber, "Informix Online XPS," Proc. ACM SIGMOD, p. 463, May 1995.
[12]
R. Gerber, "Dataflow Query Processing Using Multiprocessor Hash-Partitioned Algorithms," Technical Report 672, Computer Science Dept., Univ. of Wisconsin-Madison, Oct. 1986.
[13]
W. Hong, "Exploiting Inter-Operator Parallelism in XPRS," Proc. ACM SIGMOD, pp. 19-28, June 1992.
[14]
W. Hong and M. Stonebraker, "Optimization of Parallel Query Exeuction Plans in CPRS," Proc. First Conf. Parallel and Distributed Information Systems, pp. 218-225, Dec. 1991.
[15]
H.-I. Hsiao and D. DeWitt, "A Performance Study of Three High Availability Data Replication Strategies," Proc. First Conf. Parallel and Distributed Information Systems, pp. 79-84, Dec. 1991.
[16]
Y.E. Ioannidis and Y.C. Yang, "Left-Deep vs. Bush Trees: An Analysis of Strategy Spaces and Its Implication for Query Optimization," Proc. ACM SIGMOD, pp. 168-177, May 1991.
[17]
M. Kitsuregawa H. Tanaka and T. Moto-Oka, "Architecture and Performance of Relational Algebra Machine GRACE," Proc. Int'l Conf. Parallel Processing, pp. 241-250, Aug. 1984.
[18]
R. Krishnamurthy H. Boral and C. Zaniolo, "Optimization of Nonrecursive Queries," Proc. 12th Int'l Conf. Very Large Data Bases, pp. 128-137, Aug. 1986.
[19]
M.-L. Lo M.-S. Chen C.V. Ravishankar and P.S. Yu, "On Optimal Processor Allocation to Support Pipelined Hash Joins," Proc. ACM SIGMOD, pp. 69-78, May 1993.
[20]
R.A. Lorie J.-J. Daudenarde J.W. Stamos and H.C. Young, "Exploiting Database Parallelism in a Message-Passing Multiprocessor," IBM J. Reserach and Development, vol. 35, nos. 5 /6, pp. 681-695, Sept./Nov. 1991.
[21]
H. Lu M.-C. Shan and K.-L. Tan, "Optimization of Multi-Way Join Queries for Parallel Exeuction," Proc. 17th Int'l Conf. Very Large Data Bases, pp. 549-560, Sept. 1991.
[22]
H. Lu K.L. Tan and M.-C. Shan, "Hash-Based Join Algorithms for Multiprocessor Computers with Shared Memory," Proc. 16th Int'l Conf. Very Large Data Bases, pp. 198-209, Aug. 1990.
[23]
P. Mishra and M.H. Eich, "Join Processing in Relational Databases," ACM Computing Surveys, vol. 24, no. 1, pp. 63-113, Mar. 1992.
[24]
E. Omiecinski and E.T. Lin, "The Adaptive-Hash Join Algorithm for a Hypercube Multicomputer," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 3, pp. 334-349, May 1992.
[25]
H. Pirahesh C. Mohan J. Cheng T.S. Liu and P. Selinger, "Parallelism in Relational Data Base Systems: Architectural Issues and Design Approaches," Proc. Second Int'l Symp. Databases in Parallel and Distributed Systems, pp. 4-29, July 1990.
[26]
N. Roussopoulos and H. Kang, "A Pipeline N-Way Join Algorithm Based on the 2-Way Semijoin Program," IEEE Trans. Knowledge and Data Eng., vol. 3, no. 4, pp. 461-473, Dec. 1991.
[27]
D. Schneider, "Complex Query Processing in Multiprocessor Database Machines," Technical Report 965, Computer Science Dept., Univ. of Wisconsin-Madison, Sept. 1990.
[28]
D. Schneider and D.J. DeWitt, "A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment," Proc. ACM SIGMOD, pp. 110-121, 1989.
[29]
D. Schneider and D.J. DeWitt, "Tradeoffs in Processing Complex Join Queries via Hasing in Multiprocessor Database Machines," Proc. 16th Int'l Conf. Very Large Data Bases, pp. 469-480, Aug. 1990.
[30]
P.G. Selinger M.M. Astrahan D.D. Chamberlin R.A. Lorie and T.G. Price, "Access Path Selection in a Relational Database Management System," Proc. ACM SIGMOD, pp. 23-34, 1979.
[31]
E. Shekita H.C. Young and K. Tan, "Multi-Join Optimization for Symmetric Multiprocessors," Proc. 19th Int'l Conf. Very Large Data Bases, pp. 479-492, Aug. 1993.
[32]
M. Stonebraker R. Katz D. Patterson and J. Ousterhout, "The Design of XPRS," Proc. 14th Int'l Conf. Very Large Data Bases, pp. 318-330, 1988.
[33]
A. Swami, "Optimization of Large Join Queries: Combining Heuristics with Combinatorial Techniques," Proc. ACM SIGMOD, pp. 367-376, 1989.
[34]
A. Swami and A. Gupta, "Optimization of Large Join Queries," Proc. ACM SIGMOD, pp. 8-17, 1988.
[35]
A. Wilschut and P. Apers, "Dataflow Query Execution in Parallel Main-Memory Environment," Proc. First Conf. Parallel and Distributed Information Systems, pp. 68-77, Dec. 1991.
[36]
A.N. Wilschut J. Flokstra and P.M.G. Apers, "Parallel Evaluation of Multi-Join Queries," Proc. ACM SIGMOD, pp. 115-126, May 1995.
[37]
J.L. Wolf P.S. Yu J. Turek and D.M. Dias, "A Parallel Hash Join Algorithm for Managing Data Skew," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, pp. 1,355-1,371, Dec. 1993.
[38]
P.S. Yu M.-S. Chen H. Heiss and S.H. Lee, "On Workload Characterization of Relational Database Environments," IEEE Trans. Software Eng., vol. 18, no. 4, pp. 347-355, Apr. 1992.

Cited By

View all
  • (2007)Partition search for non-binary constraint satisfactionInformation Sciences: an International Journal10.1016/j.ins.2007.03.030177:18(3639-3678)Online publication date: 1-Sep-2007
  • (2005)Towards distributed processing of RDF path queriesInternational Journal of Web Engineering and Technology10.1504/IJWET.2005.0084842:2/3(207-230)Online publication date: 1-Dec-2005
  • (2004)Index structures and algorithms for querying distributed RDF repositoriesProceedings of the 13th international conference on World Wide Web10.1145/988672.988758(631-639)Online publication date: 17-May-2004

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 8, Issue 8
August 1997
128 pages
ISSN:1045-9219
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 August 1997

Author Tags

  1. Hash filters
  2. bushy trees
  3. hash joins.
  4. pipelining

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2007)Partition search for non-binary constraint satisfactionInformation Sciences: an International Journal10.1016/j.ins.2007.03.030177:18(3639-3678)Online publication date: 1-Sep-2007
  • (2005)Towards distributed processing of RDF path queriesInternational Journal of Web Engineering and Technology10.1504/IJWET.2005.0084842:2/3(207-230)Online publication date: 1-Dec-2005
  • (2004)Index structures and algorithms for querying distributed RDF repositoriesProceedings of the 13th international conference on World Wide Web10.1145/988672.988758(631-639)Online publication date: 17-May-2004

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media