Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2236584.2236592acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

GPU join processing revisited

Published: 21 May 2012 Publication History

Abstract

Until recently, the use of graphics processing units (GPUs) for query processing was limited by the amount of memory on the graphics card, a few gigabytes at best. Moreover, input tables had to be copied to GPU memory before they could be processed, and after computation was completed, query results had to be copied back to CPU memory. The newest generation of Nvidia GPUs and development tools introduces a common memory address space, which now allows the GPU to access CPU memory directly, lifting size limitations and obviating data copy operations. We confirm that this new technology can sustain 98% of its nominal rate of 6.3 GB/sec in practice, and exploit it to process database hash joins at the same rate, i.e., the join is processed "on the fly" as the GPU reads the input tables from CPU memory at PCI-E speeds. Compared to the fastest published results for in-memory joins on the CPU, this represents more than half an order of magnitude speed-up. All of our results include the cost of result materialization (often omitted in earlier work), and we investigate the implications of changing join predicate selectivity and table size.

References

[1]
A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB'99.
[2]
D. A. Alcantara, V. Volkov, S. Sengupta, M. Mitzenmacher, J. D. Owens, and N. Ameta. GPU Computing Gems: Jade Edition, chapter 4, pages 39--53. Morgan Kaufmann, 2012.
[3]
S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD'11.
[4]
P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB'99.
[5]
R. Budruck, D. Anderson, and T. Shanley. PCI Express System Architecture. Addison-Wesley, 2003.
[6]
M. Garland, S. Le Grand, J. Nickolls, J. Anderson, J. Hardwick, S. Morton, E. Phillips, Y. Zhang, and V. Volkov. Parallel computing experiences with CUDA. IEEE Micro, 28(4).
[7]
N. K. Govindaraju and D. Manocha. Efficient relational database management using graphics processors. In DaMoN'05.
[8]
B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM Trans. Database Syst., 34(4), Dec. 2009.
[9]
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD'08.
[10]
C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey. Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs. Proc. VLDB Endow., 2(2), Aug. 2009.
[11]
S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. IEEE Trans. on Knowledge and Data Engineering, 14.
[12]
H. Pirk, S. Manegold, and M. Kersten. Accelerating foreign-key joins using asymmetric memory channels. In ADMS'11.

Cited By

View all
  • (2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
  • (2024)GOLAP: A GPU-in-Data-Path Architecture for High-Speed OLAPProceedings of the ACM on Management of Data10.1145/36988122:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DaMoN '12: Proceedings of the Eighth International Workshop on Data Management on New Hardware
May 2012
72 pages
ISBN:9781450314459
DOI:10.1145/2236584
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2012

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
  • (2024)GOLAP: A GPU-in-Data-Path Architecture for High-Speed OLAPProceedings of the ACM on Management of Data10.1145/36988122:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
  • (2024)ArcaDB: A Disaggregated Query Engine for Heterogenous Computational Environments2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00015(42-53)Online publication date: 7-Jul-2024
  • (2023)A Case for Graphics-Driven Query ProcessingProceedings of the VLDB Endowment10.14778/3603581.360359016:10(2499-2511)Online publication date: 1-Jun-2023
  • (2023)Accelerating User-Defined Aggregate Functions (UDAF) with Block-wide Execution and JIT Compilation on GPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595307(19-26)Online publication date: 18-Jun-2023
  • (2023)Distributed GPU Joins on Fast RDMA-capable NetworksProceedings of the ACM on Management of Data10.1145/35887091:1(1-26)Online publication date: 30-May-2023
  • (2023)Out-of-the-box library support for DBMS operations on GPUsDistributed and Parallel Databases10.1007/s10619-023-07431-341:3(489-509)Online publication date: 10-May-2023
  • (2022)Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMSProceedings of the VLDB Endowment10.14778/3551793.355180915:11(2491-2503)Online publication date: 29-Sep-2022
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media