research-article

GPU join processing revisited

Authors:

Tim Kaldewey,

Guy Lohman,

Rene Mueller,

Peter VolkAuthors Info & Claims

DaMoN '12: Proceedings of the Eighth International Workshop on Data Management on New Hardware

Pages 55 - 62

https://doi.org/10.1145/2236584.2236592

Published: 21 May 2012 Publication History

Get Access

Abstract

Until recently, the use of graphics processing units (GPUs) for query processing was limited by the amount of memory on the graphics card, a few gigabytes at best. Moreover, input tables had to be copied to GPU memory before they could be processed, and after computation was completed, query results had to be copied back to CPU memory. The newest generation of Nvidia GPUs and development tools introduces a common memory address space, which now allows the GPU to access CPU memory directly, lifting size limitations and obviating data copy operations. We confirm that this new technology can sustain 98% of its nominal rate of 6.3 GB/sec in practice, and exploit it to process database hash joins at the same rate, i.e., the join is processed "on the fly" as the GPU reads the input tables from CPU memory at PCI-E speeds. Compared to the fastest published results for in-memory joins on the CPU, this represents more than half an order of magnitude speed-up. All of our results include the cost of result materialization (often omitted in earlier work), and we investigate the implications of changing join predicate selectivity and table size.

References

[1]

A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB'99.

Digital Library

Google Scholar

[2]

D. A. Alcantara, V. Volkov, S. Sengupta, M. Mitzenmacher, J. D. Owens, and N. Ameta. GPU Computing Gems: Jade Edition, chapter 4, pages 39--53. Morgan Kaufmann, 2012.

Google Scholar

[3]

S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD'11.

Digital Library

Google Scholar

[4]

P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB'99.

Digital Library

Google Scholar

[5]

R. Budruck, D. Anderson, and T. Shanley. PCI Express System Architecture. Addison-Wesley, 2003.

Digital Library

Google Scholar

[6]

M. Garland, S. Le Grand, J. Nickolls, J. Anderson, J. Hardwick, S. Morton, E. Phillips, Y. Zhang, and V. Volkov. Parallel computing experiences with CUDA. IEEE Micro, 28(4).

Digital Library

Google Scholar

[7]

N. K. Govindaraju and D. Manocha. Efficient relational database management using graphics processors. In DaMoN'05.

Digital Library

Google Scholar

[8]

B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM Trans. Database Syst., 34(4), Dec. 2009.

Digital Library

Google Scholar

[9]

B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD'08.

Digital Library

Google Scholar

[10]

C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey. Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs. Proc. VLDB Endow., 2(2), Aug. 2009.

Digital Library

Google Scholar

[11]

S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. IEEE Trans. on Knowledge and Data Engineering, 14.

Digital Library

Google Scholar

[12]

H. Pirk, S. Manegold, and M. Kersten. Accelerating foreign-key joins using asymmetric memory channels. In ADMS'11.

Google Scholar

Cited By

View all

Deng YYan MTang B(2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
https://doi.org/10.14778/3659437.3659443
Boeschen NZiegler TBinnig C(2024)GOLAP: A GPU-in-Data-Path Architecture for High-Speed OLAPProceedings of the ACM on Management of Data10.1145/36988122:6(1-26)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698812
Yang FPeng SSun NWang FWang YWu FQiu JPan A(2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673095
Show More Cited By

Recommendations

Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using the graphic processing unit (GPU)-single-GPU implementation

We have successfully ported an arbitrary high-order discontinuous Galerkin (ADER-DG) method for solving the three-dimensional elastic seismic wave equation on unstructured tetrahedral meshes to an Nvidia Tesla C2075 GPU using the Nvidia CUDA programming ...
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and ...
HG-Bitmap Join Index: A Hybrid GPU/CPU Bitmap Join Index Mechanism for OLAP
Web Information Systems Engineering – WISE 2013 Workshops
Abstract
In-memory big data OLAP(on-line analytical processing) is time consuming task for data access latency and complex star join processing overhead. GPU is introduced to DBMSs for its remarkable parallel computing power but also restricted by its ...

Comments

Information & Contributors

Information

Published In

DaMoN '12: Proceedings of the Eighth International Workshop on Data Management on New Hardware

May 2012

72 pages

ISBN:9781450314459

DOI:10.1145/2236584

Editors:
Shimin Chen
HP Labs China
,
Stavros Harizopoulos
Nou Data

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

SIGMOD/PODS '12

Sponsor:

SIGMOD

SIGMOD/PODS '12: International Conference on Management of Data

May 21, 2012

Arizona, Scottsdale

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

112
Total Citations
View Citations
943
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)3

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Deng YYan MTang B(2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
https://doi.org/10.14778/3659437.3659443
Boeschen NZiegler TBinnig C(2024)GOLAP: A GPU-in-Data-Path Architecture for High-Speed OLAPProceedings of the ACM on Management of Data10.1145/36988122:6(1-26)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698812
Yang FPeng SSun NWang FWang YWu FQiu JPan A(2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673095
Ruiz-Rohena KRodriguez-Martínez M(2024)ArcaDB: A Disaggregated Query Engine for Heterogenous Computational Environments2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00015(42-53)Online publication date: 7-Jul-2024
https://doi.org/10.1109/CLOUD62652.2024.00015
Doraiswamy HKalagi VRamachandra KHaritsa J(2023)A Case for Graphics-Driven Query ProcessingProceedings of the VLDB Endowment10.14778/3603581.360359016:10(2499-2511)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.14778/3603581.3603590
Yogatama BMiller BWang YMarkall GHemstad JKimball GYu X(2023)Accelerating User-Defined Aggregate Functions (UDAF) with Block-wide Execution and JIT Compilation on GPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595307(19-26)Online publication date: 18-Jun-2023
https://doi.org/10.1145/3592980.3595307
Thostrup LDoci GBoeschen NLuthra MBinnig C(2023)Distributed GPU Joins on Fast RDMA-capable NetworksProceedings of the ACM on Management of Data10.1145/35887091:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588709
Subramanian HGurumurthy BDurand GBroneske DSaake G(2023)Out-of-the-box library support for DBMS operations on GPUsDistributed and Parallel Databases10.1007/s10619-023-07431-341:3(489-509)Online publication date: 10-May-2023
https://doi.org/10.1007/s10619-023-07431-3
Yogatama BGong WYu X(2022)Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMSProceedings of the VLDB Endowment10.14778/3551793.355180915:11(2491-2503)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551809
Lutz CBreß SZeuch SRabl TMarkl VIves ZBonifati AEl Abbadi A(2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517911
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using the graphic processing unit (GPU)-single-GPU implementation

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities

HG-Bitmap Join Index: A Hybrid GPU/CPU Bitmap Join Index Mechanism for OLAP

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Recommendations

Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using the graphic processing unit (GPU)-single-GPU implementation

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities

HG-Bitmap Join Index: A Hybrid GPU/CPU Bitmap Join Index Mechanism for OLAP

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations