Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3329785.3329922acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Performance Analysis and Automatic Tuning of Hash Aggregation on GPUs

Published: 01 July 2019 Publication History

Abstract

Hash aggregation is an important data processing primitive which can be significantly accelerated by modern graphics processors (GPUs). Previous work derived heuristics for GPU-accelerated hash aggregation from the study of a particular GPU. In this paper, we examine the influence of different execution parameters on GPU-accelerated hash aggregation on four NVIDIA and two AMD GPUs based on six different microarchitectures. While we are able to replicate some of the previous results, our main finding is that optimal execution parameters are highly GPU-dependent. Most importantly, execution parameters optimized for a specific GPU are up to 21x slower on other GPUs. Given this hardware dependency, we present an algorithm to optimize execution parameters at runtime. On average, our algorithm converges on a result in less than 1% of the time required for a full evaluation of the search space. In this time, it finds execution parameters that are at most 1% slower than the optimum in 90% of our experiments. In the worst case, our algorithm finds execution parameters that are at most 1.29x slower than the optimum.

References

[1]
S. Breß, H. Funke, and J. Teubner. "Robust Query Processing in Co-Processor-accelerated Databases". In: SIGMOD. 2016, pp. 1891--1906.
[2]
S. Breß et al. "Generating custom code for efficient query execution on heterogeneous processors". In: VLDBJ 27.6 (Dec. 2018), pp. 797--822.
[3]
J. Cieslewicz and K. A. Ross. "Adaptive Aggregation on Chip Multiprocessors". In: VLDB. 2007, pp. 339--350.
[4]
M. J. Freitag and T. Neumann. "Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates". In: CIDR. 2019.
[5]
H. Funke et al. "Pipelined Query Processing in Coprocessor Environments". In: SIGMOD. 2018, pp. 1603--1618.
[6]
B. He et al. "Relational Query Coprocessing on Graphics Processors". In: TODS 34.4 (2009), 21:1--21:39.
[7]
M. Heimel et al. "Hardware-Oblivious Parallelism for In-Memory Column-Stores". In: PVLDB 6.9 (2013), pp. 709--720.
[8]
T. Karnagel, R. Mueller, and G. M. Lohman. "Optimizing GPU-accelerated Group-By and Aggregation". In: ADMS@VLDB. 2015, pp. 13--24.
[9]
D. E. Knuth. The Art of Computer Programming: Sorting and Searching. 2nd ed. Vol. 3. 1998.
[10]
I. Müller et al. "Cache-Efficient Aggregation: Hashing Is Sorting". In: SIGMOD. 2015, pp. 1123--1136.
[11]
NVIDIA. Quadro RTX 8000 Data Sheet. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/quadro-product-literature/quadro-rtx-8000-us-nvidia-946977-r1-web.pdf (visited on: 2019/03/26). 2019.
[12]
NVIDIA. Tuning CUDA Applications for Maxwell. https://docs.nvidia.com/cuda/maxwell-tuning-guide/ (visited on: 2019/03/26). 2017.
[13]
J. Paul, J. He, and B. He. "GPL: A GPU-based Pipelined Query Processing Engine". In: SIGMOD. 2016, pp. 1935--1950.
[14]
H. Pirk et al. "Voodoo - a Vector Algebra for Portable Data-base Performance on Modern Hardware". In: PVLDB 9.14 (Oct. 2016), pp. 1707--1718.
[15]
B. Răducanu, P. Boncz, and M. Zukowski. "Micro Adaptivity in Vectorwise". In: SIGMOD. 2013, pp. 1231--1242.
[16]
S. Richter, V. Alvarez, and J. Dittrich. "A Seven-dimensional Analysis of Hashing Methods and Its Implications on Query Processing". In: PVLDB 9.3 (Nov. 2015), pp. 96--107.
[17]
V. Rosenfeld et al. "The Operator Variant Selection Problem on Heterogeneous Hardware". In: ADMS@VLDB. 2015, pp. 1--12.
[18]
J. E. Stone, D. Gohara, and G. Shi. "OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems". In: Computing in Science & Engineering 12.3 (2010), pp. 66--73.
[19]
Y. Ye, K. A. Ross, and N. Vesdapunt. "Scalable Aggregation on Multicore Processors". In: DaMoN. 2011, pp. 1--9.
[20]
Y. Yuan, R. Lee, and X. Zhang. "The Yin and Yang of Processing Data Warehousing Queries on GPU Devices". In: PVLDB 6.10 (Aug. 2013), pp. 817--828.
[21]
S. Zeuch, H. Pirk, and J.-C. Freytag. "Non-invasive Progressive Optimization for In-memory Databases". In: PVLDB 9.14 (Oct. 2016), pp. 1659--1670.

Cited By

View all
  • (2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
  • (2023)A Case for Graphics-Driven Query ProcessingProceedings of the VLDB Endowment10.14778/3603581.360359016:10(2499-2511)Online publication date: 1-Jun-2023
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DaMoN'19: Proceedings of the 15th International Workshop on Data Management on New Hardware
July 2019
150 pages
ISBN:9781450368018
DOI:10.1145/3329785
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SIGMOD/PODS '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
  • (2023)A Case for Graphics-Driven Query ProcessingProceedings of the VLDB Endowment10.14778/3603581.360359016:10(2499-2511)Online publication date: 1-Jun-2023
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • (2022)Query Processing on Heterogeneous CPU/GPU SystemsACM Computing Surveys10.1145/348512655:1(1-38)Online publication date: 17-Jan-2022
  • (2022)A Collaborative Grouping Aggregation Query Scheme on Heterogeneous Computing Systems2022 7th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA)10.1109/ICCCBDA55098.2022.9778888(53-61)Online publication date: 22-Apr-2022
  • (2022)Accelerating Group-By and Aggregation on Heterogeneous CPU-GPU PlatformsAdvances in Natural Computation, Fuzzy Systems and Knowledge Discovery10.1007/978-3-030-89698-0_100(980-990)Online publication date: 4-Jan-2022
  • (2021)AgoraACM SIGMOD Record10.1145/3456859.345686149:4(6-11)Online publication date: 10-Mar-2021
  • (2020)Pump Up the VolumeProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389705(1633-1649)Online publication date: 11-Jun-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media