research-article

Performance Analysis and Automatic Tuning of Hash Aggregation on GPUs

Authors:

Viktor Rosenfeld,

Sebastian Breß,

Volker MarklAuthors Info & Claims

DaMoN'19: Proceedings of the 15th International Workshop on Data Management on New Hardware

Article No.: 8, Pages 1 - 11

https://doi.org/10.1145/3329785.3329922

Published: 01 July 2019 Publication History

Abstract

Hash aggregation is an important data processing primitive which can be significantly accelerated by modern graphics processors (GPUs). Previous work derived heuristics for GPU-accelerated hash aggregation from the study of a particular GPU. In this paper, we examine the influence of different execution parameters on GPU-accelerated hash aggregation on four NVIDIA and two AMD GPUs based on six different microarchitectures. While we are able to replicate some of the previous results, our main finding is that optimal execution parameters are highly GPU-dependent. Most importantly, execution parameters optimized for a specific GPU are up to 21x slower on other GPUs. Given this hardware dependency, we present an algorithm to optimize execution parameters at runtime. On average, our algorithm converges on a result in less than 1% of the time required for a full evaluation of the search space. In this time, it finds execution parameters that are at most 1% slower than the optimum in 90% of our experiments. In the worst case, our algorithm finds execution parameters that are at most 1.29x slower than the optimum.

References

[1]

S. Breß, H. Funke, and J. Teubner. "Robust Query Processing in Co-Processor-accelerated Databases". In: SIGMOD. 2016, pp. 1891--1906.

Digital Library

[2]

S. Breß et al. "Generating custom code for efficient query execution on heterogeneous processors". In: VLDBJ 27.6 (Dec. 2018), pp. 797--822.

Digital Library

[3]

J. Cieslewicz and K. A. Ross. "Adaptive Aggregation on Chip Multiprocessors". In: VLDB. 2007, pp. 339--350.

Digital Library

[4]

M. J. Freitag and T. Neumann. "Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates". In: CIDR. 2019.

[5]

H. Funke et al. "Pipelined Query Processing in Coprocessor Environments". In: SIGMOD. 2018, pp. 1603--1618.

Digital Library

[6]

B. He et al. "Relational Query Coprocessing on Graphics Processors". In: TODS 34.4 (2009), 21:1--21:39.

Digital Library

[7]

M. Heimel et al. "Hardware-Oblivious Parallelism for In-Memory Column-Stores". In: PVLDB 6.9 (2013), pp. 709--720.

Digital Library

[8]

T. Karnagel, R. Mueller, and G. M. Lohman. "Optimizing GPU-accelerated Group-By and Aggregation". In: ADMS@VLDB. 2015, pp. 13--24.

[9]

D. E. Knuth. The Art of Computer Programming: Sorting and Searching. 2nd ed. Vol. 3. 1998.

Digital Library

[10]

I. Müller et al. "Cache-Efficient Aggregation: Hashing Is Sorting". In: SIGMOD. 2015, pp. 1123--1136.

Digital Library

[11]

NVIDIA. Quadro RTX 8000 Data Sheet. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/quadro-product-literature/quadro-rtx-8000-us-nvidia-946977-r1-web.pdf (visited on: 2019/03/26). 2019.

[12]

NVIDIA. Tuning CUDA Applications for Maxwell. https://docs.nvidia.com/cuda/maxwell-tuning-guide/ (visited on: 2019/03/26). 2017.

[13]

J. Paul, J. He, and B. He. "GPL: A GPU-based Pipelined Query Processing Engine". In: SIGMOD. 2016, pp. 1935--1950.

Digital Library

[14]

H. Pirk et al. "Voodoo - a Vector Algebra for Portable Data-base Performance on Modern Hardware". In: PVLDB 9.14 (Oct. 2016), pp. 1707--1718.

Digital Library

[15]

B. Răducanu, P. Boncz, and M. Zukowski. "Micro Adaptivity in Vectorwise". In: SIGMOD. 2013, pp. 1231--1242.

[16]

S. Richter, V. Alvarez, and J. Dittrich. "A Seven-dimensional Analysis of Hashing Methods and Its Implications on Query Processing". In: PVLDB 9.3 (Nov. 2015), pp. 96--107.

Digital Library

[17]

V. Rosenfeld et al. "The Operator Variant Selection Problem on Heterogeneous Hardware". In: ADMS@VLDB. 2015, pp. 1--12.

[18]

J. E. Stone, D. Gohara, and G. Shi. "OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems". In: Computing in Science & Engineering 12.3 (2010), pp. 66--73.

[19]

Y. Ye, K. A. Ross, and N. Vesdapunt. "Scalable Aggregation on Multicore Processors". In: DaMoN. 2011, pp. 1--9.

Digital Library

[20]

Y. Yuan, R. Lee, and X. Zhang. "The Yin and Yang of Processing Data Warehousing Queries on GPU Devices". In: PVLDB 6.10 (Aug. 2013), pp. 817--828.

Digital Library

[21]

S. Zeuch, H. Pirk, and J.-C. Freytag. "Non-invasive Progressive Optimization for In-memory Databases". In: PVLDB 9.14 (Oct. 2016), pp. 1659--1670.

Digital Library

Cited By

Deng YYan MTang B(2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
https://doi.org/10.14778/3659437.3659443
Doraiswamy HKalagi VRamachandra KHaritsa J(2023)A Case for Graphics-Driven Query ProcessingProceedings of the VLDB Endowment10.14778/3603581.360359016:10(2499-2511)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.14778/3603581.3603590
Lutz CBreß SZeuch SRabl TMarkl VIves ZBonifati AEl Abbadi A(2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517911
Show More Cited By

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
Automatic FFT Performance Tuning on OpenCL GPUs
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems

Many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, have been revolutionized by Fourier methods. The fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (...
Automatic Performance Tuning of Stencil Computations on GPUs
ICPP '15: Proceedings of the 2015 44th International Conference on Parallel Processing (ICPP)

We consider automatic performance tuning of stencil computations on Graphics Processing Units. We present a strategy that uses machine learning to determine the best way to use memory followed by a heuristic that divides the remaining optimizations into ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DaMoN'19: Proceedings of the 15th International Workshop on Data Management on New Hardware

July 2019

150 pages

ISBN:9781450368018

DOI:10.1145/3329785

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Horizon 2020
Bundesministerium für Bildung und Forschung
Deutsche Forschungsgemeinschaft

Conference

SIGMOD/PODS '19

Sponsor:

SIGMOD

SIGMOD/PODS '19: International Conference on Management of Data

July 1, 2019

Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
200
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)1

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Deng YYan MTang B(2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 31-May-2024
https://doi.org/10.14778/3659437.3659443
Doraiswamy HKalagi VRamachandra KHaritsa J(2023)A Case for Graphics-Driven Query ProcessingProceedings of the VLDB Endowment10.14778/3603581.360359016:10(2499-2511)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.14778/3603581.3603590
Lutz CBreß SZeuch SRabl TMarkl VIves ZBonifati AEl Abbadi A(2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517911
Rosenfeld VBreß SMarkl V(2022)Query Processing on Heterogeneous CPU/GPU SystemsACM Computing Surveys10.1145/348512655:1(1-38)Online publication date: 17-Jan-2022
https://dl.acm.org/doi/10.1145/3485126
Zhou KFeng X(2022)A Collaborative Grouping Aggregation Query Scheme on Heterogeneous Computing Systems2022 7th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA)10.1109/ICCCBDA55098.2022.9778888(53-61)Online publication date: 22-Apr-2022
https://doi.org/10.1109/ICCCBDA55098.2022.9778888
Luan HFu Y(2022)Accelerating Group-By and Aggregation on Heterogeneous CPU-GPU PlatformsAdvances in Natural Computation, Fuzzy Systems and Knowledge Discovery10.1007/978-3-030-89698-0_100(980-990)Online publication date: 4-Jan-2022
https://doi.org/10.1007/978-3-030-89698-0_100
Traub JKaoudi ZQuiané-Ruiz JMarkl V(2021)AgoraACM SIGMOD Record10.1145/3456859.345686149:4(6-11)Online publication date: 10-Mar-2021
https://dl.acm.org/doi/10.1145/3456859.3456861
Lutz CBreß SZeuch SRabl TMarkl VMaier DPottinger RDoan ATan WAlawini ANgo H(2020)Pump Up the VolumeProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389705(1633-1649)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3389705

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten