research-article

Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMS

Authors:

Bobbi W. Yogatama,

Xiangyao YuAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 15, Issue 11

Pages 2491 - 2503

https://doi.org/10.14778/3551793.3551809

Published: 01 July 2022 Publication History

Abstract

There has been a growing interest in using GPU to accelerate data analytics due to its massive parallelism and high memory bandwidth. The main constraint of using GPU for data analytics is the limited capacity of GPU memory.

Heterogeneous CPU-GPU query execution is a compelling approach to mitigate the limited GPU memory capacity and PCIe bandwidth. However, the design space of heterogeneous CPU-GPU query execution has not been fully explored. We aim to improve state-of-the-art CPU-GPU data analytics engine by optimizing data placement and heterogeneous query execution. First, we introduce a semantic-aware fine-grained caching policy which takes into account various aspects of the workload such as query semantics, data correlation, and query frequency when determining data placement between CPU and GPU. Second, we introduce a heterogeneous query executor which can fully exploit data in both CPU and GPU and coordinate query execution at a fine granularity. We integrate both solutions in Mordred, our novel hybrid CPU-GPU data analytics engine.

Evaluation on the Star Schema Benchmark shows that the semantic-aware caching policy can outperform the best traditional caching policy by up to 3x. Compared to existing GPU DBMSs, Mordred can outperform by an order of magnitude.

References

[1]

[n.d.]. BlazingSQL. https://blazingsql.com. Accessed 15-May-2022.

[2]

[n.d.]. CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. Accessed 15-May-2022.

[3]

[n.d.]. CXL. https://www.computeexpresslink.org/. Accessed 15-May-2022.

[4]

[n.d.]. HIP Programming Guide. https://github.com/ROCm-Developer-Tools/HIP. Accessed 15-May-2022.

[5]

[n.d.]. Kinetica. https://kinetica.com/. Accessed 15-May-2022.

[6]

[n.d.]. NVIDIA A100 Tensor Core GPU. https://www.nvidia.com/en-us/data-center/a100/. Accessed 15-May-2022.

[7]

[n.d.]. NVLINK. https://www.nvidia.com/en-us/data-center/nvlink/. Accessed 15-May-2022.

[8]

[n.d.]. OmniSci. https://omnisci.com. Accessed 15-May-2022.

[9]

[n.d.]. Opencl. https://www.khronos.org/opencl/. Accessed 15-May-2022.

[10]

[n.d.]. Parquet Encoding Format. https://github.com/apache/parquet-format/blob/master/Encodings.md. Accessed 15-May-2022.

[11]

[n.d.]. RAPIDS. https://rapids.ai. Accessed 15-May-2022.

[12]

[n.d.]. The RAPIDS Accelerator for Apache Spark. https://nvidia.github.io/spark-rapids/. Accessed 15-May-2022.

[13]

Daniel J. Abadi, Daniel S. Myers, David J. DeWitt, and Samuel R. Madden. 2007. Materialization Strategies in a Column-Oriented DBMS. In 2007 IEEE 23rd International Conference on Data Engineering. 466--475.

[14]

Peter Alexander Boncz et al. 2002. Monet: A next-generation DBMS kernel for query-intensive applications. Universiteit van Amsterdam [Host].

[15]

Peter A Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Cidr, Vol. 5. 225--237.

[16]

Sebastian Breí, Felix Beier, Hannes Rauhe, Kai-Uwe Sattler, Eike Schallehn, and Gunter Saake. 2013. Efficient Co-Processor Utilization in Database Query Processing. Inf. Syst. 38, 8 (nov 2013), 1084--1096.

Digital Library

[17]

Sebastian Breß. 2014. The Design and Implementation of CoGaDB: A Column-oriented GPU-accelerated DBMS. Datenbank-Spektrum 14 (2014), 199--209.

[18]

Sebastian Breß, Felix Beier, Hannes Rauhe, Kai-Uwe Sattler, Eike Schallehn, and Gunter Saake. 2013. Efficient co-processor utilization in database query processing. Inf. Syst. 38 (2013), 1084--1096.

Digital Library

[19]

Sebastian Breß, Henning Funke, and Jens Teubner. 2016. Robust Query Processing in Co-Processor-Accelerated Databases. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 1891--1906.

Digital Library

[20]

Sebastian Breß, Ingolf Geist, Eike Schallehn, Maik Mory, and Gunter Saake. 2012. A framework for cost based optimization of hybrid CPU/GPU query plans in database systems. Control and Cybernetics 41 (2012).

[21]

Sebastian Breß, Max Heimel, Michael Saecker, Bastian Köcher, Volker Markl, and Gunter Saake. 2014. Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware. Proc. VLDB Endow. 7 (2014), 1609--1612.

Digital Library

[22]

Sebastian Breß, Siba Mohammad, and Eike Schallehn. 2012. Self-Tuning Distribution of DB-Operations on Hybrid CPU/GPU Platforms.

[23]

Sebastian Breß and Gunter Saake. 2013. Why It is Time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS. Proc. VLDB Endow. 6, 12 (aug 2013), 1398--1403.

Digital Library

[24]

Sebastian Breß, Norbert Siegmund, Max Heimel, Michael Saecker, Tobias Lauer, Ladjel Bellatreche, and Gunter Saake. 2014. Load-aware inter-co-processor parallelism in database query processing. Data Knowl. Eng. 93 (2014), 60--79.

Digital Library

[25]

Sebastian Breß, Felix Beier, Hannes Rauhe, Eike Schallehn, Kai-Uwe Sattler, and Gunter Saake. 2012. Automatic Selection of Processing Units for Coprocessing in Databases, Vol. 7503. 57--70.

Digital Library

[26]

Sebastian Breß, Norbert Siegmund, Ladjel Bellatreche, and Gunter Saake. 2013. An Operator-Stream-Based Scheduling Engine for Effective GPU Coprocessing, Vol. 8133. 288--301.

Digital Library

[27]

Periklis Chrysogelos, Manos Karpathiotakis, Raja Appuswamy, and Anastasia Ailamaki. 2019. HetExchange: Encapsulating Heterogeneous CPU-GPU Parallelism in JIT Compiled Engines. Proc. VLDB Endow. 12, 5 (Jan. 2019), 544--556.

Digital Library

[28]

Periklis Chrysogelos, Panagiotis Sioulas, and Anastasia Ailamaki. 2019. Hardware-conscious Query Processing in GPU-accelerated Analytical Engines. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13--16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p127-chrysogelos-cidr19.pdf

[29]

Shuhao Zhang et.al. 2013. OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures. Proc. VLDB Endow. 6, 12 (aug 2013), 1374--1377.

Digital Library

[30]

Henning Funke, Sebastian Breß, Stefan Noll, Volker Markl, and Jens Teubner. 2018. Pipelined query processing in coprocessor environments. In Proceedings of the 2018 International Conference on Management of Data. ACM, 1603--1618.

Digital Library

[31]

Naga Govindaraju et al. 2006. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD.

[32]

Jim Gray, Prakash Sundaresan, Susanne Englert, Ken Baclawski, and Peter J. Weinberger. 1994. Quickly Generating Billion-Record Synthetic Databases. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (Minneapolis, Minnesota, USA) (SIGMOD '94). Association for Computing Machinery, New York, NY, USA, 243--252.

Digital Library

[33]

Bingsheng He, Mian Lu, Ke Yang, Rui Fang, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. 2009. Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst. 34, 4, Article 21 (dec 2009), 39 pages.

Digital Library

[34]

Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational joins on graphics processors. In SIGMOD.

[35]

Jiong He, Mian Lu, and Bingsheng He. 2013. Revisiting co-processing for hash joins on the coupled cpu-gpu architecture. PVLDB (2013).

[36]

Jiong He, Shuhao Zhang, and Bingsheng He. 2014. In-Cache Query Co-Processing on Coupled CPU-GPU Architectures. Proc. VLDB Endow. 8, 4 (dec 2014), 329--340.

Digital Library

[37]

Max Heimel, Michael Saecker, Holger Pirk, Stefan Manegold, and Volker Markl. 2013. Hardware-oblivious parallelism for in-memory column-stores. PVLDB (2013).

[38]

Tim Kaldewey, Guy Lohman, Rene Mueller, and Peter Volk. 2012. GPU join processing revisited. In DaMoN.

[39]

Tomas Karnagel, Dirk Habich, and Wolfgang Lehner. 2017. Adaptive Work Placement for Query Processing on Heterogeneous Computing Resources. Proc. VLDB Endow. 10, 7 (mar 2017), 733--744.

Digital Library

[40]

Jing Li, Hung-Wei Tseng, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2016. Hippogriffdb: Balancing I/O and GPU bandwidth in big data analytics. Proceedings of the VLDB Endowment 9, 14 (2016), 1647--1658.

Digital Library

[41]

Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2020. Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1633--1649.

Digital Library

[42]

Sina Meraji, Berni Schiefer, Lan Pham, Lee Chu, Peter Kokosielis, Adam Storm, Wayne Young, Chang Ge, Geoffrey Ng, and Kajan Kanagaratnam. 2016. Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 1951--1960.

Digital Library

[43]

Patrick O'Neil, Elizabeth O'Neil, Xuedong Chen, and Stephen Revilak. 2009. The star schema benchmark and augmented fact table indexing. In Technology Conference on Performance Evaluation and Benchmarking. Springer, 237--252.

Digital Library

[44]

Johns Paul, Shengliang Lu, Bingsheng He, and Chiew Tong Lau. 2021. MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures. Association for Computing Machinery, New York, NY, USA, 1413--1425.

Digital Library

[45]

Ran Rui, Hao Li, and Yi-Cheng Tu. 2020. Efficient Join Algorithms for Large Database Tables in a Multi-GPU Environment. Proc. VLDB Endow. 14, 4 (Dec. 2020), 708--720.

Digital Library

[46]

Ran Rui and Yi-Cheng Tu. 2017. Fast equi-join algorithms on gpus: Design and implementation. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, 17.

Digital Library

[47]

Anil Shanbhag, Xiangyao Yu, and Samuel Madden. 2020. A Study of the Fundamental Performance Charecteristics of GPUs and CPUs for Database Analytics. In Proceedings of the 2020 International Conference on Management of Data. ACM.

[48]

Panagiotis Sioulas, Periklis Chrysogelos, Manos Karpathiotakis, Raja Appuswamy, and Anastasia Ailamaki. 2019. Hardware-conscious Hash-Joins on GPUs. Technical Report.

[49]

Evangelia A Sitaridi and Kenneth A Ross. 2013. Optimizing select conditions on GPUs. In Proceedings of the Ninth International Workshop on Data Management on New Hardware. ACM, 4.

Digital Library

[50]

Elias Stehle and Hans-Arno Jacobsen. 2017. A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs. In SIGMOD. ACM.

[51]

Kaibo Wang, Kai Zhang, Yuan Yuan, Siyuan Ma, Rubao Lee, Xiaoning Ding, and Xiaodong Zhang. 2014. Concurrent analytical query processing with GPUs. Proceedings of the VLDB Endowment 7, 11 (2014), 1011--1022.

Digital Library

[52]

Haicheng Wu, Gregory Diamos, Srihari Cadambi, and Sudhakar Yalamanchili. 2012. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE.

Digital Library

[53]

Makoto Yabuta, Anh Nguyen, Shinpei Kato, Masato Edahiro, and Hideyuki Kawashima. 2017. Relational joins on GPUs: A closer look. IEEE Transactions on Parallel and Distributed Systems 28, 9 (2017), 2663--2673.

Digital Library

[54]

Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of processing data warehousing queries on GPU devices. PVLDB (2013).

[55]

Kai Zhang, Feng Chen, Xiaoning Ding, Yin Huai, Rubao Lee, Tian Luo, Kaibo Wang, Yuan Yuan, and Xiaodong Zhang. 2015. Hetero-DB: Next Generation High-Performance Database Systems by Best Utilizing Heterogeneous Computing and Storage Resources. Journal of Computer Science and Technology 30 (2015).

Cited By

Deng YYan MTang B(2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.14778/3659437.3659443
Boeschen NZiegler TBinnig C(2024)GOLAP: A GPU-in-Data-Path Architecture for High-Speed OLAPProceedings of the ACM on Management of Data10.1145/36988122:6(1-26)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698812
Deng YChen SHong ZTang B(2024)How Does Software Prefetching Work on GPU Query Processing?Proceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663445(1-9)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663445
Show More Cited By

Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMS
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
CLUSTER '10: Proceedings of the 2010 IEEE International Conference on Cluster Computing

In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is ...
Heterogeneous concurrent execution of Monte Carlo photon transport on CPU, GPU and MIC
IA³ '14: Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms

In this paper, a new level of heterogeneous concurrent execution of Monte Carlo photon transport is presented. ARCHER, an application for computing radiation dosimetry for CT imaging involving whole-body patient phantoms has been extended to execute on ...
Exploration of CPU/GPU co-execution: from the perspective of performance, energy, and temperature
RACS '11: Proceedings of the 2011 ACM Symposium on Research in Applied Computation

In recent computing systems, CPUs have encountered the situations in which they cannot meet the increasing throughput demands. To overcome the limits of CPUs in processing heavy tasks, especially for computer graphics, GPUs have been widely used. ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 15, Issue 11

July 2022

980 pages

ISSN:2150-8097

Editors:
Fatma Özcan
Google
,
Juliana Freire
New York University
,
Xuemin Lin
University of New South Wales

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2022

Published in PVLDB Volume 15, Issue 11

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
642
Total Downloads

Downloads (Last 12 months)295
Downloads (Last 6 weeks)6

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Deng YYan MTang B(2024)Accelerating Merkle Patricia Trie with GPUProceedings of the VLDB Endowment10.14778/3659437.365944317:8(1856-1869)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.14778/3659437.3659443
Boeschen NZiegler TBinnig C(2024)GOLAP: A GPU-in-Data-Path Architecture for High-Speed OLAPProceedings of the ACM on Management of Data10.1145/36988122:6(1-26)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698812
Deng YChen SHong ZTang B(2024)How Does Software Prefetching Work on GPU Query Processing?Proceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663445(1-9)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663445
Kroviakov AKurapov PAnneser CGiceva J(2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663441
Zeng XHui YShen JPavlo AMcKinney WZhang H(2023)An Empirical Evaluation of Columnar Storage FormatsProceedings of the VLDB Endowment10.14778/3626292.362629817:2(148-161)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.14778/3626292.3626298
Huang ZDamalapati PSen RWu E(2023)Random Forests over normalized data in CPU-GPU DBMSesProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595318(98-101)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595318
Yogatama BMiller BWang YMarkall GHemstad JKimball GYu X(2023)Accelerating User-Defined Aggregate Functions (UDAF) with Block-wide Execution and JIT Compilation on GPUsProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595307(19-26)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595307

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents