Abstract
Nowadays, the performance of processors is primarily bound by a fixed energy budget, the power wall. This forces hardware vendors to optimize processors for specific tasks, which leads to an increasingly heterogeneous hardware landscape. Although efficient algorithms for modern processors such as GPUs are heavily investigated, we also need to prepare the database optimizer to handle computations on heterogeneous processors. GPUs are an interesting base for case studies, because they already offer many difficulties we will face tomorrow.
In this paper, we present CoGaDB, a main-memory DBMS with built-in GPU acceleration, which is optimized for OLAP workloads. CoGaDB uses the self-tuning optimizer framework HyPE to build a hardware-oblivious optimizer, which learns cost models for database operators and efficiently distributes a workload on available processors. Furthermore, CoGaDB implements efficient algorithms on CPU and GPU and efficiently supports star joins. We show in this paper, how these novel techniques interact with each other in a single system. Our evaluation shows that CoGaDB quickly adapts to the underlying hardware by increasing the accuracy of its cost models at runtime.
Similar content being viewed by others
Notes
Many main-memory OLTP systems use a row-oriented data layout.
We are aware of Accelerated Processing Units (APUs) from AMD, which integrate a CPU and a GPU on a single chip. However, APUs increase only the raw processing power of the machine, not memory bandwidth.
Hyper-threading was enabled during our experiments.
References
Abadi D, Myers D, DeWitt D, Madden S. (2007) Materialization strategies in a column-oriented DBMS. In: ICDE, IEEE, pp 466–475
Abadi DJ, Madden SR, Hachem N. (2008) Column-stores vs. row-stores: how different are they really? In: SIGMOD, ACM, pp 967–980
Abadi D, Boncz P, Harizopoulos S, Idreos S, Madden S (2013) The design and implementation of modern column-oriented database systems. Foundations Trends in Databases 5(3):197–280
Bakkum P, Chakradhar S (2012) Efficient data management for GPU databases. http://pbbakkum.com/virginian/paper.pdf
Balkesen C, Alonso G, Teubner J, Özsu MT (2013) Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1):85–96
Balkesen C, Teubner J, Alonso G, Özsu MT (2013) Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: ICDE, pp 362–373
Boncz PA, Zukowski M, Nes N (2005) MonetDB/X100: hyper-pipelining query execution. In: CIDR, pp 225–237
Borkar S, Chien AA (2011) The future of microprocessors. Commun ACM 54(5):67–77
Breß S, Geist I, Schallehn E, Mory M, Saake G (2012) A framework for cost based optimization of hybrid CPU/GPU query plans in database systems. Control Cybernetics 41(4):715–742
Breß S, Beier F, Rauhe H, Sattler K-U, Schallehn E, Saake G (2013) Efficient co-processor utilization in database query processing. Information Systems 38(8):1084–1096
Breß S, Heimel M, Saecker M, Köcher B, Markl V, Saake G (2014) Ocelot/HyPE: optimized data processing on heterogeneous hardware. PVLDB 7(13)
Breß S, Siegmund N, Heimel M, Saecker M, Lauer T, Bellatreche L, Saake G (2014) Load-aware inter-co-processor parallelism in database query processing. Data & Knowledge Engineering. doi:10.1016/j.datak.2014.07.003
Broneske D, Breß S, Heimel M, Saake G (2014) Toward hardware-sensitive database operations. In: EDBT, OpenProceedings.org, pp 229–234
Broneske D, Breß S, Saake G (2014) Database scan variants on modern CPUs: a performance study. In: IMDM@VLDB
Gray J et al (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min Knowl Disc 1(1):29–53
Gregg C, Hazelwood K (2011) Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In: ISPASS, IEEE, pp 134–144
He B, Lu M, Yang K, Fang R, Govindaraju NK, Luo Q, Sander PV (2009) Relational query co-processing on graphics processors. ACM Trans Database Syst 34:21
Heimel M, Markl V (2012) A first step towards GPU-assisted query optimization. In: ADMS, pp 33–44
Heimel M, Saecker M, Pirk H, Manegold S, Markl V (2013) Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6(9):709–720
Heimel M, Haase F, Meinke M, Breß S, Saecker M, Markl V (2014) Demonstrating self-learning algorithm adaptivity in a hardware-oblivious database engine. In: EDBT, OpenProceedings.org, pp 616–619
Idreos S, Groffen F, Nes N, Manegold S, Mullender KS, Kersten ML (2012) MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng Bull 35(1):40–45
Johnson R, Raman V, Sidle R, Swart G (2008) Row-wise parallel predicate evaluation. PVLDB 1(1):622–634
Leis V, Boncz P, Kemper A, Neumann T (2014) Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD, ACM, pp 743–754
Manegold S, Boncz PA, Kersten ML (2000) Optimizing database architecture for the new bottleneck: memory access. VLDB J 9(3):231–246
Manegold S, Boncz P, Kersten ML (2002) Generic database cost models for hierarchical memory systems. In: PVLDB, VLDB Endowment, pp 191–202
Manegold S, Boncz P, Nes N, Kersten M (2004) Cache-conscious radix-decluster projections. VLDB, VLDB Endowment, pp 684–695
Markl V, Raman V, Simmen D, Lohman G, Pirahesh H, Cilimdzic M (2004) Robust query processing through progressive optimization. In: SIGMOD, ACM, pp 659–670
Mühlbauer T, Rödiger W, Seilbeck R, Kemper A, Neumann T (2014) Heterogeneity-conscious parallel query execution: getting a better mileage while driving faster! In: DaMoN, ACM, pp 2:1–2:10
Neumann T (2011) Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9):539–550
NVIDIA. NVIDIA CUDA C Programming Guide. (2014) http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. pp 31–36, Version 6.0. Accessed 18 May 2014
O’Neil P, Graefe G (1995) Multi-table joins through bitmapped join indices. SIGMOD Rec 24(3):8–11
O’Neil P, O’Neil EJ, Chen X (2009) The star schema benchmark (SSB), Revision 3. http://www.cs.umb.edu/~poneil/StarSchemaB.PDF
Raman V, Swart G, Qiao L, Reiss F, Dialani V, Kossmann D, Narang I, Sidle R (2008) Constant-time query processing. In: ICDE, IEEE, pp 60–69
Raman V et al (2013) DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11):1080–1091
Stillger M, Lohman GM, Markl V, Kandil M (2001) LEO - DB2`s learning optimizer. In: VLDB, Morgan Kaufmann Publishers Inc., pp 19–28
Wang K, Zhang K, Yuan Y, Ma S, Lee R, Ding X, Zhang X (2014) Concurrent analytical query processing with GPUs. PVLDB 7(11):1011–1022
Ye Y, Ross KA, Vesdapunt N (2011) Scalable aggregation on multicore processors. In: DaMoN, ACM, pp 1–9
Yuan Y, Lee R, Zhang X (2013) The yin and yang of processing data warehousing queries on GPU devices. PVLDB 6(10):817–828
Zhang S, He J, He B, Lu M (2013) OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures. PVLDB 6(12):1374–1377
Zhou J, Ross KA (2002) Implementing database operations using SIMD instructions. In: SIGMOD, ACM, pp 145–156
Acknowledgement
We thank Jens Teubner from TU Dortmund University and Theo Härder from University of Kaiserslautern for their helpful feedback.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Breß, S. The Design and Implementation of CoGaDB: A Column-oriented GPU-accelerated DBMS. Datenbank Spektrum 14, 199–209 (2014). https://doi.org/10.1007/s13222-014-0164-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-014-0164-z