research-article

Voodoo - a vector algebra for portable database performance on modern hardware

Editor: Surajit Chaudhuri Authors:

Sam MaddenAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 9, Issue 14

Pages 1707 - 1718

https://doi.org/10.14778/3007328.3007336

Published: 01 October 2016 Publication History

Abstract

In-memory databases require careful tuning and many engineering tricks to achieve good performance. Such database performance engineering is hard: a plethora of data and hardware-dependent optimization techniques form a design space that is difficult to navigate for a skilled engineer --- even more so for a query compiler. To facilitate performance-oriented design exploration and query plan compilation, we present Voodoo, a declarative intermediate algebra that abstracts the detailed architectural properties of the hardware, such as multi- or many-core architectures, caches and SIMD registers, without losing the ability to generate highly tuned code. Because it consists of a collection of declarative, vector-oriented operations, Voodoo is easier to reason about and tune than low-level C and related hardware-focused extensions (Intrinsics, OpenCL, CUDA, etc.). This enables our Voodoo compiler to produce (OpenCL) code that rivals and even outperforms the fastest state-of-the-art in memory databases for both GPUs and CPUs. In addition, Voodoo makes it possible to express techniques as diverse as cache-conscious processing, predication and vectorization (again on both GPUs and CPUs) with just a few lines of code. Central to our approach is a novel idea we termed control vectors, which allows a code generating frontend to expose parallelism to the Voodoo compiler in a abstract manner, enabling portable performance across hardware platforms.

We used Voodoo to build an alternative backend for MonetDB, a popular open-source in-memory database. Our backend allows MonetDB to perform at the same level as highly tuned in-memory databases, including HyPeR and Ocelot. We also demonstrate Voodoo's usefulness when investigating hardware conscious tuning techniques, assessing their performance on different queries, devices and data.

References

[1]

D. Abadi, D. Myers, D. DeWitt, and S. Madden. Materialization strategies in a column-oriented dbms. In ICDE 2007. IEEE, 2007.

[2]

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. Numerical linear algebra on emerging architectures: The plasma and magma projects. In Journal of Physics: Conference Series, volume 180, 2009.

[3]

C. Balkesen, J. Teubner, G. Alonso, and M. T. Ozsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. ETH Zurich, Tech. Rep, 2012.

[4]

S. Barrachina, M. Castillo, F. D. Igual, R. Mayo, and E. S. Quintana-Orti. Evaluation and tuning of the level 3 cublas for graphics processors. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on. IEEE, 2008.

[5]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. Journal of parallel and distributed computing, 37(1), 1996.

Digital Library

[6]

P. Boncz, T. Neumann, and O. Erling. Tpc-h analyzed: Hidden messages and lessons learned from an influential benchmark. In TPC-TC. Springer, 2013.

[7]

P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in monetdb. CACM, 12 2008.

Digital Library

[8]

T. Condie, D. Chu, J. M. Hellerstein, and P. Maniatis. Evita raced: metacompilation for declarative networks. PVLDB, 1(1), 2008.

Digital Library

[9]

A. Crotty, A. Galakatos, K. Dursun, T. Kraska, U. Cetintemel, and S. Zdoni. Tupleware:" big" data, big analytics, small clusters. In CIDR, 2015.

[10]

L. Dagum and R. Menon. Openmp: an industry standard api for shared-memory programming. IEEE computational science and engineering, 5(1), 1998.

Digital Library

[11]

C. Gregg and K. Hazelwood. Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In ISPASS '11. IEEE, 2011.

[12]

B. He, M. Lu, K. Yang, R. Fang, N. Govindaraju, Q. Luo, and P. Sander. Relational query coprocessing on graphics processors. TODS, 34(4):21, 2009.

Digital Library

[13]

M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. VLDB, 2013.

Digital Library

[14]

Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building efficient query engines in a high-level language. PVLDB, 7(10), 2014.

Digital Library

[15]

K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, 2010.

[16]

M. Liu, Z. G. Ives, and B. T. Loo. Enabling incremental query re-optimization. In SIGMOD, 2016.

Digital Library

[17]

J. Malcolm, P. Yalamanchili, C. McClanahan, V. Venugopalakrishnan, K. Patel, and J. Melonakos. Arrayfire: a gpu acceleration platform. In SPIE Defense, Security, and Sensing, 2012.

[18]

T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9), 2011.

Digital Library

[19]

C. J. Newburn, B. So, Z. Liu, M. McCool, A. Ghuloum, S. D. Toit, Z. G. Wang, Z. H. Du, Y. Chen, G. Wu, et al. Intel's array building blocks: A retargetable, dynamic compiler and embedded language. In CGO, 2011.

Digital Library

[20]

H. Nguyen. Gpu gems 3. Addison-Wesley Professional, 2007.

Digital Library

[21]

H. Pirk et al. Cpu and cache efficient management of memory-resident databases. In ICDE, 2013.

Digital Library

[22]

H. Pirk, S. Manegold, and M. L. Kersten. Waste not...efficient co-processing of relational data. In ICDE 2014, pages ---. IEEE, April 2014.

[23]

H. Pirk, O. Moll, M. Zaharia, and S. Madden. Voodoo - portable database performance on modern hardware. Technical report, MIT CSAIL, 2016.

[24]

H. Pirk, E. Petraki, S. Idreos, S. Manegold, and M. Kersten. Database cracking: fancy scan, not poor man's sort! In DaMoN. ACM, 2014.

Digital Library

[25]

O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking simd vectorization for in-memory databases. In SIGMOD 2015. ACM, 2015.

Digital Library

[26]

V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, et al. Db2 with blu acceleration: So much more than just a column store. PVLDB, 6(11), 2013.

Digital Library

[27]

J. Reinders. Intel threading building blocks: outfitting C++ for multi-core processor parallelism. " O'Reilly Media, Inc.", 2007.

Digital Library

[28]

K. A. Ross. Selection conditions in main memory. ACM Trans. Database Syst., 29(1), Mar. 2004.

Digital Library

[29]

C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly. Dandelion: a compiler and runtime for heterogeneous systems. In SOSP. ACM, 2013.

Digital Library

[30]

A. K. Sujeeth, K. J. Brown, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM TECS, 13:134, 2014.

Digital Library

[31]

E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu, Q. Wu, and Y. Wang. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi. Springer, 2014.

Digital Library

[32]

H. Wu, G. Diamos, J. Wang, S. Cadambi, S. Yalamanchili, and S. Chakradhar. Optimizing data warehousing applications for gpus using kernel fusion/fission. In IPDPSW. IEEE, 2012.

Digital Library

[33]

M. Zukowski, P. Boncz, N. Nes, and S. Héman. Monetdb/x100-a dbms in the cpu cache. IEEE Data Engineering Bulletin, 1001:17, 2005.

[34]

M. Zukowski, N. Nes, and P. Boncz. DSM vs. NSM: CPU Performance Tradeoffs in Block-oriented Query Processing. In DaMoN 08, 2008.

Digital Library

Cited By

Mohr-Daurat HTheodorakis GPirk H(2024)Hardware-Efficient Data Imputation through DBMS ExtensibilityProceedings of the VLDB Endowment10.14778/3681954.368201617:11(3497-3510)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682016
Grulich PLepping ANugroho DPandey VDel Monte BZeuch SMarkl V(2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654968
Jungmair MGiceva J(2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611539
Show More Cited By

Voodoo - a vector algebra for portable database performance on modern hardware
1. Information systems
  1. Data management systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

The popular and diverse hardware accelerator ecosystem makes apples-to-apples comparisons between platforms rather difficult. SPEC ACCEL tries to offer a yardstick to compare different accelerator hardware and software ecosystems. This paper uses this ...
Exploring Fine-Grained In-Memory Database Performance for Modern CPUs
Modern CPUs keep integrating more cores and large size cache, which is beneficial for in-memory databases to improve parallel processing power and cache locality. While state-of-the-art CPUs have diverse architectures and roadmaps such as large core count ...
Evaluation of a performance portable lattice Boltzmann code using OpenCL
IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014

With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 9, Issue 14

October 2016

96 pages

ISSN:2150-8097

Editor:
Surajit Chaudhuri
Microsoft Research

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2016

Published in PVLDB Volume 9, Issue 14

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

52
Total Citations
View Citations
522
Total Downloads

Downloads (Last 12 months)76
Downloads (Last 6 weeks)9

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mohr-Daurat HTheodorakis GPirk H(2024)Hardware-Efficient Data Imputation through DBMS ExtensibilityProceedings of the VLDB Endowment10.14778/3681954.368201617:11(3497-3510)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682016
Grulich PLepping ANugroho DPandey VDel Monte BZeuch SMarkl V(2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654968
Jungmair MGiceva J(2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611539
Gruber FBandle MEngelke ANeumann TGiceva J(2023)Bringing Compiling Databases to RISC ArchitecturesProceedings of the VLDB Endowment10.14778/3583140.358314216:6(1222-1234)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.14778/3583140.3583142
Baumstark AJibril MSattler K(2023)Adaptive query compilation in graph databasesDistributed and Parallel Databases10.1007/s10619-023-07430-441:3(359-386)Online publication date: 12-May-2023
https://dl.acm.org/doi/10.1007/s10619-023-07430-4
He DNakandala SBanda DSen RSaur KPark KCurino CCamacho-Rodríguez JKaranasos KInterlandi M(2022)Query processing on tensor computation runtimesProceedings of the VLDB Endowment10.14778/3551793.355183315:11(2811-2825)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551833
Jungmair MKohn AGiceva J(2022)Designing an open framework for query optimization and compilationProceedings of the VLDB Endowment10.14778/3551793.355180115:11(2389-2401)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551801
Suh YAn JTak BNa G(2022)A Comprehensive Empirical Study of Query Performance Across GPU DBMSesACM SIGMETRICS Performance Evaluation Review10.1145/3547353.352264450:1(51-52)Online publication date: 7-Jul-2022
https://dl.acm.org/doi/10.1145/3547353.3522644
Suh YAn JTak BNa G(2022)A Comprehensive Empirical Study of Query Performance Across GPU DBMSesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080246:1(1-29)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3508024
Suh YAn JTak BNa GManjunath DNair JCarlsson NCohen ERobert P(2022)A Comprehensive Empirical Study of Query Performance Across GPU DBMSesAbstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3489048.3522644(51-52)Online publication date: 6-Jun-2022
https://dl.acm.org/doi/10.1145/3489048.3522644
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents