research-article

VIP: A SIMD vectorized analytical query engine

Authors:

Orestis Polychroniou,

Kenneth A. RossAuthors Info & Claims

The VLDB Journal, Volume 29, Issue 6

Pages 1243 - 1261

https://doi.org/10.1007/s00778-020-00621-w

Published: 13 July 2020 Publication History

Abstract

Query execution engines for analytics are continuously adapting to the underlying hardware in order to maximize performance. Wider SIMD registers and more complex SIMD instruction sets are emerging in mainstream CPUs and new processor designs such as the many-core Intel Xeon Phi CPUs that rely on SIMD vectorization to achieve high performance per core while packing a greater number of smaller cores per chip. In the database literature, using SIMD to optimize stand-alone operators with key–rid pairs is common, yet the state-of-the-art query engines rely on compilation of tightly coupled operators where hand-optimized individual operators become impractical. In this article, we extend a state-of-the-art analytical query engine design by combining code generation and operator pipelining with SIMD vectorization and show that the SIMD speedup is diminished when execution is dominated by random memory accesses. To better utilize the hardware features, we introduce VIP, an analytical query engine designed and built bottom up from pre-compiled column-oriented data parallel sub-operators and implemented entirely in SIMD. In our evaluation using synthetic and TPC-H queries on a many-core CPU, we show that VIP outperforms hand-optimized query-specific code without incurring the runtime compilation overhead, and highlight the efficiency of VIP at utilizing the hardware features of many-core CPUs.

References

[1]

Abadi, D., Myers, D., DeWitt, D., Madden, S.: Materialization strategies in a column-oriented DBMS. In: ICDE, pp. 466–475 (2007)

[2]

Balkesen C, Alonso G, Teubner J, and Ozsu MT Multicore, main-memory joins: sort vs. hash revisited PVLDB 2013 7 1 85-96

[3]

Balkesen, C., Teubner, J., Alonso, G., Ozsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: ICDE, pp. 362–373 (2013)

[4]

Blanas, S., Li, Y., Patel, J.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: SIGMOD, pp. 37–48 (2011)

[5]

Boncz, P., Manegold, S., Kersten, M.: Database architecture optimized for the new bottleneck: memory access. In: VLDB, pp. 54–65 (1999)

[6]

Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR (2005)

[7]

Cheng, X., He, B., Du, X., Lau, C.T.: A study of main-memory hash joins on many-core processor: a case with intel knights landing architecture. In: CIKM, pp. 657–666 (2017)

[8]

Chhugani, J., Nguyen, A.D., Lee, V.W., Macy, W., Hagog, M., Chen, Y.-K., Baransi, A., Kumar, S., Dubey, P.: Efficient implementation of sorting on multi-core SIMD CPU architecture. In: VLDB, pp. 1313–1324 (2008)

[9]

Costea, A., Ionescu, A., Răducanu, B., Switakowski, M., Bârca, C., Sompolski, J., Luszczak, A., Szafrański, M., de Nijs, G., Boncz, P.: VectorH: taking SQL-on-Hadoop to the next level. In: SIGMOD, pp. 1105–1117 (2016)

[10]

Dageville, B., Cruanes, T., Zukowski, M., Antonov, V., Avanes, A., Bock, J., Claybaugh, J., Engovatov, D., Hentschel, M., Huang, J., Lee, A.W., Motivala, A., Munir, A.Q., Pelley, S., Povinec, P., Rahn, G., Triantafyllis, S., Unterbrunner, P.: The snowflake elastic data warehouse. In: SIGMOD, pp. 215–226 (2016)

[11]

Fang Z, Zheng B, and Weng C Interleaved multi-vectorizing PVLDB 2019 13 3 226-238

[12]

Flajolet P and Martin GN Probabilistic counting algorithms for data base applications J. Comput. Syst. Sci. 1985 31 2 182-209

[13]

Fowler, G., Noll, L.C., Vo, K.-P., Eastlake, D.: The FNV non-cryptographic hash algorithm. Technical report (2017). http://www.ietf.org/internet-drafts/draft-eastlake-fnv-13.txt

[14]

Graefe G Volcano: an extensible and parallel query evaluation system TKDE 1994 6 1 120-135

[15]

Gupta, A., Agarwal, D., Tan, D., Kulesza, J., Pathak, R., Stefani, S., Srinivasan, V.: Amazon redshift and the case for simpler data warehouses. In: SIGMOD, pp. 1917–1923 (2015)

[16]

Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort: a new parallel sorting algorithm for multi-core SIMD processors. In: PACT, pp. 189–198 (2007)

[17]

Inoue H, Ohara M, and Taura K Faster set intersection with SIMD instructions by reducing branch mispredictions PVLDB 2014 8 3 293-304

[18]

Inoue H and Taura K SIMD- and cache-friendly algorithm for sorting an array of structures PVLDB 2015 8 11 1274-1285

[19]

Jha S, He B, Lu M, Cheng X, and Huynh HP Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach PVLDB 2015 8 6 642-653

[20]

Kim C, Kaldewey T, Lee VW, Sedlar E, Nguyen AD, Satish N, Chhugani J, Di Blas A, and Dubey P Sort vs. hash revisited: fast join implementation on modern multi-core CPUs PVLDB 2009 2 2 1378-1389

[21]

Krikellas, K., Viglas, S., Cintra, M.: Generating code for holistic query evaluation. In: ICDE, pp. 613–624 (2010)

[22]

Lang, H., Kipf, A., Passing, L., Boncz, P., Neumann, T., Kemper, A.: Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. In: DaMoN (2018)

[23]

Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD, pp. 311–326 (2016)

[24]

Lang H, Neumann T, Kemper A, and Boncz P Performance-optimal filtering: Bloom overtakes cuckoo at high throughput PVLDB 2019 12 5 502-515

[25]

Leis, V., Boncz, P., Kemper, A., Neumann, T.: Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD, pp. 743–754 (2014)

[26]

Lemire D et al. Decoding billions of integers per second through vectorization Softw. Pract. Exp. 2015 45 1 1-29

[27]

Li, Y., Patel, J.M.: Bitweaving: fast scans for main memory data processing. In: SIGMOD, pp. 289–300 (2013)

[28]

Li Y and Patel JM Widetable: an accelerator for analytical data processing PVLDB 2014 7 10 907-918

[29]

Manegold S, Boncz P, and Kersten M Optimizing database architecture for the new bottleneck: memory access J. VLDB 2000 9 3 231-246

[30]

Manegold, S., Boncz, P., Kersten, M.: What happens during a join? Dissecting CPU and memory optimization effects. In: VLDB, pp. 339–350 (2000)

[31]

Manegold S, Boncz P, and Kersten M Optimizing main-memory join on modern hardware TKDE 2002 14 4 709-730

[32]

Menon, P., Mowry, T.C., Pavlo, A.: Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last. In: PVLDB (2017)

[33]

Neumann T Efficiently compiling efficient query plans for modern hardware PVLDB 2011 4 9 539-550

[34]

Pagh R and Rodler FF Cuckoo hashing J. Algorithms 2004 51 2 122-144

[35]

Pirk H, Moll O, Zaharia M, and Madden S Voodoo—a vector algebra for portable database performance on modern hardware PVLDB 2016 9 14 1707-1718

[36]

Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: SIGMOD, pp. 1493–1508 (2015)

[37]

Polychroniou, O., Ross, K.A.: High throughput heavy hitter aggregation for modern SIMD processors. In: DaMoN (2013)

[38]

Polychroniou, O., Ross, K.A.: A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In: SIGMOD, pp. 755–766 (2014)

[39]

Polychroniou, O., Ross, K.A.: Vectorized Bloom filters for advanced SIMD processors. In: DaMoN (2014)

[40]

Polychroniou, O., Ross, K.A.: Efficient lightweight compression alongside fast scans. In: DaMoN (2015)

[41]

Polychroniou, O., Ross, K.A.: Towards practical vectorized analytical query engines. In: DaMoN (2019)

[42]

Raman V, Attaluri G, Barber R, Chainani N, Kalmuk D, KulandaiSamy V, Leenstra J, Lightstone S, Liu S, Lohman GM, Malkemus T, Mueller R, Pandis I, Schiefer B, Sharpe D, Sidle R, Storm A, and Zhang L DB2 with BLU acceleration: so much more than just a column store PVLDB 2013 6 11 1080-1091

[43]

Ross KA Selection conditions in main memory TODS 2004 29 1 132-161

[44]

Ross, K.A.: Efficient hash probes on modern processors. In: ICDE, pp. 1297–1301 (2007)

[45]

Roy, P., Teubner, J., Alonso, G.: Efficient frequent item counting in multi-core hardware. In: KDD, pp. 1451–1459 (2012)

[46]

Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: SIGMOD, pp. 351–362 (2010)

[47]

Schlegel, B., Karnagel, T., Kiefer, T., Lehner, W.: Scalable frequent itemset mining on many-core processors. In: DaMoN (2013)

[48]

Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: SIGMOD, pp. 1961–1976 (2016)

[49]

Sirin, U., Tözün, P., Porobic, D., Ailamaki, A.: Micro-architectural analysis of in-memory OLTP. In: SIGMOD, pp. 387–402 (2016)

[50]

Sitaridi, E., Polychroniou, O., Ross, K.A.: SIMD-accelerated regular expression matching. In: DaMoN (2016)

[51]

Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)

[52]

Ungethüm, A., Pietrzyk, J., Damme, P., Krause, A., Habich, D., Lehner, W., Focht, E.: Hardware-oblivious SIMD parallelism for in-memory column-stores. In: CIDR (2020)

[53]

Wassenberg, J., Sanders, P.: Engineering a multi core radix sort. In: EuroPar, pp. 160–169 (2011)

[54]

Willhalm T, Popovici N, Boshmaf Y, Plattner H, Zeier A, and Schaffner J SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units PVLDB 2009 2 1 385-394

[55]

Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: SIGMOD, pp. 145–156 (2002)

Cited By

Habich DPietrzyk JBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654694
Jungmair MGiceva J(2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611539
Li YLu JChandramouli B(2023)Selection Pushdown in Column Stores using Bit Manipulation InstructionsProceedings of the ACM on Management of Data10.1145/35893231:2(1-26)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589323
Show More Cited By

Index Terms

VIP: A SIMD vectorized analytical query engine
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Rethinking SIMD Vectorization for In-Memory Databases
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Analytical databases are continuously adapting to the underlying hardware in order to saturate all sources of parallelism. At the same time, hardware evolves in multiple directions to explore different trade-offs. The MIC architecture, one such example, ...
Boundary element quadrature schemes for multi- and many-core architectures

In the paper we study the performance of the regularized boundary element quadrature routines implemented in the BEM4I library developed by the authors. Apart from the results obtained on the classical multi-core architecture represented by the Intel ...
Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

SIMD vectorization has received significant attention in the past decade as an important method to accelerate scientific applications, media and embedded applications on SIMD architectures such as Intel® SSE, AVX, and IBM* AltiVec. However, most of the ...

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases

The VLDB Journal — The International Journal on Very Large Data Bases Volume 29, Issue 6

Nov 2020

324 pages

ISSN:1066-8888

Issue’s Table of Contents

© Springer-Verlag GmbH Germany, part of Springer Nature 2020.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 13 July 2020

Accepted: 22 June 2020

Revision received: 10 June 2020

Received: 27 January 2020

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Habich DPietrzyk JBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654694
Jungmair MGiceva J(2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611539
Li YLu JChandramouli B(2023)Selection Pushdown in Column Stores using Bit Manipulation InstructionsProceedings of the ACM on Management of Data10.1145/35893231:2(1-26)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589323
Habich DPietrzyk JKrause AHildebrandt JLehner W(2022)To use or not to use the SIMD gather instruction?Proceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3535089(1-5)Online publication date: 12-Jun-2022
https://dl.acm.org/doi/10.1145/3533737.3535089
Lutz CBreß SZeuch SRabl TMarkl VIves ZBonifati AEl Abbadi A(2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517911
Fett JUngethüm AHabich DLehner W(2021)The Case for SIMDified Analytical Query Processing on GPUsProceedings of the 17th International Workshop on Data Management on New Hardware10.1145/3465998.3466015(1-5)Online publication date: 20-Jun-2021
https://dl.acm.org/doi/10.1145/3465998.3466015
Zarubin MDamme PKrause AHabich DLehner WWassermann BMalka MChidambaram VRaz D(2021)SIMD-MIMD cocktail in a hybrid memory glassProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463782(1-12)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1145/3456727.3463782

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents