Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

VIP: A SIMD vectorized analytical query engine

Published: 13 July 2020 Publication History

Abstract

Query execution engines for analytics are continuously adapting to the underlying hardware in order to maximize performance. Wider SIMD registers and more complex SIMD instruction sets are emerging in mainstream CPUs and new processor designs such as the many-core Intel Xeon Phi CPUs that rely on SIMD vectorization to achieve high performance per core while packing a greater number of smaller cores per chip. In the database literature, using SIMD to optimize stand-alone operators with key–rid pairs is common, yet the state-of-the-art query engines rely on compilation of tightly coupled operators where hand-optimized individual operators become impractical. In this article, we extend a state-of-the-art analytical query engine design by combining code generation and operator pipelining with SIMD vectorization and show that the SIMD speedup is diminished when execution is dominated by random memory accesses. To better utilize the hardware features, we introduce VIP, an analytical query engine designed and built bottom up from pre-compiled column-oriented data parallel sub-operators and implemented entirely in SIMD. In our evaluation using synthetic and TPC-H queries on a many-core CPU, we show that VIP outperforms hand-optimized query-specific code without incurring the runtime compilation overhead, and highlight the efficiency of VIP at utilizing the hardware features of many-core CPUs.

References

[1]
Abadi, D., Myers, D., DeWitt, D., Madden, S.: Materialization strategies in a column-oriented DBMS. In: ICDE, pp. 466–475 (2007)
[2]
Balkesen C, Alonso G, Teubner J, and Ozsu MT Multicore, main-memory joins: sort vs. hash revisited PVLDB 2013 7 1 85-96
[3]
Balkesen, C., Teubner, J., Alonso, G., Ozsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: ICDE, pp. 362–373 (2013)
[4]
Blanas, S., Li, Y., Patel, J.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: SIGMOD, pp. 37–48 (2011)
[5]
Boncz, P., Manegold, S., Kersten, M.: Database architecture optimized for the new bottleneck: memory access. In: VLDB, pp. 54–65 (1999)
[6]
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR (2005)
[7]
Cheng, X., He, B., Du, X., Lau, C.T.: A study of main-memory hash joins on many-core processor: a case with intel knights landing architecture. In: CIKM, pp. 657–666 (2017)
[8]
Chhugani, J., Nguyen, A.D., Lee, V.W., Macy, W., Hagog, M., Chen, Y.-K., Baransi, A., Kumar, S., Dubey, P.: Efficient implementation of sorting on multi-core SIMD CPU architecture. In: VLDB, pp. 1313–1324 (2008)
[9]
Costea, A., Ionescu, A., Răducanu, B., Switakowski, M., Bârca, C., Sompolski, J., Luszczak, A., Szafrański, M., de Nijs, G., Boncz, P.: VectorH: taking SQL-on-Hadoop to the next level. In: SIGMOD, pp. 1105–1117 (2016)
[10]
Dageville, B., Cruanes, T., Zukowski, M., Antonov, V., Avanes, A., Bock, J., Claybaugh, J., Engovatov, D., Hentschel, M., Huang, J., Lee, A.W., Motivala, A., Munir, A.Q., Pelley, S., Povinec, P., Rahn, G., Triantafyllis, S., Unterbrunner, P.: The snowflake elastic data warehouse. In: SIGMOD, pp. 215–226 (2016)
[11]
Fang Z, Zheng B, and Weng C Interleaved multi-vectorizing PVLDB 2019 13 3 226-238
[12]
Flajolet P and Martin GN Probabilistic counting algorithms for data base applications J. Comput. Syst. Sci. 1985 31 2 182-209
[13]
Fowler, G., Noll, L.C., Vo, K.-P., Eastlake, D.: The FNV non-cryptographic hash algorithm. Technical report (2017). http://www.ietf.org/internet-drafts/draft-eastlake-fnv-13.txt
[14]
Graefe G Volcano: an extensible and parallel query evaluation system TKDE 1994 6 1 120-135
[15]
Gupta, A., Agarwal, D., Tan, D., Kulesza, J., Pathak, R., Stefani, S., Srinivasan, V.: Amazon redshift and the case for simpler data warehouses. In: SIGMOD, pp. 1917–1923 (2015)
[16]
Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort: a new parallel sorting algorithm for multi-core SIMD processors. In: PACT, pp. 189–198 (2007)
[17]
Inoue H, Ohara M, and Taura K Faster set intersection with SIMD instructions by reducing branch mispredictions PVLDB 2014 8 3 293-304
[18]
Inoue H and Taura K SIMD- and cache-friendly algorithm for sorting an array of structures PVLDB 2015 8 11 1274-1285
[19]
Jha S, He B, Lu M, Cheng X, and Huynh HP Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach PVLDB 2015 8 6 642-653
[20]
Kim C, Kaldewey T, Lee VW, Sedlar E, Nguyen AD, Satish N, Chhugani J, Di Blas A, and Dubey P Sort vs. hash revisited: fast join implementation on modern multi-core CPUs PVLDB 2009 2 2 1378-1389
[21]
Krikellas, K., Viglas, S., Cintra, M.: Generating code for holistic query evaluation. In: ICDE, pp. 613–624 (2010)
[22]
Lang, H., Kipf, A., Passing, L., Boncz, P., Neumann, T., Kemper, A.: Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. In: DaMoN (2018)
[23]
Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD, pp. 311–326 (2016)
[24]
Lang H, Neumann T, Kemper A, and Boncz P Performance-optimal filtering: Bloom overtakes cuckoo at high throughput PVLDB 2019 12 5 502-515
[25]
Leis, V., Boncz, P., Kemper, A., Neumann, T.: Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD, pp. 743–754 (2014)
[26]
Lemire D et al. Decoding billions of integers per second through vectorization Softw. Pract. Exp. 2015 45 1 1-29
[27]
Li, Y., Patel, J.M.: Bitweaving: fast scans for main memory data processing. In: SIGMOD, pp. 289–300 (2013)
[28]
Li Y and Patel JM Widetable: an accelerator for analytical data processing PVLDB 2014 7 10 907-918
[29]
Manegold S, Boncz P, and Kersten M Optimizing database architecture for the new bottleneck: memory access J. VLDB 2000 9 3 231-246
[30]
Manegold, S., Boncz, P., Kersten, M.: What happens during a join? Dissecting CPU and memory optimization effects. In: VLDB, pp. 339–350 (2000)
[31]
Manegold S, Boncz P, and Kersten M Optimizing main-memory join on modern hardware TKDE 2002 14 4 709-730
[32]
Menon, P., Mowry, T.C., Pavlo, A.: Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last. In: PVLDB (2017)
[33]
Neumann T Efficiently compiling efficient query plans for modern hardware PVLDB 2011 4 9 539-550
[34]
Pagh R and Rodler FF Cuckoo hashing J. Algorithms 2004 51 2 122-144
[35]
Pirk H, Moll O, Zaharia M, and Madden S Voodoo—a vector algebra for portable database performance on modern hardware PVLDB 2016 9 14 1707-1718
[36]
Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: SIGMOD, pp. 1493–1508 (2015)
[37]
Polychroniou, O., Ross, K.A.: High throughput heavy hitter aggregation for modern SIMD processors. In: DaMoN (2013)
[38]
Polychroniou, O., Ross, K.A.: A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In: SIGMOD, pp. 755–766 (2014)
[39]
Polychroniou, O., Ross, K.A.: Vectorized Bloom filters for advanced SIMD processors. In: DaMoN (2014)
[40]
Polychroniou, O., Ross, K.A.: Efficient lightweight compression alongside fast scans. In: DaMoN (2015)
[41]
Polychroniou, O., Ross, K.A.: Towards practical vectorized analytical query engines. In: DaMoN (2019)
[42]
Raman V, Attaluri G, Barber R, Chainani N, Kalmuk D, KulandaiSamy V, Leenstra J, Lightstone S, Liu S, Lohman GM, Malkemus T, Mueller R, Pandis I, Schiefer B, Sharpe D, Sidle R, Storm A, and Zhang L DB2 with BLU acceleration: so much more than just a column store PVLDB 2013 6 11 1080-1091
[43]
Ross KA Selection conditions in main memory TODS 2004 29 1 132-161
[44]
Ross, K.A.: Efficient hash probes on modern processors. In: ICDE, pp. 1297–1301 (2007)
[45]
Roy, P., Teubner, J., Alonso, G.: Efficient frequent item counting in multi-core hardware. In: KDD, pp. 1451–1459 (2012)
[46]
Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: SIGMOD, pp. 351–362 (2010)
[47]
Schlegel, B., Karnagel, T., Kiefer, T., Lehner, W.: Scalable frequent itemset mining on many-core processors. In: DaMoN (2013)
[48]
Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: SIGMOD, pp. 1961–1976 (2016)
[49]
Sirin, U., Tözün, P., Porobic, D., Ailamaki, A.: Micro-architectural analysis of in-memory OLTP. In: SIGMOD, pp. 387–402 (2016)
[50]
Sitaridi, E., Polychroniou, O., Ross, K.A.: SIMD-accelerated regular expression matching. In: DaMoN (2016)
[51]
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)
[52]
Ungethüm, A., Pietrzyk, J., Damme, P., Krause, A., Habich, D., Lehner, W., Focht, E.: Hardware-oblivious SIMD parallelism for in-memory column-stores. In: CIDR (2020)
[53]
Wassenberg, J., Sanders, P.: Engineering a multi core radix sort. In: EuroPar, pp. 160–169 (2011)
[54]
Willhalm T, Popovici N, Boshmaf Y, Plattner H, Zeier A, and Schaffner J SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units PVLDB 2009 2 1 385-394
[55]
Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: SIGMOD, pp. 145–156 (2002)

Cited By

View all
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • (2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 24-Aug-2023
  • (2023)Selection Pushdown in Column Stores using Bit Manipulation InstructionsProceedings of the ACM on Management of Data10.1145/35893231:2(1-26)Online publication date: 20-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 29, Issue 6
Nov 2020
324 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 13 July 2020
Accepted: 22 June 2020
Revision received: 10 June 2020
Received: 27 January 2020

Author Tags

  1. Query execution
  2. Modern hardware
  3. OLAP
  4. SIMD
  5. Vectorization

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • (2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 24-Aug-2023
  • (2023)Selection Pushdown in Column Stores using Bit Manipulation InstructionsProceedings of the ACM on Management of Data10.1145/35893231:2(1-26)Online publication date: 20-Jun-2023
  • (2022)To use or not to use the SIMD gather instruction?Proceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3535089(1-5)Online publication date: 12-Jun-2022
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • (2021)The Case for SIMDified Analytical Query Processing on GPUsProceedings of the 17th International Workshop on Data Management on New Hardware10.1145/3465998.3466015(1-5)Online publication date: 20-Jun-2021
  • (2021)SIMD-MIMD cocktail in a hybrid memory glassProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463782(1-12)Online publication date: 14-Jun-2021

View Options

View options

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media