Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach

Published: 01 February 2015 Publication History

Abstract

Modern processor technologies have driven new designs and implementations in main-memory hash joins. Recently, Intel Many Integrated Core (MIC) co-processors (commonly known as Xeon Phi) embrace emerging x86 single-chip many-core techniques. Compared with contemporary multi-core CPUs, Xeon Phi has quite different architectural features: wider SIMD instructions, many cores and hardware contexts, as well as lower-frequency in-order cores. In this paper, we experimentally revisit the state-of-the-art hash join algorithms on Xeon Phi co-processors. In particular, we study two camps of hash join algorithms: hardware-conscious ones that advocate careful tailoring of the join algorithms to underlying hardware architectures and hardware-oblivious ones that omit such careful tailoring. For each camp, we study the impact of architectural features and software optimizations on Xeon Phi in comparison with results on multi-core CPUs. Our experiments show two major findings on Xeon Phi, which are quantitatively different from those on multi-core CPUs. First, the impact of architectural features and software optimizations has quite different behavior on Xeon Phi in comparison with those on the CPU, which calls for new optimization and tuning on Xeon Phi. Second, hardware oblivious algorithms can outperform hardware conscious algorithms on a wide parameter window. These two findings further shed light on the design and implementation of query processing on new-generation single-chip many-core technologies.

References

[1]
Hash functions. http://www.cse.yorku.ca/~oz/hash.html.
[2]
Intel xeon phi coprocessor 5110p: http://ark.intel.com/products/71992/intel-xeon-phi-coprocessor-5110p-8gb-1_053-ghz-60-core.
[3]
Intel xeon processor e5-2687w: http://ark.intel.com/products/64582/intel-xeon-processor-e5-2687w-20m-cache-3_10-ghz-8_00-gts-intel-qpi.
[4]
Mumurhash3. https://code.google.com/p/smhasher/wiki/MurmurHash3.
[5]
Optimization and performance tuning for intel xeon phi coprocessors. https://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding.
[6]
C. Balkesen, G. Alonso, J. Teubner, and M. T. Ozsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, 2013.
[7]
C. Balkesen, J. Teubner, G. Alonso, and M. T. Ozsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In ICDE, 2013.
[8]
S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core cpus. In SIGMOD, 2011.
[9]
P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB, 1999.
[10]
S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. ACM TODS, 2007.
[11]
J. Giceva, G. Alonso, T. Roscoe, and T. Harris. Deployment of query plans on multicores. PVLDB, 2014.
[12]
G. Graefe. Sort-merge-join: an idea whose time has (h) passed? In ICDE. IEEE, 1994.
[13]
B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM TODS, 2009.
[14]
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, 2008.
[15]
J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled cpu-gpu architecture. PVLDB, 2013.
[16]
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. PVLDB, 6(9):709--720, 2013.
[17]
T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. Gpu join processing revisited. In DaMoN, 2012.
[18]
C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey. Sort vs. hash revisited: fast join implementation on modern multi-core cpus. PVLDB, 2009.
[19]
M. Lu, Y. Liang, H. Huynh, O. Liang, B. He, and R. Goh. Mrphi: An optimized mapreduce framework on intel xeon phi coprocessors. In TPDS. IEEE, 2014.
[20]
Y. Lv, B. Cui, B. He, and X. Chen. Operation-aware buffer management in flash-based systems. In SIGMOD, 2011.
[21]
S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. IEEE TKDE, 14(4):709--730, 2002.
[22]
S. J. Pennycook, C. J. Hughes, M. Smelyanskiy, and S. A. Jarvis. Exploring simd for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors. IPDPS, 2013.
[23]
H. Pirk, F. Funke, M. Grund, T. Neumann, U. Leser, S. Manegold, A. Kemper, and M. Kersten. Cpu and cache efficient management of memory-resident databases. In ICDE, 2013.
[24]
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: A case for bandwidth oblivious simd sort. In SIGMOD, 2010.
[25]
S. Zhang, J. He, B. He, and M. Lu. Omnidb: Towards portable and efficient query processing on parallel cpu/gpu architectures. PVLDB (demo), 2013.

Cited By

View all
  • (2023)Micro Partitioning: Friendly to the Hardware and the DeveloperProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595310(27-34)Online publication date: 18-Jun-2023
  • (2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326278234:6(1757-1772)Online publication date: 1-Jun-2023
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 8, Issue 6
February 2015
60 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2015
Published in PVLDB Volume 8, Issue 6

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Micro Partitioning: Friendly to the Hardware and the DeveloperProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595310(27-34)Online publication date: 18-Jun-2023
  • (2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326278234:6(1757-1772)Online publication date: 1-Jun-2023
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • (2021)ThunderRWProceedings of the VLDB Endowment10.14778/3476249.347625714:11(1992-2005)Online publication date: 27-Oct-2021
  • (2021)Adaptive code generation for data-intensive analyticsProceedings of the VLDB Endowment10.14778/3447689.344769714:6(929-942)Online publication date: 12-Apr-2021
  • (2020)A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systemsCluster Computing10.1007/s10586-019-03030-z23:4(2591-2608)Online publication date: 1-Dec-2020
  • (2020)VIP: A SIMD vectorized analytical query engineThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00621-w29:6(1243-1261)Online publication date: 13-Jul-2020
  • (2019)Interleaved multi-vectorizingProceedings of the VLDB Endowment10.14778/3368289.336829013:3(226-238)Online publication date: 1-Nov-2019
  • (2019)Deploying Hash Tables on Die-Stacked High Bandwidth MemoryProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358015(239-248)Online publication date: 3-Nov-2019
  • (2019)Efficient Data-Parallel Primitives on Heterogeneous SystemsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337920(1-10)Online publication date: 5-Aug-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media