research-article

Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach

Editors: Chen Li, Volker Markl Authors:

Huynh Phung HuynhAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 8, Issue 6

Pages 642 - 653

https://doi.org/10.14778/2735703.2735704

Published: 01 February 2015 Publication History

Abstract

Modern processor technologies have driven new designs and implementations in main-memory hash joins. Recently, Intel Many Integrated Core (MIC) co-processors (commonly known as Xeon Phi) embrace emerging x86 single-chip many-core techniques. Compared with contemporary multi-core CPUs, Xeon Phi has quite different architectural features: wider SIMD instructions, many cores and hardware contexts, as well as lower-frequency in-order cores. In this paper, we experimentally revisit the state-of-the-art hash join algorithms on Xeon Phi co-processors. In particular, we study two camps of hash join algorithms: hardware-conscious ones that advocate careful tailoring of the join algorithms to underlying hardware architectures and hardware-oblivious ones that omit such careful tailoring. For each camp, we study the impact of architectural features and software optimizations on Xeon Phi in comparison with results on multi-core CPUs. Our experiments show two major findings on Xeon Phi, which are quantitatively different from those on multi-core CPUs. First, the impact of architectural features and software optimizations has quite different behavior on Xeon Phi in comparison with those on the CPU, which calls for new optimization and tuning on Xeon Phi. Second, hardware oblivious algorithms can outperform hardware conscious algorithms on a wide parameter window. These two findings further shed light on the design and implementation of query processing on new-generation single-chip many-core technologies.

References

[1]

Hash functions. http://www.cse.yorku.ca/~oz/hash.html.

[2]

Intel xeon phi coprocessor 5110p: http://ark.intel.com/products/71992/intel-xeon-phi-coprocessor-5110p-8gb-1_053-ghz-60-core.

[3]

Intel xeon processor e5-2687w: http://ark.intel.com/products/64582/intel-xeon-processor-e5-2687w-20m-cache-3_10-ghz-8_00-gts-intel-qpi.

[4]

Mumurhash3. https://code.google.com/p/smhasher/wiki/MurmurHash3.

[5]

Optimization and performance tuning for intel xeon phi coprocessors. https://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding.

[6]

C. Balkesen, G. Alonso, J. Teubner, and M. T. Ozsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, 2013.

Digital Library

[7]

C. Balkesen, J. Teubner, G. Alonso, and M. T. Ozsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In ICDE, 2013.

Digital Library

[8]

S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core cpus. In SIGMOD, 2011.

Digital Library

[9]

P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB, 1999.

Digital Library

[10]

S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. ACM TODS, 2007.

Digital Library

[11]

J. Giceva, G. Alonso, T. Roscoe, and T. Harris. Deployment of query plans on multicores. PVLDB, 2014.

Digital Library

[12]

G. Graefe. Sort-merge-join: an idea whose time has (h) passed? In ICDE. IEEE, 1994.

Digital Library

[13]

B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM TODS, 2009.

Digital Library

[14]

B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, 2008.

Digital Library

[15]

J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled cpu-gpu architecture. PVLDB, 2013.

Digital Library

[16]

M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. PVLDB, 6(9):709--720, 2013.

Digital Library

[17]

T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. Gpu join processing revisited. In DaMoN, 2012.

Digital Library

[18]

C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey. Sort vs. hash revisited: fast join implementation on modern multi-core cpus. PVLDB, 2009.

Digital Library

[19]

M. Lu, Y. Liang, H. Huynh, O. Liang, B. He, and R. Goh. Mrphi: An optimized mapreduce framework on intel xeon phi coprocessors. In TPDS. IEEE, 2014.

[20]

Y. Lv, B. Cui, B. He, and X. Chen. Operation-aware buffer management in flash-based systems. In SIGMOD, 2011.

Digital Library

[21]

S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. IEEE TKDE, 14(4):709--730, 2002.

Digital Library

[22]

S. J. Pennycook, C. J. Hughes, M. Smelyanskiy, and S. A. Jarvis. Exploring simd for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors. IPDPS, 2013.

Digital Library

[23]

H. Pirk, F. Funke, M. Grund, T. Neumann, U. Leser, S. Manegold, A. Kemper, and M. Kersten. Cpu and cache efficient management of memory-resident databases. In ICDE, 2013.

Digital Library

[24]

N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: A case for bandwidth oblivious simd sort. In SIGMOD, 2010.

Digital Library

[25]

S. Zhang, J. He, B. He, and M. Lu. Omnidb: Towards portable and efficient query processing on parallel cpu/gpu architectures. PVLDB (demo), 2013.

Digital Library

Cited By

Mühlig JTeubner J(2023)Micro Partitioning: Friendly to the Hardware and the DeveloperProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595310(27-34)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595310
Liu ZHan RZhang YZhang YTang XDeng GZhong TDementiev RLu YQue M(2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326278234:6(1757-1772)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3262782
Lutz CBreß SZeuch SRabl TMarkl VIves ZBonifati AEl Abbadi A(2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517911
Show More Cited By

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers
Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

Intel® Xeon Phi™ coprocessor is based on the Intel® Many Integrated Core (Intel® MIC) architecture, which is an innovative new processor architecture that combines abundant thread parallelism with long SIMD vector units. Efficiently exploiting SIMD ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 8, Issue 6

February 2015

60 pages

ISSN:2150-8097

Editors:
Chen Li
University of California, Irvine
,
Volker Markl
TU Berlin

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2015

Published in PVLDB Volume 8, Issue 6

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
394
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)2

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mühlig JTeubner J(2023)Micro Partitioning: Friendly to the Hardware and the DeveloperProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595310(27-34)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595310
Liu ZHan RZhang YZhang YTang XDeng GZhong TDementiev RLu YQue M(2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326278234:6(1757-1772)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3262782
Lutz CBreß SZeuch SRabl TMarkl VIves ZBonifati AEl Abbadi A(2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517911
Sun SChen YLu SHe BLi Y(2021)ThunderRWProceedings of the VLDB Endowment10.14778/3476249.347625714:11(1992-2005)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.14778/3476249.3476257
Zhang WKim JRoss KSedlar EStadler L(2021)Adaptive code generation for data-intensive analyticsProceedings of the VLDB Endowment10.14778/3447689.344769714:6(929-942)Online publication date: 12-Apr-2021
https://dl.acm.org/doi/10.14778/3447689.3447697
You GWang X(2020)A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systemsCluster Computing10.1007/s10586-019-03030-z23:4(2591-2608)Online publication date: 1-Dec-2020
https://dl.acm.org/doi/10.1007/s10586-019-03030-z
Polychroniou ORoss K(2020)VIP: A SIMD vectorized analytical query engineThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00621-w29:6(1243-1261)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.1007/s00778-020-00621-w
Fang ZZheng BWeng C(2019)Interleaved multi-vectorizingProceedings of the VLDB Endowment10.14778/3368289.336829013:3(226-238)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.14778/3368289.3368290
Cheng XHe BLo EWang WLu SChen XZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)Deploying Hash Tables on Die-Stacked High Bandwidth MemoryProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358015(239-248)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3358015
Lai ZLuo QXie X(2019)Efficient Data-Parallel Primitives on Heterogeneous SystemsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337920(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337920
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents