Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

MQJoin: efficient shared execution of main-memory joins

Published: 01 January 2016 Publication History

Abstract

Database architectures typically process queries one-at-a-time, executing concurrent queries in independent execution contexts. Often, such a design leads to unpredictable performance and poor scalability. One approach to circumvent the problem is to take advantage of sharing opportunities across concurrently running queries. In this paper we propose Many-Query Join (MQJoin), a novel method for sharing the execution of a join that can efficiently deal with hundreds of concurrent queries. This is achieved by minimizing redundant work and making efficient use of main-memory bandwidth and multi-core architectures. Compared to existing proposals, MQJoin is able to efficiently handle larger workloads regardless of the schema by exploiting more sharing opportunities. We also compared MQJoin to two commercial main-memory column-store databases. For a TPC-H based workload, we show that MQJoin provides 2--5x higher throughput with significantly more stable response times.

References

[1]
TPC-H Benchmark. http://www.tpc.org/tpch/spec/tpch2.17.0.pdf.
[2]
M.-C. Albutiu, A. Kemper, and T. Neumann. Massively Parallel Sort-merge Joins in Main Memory Multi-core Database Systems. PVLDB, 5(10):1064--1075, June 2012.
[3]
S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. Perez. The DataPath System: a Data-centric Analytic Processing Engine for Large Data Warehouses. In Proc. SIGMOD 2010, pages 519--530, 2010.
[4]
R. Avnur and J. M. Hellerstein. Eddies: Continuously Adaptive Query Processing. In Proc. SIGMOD 2000, pages 261--272, 2000.
[5]
C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, 2013.
[6]
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In Proc. ICDE 2013, pages 362--373, 2013.
[7]
S. Blanas, Y. Li, and J. M. Patel. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs. In Proc. SIGMOD 2011, pages 37--48, 2011.
[8]
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In Proc. CIDR 2005, pages 225--237, 2005.
[9]
G. Candea, N. Polyzotis, and R. Vingralek. A scalable, predictable join operator for highly concurrent data warehouses. PVLDB, 2(1):277--288, Aug. 2009.
[10]
S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving Hash Join Performance Through Prefetching. In Proc. ICDE 2004, pages 116--, 2004.
[11]
S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving Hash Join Performance Through Prefetching. ACM Trans. Database Syst., 32(3), Aug. 2007.
[12]
G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: Killing one Thousand Queries with One Stone. PVLDB, 5(6):526--537, Feb. 2012.
[13]
G. Giannikis, D. Makreshanski, G. Alonso, and D. Kossmann. Shared Workload Optimization. PVLDB, 7(6):429--440, Feb. 2014.
[14]
S. Harizopoulos and A. Ailamaki. StagedDB: Designing Database Servers for Modern Hardware. In In IEEE Data, pages 11--16, 2005.
[15]
S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. QPipe: a Simultaneously Pipelined Relational Query Engine. In Proc. SIGMOD 2005, pages 383--394, 2005.
[16]
R. Johnson, S. Harizopoulos, N. Hardavellas, K. Sabirli, I. Pandis, A. Ailamaki, N. G. Mancheril, and B. Falsafi. To Share or Not to Share? In Proc. VLDB 2007, pages 351--362, 2007.
[17]
C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. PVLDB, 2(2):1378--1389, Aug. 2009.
[18]
C. A. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling. In Proc. ICDE 2007, pages 1136--1145, 2007.
[19]
C. A. Lang, B. Bhattacharjee, T. Malkemus, and K. Wong. Increasing Buffer-locality for Multiple Index Based Scans through Intelligent Placement and Index Scan Speed Control. In Proc. VLDB 2007, pages 1298--1309.
[20]
S. Manegold, P. Boncz, and M. Kersten. Optimizing Main-Memory Join on Modern Hardware. IEEE Trans. on Knowl. and Data Eng., 14(4):709--730, July 2002.
[21]
P. O'Neil, B. O'Neal, and X. Chen. Star Schema Benchmark. http://www.cs.umb.edu/~poneil/StarSchemaB.PDF.
[22]
I. Psaroudakis, M. Athanassoulis, and A. Ailamaki. Sharing Data and Work across Concurrent Analytical Queries. PVLDB, 6(9):637--648, July 2013.
[23]
L. Qiao, V. Raman, F. Reiss, P. J. Haas, and G. M. Lohman. Main-memory Scan Sharing for Multi-core CPUs. PVLDB, 1(1):610--621, Aug. 2008.
[24]
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In Proc. ICDE 2008, pages 60--69, 2008.
[25]
T. K. Sellis. Multiple-query Optimization. ACM Trans. Database Syst., 13(1):23--52, Mar. 1988.
[26]
A. Shatdal, C. Kant, and J. F. Naughton. Cache Conscious Algorithms for Relational Query Processing. In Proc. VLDB 1994, pages 510--521, 1994.
[27]
P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable Performance for Unpredictable Workloads. PVLDB, 2(1):706--717, Aug. 2009.
[28]
M. Zukowski, S. Héman, N. Nes, and P. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS. In Proc. VLDB 2007, pages 723--734, 2007.
[29]
M. Zukowski, N. Nes, and P. Boncz. DSM vs. NSM: CPU Performance Tradeoffs in Block-oriented Query Processing. In Proc. DaMoN 2008, pages 47--54, 2008.
[30]
M. Zukowski, M. van de Wiel, and P. Boncz. Vectorwise: A Vectorized Analytical DBMS. In Proc. ICDE 2012, pages 1349--1350, 2012.

Cited By

View all
  • (2023)Lemo: A Cache-Enhanced Learned Optimizer for Concurrent QueriesProceedings of the ACM on Management of Data10.1145/36267341:4(1-26)Online publication date: 12-Dec-2023
  • (2023)SH2O: Efficient Data Access for Work-Sharing DatabasesProceedings of the ACM on Management of Data10.1145/36173401:3(1-26)Online publication date: 13-Nov-2023
  • (2023)An Optimized Solution for Highly Contended Transactional WorkloadsDependable Software Engineering. Theories, Tools, and Applications10.1007/978-981-99-8664-4_23(402-418)Online publication date: 27-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 9, Issue 6
January 2016
60 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2016
Published in PVLDB Volume 9, Issue 6

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Lemo: A Cache-Enhanced Learned Optimizer for Concurrent QueriesProceedings of the ACM on Management of Data10.1145/36267341:4(1-26)Online publication date: 12-Dec-2023
  • (2023)SH2O: Efficient Data Access for Work-Sharing DatabasesProceedings of the ACM on Management of Data10.1145/36173401:3(1-26)Online publication date: 13-Nov-2023
  • (2023)An Optimized Solution for Highly Contended Transactional WorkloadsDependable Software Engineering. Theories, Tools, and Applications10.1007/978-981-99-8664-4_23(402-418)Online publication date: 27-Nov-2023
  • (2022)To share or not to share vector registers?The VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00744-231:6(1215-1236)Online publication date: 28-Apr-2022
  • (2021)Sharing opportunities for OLTP workloads in different isolation levelsProceedings of the VLDB Endowment10.14778/3401960.340196713:10(1696-1708)Online publication date: 10-Mar-2021
  • (2021)SIMD-MIMD cocktail in a hybrid memory glassProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463782(1-12)Online publication date: 14-Jun-2021
  • (2021)Resource-efficient Shared Query Execution via Exploiting Time SlacknessProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457282(1797-1810)Online publication date: 9-Jun-2021
  • (2021)Scalable Multi-Query Execution using Reinforcement LearningProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452799(1651-1663)Online publication date: 9-Jun-2021
  • (2020)Generalized sub-query fusion for eliminating redundant I/O from big-data queriesProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488778(209-224)Online publication date: 4-Nov-2020
  • (2020)AJoinProceedings of the VLDB Endowment10.14778/3372716.337271813:4(435-448)Online publication date: 6-Jan-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media