article

Sharing data and work across concurrent analytical queries

Authors:

Iraklis Psaroudakis,

Manos Athanassoulis,

Anastasia AilamakiAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 6, Issue 9

Pages 637 - 648

https://doi.org/10.14778/2536360.2536364

Published: 01 July 2013 Publication History

Abstract

Today's data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the query-centric model to execution models involving sharing of common data and work. Our goal is to show when and how a DW should employ sharing. We evaluate experimentally two sharing methodologies, based on their original prototype systems, that exploit work sharing opportunities among concurrent queries at run-time: Simultaneous Pipelining (SP), which shares intermediate results of common sub-plans, and Global Query Plans (GQP), which build and evaluate a single query plan with shared operators.

First, after a short review of sharing methodologies, we show that SP and GQP are orthogonal techniques. SP can be applied to shared operators of a GQP, reducing response times by 20%-48% in workloads with numerous common sub-plans. Second, we corroborate previous results on the negative impact of SP on performance for cases of low concurrency. We attribute this behavior to a bottleneck caused by the push-based communication model of SP. We show that pull-based communication for SP eliminates the overhead of sharing altogether for low concurrency, and scales better on multi-core machines than push-based SP, further reducing response times by 82%-86% for high concurrency. Third, we perform an experimental analysis of SP, GQP and their combination, and show when each one is beneficial. We identify a trade-off between low and high concurrency. In the former case, traditional query-centric operators with SP perform better, while in the latter case, GQP with shared operators enhanced by SP give the best results.

References

[1]

TPC-H Benchmark: Standard Specification, Revision 2.14.3.

[2]

S. Arumugam et al. The DataPath system: a data-centric analytic processing engine for large data warehouses. In Proc. of the 2010 ACM SIGMOD Int'l Conf. on Management of Data, pages 519-530, 2010.

[3]

G. Candea et al. A scalable, predictable join operator for highly concurrent data warehouses. Proc. of the VLDB Endowment, 2(1):277-288, 2009.

[4]

G. Candea et al. Predictable performance and high query concurrency for data analytics. The Int'l Journal on Very Large Data Bases, 20(2):227-248, 2011.

[5]

H.-T. Chou et al. An evaluation of buffer management strategies for relational database systems. In Proc. of the 11th Int'l Conf. on Very Large Data Bases, pages 127-141, 1985.

[6]

J. Cieslewicz et al. Adaptive aggregation on chip multiprocessors. In Proc. of the 33rd Int'l Conf. on Very Large Data Bases, pages 339-350, 2007.

[7]

L. Colby et al. Red brick vista™: aggregate computation and management. In Proc. of the 14th Int'l Conf. on Data Engineering, pages 174-177, 1998.

[8]

C. Cook. Database Architecture: The Storage Engine, 2001. http://msdn.microsoft.com/library/aa902689(v=sql.80).aspx.

[9]

N. N. Dalvi et al. Pipelining in multi-query optimization. In Proc. of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Databases, pages 59-70, 2001.

[10]

J. Dean et al. MapReduce: Simplified data processing on large clusters. Communications ACM, 51(1):107-113, 2008.

[11]

G. Giannikis et al. SharedDB: killing one thousand queries with one stone. Proc. of the VLDB Endowment, 5(6):526-537, 2012.

[12]

S. Harizopoulos et al. A case for staged database systems. In Proc. of the 2003 Conf. on Innovative Data Systems Research, 2003.

[13]

S. Harizopoulos et al. QPipe: a simultaneously pipelined relational query engine. In Proc. of the 2005 ACM SIGMOD Int'l Conf. on Management of Data, pages 383-394, 2005.

[14]

R. Johnson et al. To share or not to share? In Proc. of the 33rd Int'l Conf. on Very Large Data Bases, pages 351-362, 2007.

[15]

R. Johnson et al. Shore-MT: a scalable storage manager for the multicore era. In Proc. of the 12th Int'l Conf. on Extending Database Technology: Advances in Database Technology, pages 24-35, 2009.

[16]

T. Johnson et al. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In Proc. of the 20th Int'l Conf. on Very Large Data Bases, pages 439-450, 1994.

[17]

R. Kimball et al. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. John Wiley & Sons, Inc., 2nd edition, 2002.

[18]

C. Lang et al. Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling. In Proc. of the 23rd Int'l Conf. on Data Engineering, pages 1136-1145, 2007.

[19]

N. Megiddo et al. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proc. of the 2nd USENIX Conf. on File and Storage Technologies, pages 115-130, 2003.

[20]

M. Mehta et al. Batch Scheduling in Parallel Database Systems. In Proc. of the 9th Int'l Conf. on Data Engineering, pages 400-410, 1993.

[21]

P. O. Neil et al. Star Schema Benchmark. 2009.

[22]

E. J. O'Neil et al. The LRU-K page replacement algorithm for database disk buffering. In Proc. of the 1993 ACM SIGMOD Int'l Conf. on Management of Data, pages 297-306, 1993.

[23]

L. Qiao et al. Main-memory scan sharing for multicore cpus. Proc. of the VLDB Endowment, 1(1):610-621, 2008.

[24]

N. Roussopoulos. View indexing in relational databases. ACM Trans. Database Syst., 7(2):258-290, 1982.

[25]

P. Roy et al. Efficient and extensible algorithms for multi query optimization. In Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data, pages 249-260, 2000.

[26]

P. Russom. High-Performance Data Warehousing. TDWI, 2012. http://tdwi.org/research/2012/10/tdwi-best-practices-report-high-performance-data-warehousing.aspx.

[27]

T. K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13(1):23-52, 1988.

[28]

J. Shim et al. Dynamic Caching of Query Results for Decision Support Systems. In Proc. of the 11th Int'l Conf. on Scientific and Statistical Database Management, pages 254-263, 1999.

[29]

P. Unterbrunner et al. Predictable performance for unpredictable workloads. Proc. of the VLDB Endowment, 2(1):706-717, 2009.

[30]

M. Zukowski et al. Cooperative scans: dynamic bandwidth sharing in a DBMS. In Proc. of the 33rd Int'l Conf. on Very Large Data Bases, pages 723-734, 2007.

Cited By

Lv YZhang KWang ZZhang XLee RHe ZJing YWang X(2024)RTScan: Efficient Scan with Ray Tracing CoresProceedings of the VLDB Endowment10.14778/3648160.364818317:6(1460-1472)Online publication date: 3-May-2024
https://doi.org/10.14778/3648160.3648183
Zarubin MDamme PKrause AHabich DLehner WWassermann BMalka MChidambaram VRaz D(2021)SIMD-MIMD cocktail in a hybrid memory glassProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463782(1-12)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1145/3456727.3463782
Tang DShang ZMa WElmore AKrishnan SLi GLi ZIdreos SSrivastava D(2021)Resource-efficient Shared Query Execution via Exploiting Time SlacknessProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457282(1797-1810)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457282
Show More Cited By

Sharing data and work across concurrent analytical queries
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Reactive and proactive sharing across concurrent analytical queries
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Today an ever increasing amount of data is collected and analyzed by researchers, businesses, and scientists in data warehouses (DW). In addition to the data size, the number of users and applications querying data grows exponentially. The increasing ...
Lightweight annotations for controlling sharing in concurrent data structures
PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation

SharC is a recently developed system for checking data-sharing in multithreaded programs. Programmers specify sharing rules (read-only, protected by a lock, etc.) for individual objects, and the SharC compiler enforces these rules using static and ...
Lightweight annotations for controlling sharing in concurrent data structures
PLDI '09

SharC is a recently developed system for checking data-sharing in multithreaded programs. Programmers specify sharing rules (read-only, protected by a lock, etc.) for individual objects, and the SharC compiler enforces these rules using static and ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 6, Issue 9

July 2013

180 pages

ISSN:2150-8097

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2013

Published in PVLDB Volume 6, Issue 9

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
177
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)3

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lv YZhang KWang ZZhang XLee RHe ZJing YWang X(2024)RTScan: Efficient Scan with Ray Tracing CoresProceedings of the VLDB Endowment10.14778/3648160.364818317:6(1460-1472)Online publication date: 3-May-2024
https://doi.org/10.14778/3648160.3648183
Zarubin MDamme PKrause AHabich DLehner WWassermann BMalka MChidambaram VRaz D(2021)SIMD-MIMD cocktail in a hybrid memory glassProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463782(1-12)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1145/3456727.3463782
Tang DShang ZMa WElmore AKrishnan SLi GLi ZIdreos SSrivastava D(2021)Resource-efficient Shared Query Execution via Exploiting Time SlacknessProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457282(1797-1810)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457282
Gao XSahal RChen GKhafagy MOmara F(2020)Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with FlinkComplexity10.1155/2020/66171492020Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1155/2020/6617149
Li LZhang KGuo JHe WHe ZJing YHan WWang XMaier DPottinger RDoan ATan WAlawini ANgo H(2020)BinDex: A Two-Layered Index for Fast and Robust ScansProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380563(909-923)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3380563
Michiardi PCarra DMigliorini S(2020)Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing FrameworksInformation Systems Frontiers10.1007/s10796-020-09995-223:1(35-51)Online publication date: 4-Mar-2020
https://dl.acm.org/doi/10.1007/s10796-020-09995-2
Wei XHu HZhou XZhou A(2020)A Chunk-Based Hash Table Caching Method for In-Memory Hash JoinsWeb Information Systems Engineering – WISE 202010.1007/978-3-030-62008-0_26(376-389)Online publication date: 20-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-62008-0_26
Karimov JRabl TMarkl VBoncz PManegold SAilamaki ADeshpande AKraska T(2019)AStreamProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319884(607-622)Online publication date: 25-Jun-2019
https://dl.acm.org/doi/10.1145/3299869.3319884
(2018)Big data multi-query optimisation with Apache FlinkInternational Journal of Web Engineering and Technology10.5555/3272336.327234013:1(78-97)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3272336.3272340
Rehrmann RBinnig CBöhm AKim KLehner WRizk A(2018)OLTPshareProceedings of the VLDB Endowment10.14778/3229863.322986611:12(1769-1780)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.14778/3229863.3229866
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents