Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3034786.3034792acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Efficient and Provable Multi-Query Optimization

Published: 09 May 2017 Publication History

Abstract

Complex queries for massive data analysis jobs have become increasingly commonplace. Many such queries contain common subexpressions, either within a single query or among multiple queries submitted as a batch. Conventional query optimizers do not exploit these subexpressions and produce sub-optimal plans. The problem of multi-query optimization (MQO) is to generate an optimal combined evaluation plan by computing common subexpressions once and reusing them. Exhaustive algorithms for MQO explore an O(nn) search space. Thus, this problem has primarily been tackled using various heuristic algorithms, without providing any theoretical guarantees on the quality of their solution.
In this paper, instead of the conventional cost minimization problem, we treat the problem as maximizing a linear transformation of the cost function. We propose a greedy algorithm for this transformed formulation of the problem, which under weak, intuitive assumptions, provides an approximation factor guarantee for this formulation. We go on to show that this factor is optimal, unless P = NP. An- other noteworthy point about our algorithm is that it can be easily incorporated into existing transformation-based optimizers. We finally propose optimizations which can be used to improve the efficiency of our algorithm.

References

[1]
Y. Boykov and M. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In ICCV, pages 105--112, 2001.
[2]
N. Buchbinder, M. Feldman, J. Naor, and R. Schwartz. A tight linear time (1/2)-approximation for unconstrained submodular maximization. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS, pages 649--658, 2012.
[3]
G. Cǎlinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput., 40(6):1740--1766, 2011.
[4]
N. N. Dalvi, S. K. Sanghai, P. Roy, and S. Sudarshan. Pipelining in multi-query optimization. J. Comput. Syst. Sci., 66(4):728--762, 2003.
[5]
I. Dinur and D. Steurer. Analytical approach to parallel repetition. In Symposium on Theory of Computing, STOC, pages 624--633, 2014.
[6]
A. Ene and H. L. Nguyen. Constrained submodular maximization: Beyond 1/e. In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS, pages 248--257, 2016.
[7]
U. Feige. A threshold of ln phn for approximating set cover. J. ACM, 45(4):634--652, 1998.
[8]
U. Feige, V. S. Mirrokni, and J. Vondrák. Maximizing non-monotone submodular functions. SIAM J. Comput., 40(4):1133--1153, 2011.
[9]
G. Graefe. The Cascades framework for query optimization. IEEE Data Eng. Bull., 18(3):19--29, 1995.
[10]
G. Graefe and W. J. McKenna. The Volcano optimizer generator: Extensibility and efficient search. In Proceedings of the Ninth International Conference on Data Engineering, pages 209--218, 1993.
[11]
S. Jegelka and J. A. Bilmes. Submodularity beyond submodular energies: Coupling edges in graph cuts. In The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 1897--1904, 2011.
[12]
D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 137--146, 2003.
[13]
P. Kohli, M. P. Kumar, and P. H. S. Torr. P3 & beyond: Solving energies with higher order cliques. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
[14]
R. Krishnaswamy and M. Sviridenko. Inapproximability of the multi-level uncapacitated facility location problem. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 718--734, 2012.
[15]
C. Lund and M. Yannakakis. On the hardness of approximating minimization problems. J. ACM, 41(5):960--981, 1994.
[16]
M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques, pages 234--243, 1977.
[17]
S. Mittal and A. S. Schulz. An FPTAS for optimizing a class of low-rank functions over a polytope. Math. Program., 141(1--2):103--120, 2013.
[18]
D. Moshkovitz. The projection games conjecture and the NP-hardness of ln n-approximating set-cover. Theory of Computing, 11:221--235, 2015.
[19]
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions - I. Math. Program., 14(1):265--294, 1978.
[20]
J. Park and A. Segev. Using common subexpressions to optimize multiple queries. In Proceedings of the Fourth International Conference on Data Engineering (ICDE), pages 311--319, 1988.
[21]
A. Pellenkoft, C. A. Galindo-Legaria, and M. L. Kersten. The complexity of transformation-based join enumeration. In VLDB, pages 306--315, 1997.
[22]
A. Rosenthal and U. S. Chakravarthy. Anatomy of a mudular multiple query optimizer. In Fourteenth International Conference on Very Large Data Bases (VLDB), pages 230--239, 1988.
[23]
N. Roussopoulos. View indexing in relational databases. ACM Trans. Database Syst., 7(2):258--290, 1982.
[24]
P. Roy. Multi Query Optimization and Applications. Ph.d. thesis, Indian Institute of Technology, Bombay, 2001.
[25]
P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and extensible algorithms for multi query optimization. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 249--260, 2000.
[26]
T. K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13(1):23--52, 1988.
[27]
K. Shim, T. K. Sellis, and D. S. Nau. Improvements on a heuristic algorithm for multiple-query optimization. Data Knowl. Eng., 12(2):197--222, 1994.
[28]
Y. N. Silva, P. Larson, and J. Zhou. Exploiting common subexpressions for cloud query processing. In IEEE 28th International Conference on Data Engineering (ICDE), pages 1337--1348, 2012.
[29]
S. N. Subramanian and S. Venkataraman. Cost-based optimization of decision support queries using transient views. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, pages 319--330, 1998.
[30]
M. Sviridenko. A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett., 32(1):41--43, 2004.
[31]
D. Thomas, A. A. Diwan, and S. Sudarshan. Scheduling and caching in multiquery optimization. In Proceedings of the 13th International Conference on Management of Data (COMAD), pages 150--153, 2006.
[32]
J. Zhou, P. Larson, J. C. Freytag, and W. Lehner. Efficient exploitation of similar subexpressions for query processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 533--544, 2007.

Cited By

View all
  • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
  • (2023)Atrapos: Real-time Evaluation of Metapath Query WorkloadsProceedings of the ACM Web Conference 202310.1145/3543507.3583322(2487-2498)Online publication date: 30-Apr-2023
  • (2023)Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)The VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00785-132:6(1315-1342)Online publication date: 20-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '17: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
May 2017
458 pages
ISBN:9781450341981
DOI:10.1145/3034786
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. chase

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

PODS '17 Paper Acceptance Rate 29 of 101 submissions, 29%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)4
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
  • (2023)Atrapos: Real-time Evaluation of Metapath Query WorkloadsProceedings of the ACM Web Conference 202310.1145/3543507.3583322(2487-2498)Online publication date: 30-Apr-2023
  • (2023)Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)The VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00785-132:6(1315-1342)Online publication date: 20-Mar-2023
  • (2022)SAFEProceedings of the VLDB Endowment10.14778/3494124.349413515:3(513-526)Online publication date: 4-Feb-2022
  • (2022)Deep and Collective Entity Resolution in Parallel2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00200(2060-2072)Online publication date: May-2022
  • (2021)View selection over knowledge graphs in triple storesProceedings of the VLDB Endowment10.14778/3484224.348422714:13(3281-3294)Online publication date: 28-Oct-2021
  • (2021)Selecting Subexpressions to Materialize for Dynamic Large-Scale WorkloadsBig Data Analytics and Knowledge Discovery10.1007/978-3-030-86534-4_4(39-51)Online publication date: 5-Sep-2021
  • (2020)TempuraProceedings of the VLDB Endowment10.14778/3421424.342142714:1(14-27)Online publication date: 1-Sep-2020
  • (2020)Automated generation of materialized views in OracleProceedings of the VLDB Endowment10.14778/3415478.341553313:12(3046-3058)Online publication date: 14-Sep-2020
  • (2020)A new fragments allocating method for join query in distributed databaseFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-9032-114:4Online publication date: 1-Aug-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media