research-article

Open access

Sub-optimal Join Order Identification with L1-error

Authors: Yesdaulet Izenov, Asoke Datta, Brian Tsan, Florin RusuAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 2, Issue 1

Article No.: 17, Pages 1 - 24

https://doi.org/10.1145/3639272

Published: 26 March 2024 Publication History

Abstract

Q-error -- the standard metric for quantifying the error of individual cardinality estimates -- has been widely adopted as a surrogate for query plan optimality in recent work on learning-based cardinality estimation. However, the only result connecting Q-error with plan optimality is an upper-bound on the cost of the worst possible query plan computed from a set of cardinality estimates---there is no connection between Q-error and the real plans generated by standard query optimizers. Therefore, in order to identify sub-optimal query plans, we propose a learning-based method having as its main feature a novel measure called L1-error. Similar to Q-error, L1-error requires complete knowledge of true cardinalities and estimates for all the sub-plans of a query plan. Unlike Q-error, which considers the estimates independently, L1-error is defined as a permutation distance between true cardinalities and estimates for all the sub-plans having the same number of joins. Moreover, L1-error takes into account errors relative to the magnitude of their cardinalities and gives larger weight to small multi-way joins. Our experimental results confirm that, when L1-error is integrated into a standard decision tree classifier, it leads to the accurate identification of sub-optimal plans across four different benchmarks. This accuracy can be further improved by combining L1-error with Q-error into a composite feature that can be computed without overhead from the same data.

References

[1]

Peter Boncz. [n. d.]. The IMDB Dataset. http://homepages.cwi.nl/~boncz/job/imdb.tgz.

[2]

Peter Boncz, Angelos-Christos Anatiotis, and Steffen Klabe. 2017. JCC-H: Adding Join Crossing Correlations with Skew to TPC-H. In TPCTC 2017. 103--119.

[3]

Nicolas Bruno, César Galindo-Legaria, and Milind Joshi. 2010. Polynomial Heuristics for Query Optimization. In ICDE 2010. 589--600.

[4]

Walter Cai, Magdalena Balazinska, and Dan Suciu. 2019. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In SIGMOD 2019. 18--35.

[5]

Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In PODS 1998. 34--43.

[6]

David DeHaan and Frank Tompa. 2007. Optimal Top-Down Join Enumeration. In SIGMOD 2007. 785--796.

[7]

Bailu Ding, Surajit Chaudhuri, Johannes Gehrke, and Vivek Narasayya. 2021. DSB: A Decision Support Benchmark for Workload-Driven and Traditional Database Systems. PVLDB 14, 13 (2021), 3376--3388.

Digital Library

[8]

Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity Estimation for Range Predicates Using Lightweight Models. PVLDB 12, 9 (2019), 1044--1057.

Digital Library

[9]

Leonidas Fegaras. 1998. A New Heuristic for Optimizing Large Queries. In DEXA 1998. 726--735.

[10]

Immanuel Haffner and Jens Dittrich. 2023. Efficiently Computing Join Orders with Heuristic Search. PACMMOD 1, 1 (2023).

[11]

Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui. 2022. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. PVLDB 15, 4 (2022), 752--765.

[12]

Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from Data, not from Queries! PVLDB 13, 7 (2020), 992--1005.

Digital Library

[13]

Toshihide Ibaraki and Tiko Kameda. 1984. On the Optimal Nesting Order for Computing N-relational Joins. ACM Transactions on Database Systems 9, 3 (1984), 482--502.

Digital Library

[14]

Yannis E. Ioannidis and Stavros Christodoulakis. 1991. On the Propagation of Errors in the Size of Join Results. SIGMOD Record 20, 2 (1991), 268--277.

Digital Library

[15]

Yesdaulet Izenov. 2024. Sub-optimal Join Order Identification with L1-error. https://github.com/yizenov/l1-error.

[16]

Yesdaulet Izenov, Asoke Datta, Florin Rusu, and Jun Hyung Shin. 2021. COMPASS: Online Sketch-based Query Optimization for In-memory Databases. In SIGMOD 2021. 106--117.

[17]

Yesdaulet Izenov, Asoke Datta, Florin Rusu, and Jun Hyung Shin. 2021. Online Sketch-based Query Optimization. CoRR arXiv:2102.02440v1 (2021).

[18]

Maurice G Kendall. 1938. A New Measure of Rank Correlation. Biometrika 30, 1/2 (1938), 81--93.

[19]

Martin Kiefer, Max Heimel, Sebastian Breß, and Volker Markl. 2017. Estimating Join Selectivities using Bandwidth- Optimized Kernel Density Models. PVLDB 10, 13 (2017), 2085--2096.

Digital Library

[20]

Andreas Kipf. [n. d.]. JOB-light Benchmark. https://github.com/andreaskipf/learnedcardinalities/blob/master/workloads/job-light.sql.

[21]

Andreas Kipf, Thomas Kipf, Bernhard Radke, Victor Leis, Peter Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR 2019.

[22]

Donald Kossmann and Konrad Stocker. 2000. Iterative Dynamic Programming: A New Class of Query Optimization Algorithms. TODS 25, 1 (2000), 43--82.

Digital Library

[23]

Ravi Kumar and Sergei Vassilvitskii. 2010. Generalized Distances Between Rankings. In WWW 2010. 571--580.

[24]

Hai Lan, Zhifeng Bao, and Yuwei Peng. 2021. Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration. Data Science and Engineering 6 (2021), 86--101.

[25]

Victor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (2015), 204--215.

Digital Library

[26]

Victor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. VLDBJ 27 (2018), 643--668.

Digital Library

[27]

Beibin Li, Yao Lu, Chi Wang, and Srikanth Kandula. 2021. Q-error Bounds of Random Uniform Sampling for Cardinality Estimation. CoRR arXiv:2108.02715v2 (2021).

[28]

Christina Lioma and Niels Dalum Hansen. 2017. A Study of Metrics of Distance and Correlation between Ranked Lists for Compositionality Detection. Cognitive Systems Research 44 (2017), 40--49.

Digital Library

[29]

Guy Lohman. 2014. Is Query Optimization a Solved Problem? https://wp.sigmod.org/?p=1075.

[30]

Guido Moerkotte and Pit Fender. 2013. Counter Strike: Generic Top-Down Join Enumeration for Hypergraphs. PVLDB 6, 14 (2013), 1822--1833.

[31]

Guido Moerkotte and Thomas Neumann. 2006. Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees Without Cross Products. In VLDB 2006. 930--941.

[32]

Guido Moerkotte and Thomas Neumann. 2008. Dynamic Programming Strikes Back. In SIGMOD 2008. 539--552.

[33]

Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. PVLDB 2, 1 (2009), 982--993.

Digital Library

[34]

Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2021. Flow-Loss: Learning Cardinality Estimates That Matter. PVLDB 14, 11 (2021), 2019--2032.

Digital Library

[35]

Parimarjan Negi, Ryan Marcus, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where it Matters. In ICDE Workshops 2020. 154--157.

[36]

Thomas Neumann. 2009. Query Simplification: Graceful Degradation for Join-Order Optimization. In SIGMOD 2009. 403--414.

[37]

Open Source Relational Database [n. d.]. PostgreSQL. www.postgresql.org.

[38]

Matthew Perron, Zeyuan Shang, Tim Kraska, and Michael Stonebraker. 2019. How I Learned to Stop Worrying and Love Re-optimization. In ICDE 2019. 1758--1761.

[39]

Greg Rahn. [n. d.]. Join Order Benchmark (JOB). https://github.com/gregrahn/join-order-benchmark.

[40]

P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlain, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In SIGMOD 1979. 23--34.

[41]

Charles Spearman. 1904. The Proof and Measurement of Association between Two Things. The American Journal of Psychology 15, 1 (1904), 72--101.

[42]

Michael Steinbrunn, Guido Moerkotte, and Alfons Kemper. 1997. Heuristic and Randomized Optimization for the Join Ordering Problem. VLDBJ 6, 3 (1997), 191--208.

Digital Library

[43]

Arun Swami. 1989. Optimization of Large Join Queries: Combining Heuristics and Combinatorial Techniques. In SIGMOD 1989. 367--376.

[44]

Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2021. Are We Ready For Learned Cardinality Estimation? PVLDB 14, 9 (2021), 1640--1654.

Digital Library

[45]

Florian Wolf, Michael Brendle, Norman May, Paul R. Willems, Kai-Uwe Sattler, and Michael Grossniklaus. 2018. Robustness Metrics for Relational Query Execution Plans. PVLDB 11, 11 (2018), 1360--1372.

Digital Library

[46]

Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2021. NeuroCard: One Cardinality Estimator for All Tables. PVLDB 14, 1 (2021), 61--73.

Digital Library

[47]

Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. PVLDB 13, 3 (2019), 279--292.

Digital Library

[48]

Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. PVLDB 14, 9 (2021), 1489--1502.

Digital Library

Index Terms

Sub-optimal Join Order Identification with L1-error
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization
        Query planning

Recommendations

Query optimization through the looking glass, and what we found running the Join Order Benchmark

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark that works on real-life data riddled with correlations and introduces 113 complex join queries. We experimentally revisit the main ...
Index-Based Join Size Estimation Using Adaptive Sampling
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Cost-based query optimizers rely on cardinality estimates of intermediate results to avoid suboptimal query execution plans. However, when confronted with ad-hoc queries on big data, said optimizers can produce large estimation errors, resulting in ...
Deep Reinforcement Learning for Join Order Enumeration
aiDM'18: Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Join order selection plays a significant role in query performance. However, modern query optimizers typically employ static join order enumeration algorithms that do not incorporate feedback about the quality of the resulting plan. Hence, optimizers ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 2, Issue 1

SIGMOD

February 2024

1874 pages

EISSN:2836-6573

DOI:10.1145/3654807

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024

Published in PACMMOD Volume 2, Issue 1

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
238
Total Downloads

Downloads (Last 12 months)238
Downloads (Last 6 weeks)44

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents