Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Sub-optimal Join Order Identification with L1-error

Published: 26 March 2024 Publication History

Abstract

Q-error -- the standard metric for quantifying the error of individual cardinality estimates -- has been widely adopted as a surrogate for query plan optimality in recent work on learning-based cardinality estimation. However, the only result connecting Q-error with plan optimality is an upper-bound on the cost of the worst possible query plan computed from a set of cardinality estimates---there is no connection between Q-error and the real plans generated by standard query optimizers. Therefore, in order to identify sub-optimal query plans, we propose a learning-based method having as its main feature a novel measure called L1-error. Similar to Q-error, L1-error requires complete knowledge of true cardinalities and estimates for all the sub-plans of a query plan. Unlike Q-error, which considers the estimates independently, L1-error is defined as a permutation distance between true cardinalities and estimates for all the sub-plans having the same number of joins. Moreover, L1-error takes into account errors relative to the magnitude of their cardinalities and gives larger weight to small multi-way joins. Our experimental results confirm that, when L1-error is integrated into a standard decision tree classifier, it leads to the accurate identification of sub-optimal plans across four different benchmarks. This accuracy can be further improved by combining L1-error with Q-error into a composite feature that can be computed without overhead from the same data.

References

[1]
Peter Boncz. [n. d.]. The IMDB Dataset. http://homepages.cwi.nl/~boncz/job/imdb.tgz.
[2]
Peter Boncz, Angelos-Christos Anatiotis, and Steffen Klabe. 2017. JCC-H: Adding Join Crossing Correlations with Skew to TPC-H. In TPCTC 2017. 103--119.
[3]
Nicolas Bruno, César Galindo-Legaria, and Milind Joshi. 2010. Polynomial Heuristics for Query Optimization. In ICDE 2010. 589--600.
[4]
Walter Cai, Magdalena Balazinska, and Dan Suciu. 2019. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In SIGMOD 2019. 18--35.
[5]
Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In PODS 1998. 34--43.
[6]
David DeHaan and Frank Tompa. 2007. Optimal Top-Down Join Enumeration. In SIGMOD 2007. 785--796.
[7]
Bailu Ding, Surajit Chaudhuri, Johannes Gehrke, and Vivek Narasayya. 2021. DSB: A Decision Support Benchmark for Workload-Driven and Traditional Database Systems. PVLDB 14, 13 (2021), 3376--3388.
[8]
Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity Estimation for Range Predicates Using Lightweight Models. PVLDB 12, 9 (2019), 1044--1057.
[9]
Leonidas Fegaras. 1998. A New Heuristic for Optimizing Large Queries. In DEXA 1998. 726--735.
[10]
Immanuel Haffner and Jens Dittrich. 2023. Efficiently Computing Join Orders with Heuristic Search. PACMMOD 1, 1 (2023).
[11]
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui. 2022. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. PVLDB 15, 4 (2022), 752--765.
[12]
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from Data, not from Queries! PVLDB 13, 7 (2020), 992--1005.
[13]
Toshihide Ibaraki and Tiko Kameda. 1984. On the Optimal Nesting Order for Computing N-relational Joins. ACM Transactions on Database Systems 9, 3 (1984), 482--502.
[14]
Yannis E. Ioannidis and Stavros Christodoulakis. 1991. On the Propagation of Errors in the Size of Join Results. SIGMOD Record 20, 2 (1991), 268--277.
[15]
Yesdaulet Izenov. 2024. Sub-optimal Join Order Identification with L1-error. https://github.com/yizenov/l1-error.
[16]
Yesdaulet Izenov, Asoke Datta, Florin Rusu, and Jun Hyung Shin. 2021. COMPASS: Online Sketch-based Query Optimization for In-memory Databases. In SIGMOD 2021. 106--117.
[17]
Yesdaulet Izenov, Asoke Datta, Florin Rusu, and Jun Hyung Shin. 2021. Online Sketch-based Query Optimization. CoRR arXiv:2102.02440v1 (2021).
[18]
Maurice G Kendall. 1938. A New Measure of Rank Correlation. Biometrika 30, 1/2 (1938), 81--93.
[19]
Martin Kiefer, Max Heimel, Sebastian Breß, and Volker Markl. 2017. Estimating Join Selectivities using Bandwidth- Optimized Kernel Density Models. PVLDB 10, 13 (2017), 2085--2096.
[20]
Andreas Kipf. [n. d.]. JOB-light Benchmark. https://github.com/andreaskipf/learnedcardinalities/blob/master/workloads/job-light.sql.
[21]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Victor Leis, Peter Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR 2019.
[22]
Donald Kossmann and Konrad Stocker. 2000. Iterative Dynamic Programming: A New Class of Query Optimization Algorithms. TODS 25, 1 (2000), 43--82.
[23]
Ravi Kumar and Sergei Vassilvitskii. 2010. Generalized Distances Between Rankings. In WWW 2010. 571--580.
[24]
Hai Lan, Zhifeng Bao, and Yuwei Peng. 2021. Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration. Data Science and Engineering 6 (2021), 86--101.
[25]
Victor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (2015), 204--215.
[26]
Victor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. VLDBJ 27 (2018), 643--668.
[27]
Beibin Li, Yao Lu, Chi Wang, and Srikanth Kandula. 2021. Q-error Bounds of Random Uniform Sampling for Cardinality Estimation. CoRR arXiv:2108.02715v2 (2021).
[28]
Christina Lioma and Niels Dalum Hansen. 2017. A Study of Metrics of Distance and Correlation between Ranked Lists for Compositionality Detection. Cognitive Systems Research 44 (2017), 40--49.
[29]
Guy Lohman. 2014. Is Query Optimization a Solved Problem? https://wp.sigmod.org/?p=1075.
[30]
Guido Moerkotte and Pit Fender. 2013. Counter Strike: Generic Top-Down Join Enumeration for Hypergraphs. PVLDB 6, 14 (2013), 1822--1833.
[31]
Guido Moerkotte and Thomas Neumann. 2006. Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees Without Cross Products. In VLDB 2006. 930--941.
[32]
Guido Moerkotte and Thomas Neumann. 2008. Dynamic Programming Strikes Back. In SIGMOD 2008. 539--552.
[33]
Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. PVLDB 2, 1 (2009), 982--993.
[34]
Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2021. Flow-Loss: Learning Cardinality Estimates That Matter. PVLDB 14, 11 (2021), 2019--2032.
[35]
Parimarjan Negi, Ryan Marcus, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where it Matters. In ICDE Workshops 2020. 154--157.
[36]
Thomas Neumann. 2009. Query Simplification: Graceful Degradation for Join-Order Optimization. In SIGMOD 2009. 403--414.
[37]
Open Source Relational Database [n. d.]. PostgreSQL. www.postgresql.org.
[38]
Matthew Perron, Zeyuan Shang, Tim Kraska, and Michael Stonebraker. 2019. How I Learned to Stop Worrying and Love Re-optimization. In ICDE 2019. 1758--1761.
[39]
Greg Rahn. [n. d.]. Join Order Benchmark (JOB). https://github.com/gregrahn/join-order-benchmark.
[40]
P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlain, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In SIGMOD 1979. 23--34.
[41]
Charles Spearman. 1904. The Proof and Measurement of Association between Two Things. The American Journal of Psychology 15, 1 (1904), 72--101.
[42]
Michael Steinbrunn, Guido Moerkotte, and Alfons Kemper. 1997. Heuristic and Randomized Optimization for the Join Ordering Problem. VLDBJ 6, 3 (1997), 191--208.
[43]
Arun Swami. 1989. Optimization of Large Join Queries: Combining Heuristics and Combinatorial Techniques. In SIGMOD 1989. 367--376.
[44]
Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2021. Are We Ready For Learned Cardinality Estimation? PVLDB 14, 9 (2021), 1640--1654.
[45]
Florian Wolf, Michael Brendle, Norman May, Paul R. Willems, Kai-Uwe Sattler, and Michael Grossniklaus. 2018. Robustness Metrics for Relational Query Execution Plans. PVLDB 11, 11 (2018), 1360--1372.
[46]
Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2021. NeuroCard: One Cardinality Estimator for All Tables. PVLDB 14, 1 (2021), 61--73.
[47]
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. PVLDB 13, 3 (2019), 279--292.
[48]
Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. PVLDB 14, 9 (2021), 1489--1502.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 1
SIGMOD
February 2024
1874 pages
EISSN:2836-6573
DOI:10.1145/3654807
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024
Published in PACMMOD Volume 2, Issue 1

Author Tags

  1. "bad" query plans
  2. cardinality estimation
  3. database query processing
  4. feature engineering
  5. join ordering
  6. permutation distance

Qualifiers

  • Research-article

Funding Sources

  • NSF

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 238
    Total Downloads
  • Downloads (Last 12 months)238
  • Downloads (Last 6 weeks)44
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media