Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3507788.3507808dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
research-article

Query predicate selectivity using machine learning in Db2®

Published: 22 November 2021 Publication History

Abstract

The accuracy of cardinality estimation or the number of rows flowing through the query execution plan operators plays an important role in SQL query optimization. Cost-based optimizers depend on cardinality estimation to evaluate execution costs to select an optimal access plan. Achieving accurate cardinality estimation is difficult or expensive on tables that have correlated or skewed columns. Inaccurate cardinality estimation can lead to slow or unstable query performance. Although collecting statistics on multiple column combinations can minimize estimation errors with multiple predicates, it is hard to cover all column combinations. This paper presents a novel integrated approach using Machine Learning (ML) to learn and approximate the multivariate Cumulative Frequency Function (CFF) of column values, which is used to estimate cardinality for predicates with various relational operators. The key idea is that a model can learn the distribution of the data in the relation and can be used to predict cardinality for the query predicates accurately. The CFF model is also extended to estimate join cardinalities between tables. Experimental results demonstrate a significant improvement of cardinality estimation accuracy, computation efficiency, and amount of input required to train the model. Integration with the traditional optimizer is key to a smooth transition towards use in a production environment. This paper covers earlier technology previews shipped with Db2®.

References

[1]
T. A. Beavin, B. R. Iyer, A. Shibamiya, H. S. Tie, and M. Wang. Query optimization through the use of multi-column statistics to avoid the problems of column correlation, Nov. 30, 1999. US Patent 5,995,957.
[2]
T. A. Beavin, B. R. Iyer, A. Shibamiya, H. S. Tie, and M. Wang. Query optimization through the use of multi-column statistics to avoid the problems of non-indexed column correlation, Aug. 7, 2001. US Patent 6,272,487.
[3]
N. Bruno and S. Chaudhuri. Exploiting statistics on query expressions for optimization. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD '02, pages 263--274, Madison, Wisconsin, USA, 2002. ACM.
[4]
C. M. Chen and N. Roussopoulos. Adaptive selectivity estimation using query feedback. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, SIGMOD '94, pages 161--172, Minneapolis, Minnesota, USA, 1994. ACM.
[5]
V. Corvinelli, H. Liu, M. Xu, Z. Yu, and C. P. Zuzarte. Cardinality estimation using artificial neural networks, Sep. 22, 2016. US Patent 2016/0,275,398.
[6]
A. Dutt, C. Wang, V. Narasayya, and S. Chaudhuri. Efficiently Approximating Selectivity Functions using Low Overhead Regression Models. PVLDB, 13(11): 2215--2228, 2020.
[7]
A. Dutt, C. Wang, A. Nazi, S. Kandula, V. Narasayya, and S. Chaudhuri. Selectivity Estimation for Range Predicates using Lightweight Models. PVLDB, 12(9):1044--1057, 2019.
[8]
J. C. Freytag, G. M. Lohman, D. E. Simmen, and C. P. Zuzarte. Query optimization method for incrementally estimating the cardinality of a derived relation when statistically correlated predicates are applied, May 18, 2004. US Patent 6,738,755.
[9]
S. Hasan, S. Thirumuruganathan, J. Augustine, N. Koudas, and G. Das. Deep learning models for selectivity estimation of multi-attribute queries. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD '20, pages 1035--1050. ACM.
[10]
J. C. Huang and B. J. Frey. Cumulative distribution networks and the derivative-sum-product algorithm. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI '08, Helsinki, Finland, pages 290--297, 2008.
[11]
O. Ivanov and S. Bartunov. Adaptive cardinality estimation. arXiv preprint arXiv:1711.08330, 2017.
[12]
A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. In CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 13--16, 2019.
[13]
V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.
[14]
V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, pages 1--26, 2018.
[15]
H. Lu and R. Setiono. Effective query size estimation using neural networks. Applied Intelligence, 16(3): 173--183, 2002.
[16]
M. Magdon-Ismail and A. F. Atiya. Neural networks for density estimation. In NIPS, pages 522--528, 1999.
[17]
R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, 13 T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A learned query optimizer. PVLDB, 12(11):1705--1718, 2019.
[18]
V. Markl, V. Raman, D. E. Simmen, G. Lohman, and H. Pirahesh. Robust query processing through progressive optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD '04, pages 659--670, Paris, France, 2004. ACM.
[19]
J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. An empirical analysis of deep learning for cardinality estimation. CoRR, abs/1905.06425, 2019.
[20]
D. E. Simmen. Query optimization technique for obtaining improved cardinality estimates using statistics on pre-defined queries, Sep. 16, 2004. US Patent 2004/0,181,521.
[21]
X. Wang, C. Qu, W. Wu, J. Wang, Q. Zhou. Are We Ready For Learned Cardinality Estimation? PVLDB, 14(9): 1640--1654, 2021.
[22]
L. Woltmann, C. Hartmann, M. Thiele, D. Habich, and W. Lehner. Cardinality estimation with local deep learning models. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM '19, pages 1--8, Amsterdam, Netherlands, 2019. ACM.
[23]
C. Wu, A. Jindal, S. Amizadeh, H. Patel, W. Le, S. Qiao, and S. Rao. Towards a learning optimizer for shared clouds. PVLDB, 12(3):210--222, Nov. 2018.
[24]
Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica. Deep unsupervised cardinality estimation. PVLDB, 13(3):279--292, Nov. 2019.
[25]
C. Zuzarte, and X. Yu. Fast approximate computation of statistics on views. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD '06, pages 724, Chicago, IL, USA, 2006. ACM.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CASCON '21: Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering
November 2021
331 pages

Publisher

IBM Corp.

United States

Publication History

Published: 22 November 2021

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 58
    Total Downloads
  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)3
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media