research-article

Query predicate selectivity using machine learning in Db2^®

Authors:

Mohammed Alhamid,

Vincent Corvinelli,

Calisto Zuzarte,

Liam FinnieAuthors Info & Claims

CASCON '21: Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering

Pages 143 - 152

Published: 22 November 2021 Publication History

Abstract

The accuracy of cardinality estimation or the number of rows flowing through the query execution plan operators plays an important role in SQL query optimization. Cost-based optimizers depend on cardinality estimation to evaluate execution costs to select an optimal access plan. Achieving accurate cardinality estimation is difficult or expensive on tables that have correlated or skewed columns. Inaccurate cardinality estimation can lead to slow or unstable query performance. Although collecting statistics on multiple column combinations can minimize estimation errors with multiple predicates, it is hard to cover all column combinations. This paper presents a novel integrated approach using Machine Learning (ML) to learn and approximate the multivariate Cumulative Frequency Function (CFF) of column values, which is used to estimate cardinality for predicates with various relational operators. The key idea is that a model can learn the distribution of the data in the relation and can be used to predict cardinality for the query predicates accurately. The CFF model is also extended to estimate join cardinalities between tables. Experimental results demonstrate a significant improvement of cardinality estimation accuracy, computation efficiency, and amount of input required to train the model. Integration with the traditional optimizer is key to a smooth transition towards use in a production environment. This paper covers earlier technology previews shipped with Db2^®.

References

[1]

T. A. Beavin, B. R. Iyer, A. Shibamiya, H. S. Tie, and M. Wang. Query optimization through the use of multi-column statistics to avoid the problems of column correlation, Nov. 30, 1999. US Patent 5,995,957.

[2]

T. A. Beavin, B. R. Iyer, A. Shibamiya, H. S. Tie, and M. Wang. Query optimization through the use of multi-column statistics to avoid the problems of non-indexed column correlation, Aug. 7, 2001. US Patent 6,272,487.

[3]

N. Bruno and S. Chaudhuri. Exploiting statistics on query expressions for optimization. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD '02, pages 263--274, Madison, Wisconsin, USA, 2002. ACM.

Digital Library

[4]

C. M. Chen and N. Roussopoulos. Adaptive selectivity estimation using query feedback. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, SIGMOD '94, pages 161--172, Minneapolis, Minnesota, USA, 1994. ACM.

Digital Library

[5]

V. Corvinelli, H. Liu, M. Xu, Z. Yu, and C. P. Zuzarte. Cardinality estimation using artificial neural networks, Sep. 22, 2016. US Patent 2016/0,275,398.

[6]

A. Dutt, C. Wang, V. Narasayya, and S. Chaudhuri. Efficiently Approximating Selectivity Functions using Low Overhead Regression Models. PVLDB, 13(11): 2215--2228, 2020.

[7]

A. Dutt, C. Wang, A. Nazi, S. Kandula, V. Narasayya, and S. Chaudhuri. Selectivity Estimation for Range Predicates using Lightweight Models. PVLDB, 12(9):1044--1057, 2019.

Digital Library

[8]

J. C. Freytag, G. M. Lohman, D. E. Simmen, and C. P. Zuzarte. Query optimization method for incrementally estimating the cardinality of a derived relation when statistically correlated predicates are applied, May 18, 2004. US Patent 6,738,755.

[9]

S. Hasan, S. Thirumuruganathan, J. Augustine, N. Koudas, and G. Das. Deep learning models for selectivity estimation of multi-attribute queries. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD '20, pages 1035--1050. ACM.

[10]

J. C. Huang and B. J. Frey. Cumulative distribution networks and the derivative-sum-product algorithm. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI '08, Helsinki, Finland, pages 290--297, 2008.

Digital Library

[11]

O. Ivanov and S. Bartunov. Adaptive cardinality estimation. arXiv preprint arXiv:1711.08330, 2017.

[12]

A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. In CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 13--16, 2019.

[13]

V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.

Digital Library

[14]

V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, pages 1--26, 2018.

Digital Library

[15]

H. Lu and R. Setiono. Effective query size estimation using neural networks. Applied Intelligence, 16(3): 173--183, 2002.

Digital Library

[16]

M. Magdon-Ismail and A. F. Atiya. Neural networks for density estimation. In NIPS, pages 522--528, 1999.

Digital Library

[17]

R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, 13 T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A learned query optimizer. PVLDB, 12(11):1705--1718, 2019.

[18]

V. Markl, V. Raman, D. E. Simmen, G. Lohman, and H. Pirahesh. Robust query processing through progressive optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD '04, pages 659--670, Paris, France, 2004. ACM.

Digital Library

[19]

J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. An empirical analysis of deep learning for cardinality estimation. CoRR, abs/1905.06425, 2019.

[20]

D. E. Simmen. Query optimization technique for obtaining improved cardinality estimates using statistics on pre-defined queries, Sep. 16, 2004. US Patent 2004/0,181,521.

[21]

X. Wang, C. Qu, W. Wu, J. Wang, Q. Zhou. Are We Ready For Learned Cardinality Estimation? PVLDB, 14(9): 1640--1654, 2021.

Digital Library

[22]

L. Woltmann, C. Hartmann, M. Thiele, D. Habich, and W. Lehner. Cardinality estimation with local deep learning models. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM '19, pages 1--8, Amsterdam, Netherlands, 2019. ACM.

Digital Library

[23]

C. Wu, A. Jindal, S. Amizadeh, H. Patel, W. Le, S. Qiao, and S. Rao. Towards a learning optimizer for shared clouds. PVLDB, 12(3):210--222, Nov. 2018.

Digital Library

[24]

Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica. Deep unsupervised cardinality estimation. PVLDB, 13(3):279--292, Nov. 2019.

Digital Library

[25]

C. Zuzarte, and X. Yu. Fast approximate computation of statistics on views. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD '06, pages 724, Chicago, IL, USA, 2006. ACM.

Digital Library

Index Terms

Query predicate selectivity using machine learning in Db2^®

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor cardinality ...
Selectivity estimation using probabilistic models

Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, ...
PLAQUE: Automated Predicate Learning at Query Time
SIGMOD

Predicate pushing down is a key optimization used to speed up query processing. Much of the existing practice is restricted to pushing predicates explicitly listed in the query. In this paper, we consider the challenge of learning predicates during query ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

CASCON '21: Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering

November 2021

331 pages

Conference Chair:
Iosif-Viorel (Vio) Onuţ
IBM Canada Ltd.
,
Program Chair:
Farhana Zulkernine
Queen's University

Publisher

IBM Corp.

United States

Publication History

Published: 22 November 2021

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
58
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)3

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents