Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3401071.3401658acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Best of both worlds: combining traditional and machine learning models for cardinality estimation

Published: 14 June 2020 Publication History

Abstract

Cardinality estimation is a high-profile technique in database management systems with a serious impact on query performance. Thus, a lot of traditional approaches such as histograms-based or sampling-based methods have been developed over the last decades. With the advance of Machine Learning (ML) into the database world, cardinality estimation profits from several methods improving its quality as shown in different recent papers. However, neither an ML model nor a traditional approach meets all requirements for cardinality estimation, so that a one size fits all approach is difficult to imagine. For that reason, we advocate a better interlacing of ML models and traditional approaches for cardinality estimation and thoroughly consider their potential, advantages, and disadvantages in this paper. We start by proposing a classification of different estimation techniques and their usability for cardinality estimation. Then, we motivate a novel hybrid approach as the core proof of concept of this paper which uses the best of both worlds: ML models and the proven histogram approach. For this, we show in which cases it is beneficial to use ML models or when we can trust the traditional estimators. We evaluate our hybrid approach on two real-world data sets and conclude what can be done to improve the coexistence of traditional and ML approaches in DBMS. With all our proposals, we use ML to improve DBMS without abandoning years of valuable research in cardinality estimation.

References

[1]
Ashraf Aboulnaga and Surajit Chaudhuri. 1999. Self-tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD. 181--192.
[2]
United States Census Bureau. 2010. Census. http://www.census.gov Accessed on 2020-03-17.
[3]
Marianne Durand and Philippe Flajolet. 2003. Loglog Counting of Large Cardinalities (Extended Abstract). In ESA. 605--617.
[4]
Philippe Flajolet. 1990. On adaptive sampling. Computing 43, 4 (1990), 391--400.
[5]
Philippe Flajolet and G. Nigel Martin. 1985. Probabilistic Counting Algorithms for Data Base Applications. J. Comput. Syst. Sci. 31, 2 (1985), 182--209.
[6]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[7]
Phillip B. Gibbons. 2001. Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports. In VLDB. 541--550.
[8]
Phillip B. Gibbons, Yossi Matias, and Viswanath Poosala. 2002. Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst. 27, 3 (2002), 261--298.
[9]
Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, and Martin Strauss. 2002. Fast, small-space algorithms for approximate histogram maintenance. In STOC. 389--398.
[10]
Frédéric Giroire. 2009. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics 157, 2 (2009), 406--427.
[11]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, and Lynne Stokes. 1995. Sampling-Based Estimation of the Number of Distinct Values of an Attribute. In VLDB. 311--322.
[12]
Hazar Harmouch and Felix Naumann. 2017. Cardinality Estimation: An Experimental Survey. PVLDB 11, 4 (2017), 499--512.
[13]
Ihab F. Ilyas, Volker Markl, Peter Haas, Paul Brown, and Ashraf Aboulnaga. 2004. CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies. In SIGMOD. ACM, 647--658.
[14]
IMDB. 2017. Internet Movie Database. ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/ Accessed on 2020-03-17.
[15]
Yannis E. Ioannidis. 2003. The History of Histograms (abridged). In VLDB. 19--30.
[16]
H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, and Torsten Suel. 1998. Optimal Histograms with Quality Guarantees. In VLDB. 275--286.
[17]
Keras. 2020. Keras, the high-level neural networks API. https://keras.io/ Accessed on 2020-03-17.
[18]
M Kiefer, M Heimel, S Breß, and V Markl. 2017. Estimating join selectivities using bandwidth-optimized kernel density models. VLDB 10, 13 (2017), 2085--2096.
[19]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter A. Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR.
[20]
Arnd Christian König and Gerhard Weikum. 1999. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. In VLDB. 423--434.
[21]
Nick Koudas, S. Muthukrishnan, and Divesh Srivastava. 2000. Optimal Histograms for Hierarchical Range Queries. In PODS. 196--204.
[22]
Per-Åke Larson, Wolfgang Lehner, Jingren Zhou, and Peter Zabback. 2007. Cardinality estimation using sample views with quality assurance. In SIGMOD. 175--186.
[23]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? PVLDB 9, 3 (2015), 204--215.
[24]
Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, and Calisto Zuzarte. 2015. Cardinality Estimation Using Neural Networks. In CASCON. 53--59.
[25]
Ester Bernado Mansilla and Tin Kam Ho. 2004. On classifier domains of competence. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., Vol. 1. IEEE, 136--139.
[26]
Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. PVLDB 2, 1 (2009), 982--993.
[27]
Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering (2009), 1345--1359.
[28]
Kyu-Young Whang, Brad T. Vander Zanden, and Howard M. Taylor. 1990. A Linear-Time Probabilistic Counting Algorithm for Database Applications. ACM Trans. Database Syst. 15, 2 (1990), 208--229.
[29]
Lucas Woltmann. 2019. Cardinality Estimation with Local Deep Learning Models. https://github.com/lucaswo/cardest/ Accessed on 2020-03-17.
[30]
Lucas Woltmann, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner. 2020. Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data. arXiv:cs.DB/2005.09367
[31]
Lucas Woltmann, Claudio Hartmann, Maik Thiele, Dirk Habich, and Wolfgang Lehner. 2019. Cardinality Estimation with Local Deep Learning Models. In aiDM '19. ACM.
[32]
Karel Youssefi and Eugene Wong. 1979. Query Processing in a Relational Database Management System. In VLDB. 409--417.

Cited By

View all
  • (2023)SafeBound: A Practical System for Generating Cardinality BoundsProceedings of the ACM on Management of Data10.1145/35889071:1(1-26)Online publication date: 30-May-2023
  • (2023)Learned Probing Cardinality Estimation for High-Dimensional Approximate NN Search2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00246(3209-3221)Online publication date: Apr-2023
  • (2023)Virtual self-adaptive bitmap for online cardinality estimationInformation Systems10.1016/j.is.2022.102160114:COnline publication date: 1-Mar-2023
  • Show More Cited By

Index Terms

  1. Best of both worlds: combining traditional and machine learning models for cardinality estimation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      aiDM '20: Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management
      June 2020
      33 pages
      ISBN:9781450380294
      DOI:10.1145/3401071
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 June 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cardinality estimation
      2. hybrid
      3. machine learning
      4. neural networks
      5. query optimization

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '20
      Sponsor:

      Acceptance Rates

      aiDM '20 Paper Acceptance Rate 6 of 6 submissions, 100%;
      Overall Acceptance Rate 19 of 26 submissions, 73%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 07 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)SafeBound: A Practical System for Generating Cardinality BoundsProceedings of the ACM on Management of Data10.1145/35889071:1(1-26)Online publication date: 30-May-2023
      • (2023)Learned Probing Cardinality Estimation for High-Dimensional Approximate NN Search2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00246(3209-3221)Online publication date: Apr-2023
      • (2023)Virtual self-adaptive bitmap for online cardinality estimationInformation Systems10.1016/j.is.2022.102160114:COnline publication date: 1-Mar-2023
      • (2021)PostCENNProceedings of the VLDB Endowment10.14778/3476311.347632714:12(2715-2718)Online publication date: 28-Oct-2021

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media