Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3450572acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
abstract

Index-Based Join Size Estimation Using Adaptive Sampling

Published: 18 June 2021 Publication History

Abstract

Cost-based query optimizers rely on cardinality estimates of intermediate results to avoid suboptimal query execution plans. However, when confronted with ad-hoc queries on big data, said optimizers can produce large estimation errors, resulting in drastic decreases in overall performance. Such errors occur because many estimation algorithms for joins make use of strong independence and uniformity assumptions. Moreover, equi-joins on skewed data with filter predicates tend to cause the aforementioned assumptions to fail [2]. Since the cardinality estimate of a result with many joins depends on estimates of the underlying joins, it has been shown that improving the accuracy of join size estimates in a "bottom-up" order can significantly improve performance [2]. Thus, our research aims at improving the estimation of two-table join sizes.
Certain join size estimation approaches that use offline samples fare poorly with filtering and can suffer from insufficient sample size [2]. Alternatively, query optimizers may make use of persisted histograms. However, the associated storage space is a large deterrent, as is the case with persisting offline samples [4]. In turn, algorithms have been presented wherein adaptive, block-level sampling is conducted during query optimization [5]. To the best of our knowledge, there is no algorithm for two-table join size estimation that 1) samples filtered base tables, 2) makes use exclusively of persisted counts in B+-tree indexes and 3) attempts to provide statistical confidence on estimates.
In this paper, we present and evaluate a novel join size estimation algorithm prototyped on SAP IQ. Our algorithm can be easily incorporated into query optimizers that utilize bottom-up enumeration and evaluate filter predicates prior to optimization. In contrast with existing machine learning based approaches [1], the algorithm is simpler to implement or derive from existing support for sampling. Join size estimates are produced using a combination of variance-based calculations from Oracle 12c [5] along with the usage of persisted counts in indexes from index-based sampling [2]. In doing so, the amount of sampling conducted is minimized until either a parameterized budget is surpassed or there is sufficient statistical confidence in an estimate. On a subset of industry-standard benchmark queries involving joins on skewed data, our method improved the overall execution time by 16%. Amongst queries with execution plans altered by our estimates, the mean percentage improvement in individual execution time was 34%.

References

[1]
A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. CoRR, abs/1809.00677, 2018.
[2]
V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org, 2017.
[3]
G. Moerkotte, D. DeHaan, N. May, A. Nica, and A. Bö hm. Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA. In C. E. Dyreson, F. Li, and M. T. Özsu, editors, International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, pages 361--372. ACM, 2014.
[4]
T. Wang and C. Chan. Improved Correlated Sampling for Join Size Estimation. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20--24, 2020, pages 325--336. IEEE, 2020.
[5]
M. Zait, S. Chakkappen, S. Budalakoti, S. R. Valluri, R. Krishnamachari, and A. Wood. Adaptive Statistics in Oracle 12c. Proc. VLDB Endow., 10(12):1813--1824, 2017.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Check for updates

Author Tags

  1. adaptive sampling
  2. cardinality
  3. cardinality estimation
  4. database indexes
  5. join ordering
  6. query optimization

Qualifiers

  • Abstract

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 183
    Total Downloads
  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)5
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media