Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3452840acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

COMPASS: Online Sketch-based Query Optimization for In-Memory Databases

Published: 18 June 2021 Publication History

Abstract

Cost-based query optimization remains a critical task in relational databases even after decades of research and industrial development. Query optimizers rely on a large range of statistical synopses for accurate cardinality estimation. As the complexity of selections and the number of join predicates increase, two problems arise. First, statistics cannot be incrementally composed to effectively estimate the cost of the sub-plans generated in plan enumeration. Second, small errors are propagated exponentially through joins, which can lead to severely sub-optimal plans. In this paper, we introduce COMPASS, a novel query optimization paradigm for in-memory databases based on a single type of statistics---Fast-AGMS sketches. In COMPASS, query optimization and execution are intertwined. Selection predicates and sketch updates are pushed-down and evaluated online during query optimization. This allows Fast-AGMS sketches to be computed only over the relevant tuples---which enhances cardinality estimation accuracy. Plan enumeration is performed over the query join graph by incrementally composing attribute-level sketches---not by building a separate sketch for every sub-plan. We prototype COMPASS in MapD -- an open-source parallel database -- and perform extensive experiments over the complete JOB benchmark. The results prove that COMPASS generates better execution plans -- both in terms of cardinality and runtime -- compared to four other database systems. Overall, COMPASS achieves a speedup ranging from 1.35X to 11.28X in cumulative query execution time over the considered competitors.

Supplementary Material

Read me (3448016.3452840_readme.pdf)
Source Code (3448016.3452840_source_code.zip)
MP4 File (3448016.3452840.mp4)
Cost-based query optimization remains a critical task in relational databases even after decades of research and industrial development. Query optimizers rely on a large range of statistical synopses -- including attribute-level histograms and table-level samples -- for accurate cardinality estimation. As the complexity of selection predicates and the number of join predicates increase, two problems arise. First, statistics cannot be incrementally composed to effectively estimate the cost of the sub-plans generated in plan enumeration. Second, small errors are propagated exponentially through joins, which can lead to severely sub-optimal plans. In this paper, we introduce COMPASS, a novel query optimization paradigm for in-memory databases based on a single type of statistics---Fast-AGMS sketches. In COMPASS, query optimization and execution are intertwined. Selection predicates and sketch updates are pushed-down and evaluated online during query optimization. This allows Fast-AGMS sketches to be computed only over the relevant tuples---which enhances cardinality estimation accuracy. Plan enumeration is performed over the query join graph by incrementally composing attribute-level sketches---not by building a separate sketch for every sub-plan.We prototype COMPASS in MapD -- an open-source parallel database -- and perform extensive experiments over the complete JOB benchmark. The results prove the reduced overhead COMPASS incurs, while generating better execution plans -- both in terms of cardinality and runtime -- compared to four other database systems. Overall, COMPASS achieves a speedup ranging from 1.89X to 7.09X in cumulative query execution time over the considered competitors. Moreover, COMPASS is the only optimizer that consistently generates effective plans for complex queries with 10 or more joins.

References

[1]
N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. Tracking Join and Self-Join Sizes in Limited Storage. In PODS 1999, pages 10--20.
[2]
R. Avnur and J. M. Hellerstein. Eddies: Continuously Adaptive Query Processing. In SIGMOD 2000, pages 261--272.
[3]
W. Cai, M. Balazinska, and D. Suciu. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In SIGMOD 2019, pages 18--35.
[4]
S. Chaudhuri. An Overview of Query Optimization in Relational Systems. In PODS 1998, pages 34--43.
[5]
G. Cormode and M. Garofalakis. Sketching Streams Through the Net: Distributed Approximate Query Tracking. In VLDB 2005, pages 13--24.
[6]
G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Foundation and Trends in Databases, 4:1--294, 2012.
[7]
A. Deshpande, Z. Ives, and V. Raman. Adaptive Query Processing. Foundations and Trends in Databases, 1(1):1--140, 2007.
[8]
A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. Processing Complex Aggregate Queries over Data Streams. In SIGMOD 2002, pages 61--72.
[9]
A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. Sketch-Based Multi-query Processing over Data Streams. In EDBT 2004, pages 551--568.
[10]
A. Dutt, C. Wang, A. Nazi, S. Kandula, V. Narasayya, and S. Chaudhuri. Selectivity Estimation for Range Predicates Using Lightweight Models. PVLDB, 12(9):1044--1057, 2019.
[11]
S. Hasan, S. Thirumuruganathan, J. Augustine, N. Koudas, and G. Das. Multi-Attribute Selectivity Estimation Using Deep Learning. CoRR, arXiv:1903.09999v2, 2019.
[12]
M. Heimel, M. Kiefer, and V. Markl. Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation. In SIGMOD 2015, pages 1477--1492.
[13]
S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, and M. Kersten. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Engineering Bulletin, 35(1):40--45, 2012.
[14]
Y. E. Ioannidis and S. Christodoulakis. On the Propagation of Errors in the Size of Join Results. SIGMOD Record, 20(2):268--277, 1991.
[15]
Y. Izenov, A. Datta, F. Rusu, and J. H. Shin. Online Sketch-based Query Optimization. CoRR, arXiv:2102.02440, 2021.
[16]
N. Kabra and D. J. DeWitt. Efficient Mid-query Re-optimization of Sub-optimal Query Execution Plans. In SIGMOD 1998, pages 106--117.
[17]
A. R. Kader, P. Boncz, S. Manegold, and M. van Keulen. ROX: Run-time Optimization of XQueries. In SIGMOD 2009, pages 615--626.
[18]
M. Kiefer, M. Heimel, S. Breß, and V. Markl. Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models. PVLDB, 10(13):2085--2096, 2017.
[19]
A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned Cardinalities:Estimating Correlated Joins with Deep Learning. In CIDR 2019.
[20]
A. Kipf, D. Vorona, J. Muller, T. Kipf, B. Radke, V. Leis, P. Boncz, T. Neumann, and A. Kemper. Estimating Cardinalities with Deep Sketches. CoRR, arXiv:1904.08223v1, 2019.
[21]
T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In SIGMOD 2018, pages 489--504.
[22]
S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR, arXiv:1808.03196v2, 2018.
[23]
V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? PVLDB, 9(3):204--215, 2015.
[24]
V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR 2017.
[25]
V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. VLDB Journal, 27:643--668, 2018.
[26]
H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality Estimation Using Neural Networks. In CASCON 2015, pages 53--59.
[27]
T. Malik, R. C. Burns, and N. V. Chawla. A Black-Box Approach to Query Cardinality Estimation. In CIDR 2007.
[28]
R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A Learned Query Optimizer. VLDB Journal, 12(11), 2019.
[29]
R. Marcus and O. Papaemmanouil. Deep Reinforcement Learning for Join Order Enumeration. In aiDM 2018.
[30]
V. Markl, G. M. Lohman, and V. Raman. LEO: An Autonomic Query Optimizer for DB2. IBM Systems Journal, 42(1):98--106, 2003.
[31]
M. Muller, G. Moerkotte, and O. Kolb. Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses. PVLDB, 9(11):1016--1028, 2018.
[32]
H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-Case Optimal Join Algorithms. In PODS 2012, pages 37--48.
[33]
J. Ortiz, M. Balazinska, J. Gehrke, and S. Sathiya Keerthi. An Empirical Analysis of Deep Learning for Cardinality Estimation. CoRR, arXiv:1905.06425v2, 2019.
[34]
V. Poosala and Y. E. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. In VLDB 1997, pages 486--495.
[35]
F. Rusu and A. Dobra. Sketching Sampled Data Streams. In ICDE 2009, pages 381--392.
[36]
F. Rusu and A. Dobra. Statistical Analysis of Sketch Estimators. In SIGMOD 2007, pages 187--198.
[37]
F. Rusu and A. Dobra. Sketches for Size of Join Estimation. TODS, 33(15), 2008.
[38]
J. H. Shin, F. Rusu, and A. Suhan. Exact Selectivity Computation for Modern In-Memory Database Query Optimization. CoRR, arXiv:1901.01488v1, 2019.
[39]
I. Trummer, J. Wang, D. Maram, S. Moseley, S. Jo, and J. Antonakakis. SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning. In SIGMOD 2019, pages 1153--1170.
[40]
D. Vengerov, A. C. Menck, M. Zait, and S. P. Chakkappen. Join Size Estimation Subject to Filter Condition. PVLDB, 8(12):1530--1541, 2015.
[41]
L. Woltmann, C. Hartmann, M. Thiele, D. Habich, and W. Lehner. Cardinality Estimation with Local Deep Learning Models. In aiDM 2019, pages 1--8.
[42]
W. Wu. Sampling-Based Cardinality Estimation Algorithms: A Survey and An Empirical Evaluation, 2012.
[43]
Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica. Selectivity Estimation with Deep Likelihood Models. CoRR, arXiv:1905.04278v2, 2019.
[44]
F. Yu, W. Hou, C. Luo, D. Che, and M. Zhu. CS2: A New Database Synopsis for Query Estimation. In SIGMOD 2013.
[45]
P. Boncz. The IMDB Dataset. http://homepages.cwi.nl/ boncz/job/imdb.tgz.
[46]
Y. Izenov. The COMPASS Query Optimizer. https://github.com/yizenov/compass_query_optimizer.
[47]
M. Kiefer. join-kde. https://github.com/martinkiefer/join-kde.
[48]
G. Lohman. Is Query Optimization a Solved Problem? https://wp.sigmod.org/?p=1075, 2014.
[49]
G. Rahn. Join Order Benchmark (JOB). https://github.com/gregrahn/join-order-benchmark.
[50]
F. Rusu. Sketches for Size of Join Estimation. https://faculty.ucmerced.edu/frusu/Projects/Sketches.
[51]
StackExchange. Distance Between Two Permutations? https://math.stackexchange.com/questions/2492954/distance-between-two-permutations.
[52]
Apache Calcite. https://calcite.apache.org.
[53]
MapD. www.omnisci.com.
[54]
MonetDB. www.monetdb.org.
[55]
PostgreSQL. www.postgresql.org.

Cited By

View all
  • (2024)POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least ResistanceProceedings of the VLDB Endowment10.14778/3648160.364817517:6(1350-1363)Online publication date: 3-May-2024
  • (2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
  • (2024)Approximate SketchesProceedings of the ACM on Management of Data10.1145/36393212:1(1-24)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Check for updates

Badges

Author Tags

  1. join cardinality estimation
  2. permutation distance
  3. sketches

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)304
  • Downloads (Last 6 weeks)33
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least ResistanceProceedings of the VLDB Endowment10.14778/3648160.364817517:6(1350-1363)Online publication date: 3-May-2024
  • (2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
  • (2024)Approximate SketchesProceedings of the ACM on Management of Data10.1145/36393212:1(1-24)Online publication date: 26-Mar-2024
  • (2024)CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation ModelsProceedings of the ACM on Management of Data10.1145/36393062:1(1-28)Online publication date: 26-Mar-2024
  • (2024)BitMatcher: Bit-level Counter Adjustment for Sketches2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00366(4815-4827)Online publication date: 13-May-2024
  • (2024)Newton Sketches: Estimating Node Intimacy in Dynamic Graphs Using Newton's Law of Cooling2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00225(2904-2916)Online publication date: 13-May-2024
  • (2024)Sketches-Based Join Size Estimation Under Local Differential Privacy2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00140(1726-1738)Online publication date: 13-May-2024
  • (2024)CodingSketch: A Hierarchical Sketch with Efficient Encoding and Recursive Decoding2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00130(1592-1605)Online publication date: 13-May-2024
  • (2024)WavingSketch: an unbiased and generic sketch for finding top-k items in data streamsThe VLDB Journal10.1007/s00778-024-00869-633:5(1697-1722)Online publication date: 29-Jul-2024
  • (2023)An empirical comparison of the performances of single structure columnar in-memory and disk-resident data storage techniques using healthcare big dataJournal of Big Data10.1186/s40537-023-00691-x10:1Online publication date: 19-Feb-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media