research-article

Robust estimation of resource consumption for SQL queries using statistical techniques

Authors:

Arnd Christian König,

Vivek Narasayya,

Surajit ChaudhuriAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 5, Issue 11

Pages 1555 - 1566

https://doi.org/10.14778/2350229.2350269

Published: 01 July 2012 Publication History

Abstract

The ability to estimate resource consumption of SQL queries is crucial for a number of tasks in a database system such as admission control, query scheduling and costing during query optimization. Recent work has explored the use of statistical techniques for resource estimation in place of the manually constructed cost models used in query optimization. Such techniques, which require as training data examples of resource usage in queries, offer the promise of superior estimation accuracy since they can account for factors such as hardware characteristics of the system or bias in cardinality estimates. However, the proposed approaches lack robustness in that they do not generalize well to queries that are different from the training examples, resulting in significant estimation errors. Our approach aims to address this problem by combining knowledge of database query processing with statistical models. We model resource-usage at the level of individual operators, with different models and features for each operator type, and explicitly model the asymptotic behavior of each operator. This results in significantly better estimation accuracy and the ability to estimate resource usage of arbitrary plans, even when they are very different from the training instances. We validate our approach using various large scale real-life and benchmark workloads on Microsoft SQL Server.

References

[1]

Decision Tree Learning, Bregman Divergences, and the Histogram Trick. In submission.

[2]

Program for TPC-H data generation with Skew. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/.

[3]

TPC-H and TPC-DS Benchmarks. http://www.tpc.org.

[4]

WEKA workbench. http://www.cs.waikato.ac.nz/ml/weka/.

[5]

A. Aboulnaga and S. Chaudhuri. Self-tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD, pages 181--192, 1999.

Digital Library

[6]

M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu. Predicting Completion Times of Batch Query Workloads using Interaction-aware Models and Simulation. In EDBT, pages 449--460, 2011.

Digital Library

[7]

M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. Zdonik. The Case for Predictive Database Systems: Opportunities and Challenges. In CIDR, pages 167--174, 2011.

[8]

M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. Zdonik. Learning-based Query Performance Modeling and Prediction. In ICDE, 2012.

Digital Library

[9]

N. Bruno, S. Chaudhuri, and L. Gravano. STHoles: A Multidimensional Workload-Aware Histogram. In SIGMOD, pages 211--222, 2001.

Digital Library

[10]

C. Curino, E. P. C. Jones, S. Madden, and H. Balakrishnan. Workload-aware Database Monitoring and Consolidation. In SIGMOD, pages 313--324, 2011.

Digital Library

[11]

D. J. DeWitt, J. F. Naughton, and J. Burger. Nested Loops Revisited. In PDIS, pages 230--242, 1993.

Digital Library

[12]

J. Duggan, U. Cetintemel, O. Papaemmanouil, and E. Upfal. Performance Prediction for Concurrent Database Workloads. In SIGMOD, pages 337--348, 2011.

Digital Library

[13]

M. Elhemali, C. A. Galindo-Legaria, T. Grabs, and M. M. Joshi. Execution Strategies for SQL Subqueries. In SIGMOD, pages 993--1004, 2007.

Digital Library

[14]

J. Friedman. Greedy Function Approximation: a Gradient Boosting Machine. Annals of Statistics, 29(5):1189--1232, 2001.

[15]

A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE, pages 592--603, 2009.

Digital Library

[16]

G. Graefe. Query Evaluation Techniques for Large Databases. ACM Comput. Surv., 25(2):73--169, 1993.

Digital Library

[17]

H. Herodotou and S. Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. In VLDB, pages 1111--1122, 2011.

Digital Library

[18]

E. P. D. Pednault. Transform Regression and the Kolmogorov Superposition Theorem. In SDM, pages 35--46, 2006.

[19]

M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's LEarning Optimizer. In VLDB, pages 19--28, 2001.

Digital Library

[20]

V. Vapnik. Statistical Learning Theory. Wiley, 2000.

[21]

Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Ranking, Boosting, and Model Adaptation. Technical report, Microsoft Research, 2008.

[22]

N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical Learning Techniques for Costing XML Queries. In VLDB, pages 289--300, 2005.

Digital Library

Cited By

Yu GWu ZKossmann FLi TMarkakis MNgom AMadden SKraska T(2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682026
Wang XWu WWang CNarasayya VChaudhuri S(2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654985
Siddiqui TWu W(2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3641832.3641836
Show More Cited By

Recommendations

Online estimation for subset-based SQL queries
VLDB '05: Proceedings of the 31st international conference on Very large data bases

The largest databases in use today are so large that answering a query exactly can take minutes, hours, or even days. One way to address this problem is to make use of approximation algorithms. Previous work on online aggregation has considered how to ...
Robust variance estimation for random effects meta-analysis

In random effects meta-analysis, an overall effect is estimated using a weighted mean, with weights based on estimated marginal variances. The variance of the overall effect is often estimated using the inverse of the sum of the estimated weights, and ...
Equivalence of SQL queries in presence of embedded dependencies
PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 5, Issue 11

July 2012

608 pages

ISSN:2150-8097

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2012

Published in PVLDB Volume 5, Issue 11

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
317
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yu GWu ZKossmann FLi TMarkakis MNgom AMadden SKraska T(2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682026
Wang XWu WWang CNarasayya VChaudhuri S(2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654985
Siddiqui TWu W(2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3641832.3641836
Brucato MSiddiqui TWu WNarasayya VChaudhuri S(2024)Wred: Workload Reduction for Scalable Index TuningProceedings of the ACM on Management of Data10.1145/36393052:1(1-26)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639305
Wu ZMarcus RLiu ZNegi PNathan VPfeil PSaxena GRahman MNarayanaswamy BKraska TBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Stage: Query Execution Time Prediction in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653391(280-294)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653391
Pavlenko ACahoon JZhu YKroth BNelson MCarter ALiao DWright TCamacho-Rodríguez JSaur KBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable Container-as-a-Service Performance Enhanced Resizing Algorithm for the CloudCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653378(241-254)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653378
He ZYu JGu TYang D(2024)Query execution time estimation in graph databases based on graph neural networksJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10201836:4Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.jksuci.2024.102018
Kraska TLi TMadden SMarkakis MNgom AWu ZYu G(2023)Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data MeshesProceedings of the VLDB Endowment10.14778/3611479.361152616:11(3293-3301)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611526
Yang JWu SZhang DDai JLi FChen G(2023)Rethinking Learned Cost Models: Why Start from Scratch?Proceedings of the ACM on Management of Data10.1145/36267691:4(1-27)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626769
Paun IMoshfeghi YNtarmos N(2023)White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ PerformanceACM Transactions on Internet Technology10.1145/355497923:1(1-29)Online publication date: 23-Feb-2023
https://dl.acm.org/doi/10.1145/3554979
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents