Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Robust estimation of resource consumption for SQL queries using statistical techniques

Published: 01 July 2012 Publication History

Abstract

The ability to estimate resource consumption of SQL queries is crucial for a number of tasks in a database system such as admission control, query scheduling and costing during query optimization. Recent work has explored the use of statistical techniques for resource estimation in place of the manually constructed cost models used in query optimization. Such techniques, which require as training data examples of resource usage in queries, offer the promise of superior estimation accuracy since they can account for factors such as hardware characteristics of the system or bias in cardinality estimates. However, the proposed approaches lack robustness in that they do not generalize well to queries that are different from the training examples, resulting in significant estimation errors. Our approach aims to address this problem by combining knowledge of database query processing with statistical models. We model resource-usage at the level of individual operators, with different models and features for each operator type, and explicitly model the asymptotic behavior of each operator. This results in significantly better estimation accuracy and the ability to estimate resource usage of arbitrary plans, even when they are very different from the training instances. We validate our approach using various large scale real-life and benchmark workloads on Microsoft SQL Server.

References

[1]
Decision Tree Learning, Bregman Divergences, and the Histogram Trick. In submission.
[2]
Program for TPC-H data generation with Skew. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/.
[3]
TPC-H and TPC-DS Benchmarks. http://www.tpc.org.
[4]
WEKA workbench. http://www.cs.waikato.ac.nz/ml/weka/.
[5]
A. Aboulnaga and S. Chaudhuri. Self-tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD, pages 181--192, 1999.
[6]
M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu. Predicting Completion Times of Batch Query Workloads using Interaction-aware Models and Simulation. In EDBT, pages 449--460, 2011.
[7]
M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. Zdonik. The Case for Predictive Database Systems: Opportunities and Challenges. In CIDR, pages 167--174, 2011.
[8]
M. Akdere, U. Cetintemel, M. Riondato, E. Upfal, and S. Zdonik. Learning-based Query Performance Modeling and Prediction. In ICDE, 2012.
[9]
N. Bruno, S. Chaudhuri, and L. Gravano. STHoles: A Multidimensional Workload-Aware Histogram. In SIGMOD, pages 211--222, 2001.
[10]
C. Curino, E. P. C. Jones, S. Madden, and H. Balakrishnan. Workload-aware Database Monitoring and Consolidation. In SIGMOD, pages 313--324, 2011.
[11]
D. J. DeWitt, J. F. Naughton, and J. Burger. Nested Loops Revisited. In PDIS, pages 230--242, 1993.
[12]
J. Duggan, U. Cetintemel, O. Papaemmanouil, and E. Upfal. Performance Prediction for Concurrent Database Workloads. In SIGMOD, pages 337--348, 2011.
[13]
M. Elhemali, C. A. Galindo-Legaria, T. Grabs, and M. M. Joshi. Execution Strategies for SQL Subqueries. In SIGMOD, pages 993--1004, 2007.
[14]
J. Friedman. Greedy Function Approximation: a Gradient Boosting Machine. Annals of Statistics, 29(5):1189--1232, 2001.
[15]
A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE, pages 592--603, 2009.
[16]
G. Graefe. Query Evaluation Techniques for Large Databases. ACM Comput. Surv., 25(2):73--169, 1993.
[17]
H. Herodotou and S. Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. In VLDB, pages 1111--1122, 2011.
[18]
E. P. D. Pednault. Transform Regression and the Kolmogorov Superposition Theorem. In SDM, pages 35--46, 2006.
[19]
M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - DB2's LEarning Optimizer. In VLDB, pages 19--28, 2001.
[20]
V. Vapnik. Statistical Learning Theory. Wiley, 2000.
[21]
Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Ranking, Boosting, and Model Adaptation. Technical report, Microsoft Research, 2008.
[22]
N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical Learning Techniques for Costing XML Queries. In VLDB, pages 289--300, 2005.

Cited By

View all
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
  • (2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 5, Issue 11
July 2012
608 pages

Publisher

VLDB Endowment

Publication History

Published: 01 July 2012
Published in PVLDB Volume 5, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
  • (2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • (2024)Wred: Workload Reduction for Scalable Index TuningProceedings of the ACM on Management of Data10.1145/36393052:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Stage: Query Execution Time Prediction in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653391(280-294)Online publication date: 9-Jun-2024
  • (2024)Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable Container-as-a-Service Performance Enhanced Resizing Algorithm for the CloudCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653378(241-254)Online publication date: 9-Jun-2024
  • (2024)Query execution time estimation in graph databases based on graph neural networksJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10201836:4Online publication date: 1-Apr-2024
  • (2023)Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data MeshesProceedings of the VLDB Endowment10.14778/3611479.361152616:11(3293-3301)Online publication date: 24-Aug-2023
  • (2023)Rethinking Learned Cost Models: Why Start from Scratch?Proceedings of the ACM on Management of Data10.1145/36267691:4(1-27)Online publication date: 12-Dec-2023
  • (2023)White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ PerformanceACM Transactions on Internet Technology10.1145/355497923:1(1-29)Online publication date: 23-Feb-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media