Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A statistical approach towards robust progress estimation

Published: 01 December 2011 Publication History

Abstract

The need for accurate SQL progress estimation in the context of decision support administration has led to a number of techniques proposed for this task. Unfortunately, no single one of these progress estimators behaves robustly across the variety of SQL queries encountered in practice, meaning that each technique performs poorly for a significant fraction of queries. This paper proposes a novel estimator selection framework that uses a statistical model to characterize the sets of conditions under which certain estimators outperform others, leading to a significant increase in estimation robustness. The generality of this framework also enables us to add a number of novel "special purpose" estimators which increase accuracy further. Most importantly, the resulting model generalizes well to queries very different from the ones used to train it. We validate our findings using a large number of industrial real-life and benchmark workloads.

References

[1]
Program for TPC-H data generation with Skew. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/.
[2]
TPC-H and TPC-DS Benchmarks. http://www.tpc.org.
[3]
S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, and M. Syamala. Database Tuning Advisor for Microsoft SQL Server 2005. In VLDB, pages 1110--1121, 2004.
[4]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to Rank using Gradient Descent. In ICML, 2005.
[5]
S. Chaudhuri, R. Kaushik, and R. Ramamurthy. When can We Trust Progress Estimators For SQL Queries. In ACM SIGMOD, pages 575--586, 2005.
[6]
S. Chaudhuri, V. Narasayya, and R. Ramamurthy. Estimating Progress of Execution for SQL Queries. In ACM SIGMOD, pages 803--814, 2004.
[7]
D. J. DeWitt, J. F. Naughton, and J. Burger. Nested Loops Revisited. In PDIS, pages 230--242, 1993.
[8]
J. Duggan, U. Cetintemel, O. Papaemmanouil, and E. Upfal. Performance Prediction for Concurrent Database Workloads. In ACM SIGMOD, 2011.
[9]
M. Elhemali, C. A. Galindo-Legaria, T. Grabs, and M. M. Joshi. Execution Strategies for SQL Subqueries. In ACM SIGMOD, pages 993--1004, 2007.
[10]
J. Friedman. Greedy Function Approximation: a Gradient Boosting Machine. Annals of Statistics, 29(5), 2001.
[11]
A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In IEEE ICDE, pages 592--603, 2009.
[12]
G. Luo, J. Naughton, and P. Yu. Multi-query SQL Progress Indicators. In EDBT, pages 921--941, 2006.
[13]
G. Luo, J. F. Naughton, C. J. Ellmann, and M. W. Watzke. Toward a Progress Indicator for Database Queries. In ACM SIGMOD, pages 791--802, 2004.
[14]
G. Luo, J. F. Naughton, C. J. Ellmann, and M. W. Watzke. Increasing the Accuracy and Coverage of SQL Progress Indicators. In IEEE ICDE, pages 853--864, 2005.
[15]
C. Mishra and N. Koudas. The Design of a Query Monitoring System. ACM Trans. Database Syst., 34:1:1--1:51, April 2009.
[16]
C. Mishra and M. Volkovs. ConEx: a System for Monitoring Queries. In ACM SIGMOD, pages 1076--1078, 2007.
[17]
K. Morton, M. Balazinska, and D. Grossman. Paratimer: a Progress Indicator for MapReduce DAGs. In ACM SIGMOD, pages 507--518, 2010.
[18]
F. Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1):1--47, 2002.
[19]
Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Ranking, Boosting, and Model Adaptation. Technical report, Microsoft Research, 2008.

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2019)MIFOProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319902(1678-1695)Online publication date: 25-Jun-2019
  • (2019)A QueryRating-Based Statistical Model for Predicting Concurrent Query Response TimeWeb Information Systems and Applications10.1007/978-3-030-30952-7_71(704-713)Online publication date: 20-Sep-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 5, Issue 4
December 2011
120 pages

Publisher

VLDB Endowment

Publication History

Published: 01 December 2011
Published in PVLDB Volume 5, Issue 4

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2019)MIFOProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319902(1678-1695)Online publication date: 25-Jun-2019
  • (2019)A QueryRating-Based Statistical Model for Predicting Concurrent Query Response TimeWeb Information Systems and Applications10.1007/978-3-030-30952-7_71(704-713)Online publication date: 20-Sep-2019
  • (2017)Resource bricolage and resource selection for parallel database systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-016-0435-426:1(31-54)Online publication date: 1-Feb-2017
  • (2016)Operator and Query Progress Estimation in Microsoft SQL Server Live Query StatisticsProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2903728(1753-1764)Online publication date: 26-Jun-2016
  • (2013)Workload management for big data analyticsProceedings of the 2013 ACM SIGMOD International Conference on Management of Data10.1145/2463676.2467801(929-932)Online publication date: 22-Jun-2013

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media