Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1951365.1951419acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Predicting completion times of batch query workloads using interaction-aware models and simulation

Published: 21 March 2011 Publication History

Abstract

A question that database administrators (DBAs) routinely need to answer is how long a batch query workload will take to complete. This question arises, for example, while planning the execution of different report-generation workloads to fit within available time windows. To answer this question accurately, we need to take into account that the typical workload in a database system consists of mixes of concurrent queries. Interactions among different queries in these mixes need to be modeled, rather than the conventional approach of considering each query separately. This paper presents a new approach for estimating workload completion times that takes the significant impact of query interactions into account. This approach builds performance models using an experiment-driven technique, by sampling the space of possible query mixes and fitting statistical models to the observed performance at these samples. No prior assumptions are made about the internal workings of the database system or the cause of query interactions, making the models robust and portable. We show that a careful choice of sampling and statistical modeling strategies can result in accurate models, and we present a novel interaction-aware workload simulator that uses these models to estimate workload completion times. An experimental evaluation with complex TPC-H queries on IBM DB2 shows that this approach consistently predicts workload completion times with less than 20% error.

References

[1]
M. Ahmad, A. Aboulnaga, and S. Babu. Query interactions in database workloads. In Proc. Int. Workshop on Testing Database Systems (DBTest), 2009.
[2]
M. Ahmad, A. Aboulnaga, S. Babu, and K. Munagala. Modeling and exploiting query interactions in database systems. In Proc. ACM Conf. on Information and Knowledge Management (CIKM), 2008.
[3]
M. Ahmad, A. Aboulnaga, S. Babu, and K. Munagala. Interaction-aware scheduling of report generation workloads. The VLDB Journal, 2011. (to appear).
[4]
M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu. Interaction-aware prediction of business intelligence workload completion times. In Proc. Int. Conf. on Data Engineering (ICDE), 2010. (short paper).
[5]
Aster Data. http://www.asterdata.com/.
[6]
S. Babu, N. Borisov, S. Duan, H. Herodotou, and V. Thummala. Automated experiment-driven management of (database) systems. In Proc. Workshop on Hot Topics in Operating Systems (HotOS), 2009.
[7]
P. Belknap, B. Dageville, K. Dias, and K. Yagoub. Self-tuning for SQL performance in Oracle database 11g. In Proc. Int. Workshop on Self Managing Database Systems (SMDB), 2009.
[8]
G. Candea, N. Polyzotis, and R. Vingralek. A scalable, predictable join operator for highly concurrent data warehouses. Proc. VLDB Endowment (PVLDB), 2(1), 2009.
[9]
J. Chen, G. Soundararajan, and C. Amza. Autonomic provisioning of backend databases in dynamic content web servers. In Proc. Int. Conf. on Autonomic Computing (ICAC), 2006.
[10]
U. Dayal, H. A. Kuno, J. L. Wiener, K. Wilkinson, A. Ganapathi, and S. Krompass. Managing operational business intelligence workloads. Operating Systems Review, 43(1), 2009.
[11]
S. Duan, V. Thummala, and S. Babu. Tuning database configuration parameters with ituned. Proc. VLDB Endowment (PVLDB), 2(1), 2009.
[12]
D. Feinberg and M. A. Beyer. Magic quadrant for data warehouse database management systems. Gartner Research Note, 2008. mediaproducts.gartner.com/reprints/microsoft/vol3/article7/article7.html.
[13]
A. Ganapathi, H. Kuno, U. Dayal, J. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In Proc. Int. Conf. on Data Engineering (ICDE), 2009.
[14]
S. Ghanbari, G. Soundararajan, J. Chen, and C. Amza. Adaptive learning of metric correlations for temperature-aware database provisioning. In Proc. Int. Conf. on Autonomic Computing (ICAC), 2007.
[15]
Greenplum. http://www.greenplum.com/.
[16]
C. Gupta, A. Mehta, and U. Dayal. PQR: Predicting query execution times for autonomous workload management. In Proc. Int. Conf. on Autonomic Computing (ICAC), 2008.
[17]
C. R. Hicks and K. V. Turner. Fundamental Concepts in the Design of Experiments. Oxford University Press, 1999.
[18]
T. Kelly. Detecting performance anomalies in global applications. In Proc. Workshop on Real, Large Distributed Systems, 2005.
[19]
S. Krompass, H. A. Kuno, J. L. Wiener, K. Wilkinson, U. Dayal, and A. Kemper. Managing long-running queries. In Proc. Int. Conf. on Extending Database Technology (EDBT), 2009.
[20]
A. Mehta, C. Gupta, and U. Dayal. BI Batch Manager: A system for managing batch workloads on enterprise data warehouses. In Proc. Int. Conf. on Extending Database Technology (EDBT), 2008.
[21]
M. P. Mesnier, M. Wachs, R. R. Sambasivan, A. X. Zheng, and G. R. Ganger. Relative fitness modeling. Comm. ACM, 52(4), 2009.
[22]
MySQL log profiler and analyzer. http://myprofi.sourceforge.net.
[23]
K. O'Gorman, A. El Abbadi, and D. Agrawal. Multiple query optimization in middleware using query teamwork. Software - Practice and Experience, 35(4), 2005.
[24]
O. Ozmen, K. Salem, M. Uysal, and M. H. S. Attar. Storage workload estimation for database management systems. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2007.
[25]
P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and extensible algorithms for multi query optimization. SIGMOD Record, 29(2), 2000.
[26]
H. J. Ryser. Combinatorial Mathematics. The Mathematical Association of America, 1963.
[27]
T. J. Santner, B. J. Williams, and W. Notz. The Design and Analysis of Computer Experiments. Springer, 2003.
[28]
Skewed TPC-D data generator. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/.
[29]
A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, and S. Kamath. Automatic virtual machine configuration for database workloads. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2008.
[30]
C. Stewart, T. Kelly, and A. Zhang. Exploiting nonstationarity for performance prediction. In Proc. European Conference on Computer Systems (EuroSys), 2007.
[31]
S. Tozer, T. Brecht, and A. Aboulnaga. Q-cop: Avoiding bad query mixes to minimize client timeouts under heavy loads. In Proc. Int. Conf. on Data Engineering (ICDE), 2010.
[32]
TPC-H. http://www.tpc.org/tpch/.
[33]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, second edition, 2005.
[34]
Q. Zhang, L. Cherkasova, G. Mathews, W. Greene, and E. Smirni. R-capriccio: A capacity planning and anomaly detection tool for enterprise services with live workloads. In Middleware, 2007.
[35]
Q. Zhang, L. Cherkasova, and E. Smirni. A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In Proc. Int. Conf. on Autonomic Computing (ICAC), 2007.
[36]
W. Zheng, R. Bianchini, G. J. Janakiraman, J. R. Santos, and Y. Turner. JustRunIt: Experiment-based management of virtualized data centers. In Proc. USENIX Annual Technical Conference, 2009.

Cited By

View all
  • (2021)CPRQ: Cost Prediction for Range Queries in Moving Object DatabasesISPRS International Journal of Geo-Information10.3390/ijgi1007046810:7(468)Online publication date: 8-Jul-2021
  • (2021)MB2: Decomposed Behavior Modeling for Self-Driving Database Management SystemsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457276(1248-1261)Online publication date: 9-Jun-2021
  • (2021)Towards a Holistic ControllerProceedings of the Twelfth ACM International Conference on Future Energy Systems10.1145/3447555.3466581(424-429)Online publication date: 22-Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
March 2011
587 pages
ISBN:9781450305280
DOI:10.1145/1951365
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Microsoft Research: Microsoft Research

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

EDBT/ICDT '11
Sponsor:
  • Microsoft Research
EDBT/ICDT '11: EDBT/ICDT '11 joint conference
March 21 - 24, 2011
Uppsala, Sweden

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2021)CPRQ: Cost Prediction for Range Queries in Moving Object DatabasesISPRS International Journal of Geo-Information10.3390/ijgi1007046810:7(468)Online publication date: 8-Jul-2021
  • (2021)MB2: Decomposed Behavior Modeling for Self-Driving Database Management SystemsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457276(1248-1261)Online publication date: 9-Jun-2021
  • (2021)Towards a Holistic ControllerProceedings of the Twelfth ACM International Conference on Future Energy Systems10.1145/3447555.3466581(424-429)Online publication date: 22-Jun-2021
  • (2021)Survey on Query Optimization of GPU database2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00343(2283-2287)Online publication date: Dec-2021
  • (2020)Processing Big Data Across InfrastructuresBig Data – BigData 202010.1007/978-3-030-59612-5_4(38-51)Online publication date: 18-Sep-2020
  • (2020)DeepQT : Learning Sequential Context for Query Execution Time PredictionDatabase Systems for Advanced Applications10.1007/978-3-030-59419-0_12(188-203)Online publication date: 22-Sep-2020
  • (2019)A Hybrid Machine Learning Approach to Concurrent Query Performance Prediction2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)10.1109/ISKE47853.2019.9170460(1170-1177)Online publication date: Nov-2019
  • (2019)A Novel Auction-Based Query Pricing SchemaInternational Journal of Parallel Programming10.1007/s10766-017-0534-x47:4(759-780)Online publication date: 1-Aug-2019
  • (2019)A QueryRating-Based Statistical Model for Predicting Concurrent Query Response TimeWeb Information Systems and Applications10.1007/978-3-030-30952-7_71(704-713)Online publication date: 16-Sep-2019
  • (2018)Learning-based SPARQL query performance modeling and predictionWorld Wide Web10.5555/3220754.322086121:4(1015-1035)Online publication date: 1-Jul-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media