Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1855533.1855545guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Statistical machine learning makes automatic control practical for internet datacenters

Published: 15 June 2009 Publication History

Abstract

Horizontally-scalable Internet services on clusters of commodity computers appear to be a great fit for automatic control: there is a target output (service-level agreement), observed output (actual latency), and gain controller (adjusting the number of servers). Yet few datacenters are automated this way in practice, due in part to well-founded skepticism about whether the simple models often used in the research literature can capture complex real-life workload/performance relationships and keep up with changing conditions that might invalidate the models. We argue that these shortcomings can be fixed by importing modeling, control, and analysis techniques from statistics and machine learning. In particular, we apply rich statistical models of the application's performance, simulation-based methods for finding an optimal control policy, and change-point methods to find abrupt changes in performance. Preliminary results running aWeb 2.0 benchmark application driven by real workload traces on Amazon's EC2 cloud show that our method can effectively control the number of servers, even in the face of performance anomalies.

References

[1]
M. Basseville and I. V. Nikiforov. Detectiong of Abrupt Changes. Prentice Hall, 1993.
[2]
J. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle. Managing energy and server resources in hosting centers. In Symposium on Operating Systems Principles (SOSP), 2001.
[3]
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, August 2001.
[4]
J. L. Hellerstein, V. Morrison, and E. Eilebrecht. Optimizing concurrency levels in the .net threadpool: A case study of controller design and implementation. In Feed-back Control Implementation and Design in Computing Systems and Networks, 2008.
[5]
D. Kusic, J. O. Kephart, J. E. Hanson, N. Kandasamy, and G. Jiang. Power and performance management of virtualized computing environments via lookahead control. In ICAC '08: Proceedings of the 2008 International Conference on Autonomic Computing, pages 3-12, Washington, DC, USA, 2008. IEEE Computer Society.
[6]
X. Liu, J. Heo, L. Sha, and X. Zhu. Adaptive control of multi-tiered web applications using queueing predictor. Network Operations and Management Symposium, 2006. NOMS 2006. 10th IEEE/IFIP, pages 106- 114, April 2006.
[7]
S. Microsystems. Next generation benchmark development/runtime infrastructure. http://faban.sunsource.net/, 2008.
[8]
A. Y. Ng and M. I. Jordan. Pegasus: A policy search method for large mdps and pomdps. In UAI '00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pages 406-415, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
[9]
W. Sobel, S. Subramanyam, A. Sucharitakul, J. Nguyen, H. Wong, S. Patil, A. Fox, and D. Patterson. Cloudstone: Multi-platform, multi-language benchmark and measurement tools for web 2.0, 2008.
[10]
G. Tesauro, N. Jong, R. Das, and M. Bennani. A hybrid reinforcement learning aproach to autonomic resource allocation. In International Conference on Autonomic Computing (ICAC), 2006.
[11]
B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning of multi-tier internet applications. In ICAC, 2005.

Cited By

View all
  • (2024)TraceUpscaler: Upscaling Traces to Evaluate Systems at High LoadProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629581(942-961)Online publication date: 22-Apr-2024
  • (2021)Learning the tandem network lindley recursionProceedings of the Winter Simulation Conference10.5555/3522802.3522848(1-12)Online publication date: 13-Dec-2021
  • (2021)TraceSplitterProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456262(606-619)Online publication date: 21-Apr-2021
  • Show More Cited By
  1. Statistical machine learning makes automatic control practical for internet datacenters

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    HotCloud'09: Proceedings of the 2009 conference on Hot topics in cloud computing
    June 2009
    22 pages

    Publisher

    USENIX Association

    United States

    Publication History

    Published: 15 June 2009

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)TraceUpscaler: Upscaling Traces to Evaluate Systems at High LoadProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629581(942-961)Online publication date: 22-Apr-2024
    • (2021)Learning the tandem network lindley recursionProceedings of the Winter Simulation Conference10.5555/3522802.3522848(1-12)Online publication date: 13-Dec-2021
    • (2021)TraceSplitterProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456262(606-619)Online publication date: 21-Apr-2021
    • (2020)Learning lindley's recursionProceedings of the Winter Simulation Conference10.5555/3466184.3466256(644-655)Online publication date: 14-Dec-2020
    • (2019)MArkProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358897(1049-1062)Online publication date: 10-Jul-2019
    • (2019)MonitorlessProceedings of the 20th International Middleware Conference10.1145/3361525.3361543(149-162)Online publication date: 9-Dec-2019
    • (2018)Towards Autonomic Science InfrastructureProceedings of the 1st International Workshop on Autonomous Infrastructure for Science10.1145/3217197.3217205(1-9)Online publication date: 11-Jun-2018
    • (2018)A Survey and Taxonomy of Self-Aware and Self-Adaptive Cloud Autoscaling SystemsACM Computing Surveys10.1145/319050751:3(1-40)Online publication date: 12-Jun-2018
    • (2018)Auto-Scaling Web Applications in CloudsACM Computing Surveys10.1145/314814951:4(1-33)Online publication date: 13-Jul-2018
    • (2017)ORCAProceedings of the 18th ACM/IFIP/USENIX Middleware Conference10.1145/3135974.3135982(81-94)Online publication date: 11-Dec-2017
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media