Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Quantifying Cloud Performance and Dependability: Taxonomy, Metric Design, and Emerging Challenges

Published: 25 August 2018 Publication History

Abstract

In only a decade, cloud computing has emerged from a pursuit for a service-driven information and communication technology (ICT), becoming a significant fraction of the ICT market. Responding to the growth of the market, many alternative cloud services and their underlying systems are currently vying for the attention of cloud users and providers. To make informed choices between competing cloud service providers, permit the cost-benefit analysis of cloud-based systems, and enable system DevOps to evaluate and tune the performance of these complex ecosystems, appropriate performance metrics, benchmarks, tools, and methodologies are necessary. This requires re-examining old system properties and considering new system properties, possibly leading to the re-design of classic benchmarking metrics such as expressing performance as throughput and latency (response time). In this work, we address these requirements by focusing on four system properties: (i) elasticity of the cloud service, to accommodate large variations in the amount of service requested, (ii) performance isolation between the tenants of shared cloud systems and resulting performance variability, (iii) availability of cloud services and systems, and (iv) the operational risk of running a production system in a cloud environment. Focusing on key metrics for each of these properties, we review the state-of-the-art, then select or propose new metrics together with measurement approaches. We see the presented metrics as a foundation toward upcoming, future industry-standard cloud benchmarks.

References

[1]
Ali Abedi and Tim Brecht. 2017. Conducting repeatable experiments in highly variable cloud computing environments. In Proceedings of ACM/SPEC ICPE.
[2]
Giuseppe Aceto et al. 2013. Cloud monitoring: A survey. Comput. Netw. 57, 9 (2013).
[3]
Rodrigo F. Almeida, Flávio R. C. Sousa, Sérgio Lifschitz, and Javam C. Machado. 2013. On defining metrics for elasticity of cloud databases. In Proceedings of the SBBD. Retrieved from http://sbbd2013.cin.ufpe.br/Proceedings/artigos/sbbd_shp_12.html.
[4]
Amazon. 2017. EC2 Compute SLA. Retrieved from http://aws.amazon.com/ec2/sla/.
[5]
Armin Balalaie, Abbas Heydarnoori, and Pooyan Jamshidi. 2016. Microservices architecture enables DevOps: Migration to a cloud-native architecture. IEEE Softw. 33, 3 (2016), 42--52.
[6]
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Timothy L. Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of SOSP. 164--177.
[7]
André Bauer, Johannes Grohmann, Nikolas Herbst, and Samuel Kounev. 2018. On the value of service demand estimation for auto-scaling. In Proceedings of GI/ITG MMB. Springer.
[8]
Matthias Becker et al. 2015. Systematically deriving quality metrics for cloud computing systems. In Proceedings of ACM/SPEC ICPE.
[9]
David Bernstein. 2014. Containers and cloud: From LXC to Docker to Kubernetes. IEEE Cloud Comput. 1, 3 (2014), 81--84.
[10]
Carsten Binnig, Donald Kossmann, Tim Kraska, and Simon Loesing. 2009. How is the weather tomorrow?: Towards a benchmark for the cloud. In Proceedings of DBTest. ACM, New York, NY.
[11]
M. Boniface, B. Nasser, et al. 2010. Platform-as-a-service architecture for real-time quality of service management in clouds. In Proceedings of ICIW. 155--160.
[12]
Dean Chandler et al. 2012. Report on Cloud Computing to the OSG Steering Committee. Technical Report. Retrieved from http://www.spec.org/osgcloud/docs/osgcloudwgreport20120410.pdf.
[13]
Bryan Clark, Todd Deshane, Eli Dow, Stephen Evanchik, Matthew Finlayson, Jason Herne, and Jeanna Neefe Matthews. 2004. Xen and the art of repeated research. In Proceedings of USENIX ATC. 135--144.
[14]
CloudSleuth. 2017. CloudSleuth monitoring network. Retrieved from https://cloud.spring.io/spring-cloud-sleuth/.
[15]
Brian F. Cooper, Adam, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of ACM SoCC. ACM, New York, NY, 143--154.
[16]
Marcos Dias de Assuncao, Alexandre di Costanzo, and Rajkumar Buyya. 2009. Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters. In Proceedings of HPDC. ACM, New York, NY, 141--150.
[17]
Karim Djemame et al. 2006. Introducing risk management into the grid. In Proceedings of 2nd IEEE International Conference on e-Science and Grid Computing. IEEE, 28--28.
[18]
Thibault Dory et al. 2011. Measuring elasticity for cloud databases. In Proceedings of ACM/IEEE CCGrid. Retrieved from http://www.info.ucl.ac.be/ pvr/CC2011elasticityCRfinal.pdf.
[19]
Leticia Duboc, David Rosenblum, and Tony Wicks. 2007. A framework for characterization and analysis of software system scalability. In Proceedings of ACM SIGSOFT ESEC-FSE. ACM, 375--384.
[20]
European Commission. 2014. Uptake of Cloud in Europe. Final Report. Digital Agenda for Europe report. Publications Office of the European Union, Luxembourg.
[21]
Athanasia Evangelinou, Michele Ciavotta, Danilo Ardagna, Aliki Kopaneli, George Kousiouris, and Theodora Varvarigou. 2016. Enterprise applications cloud rightsizing through a joint benchmarking and optimization approach. Elsevier Future Generation Computer Systems 78 (2018), 102--114.
[22]
Kaniz Fatema et al. 2014. A survey of Cloud monitoring tools: Taxonomy, capabilities and objectives. J. Parallel Distrib. Comput. 74, 10 (2014).
[23]
Wes Felter, Alexandre Ferreira, Ram Rajamony, and Juan Rubio. 2015. An updated performance comparison of virtual machines and linux containers. In Proceedings of IEEE ISPASS. IEEE, 171--172.
[24]
Ana Juan Ferrer, Francisco Hernández, et al. 2012. OPTIMIS: A holistic approach to cloud service provisioning. Elsevier Future Generation Computer Systems 28, 1 (2012), 66--77.
[25]
Philip J. Fleming and John J. Wallace. 1986. How not to lie with statistics: The correct way to summarize benchmark results. Commun. ACM 29, 3 (Mar. 1986), 218--221.
[26]
Enno Folkerts, Alexander Alexandrov, Kai Sachs, Alexandru Iosup, Volker Markl, and Cafer Tosun. 2012. Benchmarking in the cloud: What it should, can, and cannot be. In Selected Topics in Performance Evaluation and Benchmarking. LNCS, Vol. 7755.
[27]
N. Forsgren Velasquez et al. 2015. State of DevOps report 2015. Puppet Labs IT Revolut. (2015). https://puppet.com/resources/whitepaper/2015-state-devops-report.
[28]
Martin Fowler. 2013. Continuous delivery. Retrieved from https://martinfowler.com/bliki/ContinuousDelivery.html.
[29]
Saurabh Kumar Garg et al. 2013. A framework for ranking of cloud computing services. Elsevier FGCS 29, 4 (2013).
[30]
Google. Compute level SLA. Retrieved from https://cloud.google.com/compute/sla.
[31]
W. Hasselbring and G. Steinacker. 2017. Microservice architectures for scalability, agility and reliability in E-commerce. In Proceedings of IEEE ICSA Workshops. IEEE, 243--246.
[32]
Nikolas Herbst, Samuel Kounev, and Ralf Reussner. 2013. Elasticity in cloud computing: What it is, and what it is not. In Proceedings of USENIX ICAC. USENIX. Retrieved from https://www.usenix.org/conference/icac13/elasticity-cloud-computing-what-it-and-what-it-not.
[33]
Nikolas Herbst, Samuel Kounev, Andreas Weber, and Henning Groenda. 2015. BUNGEE: An elasticity benchmark for self-adaptive IaaS cloud environments. In Proceedings of SEAMS. IEEE Press, Piscataway, NJ, 46--56. Retrieved from http://dl.acm.org/citation.cfm?id=2821357.2821366.
[34]
Karl Huppler. 2009. Performance Evaluation and Benchmarking. Springer-Verlag, Berlin, 18--30.
[35]
Karl Huppler. 2012. Benchmarking with your head in the cloud. In Topics in Performance Evaluation, Measurement and Characterization, Raghunath Nambiar and Meikel Poess (Eds.). Lecture Notes in Computer Science, Vol. 7144. Springer, Berlin, 97--110.
[36]
IDC. 2016. Worldwide and Regional Public IT Cloud Services: 2016--2020 Forecast. IDC Tech Report. Retrieved from http://www.idc.com/getdoc.jsp?containerId=US40739016.
[37]
Alexey Ilyushkin et al. 2017. An experimental performance evaluation of autoscaling policies for complex workflows. In Proceedings of ACM/SPEC ICPE. ACM, 75--86.
[38]
Alexandru Iosup, Simon Ostermann, Nezih Yigitbasi, Radu Prodan, Thomas Fahringer, and Dick H. J. Epema. 2011. Performance analysis of cloud computing services for many-tasks scientific computing. IEEE TPDS 22, 6 (2011), 931--945.
[39]
C. Isci, J. E. Hanson, I. Whalley, M. Steinder, and J. O. Kephart. 2010. Runtime demand estimation for effective dynamic resource management. In Proceedings of IEEE Network Operations and Management Symposium (NOMS'10). IEEE, 381--388.
[40]
Sadeka Islam, Kevin Lee, Alan Fekete, and Anna Liu. 2012. How a consumer can measure elasticity for cloud platforms. In Proceedings of ACM/SPEC ICPE 2012. ACM, New York, NY, 85--96.
[41]
Brendan Jennings and Rolf Stadler. 2015. Resource management in clouds: Survey and research challenges. Springer Journal of Network and Systems Management 23, 3 (2015), 567--619.
[42]
Prasad Jogalekar and Murray Woodside. 2000. Evaluating the scalability of distributed systems. IEEE TPDS 11 (2000), 589--603.
[43]
G. Kousiouris, D. Kyriazis, S. Gogouvitis, A. Menychtas, K. Konstanteli, and T. Varvarigou. 2011. Translation of application-level terms to resource-level attributes across the cloud stack layers. In Proceedings of IEEE ISCC. 153--160.
[44]
George Kousiouris et al. 2014. A multi-cloud framework for measuring and describing performance aspects of cloud services across different application types. In Proceedings of MultiCloud.
[45]
Rouven Krebs, Christof Momm, and Samuel Kounev. 2014. Metrics and techniques for quantifying performance isolation in cloud environments. Elsevier SciCo Vol. 90, Part B (2014), 116--134.
[46]
Michael Kuperberg et al. 2011. Defining and Quantifying Elasticity of Resources in Cloud Computing and Scalable Platforms. Technical Report. KIT, Germany. Retrieved from http://digbib.ubka.uni-karlsruhe.de/volltexte/1000023476.
[47]
Philipp Leitner and Jürgen Cito. 2016. Patterns in the chaos—A study of performance variation and predictability in public IaaS clouds. ACM Trans. Internet Technol. 16, 3 (2016).
[48]
Veronika Lesch, André Bauer, Nikolas Herbst, and Samuel Kounev. 2018. FOX: Cost-awareness for autonomic resource management in public clouds. In Proceedings of ACM/SPEC ICPE. ACM, New York, NY, 12.
[49]
Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming Zhang. 2010. CloudCmp: Comparing public cloud providers. In Proceedings of ACM SIGCOMM IMC. ACM, New York, NY, 1--14.
[50]
Zheng Li, L. O’Brien, He Zhang, and R. Cai. 2012. On a catalogue of metrics for evaluating commercial cloud services. In Proceedings of ACM/IEEE CCGrid. 164--173.
[51]
Tania Lorido-Botran, Jose Miguel-Alonso, and Jose A. Lozano. 2014. A review of auto-scaling techniques for elastic applications in cloud environments. J. Grid Comput. 12, 4 (2014), 559--592.
[52]
Lei Lu et al. Untangling mixed information to calibrate resource utilization in virtual machines. In Proceedings of ACM ICAC. ACM, New York, NY, 151--160.
[53]
Ming Mao, Jie Li, and M. Humphrey. 2010. Cloud auto-scaling with deadline and budget constraints. In Proceedings of ACM/IEEE CCGrid.
[54]
Aravind Menon, Jose Renato Santos, Yoshio Turner, G. John Janakiraman, and Willy Zwaenepoel. 2005. Diagnosing performance overheads in the Xen virtual machine environment. In Proceedings of VEE. 13--23.
[55]
Microsoft. 2017. Azure compute level SLA. Retrieved from https://azure.microsoft.com/en-us/support/legal/sla/virtual-machines/v1_2/.
[56]
Mayur R. Palankar, Adriana Iamnitchi, Matei Ripeanu, and Simson Garfinkel. 2008. Amazon S3 for science grids: A viable solution? In Proceedings of DADC. 55--64.
[57]
Alessandro Vittorio Papadopoulos et al. 2016. PEAS: A performance evaluation framework for auto-scaling strategies in cloud applications. ACM Trans. Model. Perform. Eval. Comput. Syst. 1, 4, Article 15 (Aug. 2016).
[58]
Valerio Persico, Pietro Marchetta, Alessio Botta, and Antonio Pescapé. 2015. Measuring network throughput in the cloud: The case of Amazon EC2. Comput. Netw. 93 (2015).
[59]
V. Persico, P. Marchetta, A. Botta, and A. Pescape. 2015. On network throughput variability in Microsoft azure cloud. In Proceedings of IEEE GLOBECOM.
[60]
D. C. Plummer et al. 2009. Study: Five Refining Attributes of Public and Private Cloud Computing. Technical Report. Gartner.
[61]
Mike Roberts. 2016. Serverless architectures. Retrieved from https://martinfowler.com/articles/serverless.html.
[62]
Jörg Schad, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz. 2010. Runtime measurements in the cloud: Observing, analyzing, and reducing variance. Proc. VLDB Endow. 3, 1--2 (2010), 460--471.
[63]
D. M. Shawky and A. F. Ali. 2012. Defining a measure of cloud computing elasticity. In Proceedings of ICSCS. 1--5.
[64]
Siqi Shen, Alexandru Iosup, et al. 2015. An availability-on-demand mechanism for datacenters. In Proceedings of IEEE/ACM CCGrid. 495--504.
[65]
J. Siegel and J. Perdue. 2012. Cloud services measures for global use: The service measurement index (SMI). In Proceedings of Annual SRII Global Conference.
[66]
Basem Suleiman. 2012. Elasticity economics of cloud-based applications. In Proceedings of IEEE SCC. IEEE Computer Society, Washington, DC, 694--695.
[67]
Lingjia Tang, Jason Mars, Neil Vachharajani, Robert Hundt, and Mary Lou Soffa. 2011. The impact of memory subsystem resource sharing on datacenter applications. ACM SIGARCH Comput. Architect. News 39, 3 (2011), 283--294.
[68]
Christian Tinnefeld, Daniel Taschik, and Hasso Plattner. 2014. Quantifying the elasticity of a database management system. In Proceedings of DBKDA. 125--131. Retrieved from http://www.thinkmind.org/index.php?view=article8articleid=dbkda_2014_5_30_50076.
[69]
Erwin van Eyk, Alexandru Iosup, Simon Seif, and Markus Thömmes. 2017. The SPEC cloud group’s research vision on FaaS and serverless architectures. In Proceedings of International Workshop on Serverless Computing. ACM, 1--4.
[70]
David Villegas, Athanasios Antoniou, Seyed Masoud Sadjadi, and Alexandru Iosup. 2012. An analysis of provisioning and allocation policies for infrastructure-as-a-service clouds. In Proceedings of ACM/IEEE CCGrid. 612--619.
[71]
Jóakim von Kistowski, Nikolas Herbst, Samuel Kounev, Henning Groenda, Christian Stier, and Sebastian Lehrig. 2017. Modeling and extracting load intensity profiles. ACM TAAS 11, 4, Article 23 (Jan. 2017).
[72]
Lei Wang, Jianfeng Zhan, Weisong Shi, Yi Liang, and Lin Yuan. 2009. In cloud, do MTC or HTC service providers benefit from the economies of scale? In Proceedings of MTAGS, SC Workshops.
[73]
Joe Weinman. 2011. Time is Money: The Value of “On-Demand.” Retrieved from http://www.joeweinman.com/resources/Joe_Weinman_Time_Is_Money.pdf.
[74]
Nezih Yigitbasi, Alexandru Iosup, Dick H. J. Epema, and Simon Ostermann. 2009. C-meter: A framework for performance analysis of computing clouds. In Proceedings of ACM/IEEE CCGrid. 472--477.
[75]
Qi Zhang, Quanyan Zhu, and Raouf Boutaba. 2011. Dynamic resource allocation for spot markets in cloud computing environments. In Proceedings of IEEE UCC. IEEE, 178--185.
[76]
Pin Zhou, Vivek Pandey, Jagadeesan Sundaresan, Anand Raghuraman, Yuanyuan Zhou, and Sanjeev Kumar. 2004. Dynamic tracking of page miss ratio curve for memory management. ACM SIGOPS Operat. Syst. Rev. 38, 5 (2004), 177--188.
[77]
Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM, New York, NY, USA, 129--142.

Cited By

View all
  • (2024)Mapping DevOps capabilities to the software life cycle: A systematic literature reviewInformation and Software Technology10.1016/j.infsof.2024.107583(107583)Online publication date: Sep-2024
  • (2024)Dependability of Network Services in the Context of NFV: A Taxonomy and State of the Art ClassificationJournal of Network and Systems Management10.1007/s10922-024-09810-232:2Online publication date: 26-Mar-2024
  • (2024)Exploring the impact of chaos engineering with various user loads on cloud native applications: an exploratory empirical studyComputing10.1007/s00607-024-01292-z106:7(2389-2425)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems
ACM Transactions on Modeling and Performance Evaluation of Computing Systems  Volume 3, Issue 4
December 2018
175 pages
ISSN:2376-3639
EISSN:2376-3647
DOI:10.1145/3271433
  • Editors:
  • Sem Borst,
  • Carey Williamson
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2018
Accepted: 01 July 2018
Revised: 01 June 2018
Received: 01 September 2017
Published in TOMPECS Volume 3, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Metrics
  2. availability
  3. benchmarking
  4. cloud
  5. elasticity
  6. operational risk
  7. performance isolation
  8. performance variability

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)13
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Mapping DevOps capabilities to the software life cycle: A systematic literature reviewInformation and Software Technology10.1016/j.infsof.2024.107583(107583)Online publication date: Sep-2024
  • (2024)Dependability of Network Services in the Context of NFV: A Taxonomy and State of the Art ClassificationJournal of Network and Systems Management10.1007/s10922-024-09810-232:2Online publication date: 26-Mar-2024
  • (2024)Exploring the impact of chaos engineering with various user loads on cloud native applications: an exploratory empirical studyComputing10.1007/s00607-024-01292-z106:7(2389-2425)Online publication date: 1-Jul-2024
  • (2024)An Initial Insight into Measuring Quality in Cloud-Native ArchitecturesKnowledge Management in Organisations10.1007/978-3-031-63269-3_26(341-351)Online publication date: 22-Jun-2024
  • (2023)Adopting Continuous Integration Practices to Achieve Quality in DevOpsInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-8368(101-119)Online publication date: 16-Feb-2023
  • (2023)Efficient Resource Utilization in IoT and Cloud ComputingInformation10.3390/info1411061914:11(619)Online publication date: 19-Nov-2023
  • (2023)A systematic mapping of performance in distributed stream processing systems2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)10.1109/SEAA60479.2023.00052(293-300)Online publication date: 6-Sep-2023
  • (2023)Deep Reinforcement Learning in Cloud Elasticity Through Offline Learning and Return Based Scaling2023 IEEE 16th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD60044.2023.00012(13-23)Online publication date: Jul-2023
  • (2023)Serverless Web Application for The Life Cycle of Software Development Projects using Scrum in South America2023 2nd Asia-Pacific Computer Technologies Conference (APCT)10.1109/APCT58752.2023.00008(1-7)Online publication date: Jan-2023
  • (2022)Benchmarking ISO Risk Management Systems to Assess Efficacy and Help Identify Hidden Organizational RiskSustainability10.3390/su1409493714:9(4937)Online publication date: 20-Apr-2022
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media