Fault tolerance and quality of service aware virtual machine scheduling algorithm in cloud data centers

Xu, Heyang; Xu, Sen; Wei, Wei; Guo, Naixuan

doi:10.1007/s11227-022-04760-5

Fault tolerance and quality of service aware virtual machine scheduling algorithm in cloud data centers

Published: 22 August 2022

Volume 79, pages 2603–2625, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Heyang Xu ORCID: orcid.org/0000-0002-5135-4939¹,
Sen Xu¹,
Wei Wei² &
…
Naixuan Guo¹

551 Accesses
8 Citations
Explore all metrics

Abstract

How to improve resource utilization of cloud data centers (CDCs) and ensure users’ quality of service (QoS) through efficient virtual machine (VM) scheduling is an urgent problem. Especially when service reliability is taken into consideration, the problem becomes more challenging. However, existing related researches mostly ignore the influence of reliability factors, such as failures and recoveries of computing nodes (CNs), which cannot reflect the realistic situations of real-life CDCs. Therefore, this paper investigates the problem of fault tolerance-aware VM scheduling and formulates it as a multi-objective optimization model with multiple QoS constraints. The proposed model tries to minimize users’ total expenditure and, at the same time, maximize the successful execution rate of their businesses. To solve the proposed optimization model, a greedy-based best fit decreasing (GBFD) algorithm is then developed. The GBFD algorithm adopts a cost efficiency factor whose definition is according to the characteristics of CNs, to select a suitable CN for each VM request. Finally, extensive experiments are conducted to verify the feasibility of the proposed models and algorithm based on both the real-world CDC cluster data sets and the simulation ones. The results show that, first, as expected, fault tolerance significantly influences the performance criteria of VM scheduling and second, in most cases, the developed algorithm can decrease users’ expenditure, increase success rate for executing their business and improve their overall satisfactions. Specifically, under real-world CDC cluster scenario, GBFD algorithm can increase the overall satisfaction of all cloud users by 38.3%, 20.9% and 14.6%, respectively, compared with the other three ones. Thus, the developed algorithm can perform better under fault tolerance-aware cloud environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fault Tolerance Aware Virtual Machine Scheduling Algorithm in Cloud Data Center Environment

A Multi-objective Virtual Machine Scheduling Algorithm in Fault Tolerance Aware Cloud Environments

A hybrid energy-aware algorithm for virtual machine placement in cloud computing

Article 03 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Armbrust AM, Fox A, Griffith R et al (2010) A view of cloud computing. Commun ACM 53(4):50–58
Article Google Scholar
Liu L, Qiu Z (2016) A survey on virtual machine scheduling in cloud computing. In: Proceedings of the 2nd International Conference on Computer and Communications, IEEE, pp 2717–2721
Mirobi GJ, Arockiam L (2021) DAVmS: Distance aware virtual machine scheduling approach for reducing the response time in cloud computing. J Supercomput 77(7):6664–6675
Article Google Scholar
Xu H, Liu Y, Wei W et al (2018) Incentive-aware virtual machine scheduling in cloud computing. J Supercomput 74(7):3016–3038
Article Google Scholar
Madni SHH, Latiff MSA, Coulibaly Y (2016) Resource scheduling for infrastructure as a service (IaaS) in cloud computing: challenges and opportunities. J Netw Comput Appl 68(1):173–200
Article Google Scholar
Wan B, Dang J, Li Z et al (2020) Modeling analysis and cost-performance ratio optimization of virtual machine scheduling in cloud computing. IEEE Trans Parallel Distrib Syst 31(7):1518–1532
Article Google Scholar
Rathinaraja J, Ananthanarayana VS, Paul A (2019) Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment. J Supercomput 75:7520–7549
Article Google Scholar
Liu X, Cheng B, Yue Y et al (2019) Traffic-aware and reliability-guaranteed virtual machine placement optimization in cloud datacenters. In: Proceeding of 12th International Conference on Cloud Computing (CLOUD), Springer, pp 91–98
Fernando D, Terner J, Gopalan K et al (2019) Live migration ate my VM: recovering a virtual machine after failure of post-copy live migration. In: Proceedings of 38th Conference on Computer Communications (INFOCOM), IEEE, pp 343–351
Xu L, Lv M, Li Z et al (2020) PDL: a data layout towards fast failure recovery for erasure-coded distributed storage systems. In: Proceedings of 39th Conference on Computer Communications (INFOCOM), IEEE, pp 736–745
Luo L, Meng S, Qiu X et al (2019) Improving failure tolerance in large-scale cloud computing systems. IEEE Trans Reliab 68(2):620–632
Article Google Scholar
Xu H, Yang B, Qi W et al (2016) A multi-objective optimization approach to workflow scheduling in clouds considering fault recovery. KSII Trans Internet Inf 10(3):976–995
Google Scholar
Liu X, Cheng B, Yue Y et al (2019) Enhancing availability of traffic-aware virtual cluster allocation in cloud datacenters. In: Proceedings of the International Conference on Services Computing (SCC), IEEE pp 220–227
Meng L, Sun Y (2018) Context sensitive efficient automatic resource scheduling for cloud applications. In: Proceedings of the 11th International Conference on Cloud Computing, Springer, pp 391–397
Alibaba Cluster Workload Traces, https://github.com/alibaba/clusterdata, 2017
Wang D, Dai W, Zhang C et al (2017) TPS: an efficient VM scheduling algorithm for HPC applications in cloud. In: Proceedings of International Conference on Green, Pervasive, and Cloud Compunting, Springer, pp 152–164
Xu H, Cheng P, Liu Y (2019) A fault tolerance aware virtual machine scheduling algorithm in cloud computing. Int J Perform Eng 15(11):2990–2997
Article Google Scholar
Wei L, Foh CH, He B et al (2018) Towards efficient resource allocation for heterogeneous workloads in IaaS clouds. IEEE Trans Cloud Comput 6(1):264–275
Article Google Scholar
Yu L, Chen L, Cai Z et al (2020) Stochastic load balancing for virtual resource management in datacenters. IEEE Trans Cloud Comput 8(2):459–472
Article Google Scholar
Belgacem A, Beghdad-Bey K, Mahmoudi S (2021) New virtual machine placement approach based on the micro genetic algorithm in cloud computing. In: Proceedings of 8th International Conference on Future Internet of Things and Cloud, IEEE, pp 66–72
Xu H, Liu Y, Wei W et al (2019) Migration cost and energy-aware virtual machine consolidation under cloud environments considering remaining runtime. Int J Parallel Prog 47(3):481–501
Article Google Scholar
Mishra SK, Puthal D, Sahoo B et al (2018) An adaptive task allocation technique for green cloud computing. J Supercomput 74(1):370–385
Article Google Scholar
Liu X, Zhan Z, Deng J et al (2018) An energy efficient ant colony system for virtual machine placement in cloud computing. IEEE Trans Evol Comput 22(1):113–128
Article Google Scholar
Padhy S, Chou J (2021) MIRAGE: a consolidation aware migration avoidance genetic job scheduling algorithm for virtualized data centers. J Parallel Distrib Comput 154:106–118
Article Google Scholar
Tong Z, Deng X, Chen H et al (2021) DDMTS: a novel dynamic load balancing scheduling scheme under SLA constraints in cloud computing. J Parallel Distrib Comput 149:138–148
Article Google Scholar
Guo M, Guan Q, Chen W et al (2022) Delay-optimal scheduling of VMs in a queuing cloud computing system with heterogeneous workloads. IEEE Trans Serv Comput 15(1):110–123
Article Google Scholar
Shen D, Luo J, Dong F et al (2019) VirtCo: joint coflow scheduling and virtual machine placement in cloud data centers. Tsinghua Sci Technol 24(5):630–644
Article Google Scholar
Meo M, Renga D, Umar Z (2021) Advanced sleep modes to comply with delay constraints in energy efficient 5G networks. In: Proceedings of the 93rd Vehicular Technology Conference, IEEE, pp1–7
Yu Q, Wan H, Zhao X et al (2020) Online scheduling for dynamic VM migration in multicast time-sensitive networks. IEEE Trans Industr Inf 16(6):3778–3788
Article Google Scholar
Zhang R, Wu K, Li M et al (2016) Online resource scheduling under concave pricing for cloud computing. IEEE Trans Parallel Distrib Syst 27(4):1131–1145
Article Google Scholar
Bugingo E, Zhang D, Zheng W (2020) Constrained energy-cost-aware workflow scheduling for cloud environment. In: Proceedings of 13th International Conference on Cloud Computing, IEEE, pp 40–42
Ran Y, Yang J, Zhang S et al (2017) Dynamic IaaS computing resource provisioning strategy with QoS constraint. IEEE Trans Serv Comput 10(2):190–202
Article Google Scholar
Sotiriadis S, Bessis N, Buyya R (2018) Self managed virtual machine scheduling in cloud systems. Inf Sci 433:381–400
Article Google Scholar
Zheng B, Pan L, Liu S (2021) An online cost optimization algorithm for IaaS instance releasing in cloud environments. In: Proceedings of 11th Annual Computing and Communication Workshop and Conference, IEEE pp 463–469
Sun P, Dai Y, Qiu X (2017) Optimal scheduling and management on correlating reliability, performance, and energy consumption for multi-agent cloud systems. IEEE Trans Reliab 66(2):547–558
Article Google Scholar
Secinti C, Ovatman T (2018) Fault tolerant VM consolidation for energy-efficient cloud environments. In: Proceedings of the 11th International Conference on Cloud Computing, Springer, pp 323–333
Singh S, Chana I (2016) A survey on resource scheduling in cloud computing: issues and challenges. J Grid Comput 14(2):217–264
Article Google Scholar
Kurdi H, Al-Anazi A, Campbell C et al (2015) A combinatorial optimization algorithm for multiple cloud service composition. Comput Electr Eng 42:107–113
Article Google Scholar
Calheiros R, Ranjan R, Beloglazov A et al (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41:23–50
Article Google Scholar
Lu C, Ye K, Xu G et al (2017) Imbalance in the cloud: an analysis on Alibaba cluster trace. In: Proceeding of the 2017 IEEE International Conference on Big Data, IEEE, 2017, pp 2802–2810

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (No. 62076215), the Natural Science Foundation of the Jiangsu Higher Education Institutions (No. 21KJD520006), the Future Network Scientific Research Fund Project (No. FNSRFP-2021-YB-46), the Funding for School-Level Research Projects of Yancheng Institute of Technology (No. xjr2021047 and No. xjr2022028) and the Project of Natural science project of Zhengzhou Science and Technology Bureau (21ZZXTCX20).

Author information

Authors and Affiliations

School of Information Engineering, Yancheng Institute of Technology, Yancheng, 224051, Jiangsu, China
Heyang Xu, Sen Xu & Naixuan Guo
College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, Henan, China
Wei Wei

Authors

Heyang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Sen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wei
View author publications
You can also search for this author in PubMed Google Scholar
Naixuan Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heyang Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, H., Xu, S., Wei, W. et al. Fault tolerance and quality of service aware virtual machine scheduling algorithm in cloud data centers. J Supercomput 79, 2603–2625 (2023). https://doi.org/10.1007/s11227-022-04760-5

Download citation

Accepted: 09 August 2022
Published: 22 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11227-022-04760-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fault tolerance and quality of service aware virtual machine scheduling algorithm in cloud data centers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fault Tolerance Aware Virtual Machine Scheduling Algorithm in Cloud Data Center Environment

A Multi-objective Virtual Machine Scheduling Algorithm in Fault Tolerance Aware Cloud Environments

A hybrid energy-aware algorithm for virtual machine placement in cloud computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Fault tolerance and quality of service aware virtual machine scheduling algorithm in cloud data centers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fault Tolerance Aware Virtual Machine Scheduling Algorithm in Cloud Data Center Environment

A Multi-objective Virtual Machine Scheduling Algorithm in Fault Tolerance Aware Cloud Environments

A hybrid energy-aware algorithm for virtual machine placement in cloud computing

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation