Abstract
How to improve resource utilization of cloud data centers (CDCs) and ensure users’ quality of service (QoS) through efficient virtual machine (VM) scheduling is an urgent problem. Especially when service reliability is taken into consideration, the problem becomes more challenging. However, existing related researches mostly ignore the influence of reliability factors, such as failures and recoveries of computing nodes (CNs), which cannot reflect the realistic situations of real-life CDCs. Therefore, this paper investigates the problem of fault tolerance-aware VM scheduling and formulates it as a multi-objective optimization model with multiple QoS constraints. The proposed model tries to minimize users’ total expenditure and, at the same time, maximize the successful execution rate of their businesses. To solve the proposed optimization model, a greedy-based best fit decreasing (GBFD) algorithm is then developed. The GBFD algorithm adopts a cost efficiency factor whose definition is according to the characteristics of CNs, to select a suitable CN for each VM request. Finally, extensive experiments are conducted to verify the feasibility of the proposed models and algorithm based on both the real-world CDC cluster data sets and the simulation ones. The results show that, first, as expected, fault tolerance significantly influences the performance criteria of VM scheduling and second, in most cases, the developed algorithm can decrease users’ expenditure, increase success rate for executing their business and improve their overall satisfactions. Specifically, under real-world CDC cluster scenario, GBFD algorithm can increase the overall satisfaction of all cloud users by 38.3%, 20.9% and 14.6%, respectively, compared with the other three ones. Thus, the developed algorithm can perform better under fault tolerance-aware cloud environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Armbrust AM, Fox A, Griffith R et al (2010) A view of cloud computing. Commun ACM 53(4):50–58
Liu L, Qiu Z (2016) A survey on virtual machine scheduling in cloud computing. In: Proceedings of the 2nd International Conference on Computer and Communications, IEEE, pp 2717–2721
Mirobi GJ, Arockiam L (2021) DAVmS: Distance aware virtual machine scheduling approach for reducing the response time in cloud computing. J Supercomput 77(7):6664–6675
Xu H, Liu Y, Wei W et al (2018) Incentive-aware virtual machine scheduling in cloud computing. J Supercomput 74(7):3016–3038
Madni SHH, Latiff MSA, Coulibaly Y (2016) Resource scheduling for infrastructure as a service (IaaS) in cloud computing: challenges and opportunities. J Netw Comput Appl 68(1):173–200
Wan B, Dang J, Li Z et al (2020) Modeling analysis and cost-performance ratio optimization of virtual machine scheduling in cloud computing. IEEE Trans Parallel Distrib Syst 31(7):1518–1532
Rathinaraja J, Ananthanarayana VS, Paul A (2019) Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment. J Supercomput 75:7520–7549
Liu X, Cheng B, Yue Y et al (2019) Traffic-aware and reliability-guaranteed virtual machine placement optimization in cloud datacenters. In: Proceeding of 12th International Conference on Cloud Computing (CLOUD), Springer, pp 91–98
Fernando D, Terner J, Gopalan K et al (2019) Live migration ate my VM: recovering a virtual machine after failure of post-copy live migration. In: Proceedings of 38th Conference on Computer Communications (INFOCOM), IEEE, pp 343–351
Xu L, Lv M, Li Z et al (2020) PDL: a data layout towards fast failure recovery for erasure-coded distributed storage systems. In: Proceedings of 39th Conference on Computer Communications (INFOCOM), IEEE, pp 736–745
Luo L, Meng S, Qiu X et al (2019) Improving failure tolerance in large-scale cloud computing systems. IEEE Trans Reliab 68(2):620–632
Xu H, Yang B, Qi W et al (2016) A multi-objective optimization approach to workflow scheduling in clouds considering fault recovery. KSII Trans Internet Inf 10(3):976–995
Liu X, Cheng B, Yue Y et al (2019) Enhancing availability of traffic-aware virtual cluster allocation in cloud datacenters. In: Proceedings of the International Conference on Services Computing (SCC), IEEE pp 220–227
Meng L, Sun Y (2018) Context sensitive efficient automatic resource scheduling for cloud applications. In: Proceedings of the 11th International Conference on Cloud Computing, Springer, pp 391–397
Alibaba Cluster Workload Traces, https://github.com/alibaba/clusterdata, 2017
Wang D, Dai W, Zhang C et al (2017) TPS: an efficient VM scheduling algorithm for HPC applications in cloud. In: Proceedings of International Conference on Green, Pervasive, and Cloud Compunting, Springer, pp 152–164
Xu H, Cheng P, Liu Y (2019) A fault tolerance aware virtual machine scheduling algorithm in cloud computing. Int J Perform Eng 15(11):2990–2997
Wei L, Foh CH, He B et al (2018) Towards efficient resource allocation for heterogeneous workloads in IaaS clouds. IEEE Trans Cloud Comput 6(1):264–275
Yu L, Chen L, Cai Z et al (2020) Stochastic load balancing for virtual resource management in datacenters. IEEE Trans Cloud Comput 8(2):459–472
Belgacem A, Beghdad-Bey K, Mahmoudi S (2021) New virtual machine placement approach based on the micro genetic algorithm in cloud computing. In: Proceedings of 8th International Conference on Future Internet of Things and Cloud, IEEE, pp 66–72
Xu H, Liu Y, Wei W et al (2019) Migration cost and energy-aware virtual machine consolidation under cloud environments considering remaining runtime. Int J Parallel Prog 47(3):481–501
Mishra SK, Puthal D, Sahoo B et al (2018) An adaptive task allocation technique for green cloud computing. J Supercomput 74(1):370–385
Liu X, Zhan Z, Deng J et al (2018) An energy efficient ant colony system for virtual machine placement in cloud computing. IEEE Trans Evol Comput 22(1):113–128
Padhy S, Chou J (2021) MIRAGE: a consolidation aware migration avoidance genetic job scheduling algorithm for virtualized data centers. J Parallel Distrib Comput 154:106–118
Tong Z, Deng X, Chen H et al (2021) DDMTS: a novel dynamic load balancing scheduling scheme under SLA constraints in cloud computing. J Parallel Distrib Comput 149:138–148
Guo M, Guan Q, Chen W et al (2022) Delay-optimal scheduling of VMs in a queuing cloud computing system with heterogeneous workloads. IEEE Trans Serv Comput 15(1):110–123
Shen D, Luo J, Dong F et al (2019) VirtCo: joint coflow scheduling and virtual machine placement in cloud data centers. Tsinghua Sci Technol 24(5):630–644
Meo M, Renga D, Umar Z (2021) Advanced sleep modes to comply with delay constraints in energy efficient 5G networks. In: Proceedings of the 93rd Vehicular Technology Conference, IEEE, pp1–7
Yu Q, Wan H, Zhao X et al (2020) Online scheduling for dynamic VM migration in multicast time-sensitive networks. IEEE Trans Industr Inf 16(6):3778–3788
Zhang R, Wu K, Li M et al (2016) Online resource scheduling under concave pricing for cloud computing. IEEE Trans Parallel Distrib Syst 27(4):1131–1145
Bugingo E, Zhang D, Zheng W (2020) Constrained energy-cost-aware workflow scheduling for cloud environment. In: Proceedings of 13th International Conference on Cloud Computing, IEEE, pp 40–42
Ran Y, Yang J, Zhang S et al (2017) Dynamic IaaS computing resource provisioning strategy with QoS constraint. IEEE Trans Serv Comput 10(2):190–202
Sotiriadis S, Bessis N, Buyya R (2018) Self managed virtual machine scheduling in cloud systems. Inf Sci 433:381–400
Zheng B, Pan L, Liu S (2021) An online cost optimization algorithm for IaaS instance releasing in cloud environments. In: Proceedings of 11th Annual Computing and Communication Workshop and Conference, IEEE pp 463–469
Sun P, Dai Y, Qiu X (2017) Optimal scheduling and management on correlating reliability, performance, and energy consumption for multi-agent cloud systems. IEEE Trans Reliab 66(2):547–558
Secinti C, Ovatman T (2018) Fault tolerant VM consolidation for energy-efficient cloud environments. In: Proceedings of the 11th International Conference on Cloud Computing, Springer, pp 323–333
Singh S, Chana I (2016) A survey on resource scheduling in cloud computing: issues and challenges. J Grid Comput 14(2):217–264
Kurdi H, Al-Anazi A, Campbell C et al (2015) A combinatorial optimization algorithm for multiple cloud service composition. Comput Electr Eng 42:107–113
Calheiros R, Ranjan R, Beloglazov A et al (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41:23–50
Lu C, Ye K, Xu G et al (2017) Imbalance in the cloud: an analysis on Alibaba cluster trace. In: Proceeding of the 2017 IEEE International Conference on Big Data, IEEE, 2017, pp 2802–2810
Acknowledgements
This work is partially supported by the National Natural Science Foundation of China (No. 62076215), the Natural Science Foundation of the Jiangsu Higher Education Institutions (No. 21KJD520006), the Future Network Scientific Research Fund Project (No. FNSRFP-2021-YB-46), the Funding for School-Level Research Projects of Yancheng Institute of Technology (No. xjr2021047 and No. xjr2022028) and the Project of Natural science project of Zhengzhou Science and Technology Bureau (21ZZXTCX20).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, H., Xu, S., Wei, W. et al. Fault tolerance and quality of service aware virtual machine scheduling algorithm in cloud data centers. J Supercomput 79, 2603–2625 (2023). https://doi.org/10.1007/s11227-022-04760-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04760-5