Efficiency-First Fault-Tolerant Replica Scheduling Strategy for Reliability Constrained Cloud Application

Zhang, Yingxue; Fan, Guisheng; Yu, Huiqun; Chen, Xingpeng

doi:10.1007/978-3-030-93571-9_11

Yingxue Zhang^13,14,
Guisheng Fan¹³,
Huiqun Yu¹³ &
…
Xingpeng Chen^13,14

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13152))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

823 Accesses

Abstract

Reliability requirement assurance is an important prerequisite for application execution in the cloud. Although copy management can improve the reliability of applications, it also brings a series of resource waste and overhead issues. Therefore, the efficiency-first fault-tolerant algorithm (EFFT) with minimum execution cost in the cloud application is proposed. This algorithm minimizes the execution cost of the application under the constraints of reliability, and solves the problem of excessive overhead caused by too many copies. The EFFT algorithm is divided into two stages: initial allocation and dynamic adjustment. On the initial allocation of EFFT algorithm, a sorting rule is defined to determine the priority of tasks and instances. During the adjustment phase, by defining an actual efficiency ratio indicator to measure the cost-effectiveness of an instance, the EFFT algorithm makes a good trade-off between cost and reliability in order to minimize execution costs. Run our algorithm on randomly generated parallel applications of different scales and compare the experimental results with four advanced algorithms. The experiments show that the performance of the algorithm we proposed is better than the other algorithms in terms of execution cost and fault tolerance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A fault-tolerant scheduling algorithm that minimizes the number of replicas in heterogeneous service-oriented cloud computing systems

Article 27 February 2024

A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems

Article 20 August 2020

Task replication to improve the reliability of running workflows on the cloud

Article 27 April 2020

References

Kalra, M., Singh, S.: Multi-objective energy aware scheduling of deadline constrained workflows in clouds using hybrid approach. Wireless Pers. Commun. 116(3), 1743–1764 (2020). https://doi.org/10.1007/s11277-020-07759-4
Article Google Scholar
Mukwevho, M.A., Celik, T.: Toward a smart cloud: a review of fault- tolerance methods in cloud systems. IEEE Trans. Serv. Comput. (2018)
Google Scholar
Faragardi H R, Sedghpour M S, Fazliahmadi S, et al.: GRP-HEFT: A Budget-Constrained Resource Provisioning Scheme for Workflow Scheduling in IaaS Clouds 31(6), 1239–1254 (2020)
Google Scholar
Tang, Y., Shaer, E., Joshi, K.: Reasoning under uncertainty for overlay fault diagnosis. IEEE Trans. Network Serv. Manage. 9(1), 34–47 (2012)
Google Scholar
Shatz, S.M., Wang, J.P.: Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems. IEEE Trans. Rel. 38(1), 16–27 (1989)
Google Scholar
Li, J., Liang, W., Huang, M., et al.: Reliability-aware network service provisioning in mobile edge-cloud networks. IEEE Trans. Parallel Distrib. Syst. 31(7), 1545–1558 (2020)
Article Google Scholar
Kumar, N., Mayank, J., Mondal, A.: Reliability aware energy optimized scheduling of non-preemptive periodic real-time tasks on heterogeneous multiprocessor system. IEEE Trans. Parallel Distrib. Syst. 31(4), 871–885 (2020)
Article Google Scholar
Kherraf, N., Sharafeddine, S., Assi, C.M., et al.: Latency and reliability-aware workload assignment in IoT networks with mobile edge clouds. IEEE Trans. Netw. Serv. Manage. 16(99), 1435–1449 (2019)
Article Google Scholar
Xie, G., Wei, Y.H., Le, Y., et al.: Redundancy minimization and cost reduction for workflows with reliability requirements in cloud-based services. IEEE Trans. Cloud Comput., 99 (2019)
Google Scholar
Yao, G., Ren, Q., Li, X., Zhao, S.: Rub.: A hybrid fault-tolerant scheduling for deadline-constrained tasks in cloud system. IEEE Trans. Serv. Comput. (2020)
Google Scholar
Zhao, L., Ren, Y., Xiang, Y., Sakurai, K.: Fault-tolerant scheduling with dynamic number of replicas in heterogeneous systems. In: Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, pp. 434–441 (2010)
Google Scholar
Zhao, L., Ren, Y., Sakurai, K.: Reliable workflow scheduling with less resource redundancy. Parallel Comput. 39(10), 567–585 (2013)
Article MathSciNet Google Scholar
Hu, B., Cao, Z.: Minimizing resource consumption cost of DAG applications with reliability requirement on heterogeneous processor systems. IEEE Trans. Industr. Inf. 16(12), 7437–7447 (2020)
Article Google Scholar
Xie, G., Zeng, G., Chen, Y., et al.: Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems. IEEE Trans. Serv. Comput. 13(5), 871–886 (2020)
Article Google Scholar
Xie, G., Zeng, G., Li, R.: Quantitative fault-tolerance for reliable workflows on heterogeneous iaas clouds. IEEE Trans. Cloud Comput. 8(4), 1223–1236 (2020)
Google Scholar
Nik, S.S.M., Naghibzadeh, M., Sedaghat, Y.: Task replication to improve the reliability of running workflows on the cloud. Clust. Comput. 24(1), 343–359 (2021)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61772200), Shanghai Natural Science Foundation (No. 21ZR1416300).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China
Yingxue Zhang, Guisheng Fan, Huiqun Yu & Xingpeng Chen
Shanghai Key Laboratory of Computer Software Evaluating and Testing, Shanghai, China
Yingxue Zhang & Xingpeng Chen

Authors

Yingxue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guisheng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Huiqun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xingpeng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guisheng Fan .

Editor information

Editors and Affiliations

Université Sorbonne Paris Nord, LIPN, Villetaneuse, France
Christophe Cérin
Beihang University, Beijing, China
Depei Qian
University of California at Irvine, Irvine, CA, USA
Jean-Luc Gaudiot
Institute of Computing Technology, Beijing, China
Guangming Tan
ETIS Laboratory, CY Cergy Paris Universités, ENSEA, CNRS, Cergy, France
Stéphane Zuckerman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Fan, G., Yu, H., Chen, X. (2022). Efficiency-First Fault-Tolerant Replica Scheduling Strategy for Reliability Constrained Cloud Application. In: Cérin, C., Qian, D., Gaudiot, JL., Tan, G., Zuckerman, S. (eds) Network and Parallel Computing. NPC 2021. Lecture Notes in Computer Science(), vol 13152. Springer, Cham. https://doi.org/10.1007/978-3-030-93571-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-93571-9_11
Published: 13 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93570-2
Online ISBN: 978-3-030-93571-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Efficiency-First Fault-Tolerant Replica Scheduling Strategy for Reliability Constrained Cloud Application

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A fault-tolerant scheduling algorithm that minimizes the number of replicas in heterogeneous service-oriented cloud computing systems

A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems

Task replication to improve the reliability of running workflows on the cloud

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Efficiency-First Fault-Tolerant Replica Scheduling Strategy for Reliability Constrained Cloud Application

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A fault-tolerant scheduling algorithm that minimizes the number of replicas in heterogeneous service-oriented cloud computing systems

A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems

Task replication to improve the reliability of running workflows on the cloud

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation