Abstract
In this paper we propose a fault-tolerant scheduler for Bag-of-Tasks Grid applications, called WorkQueue with Replication Fault Tolerant (WQR-FT), obtained by adding checkpointing and replication to the WorkQueue with Replication (WQR) scheduling algorithm. By using discrete-event simulation, we show that WQR-FT not only ensures the successful completion of all the tasks in a bag, but also achieves performance better than WQR and other fault-tolerant schedulers obtained by coupling WQR with replication only, or with checkpointing only.
This work has been supported by the Italian MIUR under the project Societá dell’Informazione, Sottoprogetto 3 – Grid Computing: Tecnologie abilitanti ed applicazioni per eScience, L. 449/97, anno 1999.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abawajy, J.H.: Fault-Tolerant Scheduling Policy for Grid Computing Systems. In: Proc. of 18th Int. Parallel and Distributed Processing Symposium, Workshop on. IEEE-CS Press, Los Alamitos (April 2004)
Berman, F., Wolski, R., et al.: Adaptive Computing on the Grid Using AppLeS. IEEE Trans. on Parallel and Distributed Systems 14(4) (April 2004)
Casanova, H., Berman, F., Obertelli, G., Wolski, R.: The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. In: Proc. of Supercomputing 2000. IEEE CS Press, Los Alamitos (2000)
Casanova, H., Legrand, A., Zagorodnov, D., et al.: Heuristics for Scheduling Parameter Sweeping Application in Grif Environments. In: Proc. of Heterogeneous Computing Workshop. IEEE CS Press, Los Alamitos (2000)
Dinda, P., Lu, D.: GridG: Generating Realistic Computational Grids. Performance Evaluation Review 30 (2003)
da Silva, D.P., Cirne, W., Brasileiro, F.V.: Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 169–180. Springer, Heidelberg (2003)
Brevik, J., Nurmi, D., Wolski, R.: Modeling machine availability in enterprise and wide-area distributed computing environments. Technical Report 37, Department of Computer Science, University of California, Santa Barbara (2003)
Brevik, J., Nurmi, D., Wolski, R.: Automatic Methods for Predicting Machine Availability in Desktop Grid and Peer-to-peer Systems. In: Proc. of 4th Int. Workshop on Global and Peer-to-Peer Computing, Chicago, Illinois (USA), April 19-22. IEEE Press, Los Alamitos (2004)
Medeiros, R., Cirne, W., Brasileiro, F., Sauvé, J.: Fault in Grids: Why are they so bad and What can be done about it? In: Proc. 4th Int. Workshop on Grid Computing (Grid 2003). IEEE-CS Press, Los Alamitos (November 2003)
Schwetman, H.: Object-oriented simulation modeling with c++/csim. In: Proc. of 1995 Winter Simulation Conference (December 1995)
Cirne, W., et al.: Grid Computing for Bag of Tasks Applications. In: Proc. of 3rd IFIP Conf. on E-Commerce, E-Business and E-Government, Sao Paulo, Brazil (September 2003)
Weissman, J., Womack, D.: Fault Tolerant Scheduling in Distributed Networks. Technical Report TR CS-96-10, Department of Computer Science, University of Texas, San Antonio (September 1996)
Young, J.W.: A First-order Approximation to the Optimum Checkpoint. Communications of the ACM 17 (1974)
Hwang, S., Kesselman, C.: A Flexible Framework for Fault Tolerance in the Grid. Journal of Grid Computing 1(3) (2003)
Zhang, X., Zagorodnov, D., Hiltunen, M., Marzullo, K., Schlichting, R.D.: Fault-tolerant Grid Services Using Primary-Backup: Feasibility and Performance. In: Proc. IEEE Int. Conf. on Cluster Computing. IEEE-CS Press, Los Alamitos (September 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Anglano, C., Canonico, M. (2005). Fault-Tolerant Scheduling for Bag-of-Tasks Grid Applications. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds) Advances in Grid Computing - EGC 2005. EGC 2005. Lecture Notes in Computer Science, vol 3470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508380_64
Download citation
DOI: https://doi.org/10.1007/11508380_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26918-2
Online ISBN: 978-3-540-32036-4
eBook Packages: Computer ScienceComputer Science (R0)