Significant checkpoint in distributed system

Tanaka, Katsuya; Higaki, Hiroaki; Takizawa, Makoto

doi:10.1007/BFb0034721

Katsuya Tanaka¹,
Hiroaki Higaki¹ &
Makoto Takizawa¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1134))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

138 Accesses

Abstract

In distributed applications, a group of multiple objects are cooperated to achieve some objectives. The objects may suffer from kinds of faults. If some object o is faulty, o is rolled back to the checkpoint and objects which have received messages from o are also required to be rolled back. In this paper, on the basis of the message semantics, we define influential messages whose receivers are required to be rolled back from the application point of view if the senders are rolled back. By using the influential messages, a significant checkpoint is defined to denote a consistent global state of the system while being inconsistent from the traditional definition. We would present protocols for taking the significant checkpoint and for rolling back the objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bernstein, P. A., Hadzilacos, V., and Goodman, N., “Concurrency Control and Recovery in Database Systems,” Addison-Wesley Publishing Company, 1987.
Google Scholar
Bhargava, B. and Lian, S. R., “Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems — An Optimistic Approach,” Proc. of the 7th Symp. on Reliable Distributed Systems, 1988, pp. 3–12.
Google Scholar
Birman, K. P. and Joseph, T. A., “Reliable Communication in the Presence of Failures,” ACM Trans. on Computer Systems, Vol.5, No.1, 1987, pp.47–76.
Article Google Scholar
Chandy, K. M. and Lamport, L., “Distributed Snapshots: Determining Global States of Distributed Systems,” ACM Trans. on Computer Systems, Vol. 3, No. 1, 1985, pp. 63–75.
Article Google Scholar
Fischer, M. J., Griffeth, N. D., and Lynch, N. A., “Global States of a Distributed System,” IEEE Trans. on Software Engineering, Vol. 8, No. 3, 1982.
Google Scholar
Higaki, H. and Soneoka, T., “Group-to-Group Communications for Fault-Tolerance in Distributed Systems,” IEICE Trans. on Information and Systems, Vol.E76-D, No.11, 1993, pp.1348–1357.
Google Scholar
Higaki, H. and Hirakawa, Y., “Group Communications for Upgrading Distributed Programs,” Proc. of IEEE ICDCS-16, 1996, pp.420–427.
Google Scholar
Johnson, D. and Zwaenepoel, W., “Recovery in Distributed Systems using Optimistic Message Logging and Checkpointing,” Proc. of ACM Symp. on Principles of Distributed Computing, 1988, pp. 171–180.
Google Scholar
Koo, R. and Toueg, S., “Checkpointing and Rollback-Recovery for Distributed Systems,” IEEE Trans. on Computers, Vol. SE-13, No. 1, 1987, pp. 23–31.
Google Scholar
Lamport, L., “Time, Clocks, and the Ordering of Events in a Distributed System,” Comm. ACM, Vol.21, No.7, 1978, pp.558–565.
Article Google Scholar
Leong, H. V. and Agrawal, D., “Using Message Semantics to Reduce Rollback in Optimistic Message Logging Recovery Schemes,” Proc. of IEEE ICDCS-14, 1994, pp.227–234.
Google Scholar
Manivannan, D. and Singhai, M., “A Low-Overhead Recovery Technique Using Quasi-Synchronous Checkpointing,” Proc. of IEEE ICDCS-16, 1996, pp.100–107.
Google Scholar
Nakamura, A. and Takizawa, M., “Causally Ordering Broadcast Protocol,” Proc. of IEEE ICDCS-14, 1994, pp.48–55.
Google Scholar
Ramanathan, P. and Shin K. G., “Checkpointing and Rollback Recovery in a Distributed System Using Common Time Base,” Proc. of the 7th IEEE Symp. on Reliable Distributed Systems, 1988, pp. 13–21.
Google Scholar
Tachikawa, T. and Takizawa, M., “Communication Protocol for Group of Distributed Objects,” to appear in Proc. of IEEE ICPADS'96, 1996.
Google Scholar
Tanaka, K. and Takizawa, M., “Distributed Checkpointing Based on Influential Messages,” to appear in Proc. of IEEE ICPADS'96, 1996
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computers and Systems Engineering, Tokyo Denki University, Ishizaka, Hatoyama, 350-03, Saitama, Japan
Katsuya Tanaka, Hiroaki Higaki & Makoto Takizawa

Authors

Katsuya Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Higaki
View author publications
You can also search for this author in PubMed Google Scholar
Makoto Takizawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Roland R. Wagner Helmut Thoma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tanaka, K., Higaki, H., Takizawa, M. (1996). Significant checkpoint in distributed system. In: Wagner, R.R., Thoma, H. (eds) Database and Expert Systems Applications. DEXA 1996. Lecture Notes in Computer Science, vol 1134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0034721

Download citation

DOI: https://doi.org/10.1007/BFb0034721
Published: 26 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61656-6
Online ISBN: 978-3-540-70651-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics