Abstract
In distributed applications, a group of multiple objects are cooperated to achieve some objectives. The objects may suffer from kinds of faults. If some object o is faulty, o is rolled back to the checkpoint and objects which have received messages from o are also required to be rolled back. In this paper, on the basis of the message semantics, we define influential messages whose receivers are required to be rolled back from the application point of view if the senders are rolled back. By using the influential messages, a significant checkpoint is defined to denote a consistent global state of the system while being inconsistent from the traditional definition. We would present protocols for taking the significant checkpoint and for rolling back the objects.
Preview
Unable to display preview. Download preview PDF.
References
Bernstein, P. A., Hadzilacos, V., and Goodman, N., “Concurrency Control and Recovery in Database Systems,” Addison-Wesley Publishing Company, 1987.
Bhargava, B. and Lian, S. R., “Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems — An Optimistic Approach,” Proc. of the 7th Symp. on Reliable Distributed Systems, 1988, pp. 3–12.
Birman, K. P. and Joseph, T. A., “Reliable Communication in the Presence of Failures,” ACM Trans. on Computer Systems, Vol.5, No.1, 1987, pp.47–76.
Chandy, K. M. and Lamport, L., “Distributed Snapshots: Determining Global States of Distributed Systems,” ACM Trans. on Computer Systems, Vol. 3, No. 1, 1985, pp. 63–75.
Fischer, M. J., Griffeth, N. D., and Lynch, N. A., “Global States of a Distributed System,” IEEE Trans. on Software Engineering, Vol. 8, No. 3, 1982.
Higaki, H. and Soneoka, T., “Group-to-Group Communications for Fault-Tolerance in Distributed Systems,” IEICE Trans. on Information and Systems, Vol.E76-D, No.11, 1993, pp.1348–1357.
Higaki, H. and Hirakawa, Y., “Group Communications for Upgrading Distributed Programs,” Proc. of IEEE ICDCS-16, 1996, pp.420–427.
Johnson, D. and Zwaenepoel, W., “Recovery in Distributed Systems using Optimistic Message Logging and Checkpointing,” Proc. of ACM Symp. on Principles of Distributed Computing, 1988, pp. 171–180.
Koo, R. and Toueg, S., “Checkpointing and Rollback-Recovery for Distributed Systems,” IEEE Trans. on Computers, Vol. SE-13, No. 1, 1987, pp. 23–31.
Lamport, L., “Time, Clocks, and the Ordering of Events in a Distributed System,” Comm. ACM, Vol.21, No.7, 1978, pp.558–565.
Leong, H. V. and Agrawal, D., “Using Message Semantics to Reduce Rollback in Optimistic Message Logging Recovery Schemes,” Proc. of IEEE ICDCS-14, 1994, pp.227–234.
Manivannan, D. and Singhai, M., “A Low-Overhead Recovery Technique Using Quasi-Synchronous Checkpointing,” Proc. of IEEE ICDCS-16, 1996, pp.100–107.
Nakamura, A. and Takizawa, M., “Causally Ordering Broadcast Protocol,” Proc. of IEEE ICDCS-14, 1994, pp.48–55.
Ramanathan, P. and Shin K. G., “Checkpointing and Rollback Recovery in a Distributed System Using Common Time Base,” Proc. of the 7th IEEE Symp. on Reliable Distributed Systems, 1988, pp. 13–21.
Tachikawa, T. and Takizawa, M., “Communication Protocol for Group of Distributed Objects,” to appear in Proc. of IEEE ICPADS'96, 1996.
Tanaka, K. and Takizawa, M., “Distributed Checkpointing Based on Influential Messages,” to appear in Proc. of IEEE ICPADS'96, 1996
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tanaka, K., Higaki, H., Takizawa, M. (1996). Significant checkpoint in distributed system. In: Wagner, R.R., Thoma, H. (eds) Database and Expert Systems Applications. DEXA 1996. Lecture Notes in Computer Science, vol 1134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0034721
Download citation
DOI: https://doi.org/10.1007/BFb0034721
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61656-6
Online ISBN: 978-3-540-70651-9
eBook Packages: Springer Book Archive