Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Significant checkpoint in distributed system

  • Parallel and Distributed Systems
  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1134))

Included in the following conference series:

  • 138 Accesses

Abstract

In distributed applications, a group of multiple objects are cooperated to achieve some objectives. The objects may suffer from kinds of faults. If some object o is faulty, o is rolled back to the checkpoint and objects which have received messages from o are also required to be rolled back. In this paper, on the basis of the message semantics, we define influential messages whose receivers are required to be rolled back from the application point of view if the senders are rolled back. By using the influential messages, a significant checkpoint is defined to denote a consistent global state of the system while being inconsistent from the traditional definition. We would present protocols for taking the significant checkpoint and for rolling back the objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernstein, P. A., Hadzilacos, V., and Goodman, N., “Concurrency Control and Recovery in Database Systems,” Addison-Wesley Publishing Company, 1987.

    Google Scholar 

  2. Bhargava, B. and Lian, S. R., “Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems — An Optimistic Approach,” Proc. of the 7th Symp. on Reliable Distributed Systems, 1988, pp. 3–12.

    Google Scholar 

  3. Birman, K. P. and Joseph, T. A., “Reliable Communication in the Presence of Failures,” ACM Trans. on Computer Systems, Vol.5, No.1, 1987, pp.47–76.

    Article  Google Scholar 

  4. Chandy, K. M. and Lamport, L., “Distributed Snapshots: Determining Global States of Distributed Systems,” ACM Trans. on Computer Systems, Vol. 3, No. 1, 1985, pp. 63–75.

    Article  Google Scholar 

  5. Fischer, M. J., Griffeth, N. D., and Lynch, N. A., “Global States of a Distributed System,” IEEE Trans. on Software Engineering, Vol. 8, No. 3, 1982.

    Google Scholar 

  6. Higaki, H. and Soneoka, T., “Group-to-Group Communications for Fault-Tolerance in Distributed Systems,” IEICE Trans. on Information and Systems, Vol.E76-D, No.11, 1993, pp.1348–1357.

    Google Scholar 

  7. Higaki, H. and Hirakawa, Y., “Group Communications for Upgrading Distributed Programs,” Proc. of IEEE ICDCS-16, 1996, pp.420–427.

    Google Scholar 

  8. Johnson, D. and Zwaenepoel, W., “Recovery in Distributed Systems using Optimistic Message Logging and Checkpointing,” Proc. of ACM Symp. on Principles of Distributed Computing, 1988, pp. 171–180.

    Google Scholar 

  9. Koo, R. and Toueg, S., “Checkpointing and Rollback-Recovery for Distributed Systems,” IEEE Trans. on Computers, Vol. SE-13, No. 1, 1987, pp. 23–31.

    Google Scholar 

  10. Lamport, L., “Time, Clocks, and the Ordering of Events in a Distributed System,” Comm. ACM, Vol.21, No.7, 1978, pp.558–565.

    Article  Google Scholar 

  11. Leong, H. V. and Agrawal, D., “Using Message Semantics to Reduce Rollback in Optimistic Message Logging Recovery Schemes,” Proc. of IEEE ICDCS-14, 1994, pp.227–234.

    Google Scholar 

  12. Manivannan, D. and Singhai, M., “A Low-Overhead Recovery Technique Using Quasi-Synchronous Checkpointing,” Proc. of IEEE ICDCS-16, 1996, pp.100–107.

    Google Scholar 

  13. Nakamura, A. and Takizawa, M., “Causally Ordering Broadcast Protocol,” Proc. of IEEE ICDCS-14, 1994, pp.48–55.

    Google Scholar 

  14. Ramanathan, P. and Shin K. G., “Checkpointing and Rollback Recovery in a Distributed System Using Common Time Base,” Proc. of the 7th IEEE Symp. on Reliable Distributed Systems, 1988, pp. 13–21.

    Google Scholar 

  15. Tachikawa, T. and Takizawa, M., “Communication Protocol for Group of Distributed Objects,” to appear in Proc. of IEEE ICPADS'96, 1996.

    Google Scholar 

  16. Tanaka, K. and Takizawa, M., “Distributed Checkpointing Based on Influential Messages,” to appear in Proc. of IEEE ICPADS'96, 1996

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roland R. Wagner Helmut Thoma

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tanaka, K., Higaki, H., Takizawa, M. (1996). Significant checkpoint in distributed system. In: Wagner, R.R., Thoma, H. (eds) Database and Expert Systems Applications. DEXA 1996. Lecture Notes in Computer Science, vol 1134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0034721

Download citation

  • DOI: https://doi.org/10.1007/BFb0034721

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61656-6

  • Online ISBN: 978-3-540-70651-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics