Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/876878.879309guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

On the Minimal Characterization of the Rollback-Dependency Trackability Property

Published: 13 November 2019 Publication History

Abstract

Abstract: Checkpoint and communication patterns that enforce rollback-dependency trackability (RDT) have only on-line trackable checkpoint dependencies and allow efficient solutions to the determination of consistent global checkpoints. Baldoni, Helary, and Raynal have explored RDT at the message level, in which checkpoint dependencies are represented by zigzag paths. They have presented many characterizations of RDT and conjectured that a certain communication pattern characterizes the minimal set of zigzag paths that must be tested on-line by a checkpointing protocol in order to enforce RDT. The contributions of this work are (i) a proof that their conjecture is false, (ii) a minimal characterization of RDT, and (iii) introduction of an original approach to analyze RDT checkpointing protocols.

References

[1]
R. Baldoni, J. M. Helary, A. Mostefaoui, and M. Raynal. A communication-induced checkpoint protocol that ensures rollback dependency trackability. In IEEE Symposium on Fault Tolerant Computing (FTCS'97), pages 68-77, 1997.
[2]
Ö. Babao¿lu and K. Marzullo. Consistent global states of distributed systems: Fundamental concepts and mechanisms. In S. Mullender, editor, Distributed Swterns, pages 55-96. Addison-Wesley, 1993.
[3]
E. N. Elnozahy, D. Johnson, and Y.M. Yang. A survey of rollback-recovery protocols in message-passing systems. Technical Report CMU-CS-96-181, Carnegie Mellon University, 1996.
[4]
I. C. Garcia and L. E. Buzato. Progressive construction of consistent global checkpoints. In 19th IEEE Int. Conf on Distributed Computing Systems (ICDCS'99), Austin, Texas, EUA, June 1999.
[5]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558-565, July 1978.
[6]
D. Manivannan and M. Singhal. Quasi-synchronous checkpointing: Models, characterization, and classification. IEEE Trans. on Parallel and Distributed Systems, 10(7), July 1999.
[7]
R. H. B. Netzer and J. Xu. Necessary and sufficient conditions for consistent global snapshots. IEEE Trans. on Parallel and Distributed Systems, 6(2): 165-169, 1995.
[8]
R. Baldoni, J. M. Helary, and M. Raynal. Rollback-dependency trackability: A minimal characterization and its protocol. Technical Report 1173, IRISA, Mar. 1998.
[9]
R.Baldoni, J.M. Helary, and M. Raynal. Rollback-dependency trackability: Visible characterizations. In 18th ACM Symposium on the Principles of Distributed Computing (PODC'99), Atlanta (USA), May 1999.
[10]
J. Tsai, S. Y. Kuo, and Y. M. Wang. Theoretical analysis for communication-induced checkpointing protocols with rollback-dependency trackability. IEEE Trans. on Parallel and Distributed Systems, Oct. 1998.
[11]
Y. M. Wang. Consistent global checkpoints that contain a given set of local checkpoints. IEEE Trans. on Computers, 46(4):456-468, Apr. 1997.

Cited By

View all
  • (2011)Theoretical and experimental evaluation of communication-induced checkpointing protocols in FE and FLazy-E familiesPerformance Evaluation10.1016/j.peva.2011.01.00568:5(429-445)Online publication date: 1-May-2011
  • (2008)Checkpointing and rollback recovery in distributed systemsProceedings of the 12th WSEAS international conference on Systems10.5555/1580134.1580272(569-574)Online publication date: 22-Jul-2008
  • (2003)On Properties of RDT Communication-Induced Checkpointing ProtocolsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2003.122505514:8(755-764)Online publication date: 1-Aug-2003

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICDCS '01: Proceedings of the The 21st International Conference on Distributed Computing Systems
April 2001

Publisher

IEEE Computer Society

United States

Publication History

Published: 13 November 2019

Author Tags

  1. Distributed algorithms
  2. distributed checkpointing
  3. fault-tolerance
  4. rollback recovery
  5. zigzag paths.

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Theoretical and experimental evaluation of communication-induced checkpointing protocols in FE and FLazy-E familiesPerformance Evaluation10.1016/j.peva.2011.01.00568:5(429-445)Online publication date: 1-May-2011
  • (2008)Checkpointing and rollback recovery in distributed systemsProceedings of the 12th WSEAS international conference on Systems10.5555/1580134.1580272(569-574)Online publication date: 22-Jul-2008
  • (2003)On Properties of RDT Communication-Induced Checkpointing ProtocolsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2003.122505514:8(755-764)Online publication date: 1-Aug-2003

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media