Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/11408901_3guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Failure detection with booting in partially synchronous systems

Published: 20 April 2005 Publication History

Abstract

Unreliable failure detectors are a well known means to enrich asynchronous distributed systems with time-free semantics that allow to solve consensus in the presence of crash failures. Implementing unreliable failure detectors requires a system that provides some synchrony, typically an upper bound on end-to-end message delays. Recently, we introduced an implementation of the perfect failure detector in a novel partially synchronous model, referred to as the Θ-Model, where only the ratio Θ of maximum vs. minimum end-to-end delay of messages that are simultaneously in transit must be known a priori (while the actual delays need not be known and not even be bounded). In this paper, we present an alternative failure detector algorithm, which is based on a clock synchronization algorithm for the Θ-Model. It not only surpasses our first implementation with respect to failure detection time, but also works during the system booting phase.

References

[1]
Hermant, J. F., Le Lann, G.: Fast asynchronous uniform consensus in real-time distributed systems. IEEE Transactions on Computers 51 (2002) 931-944
[2]
Le Lann, G., Schmid, U.: How to maximize computing systems coverage. Technical Report 183/1-128, Department of Automation, Technische Universität Wien (2003)
[3]
Fischer, M. J., Lynch, N. A., Paterson, M. S.: Impossibility of distributed consensus with one faulty processor. Journal of the ACM 32 (1985) 374-382
[4]
Dolev, D., Dwork, C., Stockmeyer, L.: On the minimal synchronism needed for distributed consensus. Journal of the ACM 34 (1987) 77-97
[5]
Chandra, T. D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43 (1996) 225-267
[6]
Chandra, T. D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. Journal of the ACM 43 (1996) 685-722
[7]
Le Lann, G., Schmid, U.: How to implement a timer-free perfect failure detector in partially synchronous systems. Technical Report 183/1-127, Department of Automation, Technische Universität Wien (2003)
[8]
Larrea, M., Fernandez, A., Arevalo, S.: On the implementation of unreliable failure detectors in partially synchronous systems. IEEE Transactions on Computers 53 (2004) 815-828
[9]
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM 35 (1988) 288-323
[10]
Widder, J.: Distributed Computing in the Presence of Bounded Asynchrony. PhD thesis, Vienna University of Technology, Fakultät für Informatik (2004)
[11]
Larrea, M., Fernández, A., Arévalo, S.: On the impossibility of implementing perpetual failure detectors in partially synchronous systems. In: Proceedings of the 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing (PDP'02), Gran Canaria Island, Spain (2002)
[12]
Widder, J.: Booting clock synchronization in partially synchronous systems. In: Proceedings of the 17th International Symposium on Distributed Computing (DISC'03). Volume 2848 of LNCS., Sorrento, Italy, Springer Verlag (2003) 121-135
[13]
Widder, J., Schmid, U.: Booting clock synchronization in partially synchronous systems with hybrid node and link failures. Technical Report 183/1-126, Department of Automation, Technische Universität Wien (2003) (submitted for publication).
[14]
Srikanth, T. K., Toueg, S.: Optimal clock synchronization. Journal of the ACM 34 (1987) 626-645
[15]
Dolev, D., Friedman, R., Keidar, I., Malkhi, D.: Failure detectors in omission failure environments. In: Proc. 16th ACM Symposium on Principles of Distributed Computing, Santa Barbara, California (1997) 286
[16]
Malkhi, D., Reiter, M.: Unreliable intrusion detection in distributed computations. In: Proceedings of the 10th Computer Security Foundations Workshop (CSFW97), Rockport, MA, USA (1997) 116-124
[17]
Kihlstrom, K. P., Moser, L. E., Melliar-Smith, P. M.: Solving consensus in a byzantine environment using an unreliable fault detector. In: Proceedings of the International Conference on Principles of Distributed Systems (OPODIS), Chantilly, France (1997) 61-75
[18]
Doudou, A., Garbinato, B., Guerraoui, R., Schiper, A.: Muteness failure detectors: Specification and implementation. In: Proceedings 3rd European Dependable Computing Conference (EDCC-3). Volume 1667 of LNCS 1667., Prague, Czech Republic, Springer (1999) 71-87
[19]
Doudou, A., Garbinato, B., Guerraoui, R.: Encapsulating failure detection: From crash to byzantine failures. In: Reliable Software Technologies - Ada-Europe 2002. LNCS 2361, Vienna, Austria, Springer (2002) 24-50
[20]
Basu, A., Charron-Bost, B., Toueg, S.: Simulating reliable links with unreliable links in the presence of process crashes. In Babaoglu, Ö., ed.: Distributed algorithms. Volume 1151 of Lecture Notes in Computer Science. (1996) 105-122
[21]
Liu, J. W. S.: Real-Time Systems. Prentice Hall (2000)
[22]
Stankovic, J. A., Spuri, M., Ramamritham, K., Buttazzo, G.C.: Deadline Scheduling for Real-Time Systems. Kluwer Academic Publishers (1998)
[23]
Albeseder, D.: Experimentelle Verifikation von Synchronitätsannahmen für Computernetzwerke. Diplomarbeit, Embedded Computing Systems Group, Technische Universität Wien (2004) (in German).
[24]
Hadzilacos, V., Toueg, S.: Fault-tolerant broadcasts and related problems. In Mullender, S., ed.: Distributed Systems. 2nd edn. Addison-Wesley (1993) 97-145
[25]
Schmid, U., Fetzer, C.: Randomized asynchronous consensus with imperfect communications. In: 22nd Symposium on Reliable Distributed Systems (SRDS'03), Florence, Italy (2003) 361-370
[26]
Le Lann, G.: On real-time and non real-time distributed computing. In: Proceedings 9th International Workshop on Distributed Algorithms (WDAG'95). Volume 972 of Lecture Notes in Computer Science., Le Mont-Saint-Michel, France, Springer (1995) 51-70
[27]
Aguilera, M. K., Chen, W., Toueg, S.: Failure detection and consensus in the crashrecovery model. Distributed Computing 13 (2000) 99-125
[28]
Cristian, F., Fetzer, C.: The timed asynchronous distributed system model. IEEE Transactions on Parallel and Distributed Systems 10 (1999) 642-657
[29]
Veríssimo, P., Casimiro, A., Fetzer, C.: The timely computing base: Timely actions in the presence of uncertain timeliness. In: Proceedings IEEE International Conference on Dependable Systems and Networks (DSN'01 / FTCS'30), New York City, USA (2000) 533-542

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
EDCC'05: Proceedings of the 5th European conference on Dependable Computing
April 2005
470 pages
ISBN:3540257233
  • Editors:
  • Mario Cin,
  • Mohamed Kaâniche,
  • András Pataricza

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 20 April 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Reconciling fault-tolerant distributed computing and systems-on-chipDistributed Computing10.1007/s00446-011-0151-724:6(323-355)Online publication date: 1-Jan-2012
  • (2009)Optimal message-driven implementations of omega with mute processesACM Transactions on Autonomous and Adaptive Systems10.1145/1462187.14621914:1(1-22)Online publication date: 9-Feb-2009
  • (2009)Towards a real-time distributed computing modelTheoretical Computer Science10.1016/j.tcs.2008.10.012410:6-7(629-659)Online publication date: 20-Feb-2009
  • (2009)The Theta-ModelDistributed Computing10.1007/s00446-009-0080-x22:1(29-47)Online publication date: 1-Apr-2009
  • (2008)A general characterization of indulgenceACM Transactions on Autonomous and Adaptive Systems10.1145/1452001.14520103:4(1-19)Online publication date: 12-Dec-2008
  • (2008)The asynchronous bounded-cycle modelProceedings of the twenty-seventh ACM symposium on Principles of distributed computing10.1145/1400751.1400815(423-423)Online publication date: 18-Aug-2008
  • (2008)The Asynchronous Bounded-Cycle ModelProceedings of the 10th International Symposium on Stabilization, Safety, and Security of Distributed Systems10.1007/978-3-540-89335-6_20(246-262)Online publication date: 21-Nov-2008
  • (2007)Booting clock synchronization in partially synchronous systems with hybrid process and link failuresDistributed Computing10.1007/s00446-007-0026-020:2(115-140)Online publication date: 1-Aug-2007
  • (2006)Optimal message-driven implementation of omega with mute processesProceedings of the 8th international conference on Stabilization, safety, and security of distributed systems10.5555/1759076.1759086(110-121)Online publication date: 17-Nov-2006
  • (2005)Brief announcementProceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing10.1145/1073814.1073852(208-208)Online publication date: 17-Jul-2005
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media