Abstract
We investigate whether fast failure detectors can be useful— and if so by how much— in the design of real-time fault-tolerant systems. Specifically, we show how fast failure detectors can speed up consensus and fault-tolerant broadcasts, by providing fast algorithms and deriving some matching lower bounds, for synchronous systems with crashes. These results show that a fast failure detector service (implemented using specialized hardware or expedited message delivery) can be an important tool in the design of real-time mission-critical systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. K. Aguilera, W. Chen, and S. Toueg. Using the heartbeat failure detector for quiescent reliable communication and consensus in partitionable networks. Theoretical Computer Science, 220(1):3–30, June 1999.
M. K. Aguilera, W. Chen, and S. Toueg. Failure detection and consensus in the crash-recovery model. Distributed Computing, 13(2):99–125, Apr. 2000.
M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg. Stable leader election. In Proceedings of the 15th International Symposium on Distributed Computing, Lecture Notes on Computer Science, Oct. 2001.
O. Babaoğlu, R. Davoli, and A. Montresor. Failure detectors, group membership and view-synchronous communication in partitionable asynchronous systems. Technical Report UBLCS-95-18, Dept. of Computer Science, University of Bologna, Bologna, Italy, November 1995.
A. Casimiro, P. Martins, and P. Veríssimo. How to build a timely computing base using real-time linux. In Proceedings of the 2000 IEEE International Workshop on Factory Communication Systems, pages 127–134, Sept. 2000.
T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267, Mar. 1996. A preliminary version appeared in Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, Aug., 1991, 325–340.
W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of failure detectors. IEEE Transactions on Computers, 51(1):13–32, Jan. 2002.
B. Deianov and S. Toueg. Failure detector service for dependable computing (fast abstract). In Proceedings of the 2000 International Conference on Dependable Systems and Networks, pages B14–B15. IEEE Computer Society, June 2000.
D. Dolev and R. Reischuk. Bounds on information exchange for Byzantine agreement. J. ACM, 32(1):191–204, Jan. 1985.
D. Ferrari and D. C. Verma. A scheme for real-time channel establishment in wide-area networks. IEEE Journal on Selected Areas in Communications, 8(3):368–379, Apr. 1990.
R. Guerraoui, M. Larrea, and A. Schiper. Non blocking atomic commitment with an unreliable failure detector. In Proceedings of the 14th IEEE Symposium on Reliable Distributed Systems, pages 41–50, Sept. 1995.
V. Hadzilacos and S. Toueg. A modular approach to fault-tolerant broadcasts and related problems. Technical Report 94-1425, Department of Computer Science, Cornell University, Ithaca, New York, May 1994.
J.-F. Hermant and G. Le Lann. Fast asynchronous uniform consensus in real-time distributed systems. IEEE Transactions on Computers, Aug. 2002. Special issue on Asynchronous Real-Time Distributed Systems.
M. Hurfin and M. Raynal. A simple and fast asynchronous consensus protocol based on a weak failure detector. Distributed Computing, 12(4):209–223, 1999.
D. Ivan, M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg, November 2001. Prototype of a shared failure detector service with QoS guarantees.
J. F. Kurose, M. Schwartz, and Y. Yemini. Multiple-access protocols and time-constrained communication. ACM Computing Surveys, 16(1):43–70, Mar. 1984.
C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard real-time environment. J. ACM, 20(1):46–61, Jan. 1973.
N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, Inc., 1996.
G. Le Lann, 2001. Private communication with Astrium, Axlog, European Space Agency.
G. Neiger and S. Toueg. Automatically increasing the fault-tolerance of distributed algorithms. Journal of Algorithms, 11(3):374–419, 1990.
K. Tindell, A. Burns, and A. J. Wellings. Analysis of hard real-time communications. Real-Time Systems, 9(1):147–171, Sept. 1995.
H. Zhang. Service disciplines for guaranteed performance service in packet-switching networks. Proceedings of the IEEE, 83(10):1374–1399, Oct. 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aguilera, M.K., Le Lann, G., Toueg, S. (2002). On the Impact of Fast Failure Detectors on Real-Time Fault-Tolerant Systems. In: Malkhi, D. (eds) Distributed Computing. DISC 2002. Lecture Notes in Computer Science, vol 2508. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36108-1_24
Download citation
DOI: https://doi.org/10.1007/3-540-36108-1_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00073-0
Online ISBN: 978-3-540-36108-4
eBook Packages: Springer Book Archive