Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

On the Impact of Fast Failure Detectors on Real-Time Fault-Tolerant Systems

  • Conference paper
  • First Online:
Distributed Computing (DISC 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2508))

Included in the following conference series:

Abstract

We investigate whether fast failure detectors can be useful— and if so by how much— in the design of real-time fault-tolerant systems. Specifically, we show how fast failure detectors can speed up consensus and fault-tolerant broadcasts, by providing fast algorithms and deriving some matching lower bounds, for synchronous systems with crashes. These results show that a fast failure detector service (implemented using specialized hardware or expedited message delivery) can be an important tool in the design of real-time mission-critical systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. M. K. Aguilera, W. Chen, and S. Toueg. Using the heartbeat failure detector for quiescent reliable communication and consensus in partitionable networks. Theoretical Computer Science, 220(1):3–30, June 1999.

    Google Scholar 

  2. M. K. Aguilera, W. Chen, and S. Toueg. Failure detection and consensus in the crash-recovery model. Distributed Computing, 13(2):99–125, Apr. 2000.

    Google Scholar 

  3. M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg. Stable leader election. In Proceedings of the 15th International Symposium on Distributed Computing, Lecture Notes on Computer Science, Oct. 2001.

    Google Scholar 

  4. O. Babaoğlu, R. Davoli, and A. Montresor. Failure detectors, group membership and view-synchronous communication in partitionable asynchronous systems. Technical Report UBLCS-95-18, Dept. of Computer Science, University of Bologna, Bologna, Italy, November 1995.

    Google Scholar 

  5. A. Casimiro, P. Martins, and P. Veríssimo. How to build a timely computing base using real-time linux. In Proceedings of the 2000 IEEE International Workshop on Factory Communication Systems, pages 127–134, Sept. 2000.

    Google Scholar 

  6. T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267, Mar. 1996. A preliminary version appeared in Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, Aug., 1991, 325–340.

    Google Scholar 

  7. W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of failure detectors. IEEE Transactions on Computers, 51(1):13–32, Jan. 2002.

    Google Scholar 

  8. B. Deianov and S. Toueg. Failure detector service for dependable computing (fast abstract). In Proceedings of the 2000 International Conference on Dependable Systems and Networks, pages B14–B15. IEEE Computer Society, June 2000.

    Google Scholar 

  9. D. Dolev and R. Reischuk. Bounds on information exchange for Byzantine agreement. J. ACM, 32(1):191–204, Jan. 1985.

    Google Scholar 

  10. D. Ferrari and D. C. Verma. A scheme for real-time channel establishment in wide-area networks. IEEE Journal on Selected Areas in Communications, 8(3):368–379, Apr. 1990.

    Google Scholar 

  11. R. Guerraoui, M. Larrea, and A. Schiper. Non blocking atomic commitment with an unreliable failure detector. In Proceedings of the 14th IEEE Symposium on Reliable Distributed Systems, pages 41–50, Sept. 1995.

    Google Scholar 

  12. V. Hadzilacos and S. Toueg. A modular approach to fault-tolerant broadcasts and related problems. Technical Report 94-1425, Department of Computer Science, Cornell University, Ithaca, New York, May 1994.

    Google Scholar 

  13. J.-F. Hermant and G. Le Lann. Fast asynchronous uniform consensus in real-time distributed systems. IEEE Transactions on Computers, Aug. 2002. Special issue on Asynchronous Real-Time Distributed Systems.

    Google Scholar 

  14. M. Hurfin and M. Raynal. A simple and fast asynchronous consensus protocol based on a weak failure detector. Distributed Computing, 12(4):209–223, 1999.

    Article  Google Scholar 

  15. D. Ivan, M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg, November 2001. Prototype of a shared failure detector service with QoS guarantees.

    Google Scholar 

  16. J. F. Kurose, M. Schwartz, and Y. Yemini. Multiple-access protocols and time-constrained communication. ACM Computing Surveys, 16(1):43–70, Mar. 1984.

    Google Scholar 

  17. C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard real-time environment. J. ACM, 20(1):46–61, Jan. 1973.

    Google Scholar 

  18. N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, Inc., 1996.

    Google Scholar 

  19. G. Le Lann, 2001. Private communication with Astrium, Axlog, European Space Agency.

    Google Scholar 

  20. G. Neiger and S. Toueg. Automatically increasing the fault-tolerance of distributed algorithms. Journal of Algorithms, 11(3):374–419, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  21. K. Tindell, A. Burns, and A. J. Wellings. Analysis of hard real-time communications. Real-Time Systems, 9(1):147–171, Sept. 1995.

    Google Scholar 

  22. H. Zhang. Service disciplines for guaranteed performance service in packet-switching networks. Proceedings of the IEEE, 83(10):1374–1399, Oct. 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aguilera, M.K., Le Lann, G., Toueg, S. (2002). On the Impact of Fast Failure Detectors on Real-Time Fault-Tolerant Systems. In: Malkhi, D. (eds) Distributed Computing. DISC 2002. Lecture Notes in Computer Science, vol 2508. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36108-1_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-36108-1_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00073-0

  • Online ISBN: 978-3-540-36108-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics