Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Reaching Approximate Agreement with Mixed-Mode Faults

Published: 01 January 1994 Publication History

Abstract

In a fault-tolerant distributed system, different non-faulty processes may arrive atdifferent values for a given system parameter. To resolve this disagreement, processesmust exchange and vote upon their respective local values. Faulty processes mayattempt to inhibit agreement by acting in a malicious or "Byzantine" manner. Approximate agreement defines one form of agreement in which the voted values obtained by the non-faulty processes need not be identical. Instead, they need only agree to within a predefined tolerance. Approximate agreement can be achieved by a sequence of convergent voting rounds, in which the range of values held by non-faulty processes isreduced in each round. Historically, each new convergent voting algorithm has beenaccompanied by ad-hoc proofs of its convergence rate and fault-tolerance, using anoverly conservative fault model in which all faults exhibit worst-case Byzantine behavior.This paper presents a general method to quickly determine convergence rate andfault-tolerance for any member of a broad family of convergent voting algorithms. Thismethod is developed under a realistic mixed-mode fault model comprised of asymmetric,symmetric, and benign fault modes. These results are employed to more accuratelyanalyze the properties of several existing voting algorithms, to derive a sub-family ofoptimal mixed-mode voting algorithms, and to quickly determine the properties ofproposed new voting algorithms.

References

[1]
{1} K. W. Anderson and D. W. Hall, Sets, Sequences, and Mappings: the Basic Concepts of Analysis. New York: Wiley, 1963.
[2]
{2} O. Babaoglu and R. Drummond, "Streets of Byzantium: Network architectures for fast reliable broadcasts," IEEE Trans. Software Eng., vol. SE-11, no. 6, pp. 546-554, Jun. 1985.
[3]
{3} L. Chen and A. Avizienis, "N-version programming: A fault tolerant approach to reliability of software operation," in Proc. Int. Symp. on Fault Tolerant Computing, 1978, pp. 3-9.
[4]
{4} D. Davies and J. Wakerly, "Synchronization and matching in redundant systems," IEEE Trans. Comput., vol. C-27, no. 6, pp. 531-539, June 1978.
[5]
{5} D. Dolev, N. A. Lynch, S. S. Pinter, E. W. Stark, and W. E. Weihl, "Reaching approximate agreement in the presence of faults," presented at the Proc. Third Symp. on Reliability in Distributed Software and Database Syst., Oct 1983.
[6]
{6} D. Dolev, N. A. Lynch, S. S. Pinter, E. W. Stark, and W. E. Weihl, "Reaching approximate agreement in the presence of faults," J. ACM, vol. 33, no. 3, pp. 499-516, July 1986.
[7]
{7} K. R. Driscoll, G. M. Papadopoulis, S. Nelson, G. L. Hartman, and G. Ramohalli, "Multi-microprocessor flight control system II," AFWAL--TR-84-3076, Honeywell Syst. and Res. Center, Minneapolis, MN, Sept. 1984.
[8]
{8} B. W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems. Reading, MA: Addison Wesley, 1988.
[9]
{9} J. L. W. Kessels, "Two designs of a fault-tolerant clocking system", IEEE Trans. Comput., vol. C-33, no. 10, pp. 912-919, Oct. 1984.
[10]
{10} R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai, "The MAFT architecture for distributed fault-tolerance," IEEE Trans. Comput., vol. 37, no. 4, pp. 398-405, Apr. 1988.
[11]
{11} R. M. Kieckhafer and M. H. Azadmanesh, "Fault-tolerant convergent voting in large sparsely connected networks", in Proc. Complex Syst. Eng. Synthesis and Assessment Technol. Workshop, July 1992, pp. 31-51.
[12]
{12} R. M. Kieckhafer and M. H. Azadmanesh, "Low cost approximate agreement in partially connected networks," to appear in J. Computing and Inform.
[13]
{13} L. Lamport, R. Shostak, and M. Pease, "The Byzantine generals problem," ACM TOPLAS, vol. 4, no. 3, pp. 382-401, July 1982.
[14]
{14} L. Lamport and P. M. Melliar-Smith, "Synchronizing clocks in the presence of faults", J. ACM, vol. 32, no. 1, pp. 52-78, Jan. 1985.
[15]
{15} C. L. Liu, Elements of Discrete Mathematics, 2nd ed. New York: McGraw-Hill, 1985.
[16]
{16} J. Lundelius and N. Lynch, "A new fault-tolerant algorithm for clock synchronization," Third Symp. on Principles of Distributed Computing, Aug 1984, pp. 75-87.
[17]
{17} F. J. Meyer and D. K. Pradhan, "Consensus with dual failure modes," Proc. Seventeenth Fault-tolerant Computing Symp., July 1987, pp. 48-54.
[18]
{18} M. Pease, R. Shostak, and L. Lamport, "Reaching agreement in the presence of faults," J. ACM, vol. 27, no. 2, pp. 228-234, Apr. 1980.
[19]
{19} D. A. Reynolds and G. Metze, "Fault detection capabilities of alternating logic," IEEE Trans. Comput., vol. C-27, no. 12, pp. 1093-1098, Dec. 1978.
[20]
{20} F. B. Schneider, "Understanding protocols for Byzantine clock synchronization," Rep. No. 87-859, Dept. of Comput. Sci., Cornell Univ., Aug. 1987.
[21]
{21} O. Serlin, "Fault-tolerant systems in commercial applications", IEEE Comput., vol. 17, no. 8, pp. 19-30, Aug. 1984.
[22]
{22} J. R. Sklaroff, "Redundancy management technique for space shuttle computers," IBM J. Res. Develop., vol. 20, no. 1, pp. 20-28, Jan. 1976.
[23]
{23} P. M. Thambidurai and Y. K. Park, "Interactive consistency with multiple failure modes," presented at the Proc. Seventh Reliable Distributed Syst. Symp., Oct. 1988.
[24]
{24} P. M. Thambidurai, A. M. Finn, R. M. Kieckhafer, and C. J. Walter, "Clock synchronization in MAFT," in Proc. Nineteenth Fault-Tolerant Computing Symp., June 1989, pp. 142-151.
[25]
{25} N. Vasanthavada and P. N. Marinos, "Synchronization of fault-tolerant clocks in the presence of malicious failures," IEEE Trans. Comput., vol. 37, no. 4, pp. 440-448, Apr. 1988.
[26]
{26} N. Vasanthavada, P. M. Thambidurai, and P. N. Marinos, "Design of fault-tolerant clocks with realistic assumptions," in Proc. Eighteenth Fault-Tolerant Computing Symp., June 1989, pp. 128-133.

Cited By

View all
  • (2021)Global Convergence in Large Sensor Networks under Hybrid FaultsProceedings of the 2021 9th International Conference on Communications and Broadband Networking10.1145/3456415.3456448(201-208)Online publication date: 25-Feb-2021
  • (2021)Accurate Resilient Average Consensus via Detection and Compensation2021 60th IEEE Conference on Decision and Control (CDC)10.1109/CDC45484.2021.9682843(5502-5507)Online publication date: 14-Dec-2021
  • (2020)Asynchronous Byzantine Approximate Consensus in Directed NetworksProceedings of the 39th Symposium on Principles of Distributed Computing10.1145/3382734.3405724(149-158)Online publication date: 31-Jul-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 5, Issue 1
January 1994
110 pages

Publisher

IEEE Press

Publication History

Published: 01 January 1994

Author Tags

  1. Index Termsfault tolerant computing
  2. approximate agreement
  3. convergent votingalgorithm
  4. distributed processing
  5. fault-tolerant distributed system
  6. mixed-modefaults
  7. reliability
  8. synchronisation
  9. voting algorithms
  10. worst-case Byzantine behavior

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Global Convergence in Large Sensor Networks under Hybrid FaultsProceedings of the 2021 9th International Conference on Communications and Broadband Networking10.1145/3456415.3456448(201-208)Online publication date: 25-Feb-2021
  • (2021)Accurate Resilient Average Consensus via Detection and Compensation2021 60th IEEE Conference on Decision and Control (CDC)10.1109/CDC45484.2021.9682843(5502-5507)Online publication date: 14-Dec-2021
  • (2020)Asynchronous Byzantine Approximate Consensus in Directed NetworksProceedings of the 39th Symposium on Principles of Distributed Computing10.1145/3382734.3405724(149-158)Online publication date: 31-Jul-2020
  • (2020)Resilient Self/Event-Triggered Consensus Based on Ternary Control2020 59th IEEE Conference on Decision and Control (CDC)10.1109/CDC42340.2020.9304448(6240-6245)Online publication date: 14-Dec-2020
  • (2019)Switching Topology for Resilient Consensus using Wi-Fi Signals2019 International Conference on Robotics and Automation (ICRA)10.1109/ICRA.2019.8793788(2018-2024)Online publication date: 20-May-2019
  • (2018)Resilient Consensus for Multi-agent Networks with Mobile DetectorsNeural Information Processing10.1007/978-3-030-04239-4_26(291-302)Online publication date: 13-Dec-2018
  • (2017)Distributed consensus in resilient networks2017 IEEE 56th Annual Conference on Decision and Control (CDC)10.1109/CDC.2017.8264622(6383-6388)Online publication date: 12-Dec-2017
  • (2017)r-Robustness and (r, s)-robustness of circulant graphs2017 IEEE 56th Annual Conference on Decision and Control (CDC)10.1109/CDC.2017.8264310(4416-4421)Online publication date: 12-Dec-2017
  • (2010)Strong order-preserving renaming in the synchronous message passing modelTheoretical Computer Science10.1016/j.tcs.2010.06.001411:40-42(3787-3794)Online publication date: 1-Sep-2010
  • (2009)Tiered fault tolerance for long-term integrityProccedings of the 7th conference on File and storage technologies10.5555/1525908.1525928(267-282)Online publication date: 24-Feb-2009
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media