Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Dealing with Byzantine Faults CS 686 Final Project brought to you by  Chris Sosa

2

Overview Motivation in Dependable Systems Common Types of Byzantine Faults Solutions in Real Systems

3

The Myths Hardware cannot be “traitorous”! Anthropomorphic model Any  system with consensus is susceptible It’s never happened before Often misclassified Legionnaire's Disease

4

The Awful Truth Time-Triggered Architecture Radioactive Fault injection to one node Messed up timing protocol (SOS) Formed Cliques until system failed Quad Redundant Control System  No message exchange Lots of redundancy One fault propagated to look like many Professor Knight’s Computer

5

Trends in Dependable Systems Device Physics Smaller and faster not always better Cosmic Rays, etc. Movement to Distributed Topologies Usage of Commercial off-the-shelf (COTS) Technology

6

Common Types of Observed Faults Value Issues related to digital values being the extreme of analog Propagation Temporal Different observations at same time Synchronization doesn’t help very much Value + Temporal

7

Solutions?

8

Solutions (1) Full Exchange Uses classical Byzantine agreement SPIDER – bus (ROBUS) design

9

Solutions (2) Hierarchical Uses hierarchy of different fault tolerant techniques including Byzantine Agreement Seen with Fail-Stop processors SAFEbus Communication backplane for Boeing 777 Uses two buses which are themselves dual redundant –different forms of parity detect errors Uses self-checking pairs on top of buses

10

Solutions (3) Filtering Targets propagation of Byzantine faults Tries to either Mask faults by forcing output to some straight value (removes value-type faults) Segments system into Fault Containment Regions (FCR’s) where we put protections to stop propagation

11

Ignorance is not Bliss Can invalidate failure model Propagation of one fault can be disastrous No amount of redundancy can help Large Economic Factor Possible costs of recall and redeployment

12

Conclusions Byzantine faults are real!  Problems with Ignoring them No amount of Redundancy can tolerate them w/out message exchange Three categories of solutions to deal with them

13

Questions?

14

BGP Quick Review Algorithm is expensive: Each processor has to broadcast its values for  m any rounds Chooses majority value Requires n > 3f where f is # of failures and n is the # of processors With signed messages Can tolerate more failures Still expensive

More Related Content

Handling Byzantine Faults

  • 1. Dealing with Byzantine Faults CS 686 Final Project brought to you by Chris Sosa
  • 2. Overview Motivation in Dependable Systems Common Types of Byzantine Faults Solutions in Real Systems
  • 3. The Myths Hardware cannot be “traitorous”! Anthropomorphic model Any system with consensus is susceptible It’s never happened before Often misclassified Legionnaire's Disease
  • 4. The Awful Truth Time-Triggered Architecture Radioactive Fault injection to one node Messed up timing protocol (SOS) Formed Cliques until system failed Quad Redundant Control System No message exchange Lots of redundancy One fault propagated to look like many Professor Knight’s Computer
  • 5. Trends in Dependable Systems Device Physics Smaller and faster not always better Cosmic Rays, etc. Movement to Distributed Topologies Usage of Commercial off-the-shelf (COTS) Technology
  • 6. Common Types of Observed Faults Value Issues related to digital values being the extreme of analog Propagation Temporal Different observations at same time Synchronization doesn’t help very much Value + Temporal
  • 8. Solutions (1) Full Exchange Uses classical Byzantine agreement SPIDER – bus (ROBUS) design
  • 9. Solutions (2) Hierarchical Uses hierarchy of different fault tolerant techniques including Byzantine Agreement Seen with Fail-Stop processors SAFEbus Communication backplane for Boeing 777 Uses two buses which are themselves dual redundant –different forms of parity detect errors Uses self-checking pairs on top of buses
  • 10. Solutions (3) Filtering Targets propagation of Byzantine faults Tries to either Mask faults by forcing output to some straight value (removes value-type faults) Segments system into Fault Containment Regions (FCR’s) where we put protections to stop propagation
  • 11. Ignorance is not Bliss Can invalidate failure model Propagation of one fault can be disastrous No amount of redundancy can help Large Economic Factor Possible costs of recall and redeployment
  • 12. Conclusions Byzantine faults are real! Problems with Ignoring them No amount of Redundancy can tolerate them w/out message exchange Three categories of solutions to deal with them
  • 14. BGP Quick Review Algorithm is expensive: Each processor has to broadcast its values for m any rounds Chooses majority value Requires n > 3f where f is # of failures and n is the # of processors With signed messages Can tolerate more failures Still expensive