An Introduction To Fault Tree Analysis
An Introduction To Fault Tree Analysis
Key Words: fault trees, failure probability, importance measures, system failure intensity
Fire Protection
1. INTRODUCTION System Fails to
TOP Event
Respond to a Fire
Fault tree analysis is now a commonly applied method to
OR Gate
predict the failure probability or failure frequency of
engineering systems in terms of the failure and repair
Fire Detection Water Deluge
parameters of the system components. The concept of Intermediate
System Fails to System Fails to
Detect a Fire Event Activate
expressing the system failure causes in a logic diagram, which
became known as a fault tree, was established in the early AND Gate
1960’s by Watson working at Bell Telephone Labs on the
launch control system of the Minuteman intercontinental Failure to Detect Failure to Detect Pump Nozzles
Smoke Heat Fails to Blocked
ballistic missile. The time-dependent methodology to quantify Start
the system failure likelihood or frequency, known as kinetic
SD HD PUMP
tree theory was developed almost 10 years later by Vesely [ref NOZ Basic Event
CD1
operator Relay
Figure 4. 2-out-of-3:F Vote Gate Concentration
detector Alarm Push
Button
(BP)
The house event, shown in figure 2, is an event which
terminates a branch of the fault tree but unlike the basic event,
the house event is known to be either true or false. Setting such Relay
events to true or false on the fault tree has the effect of turning V3 Contacts
1 1 1
A B C B D
push button
power to the contacts relay contacts
valves closed closed Figure 7. Example System Fault Tree Structure.
1 2 3
COMP
CD1 SD1
A GATE 3 D GATE 4
NC NC i 1
Q SYS P(C i ) P(C C j )
0 2 3 i 1 i 2 j 1
i
NC i 1 j 1
(12)
Figure 11. Unrevealed Component Failure
i 3 j 2 k 1
P(C i C j C k )
The average unavailability is given by:
(1) N C 1 P(C1 C 2 C N C )
1 Consider the example fault tree in figure 7 which has minimal
(1 e )dt
t
QAV cut sets {B,D}, {A,B,C}. Applying equation 12 gives:
0
1 e t QSYS qa qb qc qb qd qa qb qc qd (13)
t
0
1 e
where qA, qB, qC, qD are the failure probabilities of components
A, B, C and D respectively.
1
(8)
In this particular example it is a simple calculation. However,
consider a moderate to large sized fault tree which delivered
100,000 minimal cut sets. The number of elements in first term
of equation 12 would be105, in the second term 5 x 109 and in
Where θ is the interval between inspections. Alternatively this the third term 1.7 x 1014 and so on for the105 terms in the
can be approximated by: equation. Even with modern fast digital computers this is an
enormous number of calculations and would take a
considerable time to complete. In practice acceptably accurate
upper bound approximations are used such as the Rare Event
Q AV (9)
approximation (equation 14) or the Minimal Cut Set Upper
2 Bound (equation 15).
where the events in the minimal cut set, Ci, are X1, 7. IMPORTANCE MEASURES
X2, … Xn.
Should a system not perform to the reliability or availability
6. SYSTEM FAILURE PROBABILITY target required then modifications to the design or operation
have to be made to address the weaknesses. An output from a
Using fault tree analysis predictions for the failure probability fault tree analysis which can help to identify the weaknesses is
or the failure frequency of the system (top event) can be made. importance measures. Importance measures provide an
In this section we will concentrate on the top event probability. indication, in some sense, of the contribution that each basic
Having obtained the minimal cut sets we can express the top event or minimal cut set makes to the system failure mode.
event logic equation as the disjunction (OR) of the NC minimal There are many different types of importance measure and each
cut sets, Ci. The system failure probability, Qsys, is then the calculates a different means of ranking the contribution to the
probability of this disjunction:
top event. More details can be found in references 12 and 14. these states, those which will fail the system when the
Considering the basic event importance measures. The component being considered fails are critical and identified.
vulnerability of the system to the occurrence of each These tables are illustrated for components A, B and C in tables
component failure event is indicated by a numerical value. The 2, 3 and 4 respectively. Due to the symmetry of the system
higher the importance value the greater the contribution of that component D will have the same number of critical states as
basic event to the system failure. Depending on nature of the component C.
importance measure they can take into account such things as
the structure of the system (levels of redundancy etc), the failure States
rate of the component, and the time taken to repair the B C D Critical
component. To improve the system performance the basic for A?
events which have the highest importance measure can be 1 W W W Y
addressed. Importance measures can be deterministic – which 2 W W F Y
3 W F W Y
consider only the system structure or probabilistic and account
4 W F F Y
for the likelihood of component failures. 5 F W W Y
A concept which is fundamental in developing component 6 F W F N
importance measures is that of a critical system state. 7 F F W N
A Critical System State for a component i is a state for the 8 F F F N
remaining n-1 components such that failure of component i
causes the system to go from a working to a failed state. Table 2. Criticality of Component A
TOP States
A B D Critical
for C/D?
1 W W W N
2 W W F N
GATE 1 3 W F W Y
A
4 W F F N
5 F W W N
6 F W F N
7 F F W N
GATE 2 8 F F F N
B
Table 4. Criticality of Component C
IA = 5/8
Figure 12. Simple Four Component System Fault Tree IB = 3/8 (17)
IC = ID = 1/8
Taking each component in turn the critical system states can
be identified by constructing a table which considers the states Birnbaum Measure of Importance
of all the other components in the system. Some of these states
The Criticality Function, Gi(q), is the probability that the
may already satisfy the conditions which mean the system is system is in a critical state for component i . This is also known
failed. Others will mean that the system still functions. From
as Birnbaum’s measure of importance. The criticality importance measures for the components are
From table 2 Birbaum’s measure of importance for then:
component A is given by summing the probability of being in a
(0.944)(0.1)
critical state. This is: I CM A 0.6277
0.1504
GA = (1 – qB)(1 – qC)(1 – qD) (0.252)(0.2)
I CM B 0.3351
+ (1 – qB)(1 – qC) qD 0.1504
+ (1 – qB)( qC)(1 – qD) (0.144)(0.1)
+ (1 – qB) qC qD + qB (1 – qC)(1 – qD) I CM C 0.0957
= (1 – qB)+ qB (1 – qC)(1 – qD) (18) 0.1504
(0.162)(0.2)
I CM D 0.2154
GA = 0.944 0.1504 (25)
Similarly from tables 3 and 4 we get:
Fussell -Vesely Measure of Importance
GB = (1 – qA)(1 – qC) qD
+ (1 – qA) qC (1 – qD) The Fussell-Vesely measure of component importance for
+ (1 – qA) qC qD (19) component i is defined as the ratio of the probability of the
union of all minimal cut sets containing i and the system failure
GB = 0.252 probability.
and P C j
iC
j
GC = (1 – qA) qB (1 – qD) (20) I FVi (26)
GC = 0.144 QSYS
GD = (1 – qA) qB (1 – qC) (21)
GD = 0.162
For the simple system shown in figure 12 this measure gives:
Whilst the structural and Birnbaum measures can be produced qA 0.1
using the tabular approach this soon becomes impractical for I FVA 0.6649
real systems due to the size of the tables. QSYS 0.1504
An alternative means of calculating Birnbaum’s measure is
to use: q B (q C q D q C q D )
I FVB
Qsys QSYS
Gi ( q ) Qsys (1i , q) Qsys (0 i , q) (22)
qi 0.2(0.1 0.2 0.02)
0.3723
0.1504 (27)
where Qsys(1i,q) is the system failure probability with qi=1 and
Qsys(0i,q) is the system failure probability with qi=0.
qC qB 0.02
I FVC 0.1330
Criticality Measure of Importance QSYS 0.1504
component condition will mean that the pump keeps running Figure 13. Simple Tank Level Control System
and the problem is revealed by the tank overfilling. Others such
as relay R1 contacts fail closed will be unrevealed as this is the
normal operating state for that component. All of the
Relay R2
remains
Tank Overfills energised
Pump Motor
energised too
long
Power acrross Switch SW1
the PB/R1 remains R1 remains
Relay R2 contacts section closed energised
contacts
closed too long
2
SW2 remains
closed
Relay R1
PB contacts SW1 L1
closed
Relay R2 Relay R2
contacts fail remains
closed energised Switch 2 Level
fails sensor 2
R1 fails R1 remains
closed energised closed fails
1
R2
R1 2
SW2 L2
The fault tree for the undesired top event ‘ tank overfills’ is Table 6. Minimal Cut Sets
developed in figure 14.
The text boxes specify exactly what each gate output event in
Using the component failure data in table 1, the system failure
the fault tree represents. Each branch is developed downward
parameters can be calculated:
using AND and OR gates until basic events (component failure
events) are encountered and the failure causality development
Top Event Probability = 1.39 x 10-3
is terminated.
Top Event Frequency = 1.919 x 10-4 per hour
The final fault tree structure showing how the basic events
combine to cause the system level failure event is illustrated in
If the system failure predictions indicate an unacceptable
figure 15
performance the weaknesses can be identified using component
importance measures. The Fussell-Vesely measure is indicated
in table 7. This shows that component L1 provides the biggest
contribution to system failure.
10. CONCLUSIONS
R1 SW2 L2 A fault tree represents the causes of a specified system failure
mode in terms of the failure modes of the system components.
A summary of the features of fault tree analysis is:
Figure 15. Tank Overfill Fault Tree Structure Provides a well structured development of the system
failure logic.
Forms a documented record of analysis which can be BIOGRAPHIES
used to communicate fault development with
John Andrews, Ph.D, FIMechE, CEng, MIMA, CMath,
regulators etc.
MSaRS
Directly developed from the engineering system
structure. Professor of Infrastructure Asset Management
Head of the Resilience Engineering Research Group
Easily interpreted from the engineering viewpoint.
University of Nottingham
Analysis gives all minimal cut sets.
Faculty of Engineering,
Quantification gives the top system failure mode
University Park
probability or frequency.
Nottingham, NG7 2RD, England
Vulnerability to system failure can be identified using
importance measures.
email: john.andrews@nottingham.ac.uk
11. REFERENCES
John Andrews is Professor of Infrastructure Asset Management
1. W.E. Vesely, ‘A Time Dependent Methodology for Fault in the Faculty of Engineering at the University of Nottingham,
Tree Evaluation’, Nuclear Design and Engineering, no. 13 UK. He is also the Head of the Resilience Engineering
(1970): 337-360. Research Group. He moved to Nottingham in 2009 having
2. Z.W.Birnbaum, ‘On the importance of different previously worked for 20 years at Loughborough
components in a multi-component system’, Multivariate University. The focus of his research has been on methods for
Analysis 11, P.R.Krishnaiah, ed.,Academic Press, 1969 predicting system reliability and availability in terms of the
3. Fussell, J. B., ‘How to Hand-Calculate System Reliability component failure probabilities and a representation of the
Characteristics’, IEEE Transactions on Reliability, R-24, system structure. Much of his early work has concentrated on
(3), 1975 the Fault Tree technique and the use of the Binary Decision
4. Lambert H.E and Dunglinson C., ‘Interval Reliability for Diagrams (BDDs) as an efficient and accurate solution
Initiating and Enabling events’, , IEEE Transactions on method. More recently his main interest has been on modelling
Reliability, Vol 32, June 1983, pp 150-163. the effects of maintenance in order to identify the optimal
strategy for asset management. He is the author of around 350
5. Akers B, ‘Binary Decision Diagrams’, IEEE Trans on
research papers on this topic and is joint author, with Bob Moss,
Computers, 27(6), 509-516, 1978.
of a text book, Reliability and Risk Assessment, now in its
6. Bryant R, ‘Graph Based Algorithms for Boolean Function
second edition, published by ASME. John was the founding
Manipulation’, IEEE Trans on Computers, 35(8), 677-691, Editor of the Journal of Risk and Reliability and is a member of
1986. the Editorial Boards for Reliability Engineering and System
7. Schneeweiss W., ‘Fault Tree Analysis Using Binary Safety, and Quality and Reliability Engineering International.
Decision Diagrams’, IEEE Trans on Reliability, 34(5),
453-457, 1985. Sally Lunt, BSc, Ph.D
8. Rauzy A, ‘New Approaches for Fault Tree Analysis’, Research Fellow in Risk and Reliability Engineering
Reliability Engineering and System Safety, 05(59), 203- Resilience Engineering Research Group
211, 1993. University of Nottingham
9. Sinnamon R.M. and Andrews J.D., ‘Quantitative Fault Faculty of Engineering,
Tree Analysis Using Binary Decision Diagrams’, University Park
European Journal of Automation, 30 (8), 1996, 1051-1071. Nottingham, NG7 2RD, England
10. Sinnamon R.M and Andrews J.D., ‘Improved Efficiency in
Qualitative Fault Tree Analysis’, Quality and Reliability email: sally.lunt@nottingham.ac.uk
Engineering International, Vol 13, 1997, pp293-298.
11. Sinnamon R.M and Andrews J.D., ‘Improved Accuracy in Sally Lunt is a Research Fellow at the University of
Quantitative Fault Tree Analysis’, Quality and Reliability Nottingham. She graduated in Mathematical Education from
Engineering International, Vol 13, 1997, pp285-292 Loughborough University and then went on to study her
12. Andrews J.D. and Moss T.R., ‘Reliability and Risk doctorate in the Risk and Reliability Engineering Group at the
Assessment’, Professional Engineering Publications Ltd, University. The subject of her thesis was importance measure
2002. for non-coherent fault trees. Sally has spent a significant
13. Haasl D.F., Roberts N.H., Vesely, W.E. and Goldberg F.F., proportion of her career to date in education. She recently
‘Fault Tree Handbook’, US Nuclear Regulatory returned to research with the Resilience Engineering Research
Commission NUREG-0492, 1981 Group and specializes in advanced methods for fault tree
14. Henley E.J. and Kumamato H., ‘Reliability Engineering analysis and phased mission modelling.
and Risk Assessment’, Prentice-Hall, 1981