Neural Network Methodology Process Fault Diagnosis: A For
Neural Network Methodology Process Fault Diagnosis: A For
Neural Network Methodology Process Fault Diagnosis: A For
Grid
oil Feed
t
Blast Steam
ers each fault and symptom node as a neural-network-process- rcsults of this study were obtained by the Neuralworks Profes-
ing element. Most importantly, hidden layers are added between sional 11 neural network simulation package.
symptoms and their immediate faults to enable the development
of internal representations. The simulation issues for Neural Accuracy calculations
CATDEX are presented in the following sections. It is evident from Table 1 that the training data are binary
values, in which 1 represents a fault and 0 a nonfault. The out-
Network training puts, however, are real values since infinite weights are required
In this initial study, training was performed only on the bot- to drive the results to 1 or 0.
tom-most inference network layer consisting of the elemental In this study, the accuracy of a single output node is defined
symptoms and its immediate faults (Figure 4). In addition, to as one minus the absolute difference between the desired fault
simplify the ensuing analysis, the excessive catalyst loss and and the predicted fault determined during the recall or general-
yield loss malfunction categories were chosen because they con- ization phase. The network accuracy in fault detection for a sin-
tain the most intriguing Boolean combinations for learning and gle training pattern is computed as the average of the 13 output
subsequent generalization. During network training, the input node accuracies. Consequently, the overall accuracy is the aver-
patterns consist of symptoms resulting from a fault, and the out- age network accuracy for all training patterns over all training
put patterns contain the specific fault. Thus, as numerically repetitions.
depicted in Figure 4, the neural network contains 18 input nodes
and 13 output nodes which represent the symptoms and its Number of hidden layers and hidden units
immediate faults, respectively. I n this study, the network was As previously mentioned, neural networks partition the prob-
trained on all single fault occurrences in the partial FCCU lem space into numerous decision regions when performing its
inference network. Table 1 lists the training data for the 13 classifications. The partitioning capability is a function of both
faults involved. The training data were presented a total of 500, the number of hidden layers and hidden units. For example, net-
1,000, 2,000, 4,000, 6,000, and 8,000 iterations or time steps. works utilizing sigmoid nonlinearities form fuzzy hyperplanes
Lastly, all network training was repeated three times with dif- with no hidden layers, convex polygonal boundaries for one hid-
ferent initial random weight values to average over variations in den layer, and arbitrarily complex decision regions for two hid-
performance. den layers (Lippmann, 1987; also see Kolmogorov, 1956, for a
mathematical treatment on the capabilities of multilayer net-
Back -propagation learning parameters works). Likewise, the number of hidden units must also be
In all simulation experiments, the learning rate parameter, 1, determined such that the complexity of the decision regions
was set at 0.9, the momentum term, a, was set at 0.6, and the formed corresponds to the problem complexity needed for cor-
gain term, p, was set at 1 .O. Furthermore, the initial weights of rect classification.
the network were assigned to small uniformly-distributed ran- Presently, neural networks with one hidden layer were used
dom values between -0.1 and +0.1 to prevent the hidden units primarily in simulations. Nevertheless, experiments on networks
from acquiring identical weights during training. Finally, the with two hidden layers were performed to examine fault diag-
5 6
9 I1 13
Vanadium Sodium Nickel Hydrc-
Thermal
Poisoning Poisoning Poisoning Thermal
Deactivation
off catalyst
o of catalyst off Catalyst
o Deactivation
Vanadium
1L
High
Sodium
on Catalyst
13
Coke
Make has
Increased
14
The top layer represents the operational problem categories. The next layer (1-13) represent the faults of the elemental symptom layer (1-18)
nosis on the FCCU with more complex decision regions. The by presenting symptoms resulting from two and three process
number of hidden units required to perform accurate diagnosis malfunctions. Following this, the fault diagnosis behavior is
was determined empirically. All series of experiments were analyzed for cases where faults share common symptoms and
repeated using networks of 5 , 7, 1 1, 15, 2 1 and 27 hidden units. for cases where partial symptoms of a fault are presented to the
Due to overcrowding of the plots, only the results of the net- network. Lastly, the recall and generalization results are exam-
works that contain 5, 7, 15 and 27 hidden units are shown. The ined for networks trained on a subset of single fault input-output
other results were similar to these. patterns. The preceding experiments are duplicated for two hid-
den-layer neural network architectures.
A Performance Evaluation of Neural CATDEX
In the ensuing sections, the fault diagnosis simulation results Recall results of trained single fault patterns
are presented in their entirety. The facility of neural networks To test the learning proficiency and thus the network conver-
for recalling trained faults is explored. Next, the trained net- gence, the symptoms resulting from the 13 faults were presented
work generalization capacity to multiple faults is investigated to all of the trained networks. During this stage, all output val-
I 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
d O O O O O l O O O O O O O
~
i 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
d O O O O O O l O O O O O O
~~ ~ ~
I 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0
d O O O O O O O l O O O O O
~~~~ ~ ~
I 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0
d O O O O O O O O l O O O O
~~ ~~
I 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0
d O O O O O O O O O l O O O
i 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
d O O O O O O O O O O l O O
i O O O O O O O O O O O O O O O l 1 0
d O O O O O O O O O O O l O
i O O O O O O O O O O O O O O O O l 1
d O O O O O O O O O O O O l
Consult Figure 4 for the input-output pattern legend I = fault; 0 - nonfault
alization curves for networks trained on single faults. Similarly, Time Steps
Figure 7 shows the overall average three-fault generalization
Figure 7. Network performance curves for three-fault
curves.
generalization.
The networks are thus able to perform novel generalizations
to multiple fault conditions. As in recall, the asymptotic limits
exhibited by these curves suggest a sufficient number of hidden performance tends to increase with the number of hidden units.
units to perform the task. However, unlike recall where learning Lastly, Figures 8 and 9 display bar charts of selected two- and
performances are not significantly affected by the number of three-fault generalizations for networks trained on single faults,
hidden units, useful generalizations seem to require a certain respectively. All the bar charts correspond to a network with 27
minimum number of hidden units which depend on the complex- hidden units in a single hidden layer. Figure 8 displays the
ity of the particular task. If too few are present, all the essential detection accuracy of the detection of two fault occurrence by
attributes of the input-output patterns cannot be completely the neural network. One of the two faults in Figure 8 is always
represented. fault 1, namely, “Hole in reactor plenum” as seen in Figure 4.
The best average performance was achieved by a network The other member of the two fault pair varies from fault 2
with 27 hidden units which attained 77.54% accuracy for two- through fault 13. So, the x-axis displays the fault combina-
fault generalization and 53.60% accuracy for three-fault gener- tions-( 1, 2), ( I , 3), and so on. The dark-colored bars corre-
alization at 8,000 time steps. It appears from Figures 6 and 7 spond to the accuracy of the detection of fault 1 , while the
that the generalization accuracy can further be improved with hatched bars correspond to the accuracy of the other faults in
more hidden units as saturation levels have definitely not been the set 2 through 13. Figure 9 has similar results, except that
attained. Thus, for the task of generalizing to multiple faults, they now correspond to three fault combinations.
Number of
Hidden Units
80
60-
60
40
40
20 -
20
i
1 0
2 3 4 5 6 7 8 9 1011 1213
-$
4
Y
1
get weighted more in the corresponding outputs generated by
the neural network. This is reflected in the accuracy shown in
Figure 10, such as in the case of fault 4 and 6. Fault 4 has two
i4“
symptoms and fault 6 has four symptoms; as a result, when all
these symptoms are observed, fault 6 is recommended with a
higher accuracy than fault 4. The accuracy may be interpreted
in a limitcd sense as the probability of the recommendations for
the different faults predicted by the network.
.,
Output Node Number “A,B” work gives greater emphasis to faults displaying more symp-
toms. This behavior is a result of the cooperative nature of the
Figure 9. Bar chart for three-fault generalization. computations performed by the neural networks. In this section,
Predicted faults: I ; 0 ,“A”; cl, “B.” we present the network’s response to input patterns with incom-
plete set of symptoms for fault occurrence. This is examined for
Generalization results for faults sharing the “high catalyst losses in reactor” operational problem which
CommOn symptoms contains four symptom nodes and three fault nodes (Figure 4).
From the prcceding, it is evident that Neural CATDEX, The results are presented in Figure 1 1 in which the abscissa con-
unlike CATDEX which uncovers only single causal sources for a tains selected symptom patterns represented as a vertical binary
given symptom pattern, performs multiple fault diagnosis. This string with symptom node 1 located at the base. From this fig-
section presents the generalization results for faults which share ure, it is evident that the network generalizes to a “no fault”
common symptoms. It examines the two-fault generahations condition for an input of no faulty symptoms (‘‘0000”). How-
involving fault node 4 with fault node 5-8 for the “high catalyst ever, this is not the case for other partial symptom patterns.
losses i n regenerator” operational problem. The common symp- Even though the symptom patterns are incomplete and thus do
tom of faults 4-8, node number 9 (“losses are high and steady”), not warrant the existence of a fault, the network generali~esto
is explicit in Figure 4. predict the occurrence of an appropriate fault with some non-
The generalization results are shown in Figure 10. From this zero accuracy.
ligure, the outputs of fault nodes 4 and 6, which exhibit symp- For example, the appearance of the “rate of loss increasing”
symptom (“OIOO”), which is shared by fault nodes 1 , 2, and 3
5 6 7 8
., .,
faults which share common symptoms for the Figure 1 1. Probabilistic generalization for partial symp-
“high catalyst losses in regenerator” opera- toms for the “high catalyst losses in reactor”
tional problem. operational problem.
Predicted faults: 4,ffl “A.” Predicted faults: I ; 0 ,2; 0,3.
fault node 3 exhibits only one of three symptoms, the probability Network Actual Predicted
of fault 1 or 2 occurring is higher and is reflected in the pre- Architecture Faults Faults
dicted accuracies (i.e., 28%, 17% and 0.3% for faults 1 , 2 and 3, 18-14-1 3-1 3 1,6 6
respectively). Compare these accuracies with the case when all I , 12 9, 12
the evidences are present for these faults from Table 2 (the aver- 1.8 3 , 7, 8
age overall accuracy of recall for these faults is 98.18%). Thus, 18-27-27-13 1, 3 2. 3
for incomplete symptom patterns, the neural network gives 1.9 1,279
greater emphasis to a fault displaying a higher percentage of 1,4 I , 4, 12
symptoms which may result in its occurrence. Therefore, the
network is able to generalize its diagnostic knowledge even for
cases with incomplete evidence and come up with a probabilistic performing multiple fault diagnosis for all initially-trained
recommendation which are quite accurate. This is a very useful faults.
diagnostic characteristic in practical situations.
Performance results for two hidden layers
The final series of simulations analyze the decision regions
Recall and generalization results for incomplete obtained for four-layer neural networks (i.e., two hidden layers).
training of single fault patterns The topology of the networks in consideration contains 14-1 3
This experiment investigates generalizations produced by and 27-27 hidden units in each hidden layer, respectively. As in
networks trained on a subset of the single-fault input-output previous cases, the trained networks are tested on both recall
patterns. Networks in this experiment are trained with one of and multiple fault generalization. These results are subse-
the 13 faults withheld. However, during the recall phase, symp- quently correlated with those obtained for the 27 hidden-unit
toms resulting from all the 13 faults are presented to the net- single-layer network.
work. Similarly, during generalization, symptoms resulting As anticipated, the increased connections of the four-layered
from trained and untrained faults are present. networks require significantly more training iterations in attain-
The recall on trained faults were very good, similar to the ing comparable single hidden layer recall accuracies. For exam-
accuracies shown in Figure 5 and Table 2. However, the recall ple, recall accuracies of 98.1 8% were achieved by an 18-27-13
accuracy for untrained faults were close to zero indicating that network in 8,000 time steps, while an 18-14-13-13 (96.12%)
the network exhibited no knowledge of the untrained faults. Fig- and an 18-27-27-1 3 (97.66%) networks require 15,000 time
ure 12 displays the two-fault generalization results for cases steps. Therefore, using an additional hidden layer by dividing
where one single fault pattern is withheld during training (i.e., the single hidden layer units into two layers (i.e., 14-13) or by
faults 5 , 6 , 9 , 10, and 1 1 withheld one at a time). In all instances, incorporating another hidden layer of equal size (i.e., 27-27)
the network is unable to generalize to the untrained faults. This does not significantly aid the recall process.
is illustrated in Figure 12 as all untrained fault output nodes dis- While the preceding results are expected, generalization re-
play near zero accuracies (the hatched bars are indistinguish- sults produced by an additional hidden layer are unpredictable.
able from the x-axis). As expected, the network is capable of A partial listing of two- and three-fault generalizations for the
four-layer networks trained on 15,000 time steps are in Tables 3
and 4. respectively. Thus the fault discriminatory capability
with two hidden layers leads to erroneous fault predictions,
while single hidden layer networks offer useful generalizations.
8
Y
In this paper, we have proposed a neural-network-based
methodology for developing automated systems for process fault
diagnosis. As demonstrated, neural networks are able to acquire
diagnostic knowledge from examples of fault scenarios. This
a
g!
u Table 4. Partial Listing of Three-Fault Generalization for
3a Two Hidden Layer Networks @ 15,000 Time Steps
5 6 9 10 11 1 8- 14- 1 3- I 3 1, 4, 13 13
1,6, 1 1 9, 12
I , 5, 13 5, 13
Output Node Number “ A
18-27-27- 13 1,5,9 None
Figure 12. Generalization results for incomplete single-
1.7, 1 1 6
fault training. I , 8, 12 4, 8, 12
Predicted faults: m, I ; B,“A” (untrained single fault).