Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Residual Generation For Diagnosis of Additive Faults in Linear Systems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Residual generation for diagnosis of additive faults in

linear systems
F. Gustafsson
We here analyze the parity space approach to fault detection and isolation in a
stochastic setting, using a state space model with both deterministic and stochas-
tic unmeasurable inputs. We rst show the similarity and a formal relationship
between a Kalman lter approach and the parity space.
A rst main contribution is probabilistic design of a parity space detection and
diagnosis algorithm, which enables an explicit computation of the probability
for incorrect diagnosis.
A second main contribution is to compare a range of related methods starting
at model-based diagnosis going to completely data-driven approaches: (1) the
analytical parity space is computed from a known state space model, (2) this
state space model is estimated from data, (3) the parity space is estimated us-
ing subspace identication techniques and (4) the principal component analysis
(PCA) is applied to data. The methods are here presented in a common parity
space framwork.
The methods are applied to two application examples: a DC motor, which is a
two-state SISO model with two faults, and a larger F16 vertical dynamics ve
state MIMO model with six faults. Different user choices and design parameters
are compared, for instace how the matrix of diagnosis probabilities can be used
as a design tool for performance optimization with respect to design variables
and sensor placement and quality.
Key words: fault detection, diagnosis, Kalman ltering, adaptive lters, linear systems,
principal component analysis, subspace identication
1 Introduction
The parity space approach to fault detection [1,3,4,7,8] is an elegant and general
tool for additive faults in linear systems and is based on intuitively simple algebraic
projections and geometry. Simply speaking, a residual r
t
is a data projection
r
t
= P
T
Z
t
, Z
t
=
_
_
_
Y
t
U
t
_
_
_, (1)
Preprint submitted to Elsevier Preprint 2005-01-19
where the data vector Z
t
contains the measured input (U
t
) and output (Y
t
) over a
certain time window. The parity space approach provides a tool to compute P to
yield a residual vector that is zero when there is no fault in the system and reacts to
different faults in different patterns, enabling a simple algorithm for fault isolation
(deciding which fault actually occurred). Examples on simulated data often show
very good results. Consider for instance Figure 1, where a DC motor is subject to
rst an offset in the control input and then an offset in the velocity sensor.
0 10 20 30 40 50 60 70 80
1
0
1
2
Structured residuals for L = 2
r
1
r
2
0 10 20 30 40 50 60 70 80
1
0
1
2
Structured residuals for L = 2 with measurement noise (SNR=221)
r
1
r
2
Fig. 1. Parity space residual for a DC motor, as described in Section 5, subject to rst an
input voltage offset and then a sensor offset. The two residuals are designed to be non-zero
for only one fault each. The lower plot illustrates extremely high sensitivity in residuals to
measurement noise (SNR=221).
The upper plot shows how structured parity space residuals correctly point out
which fault has occurred. A main drawback is that the approach does not take mea-
surement errors and state noise into consideration as in the classical Kalman lter
literature. The lower plot in Figure 1 illustrates the high sensitivity to even quite
small a measurement noise.
The rst main contribution is a stochastic design and analysis of the parity space
approach. We here mix the linear state space models used in fault detection and
Kalman ltering, treating deterministic and stochastic disturbances in different ways.
Previous work in this direction includes [14], [1] (Ch. 7) and [8] (Ch. 11). Related
ideas using principal component analysis (PCA) are found in the chemical diag-
nosis literature as [2,5]. This work is a continuation of [11], where an additive
fault was included in an augmented state vector, and observability of the fault was
used as the tool to assess diagnosability. In this paper, an explicit expression for
P
i,j
= P(diagnosis j| fault i) is given for any parity space, and the proposed detec-
tion and isolation algorithm is optimally designed to minimize these probabilities.
2
The second main contribution is a comparison of alternative approaches to compute
the projection P in (1):
(i) The model-based parity space, where P(A, B, C, D) depends on the known
state space model, described by the quadruple (A, B, C, D).
(ii) System identication gives (

A,

B,

C,

D), from which the parity space can be
approximated as P(

A,

B,

C,

D). One here needs to know the structure of the
state space model.
(iii) Subspace approaches to system identication provides a way to directly com-
pute

P. Again, one needs to know the structure of the state space model.
(iv) The principal component approach, where one directly estimates

P from data.
Compared to above, one needs to know the state order, but not how the data
Z
t
is split into inputs and outputs. That is, causality is no concern in the PCA
approach. This is one main reason for its wide spread [2] in chemical engi-
neering, where sometimes thousands of variables are measured.
Simulations on a DC motor and F16 vertical dynamics will be used to illustrate the
contributions. Preliminary results of the two main contributions have previously
been published in [12,13].
2 Models and notation
2.1 System model
The linear system is here dened as the state space model
x
t+1
=A
t
x
t
+B
u,t
u
t
+B
d,t
d
t
+B
f ,t
f
t
+B
v,t
v
t
y
t
=C
t
x
t
+D
u,t
u
t
+D
d,t
d
t
+D
f ,t
f
t
+e
t
. (2)
The matrices A, B, C, D depends on the system, while the signals belong to the
following categories:
Deterministic known input u
t
, as is common in control applications.
Deterministic unknown disturbance d
t
, as is also common in control applica-
tions.
Deterministic unknown fault input f
t
, which is used in the fault detection lit-
erature. We here assume that f
t
is either zero (no fault) or proportional to the
unit vector f
t
= m
t
f
i
, where f
i
is all zero except for element i which is one.
Exactly which part of the system fault i affects is determined by the correspond-
ing columns in B
f ,t
and D
f ,t
. This fault model covers offsets in actuators and
sensors for instance. The fault magnitude m
t
can be arbitrary, but in most of the
discussion we consider a constant magnitude m
t
= m within the analysed data
window.
3
Stochastic unknown state disturbance v
t
and measurement noise e
t
, as are used
in a Kalman lter setting. There is an ambiguity of the interpretations of v
t
and
d
t
. We might treat v
t
as a deterministic disturbance, but in many cases this leads
to an infeasible problem where no parity space exists. Both v
t
and e
t
are here
assumed to be independent with zero mean and covariance matrices Q
t
and R
t
,
respectively.
The initial state is treated as an unknown variable, so no prior information is
needed.
The dimension of any signal s
t
is denoted as n
s
= dim(s
t
). Traditionally, either
a stochastic (d
t
= 0) or a deterministic (v
t
= 0, e
t
= 0) framework is used in the
literature, but here we aim to mix them and combine the theories.
The work concerns primarily tests based on data from a sliding window, in which
case the signal model can be written
Y
t
= Ox
tL+1
+H
u
U
t
+H
d
D
t
+H
v
V
t
+H
f
F
t
+E
t
. (3)
To establish the correspondance of models (2) and (3), stack L signal values to
dene the signal vectors Y
t
=
_
y
T
tL+1
, . . . , y
T
t
_
T
, etc. for all signals. We here
use the time index t to note that fault detection is a recursive task. Also dene the
Hankel matrices
H
s
=
_
_
_
_
_
_
_
_
_
_
D
s
0 . . . 0
CB
s
D
s
. . . 0
.
.
.
.
.
.
.
.
.
CA
L2
B
s
. . . CB
s
D
s
_
_
_
_
_
_
_
_
_
_
(4)
for all signals s = u, d, f, v and the observability matrix
O =
_
_
_
_
_
_
_
_
_
_
C
CA
.
.
.
CA
L1
_
_
_
_
_
_
_
_
_
_
. (5)
The covariance of the measurement vector is denoted
S = Cov(H
v
V
t
+E
t
). (6)
If the system is time-varying, then O, H
s
, S will all be time-varying as well.
4
2.2 Projections and whitening operations
The basic tools and mathematical notation in the derivation are the following:
Pseudo-inverse is dened as A

= (A
T
A)
1
A
T
.
Projection operator. Aprojection on the range space R(A) spanned by the columns
in A is given by P
A
= A(A
T
A)
1
A
T
= AA

, with the obvious properties


P
A
A = A and P
A
P
A
= P
A
. R
A
denotes an arbitrary basis for R(A).
Projection on null space. To remove the state dependence in (3), the orthogonal
projection I P
O
is used, with the obvious properties (I P
O
)O = 0 and
(I P
O
)(I P
O
) = (I P
O
). N
O
denotes an arbitrary basis for N(O).
Whitening. If Cov(r) = P, then Cov(P
1/2
r) = I, so pre-multiplying with
a symmetric matrix square root P
1/2
with P
1/2
P
1/2
= P is a whitening
operation.
Least Squares (LS) estimation. For the equation systemAx = r, the least squares
(LS) solution is x
LS
= A

r.
Minimum variance (MV) estimation. For the equation system Ax = r, the least
squares (LS) solution x
LS
= A

r is the minimum variance estimate if and only


if Cov(r) = I. That is, using pre-whitened residual, we have
x
MV
= (P
1/2
A)

P
1/2
r
= (A
T
P
1
A)
1
A
T
P
1
r.
Angle between subspaces. Let A and B be two M N matrices with M > N.
The gap metric distance between the subspaces spanned by the columns of A
and B, respectively, is given by
d(A, B) = P
A
P
B
= A(A
T
A)
1
A
T
B(B
T
B)
1
B
T
(7)
for some matrix norm, where we can choose the Frobenius norm.
2.3 State estimation
From the properties above, the state estimator over a sliding window for the model
(3) is immediately derived. The least squares estimate gives the state observer,
while the minimum variance estimator gives the Kalman lter state estimates
x
LS
tL+1
= O

(Y
t
H
u
U
t
), (8a)
x
MV
tL+1
= (S
1/2
O)

S
1/2
(Y
t
H
u
U
t
). (8b)
Here, we can interprete K = (S
1/2
O)

S
1/2
as the Kalman gain. For more details,
see [11].
5
3 Residual generation
3.1 Parity space
Without loss of generality, the residual generating matrix in (1) can be factorized
as
r
t
= W
T
_
I, H
u
_
_
_
_
Y
t
U
t
_
_
_, (9a)
= W
T
(Y
t
H
u
U
t
) (9b)
= W
T
(Ox
tL+1
+H
d
D
t
+H
f
F
t
+H
v
V
t
+E
t
) (9c)
= W
T
(H
f
F
t
+H
v
V
t
+E
t
). (9d)
The parity space is dened to be insensitive to the input (yielding the factorization
in (9a)), the initial state and deterministic disturbances, which implies that r
t
= 0
for any initial state x
tL+1
and any disturbance sequence d
k
, k = t L + 1, . . . , t,
provided that there is no stochastic term present (e
k
= 0, v
k
= 0 for k = t L +
1, . . . , t) and no fault, f
k
= 0, k = t L + 1, . . . , t.
Denition 1 (Parity space) The parity space is dened as in (1), with P = W[I, H
u
]
for any data projection W in the null space of [O, H
d
]. That is,
W
T
[O H
d
] = 0 W N
[O H
d
]
. (10)
From (9) we get
E(r
t
) = W
T
H
f
F
t
, (11a)
Cov(r
t
) = W
T
SW. (11b)
Any deviation from zero of r
t
is either due to the noise or one of the possible faults,
and the diagnosis task is to distinguish these causes.
The maximal dimension of the residual vector is given by
L(n
y
n
d
) n
x
max
W
n
r
Ln
y
n
x
(12)
The inequalities become an equality in case n
d
= 0, that is, no disturbance. Equality
with the lower bound holds if the matrix [O H
d
] has full column rank. This shows
that a parity space always exists (max
w
n
r
> 0) if there are more observations than
disturbances, if L is chosen large enough.
Another approach, not pursued here, is to apply fault decoupling, where each resid-
ual is designed separately by the condition W
T
i
[O H
d
H
f
F
i
] = 0. Here F
i
is a
6
fault vector that excites all faults except for fault i. The advantage is that the tran-
sient as shown in the upper plot in Figure 1 will disappear. The disadvantage is that
more measurements needed (n
y
n
d
+ n
f
) and that one projection W
i
is needed
for each fault. We will not use fault decoupling in the sequel, although the same
principles are applicable to this case as well.
3.2 Kalman lter based residuals
Generally, a linear state estimator can be written
x
tL+1
= K(Y
t
H
u
U
t
).
The estimator is unbiased if KO = I, which of course is the case for (8). It gener-
ates a vector of model errors as

t
= Y
t


Y = Y
t
O x
tL+1
H
u
U
t
(13a)
= (I OK)(Y
t
H
u
U
t
) (13b)
= (I OK)(Ox
tL+1
+H
d
D
t
+H
v
V
t
+E
t
+H
f
mF
i
) (13c)
= (I OK)(H
d
D
t
+H
v
V
t
+E
t
+H
f
mF
i
). (13d)
In the last equality, the unbiased property of the state estimate is used.
From (13a) we see that the covariance of the model errors is minimized using the
minimum variance Kalman lter estimate, so this is the only state estimator dis-
cussed in the sequel. The Kalman lter model errors in (13) have mean and covari-
ance:
E(
KF
t
) =(I O(O
T
S
1
O)
1
O
T
S
1
)(H
d
D
t
+H
f
mF
i
),
Cov(
KF
t
) =S O(O
T
S
1
O)
1
O
T
).
The model error generating matrix I O(O
T
S
1
O)
1
O
T
S
1
is a projection ma-
trix, so the covariance matrix of
KF
t
is singular. That is, there are many linear com-
binations of
KF
t
that are always zero, independently of the data. More precisely,
the rank of the covariance matrix is
rank(Cov(
KF
t
)) = rank(I O(O
T
S
1
O)
1
O
T
S
1
) (14)
= rank(I) rank(O(O
T
S
1
O)
1
O
T
S
1
) (15)
= rank(I) rank(O) Ln
y
n
x
, (16)
with equality if and only if the system is observable (rank(O) = n
x
). By intro-
ducing a basis W
KF
for the range of this data projection matrix, we get a residual
generator
r
t
= W
T
KF
(Y
t
H
u
U
t
), W
KF
= R
IO(O
T
S
1
O)
1
O
T
S
1. (17)
7
If the system is observable, then the dimension of the residual in (17) is
n
r
= Ln
y
n
x
. (18)
3.3 Comparison
The parity space and Kalman lter prediction errors are related as follows:
The observer and Kalman lter can be used to compute a model error that can
be reduced to a residual with non-singular covariance matrix for the case of no
disturbance D
t
= 0, where the latter gives minimum variance residuals.
Since r
t
= W
T
KF
(Y
t
H
u
U
t
) has the same size as the parity space residual
dened in (9) (namely Ln
y
n
x
) and it does not depend on the initial state, it
belongs by denition to the parity space.
The Kalman lter innovation can be transformed to a parity space where also the
disturbance is decoupled (besides the initial state), by another projection

r
t
=
N
W
T
KF
H
d
r
t
.
That is, these two design methods are more or less equivalent, so in the sequel we
will just refer to the parity space residual.
4 Diagnosis
We here detail an algorithm for parity space detection and isolation which mini-
mizes the risk for incorrect isolation and discuss the improvements to the structured
parity space approach.
4.1 Residual normalization
The distribution of the residual in (11) will in the design and analysis be assumed
Gaussian
(r
t
|mf
i
) N(mW
T
H
f
F
i
. .

i
, W
T
SW), (19)
which can be motivated in two ways:
It is Gaussian if both V
t
and E
t
are Gaussian.
It is approximately Gaussian by the central limit theorem when dimr
t
<<
dimV
t
+ dimE
t
, which happens if the data window L is large enough. That
is, asymptotically in L, it is Gaussian.
8
It follows from (19) that each fault is mapped onto a vector
i
= W
T
H
f
F
i
with a
covariance matrix W
T
SW. We can normalize the residual distribution as follows,
which will enable probability calculations in Section 4.2.
Denition 2 (Normalized parity space) The normalized parity space is dened
as
[Normalizedparityspace] r
t
=

W
T
(Y
t
H
u
U
t
),

W
T
= (W
T
SW)
1/2
W
T
,
(20)
for any parity space W
T
, where S = Cov(Y
t
H
u
U
t
) is dened in (6). The parity
space is unique up to a multiplication with a unitary matrix. We call

W
T
H
f
F
i
=
(W
T
SW)
1/2
W
T
H
f
F
i
the Fault to Noise Ratio (FNR).
The bar on r, , W is here and in the sequel used to denotes normalized variables.
The normalized residual satises (asymptotically)
( r
t
|mf
i
) =

W
T
(H
v
V
t
+E
t
+mH
f
F
i
) (21a)
N(m

W
T
H
f
F
i
. .

i
, I) = N(m
i
, I). (21b)
The FNR
i
explicitly reveals how much each fault contributes to the residual
relative to Gaussian unit noise.
One interpretation of this denition is that the parity space residual is whitened spa-
tially and temporally. We stress that a transformation of the residual space affects
how the fault vectors look like, but not the ability to make diagnosis. The point
to keep in mind is that there are many obtainable parity spaces, the sliding win-
dow size L affects their dimension n
r
and the weighting matrix W their stochastic
properties. The structured residual is a common choice in the literature on fault
detection.
Denition 3 (Structured parity space) Normalize W so the fault vectors
i
point
in perpendicular directions. The most common choices of residual pattern are
[
1
,
2
, . . .
n
f
] = I and [
1
,
2
, . . .
n
f
] = 11
T
I,
both dening a set of corners on a unit cube. This approach presume n
f
= n
r
. The
design is done by solving
[
1
,
2
, . . .
n
f
] = TW
T
H
f
(1
L
I
n
f
) (22)
for T and taking W
T
struc
= TW
T
. Here denotes the Kronecker product.
Figure 2 illustrates some fundamental differences of structured and normalized par-
ity spaces:
9
Figure 2.a shows one example of a structured residual. In a noise-free setting,
diagnosis is simple, but in the noisy case, the decision regions become quite
complicated non-linear surfaces.
Figure 2.a shows normalized residuals. Here, the stochastic uncertainty is a unit
sphere, and the decision regions are straight lines. The price paid is non-perpendicular
fault vectors
i
.
Another important difference concerns the residual dimension:
n
f
< n
r
The structured residual is truncated in some way, and information is lost.
n
f
= n
r
This is the case in Figure 2.
n
f
> n
r
The structured residual concept does not work, while isolation is still pos-
sible as outlined in the algorithm below as long as only single faults are consid-
ered.
2 0 2 4 6
2
0
2
4
6
0 1
2
r
1
r
2
Structured Residual , L=2
(a)
1 0.5 0 0.5 1 1.5
0
0.5
1
1.5
0
1
2
r
1
r
2
Normalized Structured Residual L=2
(b)
Fig. 2. Structured and normalized residual fault pattern with uncertainty ellipsoids for fault
1 and 2, respectively. Solid line is for unnormalized residuals, and dashed line after nor-
malization. The dashed line is the optimal decision region.
4.2 Algorithm
Since ( r
t
|f = 0) N(0, I) we have ( r
T
t
r
t
|f = 0)
2
(n
r
). The
2
test provides
a threshold h for detection, and fault isolation is performed by taking the closest
fault vector in the sense of smallest angle difference (since the magnitude m of
is unknown).
Algorithm 1 On-line diagnosis
1. Compute a normalized parity space

W, e.g. (20).
10
2. Compute recursively:
Residual: r
t
=

W
T
(Y
t
H
U
U
t
)
Detection: r
T
t
r
t
> h
Isolation:

i = arg min
i

r
t
r
t



i

i

2
= arg min
i
angle( r
t
,
i
)
where r
T
t
r
t

2
(n
r
) and angle( r
t
,
i
) denotes the angle between the two vectors r
t
and
i
. A detection may be rejected if no suitable isolation is found (min
i
angle( r
t
,
i
)
is too large) to improve false alarm rate.
For diagnosability of single faults, the only requirement is that all faults are mapped
to different directions
i
.
In the two-dimensional residual space, as in the example in Figure 2, the probability
for false alarm, P
FA
, (incorrect detection) can be computed explicitly as
P
FA
=
_
r
T
t
rt>h
1
2
e

r
T
t
r
t
2
dr
=
_
2
0
_

h
x
2
e

x
2
2
dxd
= e

h
2
2
.
which means that the threshold design is to choose P
FA
and then letting h =
_
2 log(P
FA
). Note that the true false alarm rate may be lower if we reject alarms
where min
i
angle( r
t
,
i
) is too large. A more precise analysis is given below.
4.3 Analysis
We can interpret the diagnosis step as a classication problem, and compare it to
modulation in digital communication. Performance depends on the SNR, which
here corresponds to FNR m
i
. In modulation theory, using an additive Gaussian
error assumption, it is straightforward to compute the risk for incorrect symbol
detection. We will here extend these expressions from regular 2D (complex plane)
patterns to general vectors in R
nr
.
The risk of incorrect diagnosis can be computed exactly in the case of only two
faults as follows. It relies on the symmetric distribution of r
t
, where the decision
region becomes a line, as illustrated by the dashed lines in Figure 2(b). The rst step
is a change of coordinates to one where one axis is perpendicular to the decision
plane. Because of the normalization, the Jacobian of this transformation equals one.
The second step is to marginalize all dimensions except the one perpendicular to
11
the decision plane. All these marginals integrate to one. The third step is to evaluate
the Gaussian error function. Here we use the (Matlab) denition
erfc(x) = 2
_

x
1

2
e
x
2
/2
dx
The result in R
2
(cf. Figure 2) can be written
P(diagnosis i|fault mf
j
) =
1
2
erfc
_
m
j
sin(

j
2
)
_
.
In the general case, the decision line becomes a plane, and the line perpendicular
to it is given by the projection distance to the intermediate line
1
+
2
as
m
_

1

(
1
,
1
+
2
)
(
1
+
2
,
1
+
2
)
(
1
+
2
)
_
,
where (a, b) = a
T
b denotes a scalar product, and we get the following algorithm:
Algorithm 2 Off-line diagnosis analysis
1. Compute a normalized parity space W, e.g. (20).
2. Compute the normalized fault vectors
i
in the parity space as in (21b).
3. The probability of incorrect diagnosis is approximately
P(diagnosis i|fault mf
j
) =
1
2
erfc
_
m
_
_
_
_
_

j

(
j
,
j
+
i
)
(
j
+
i
,
j
+
i
)
(
j
+
i
)
_
_
_
_
_
_
(23)
Here m denotes the magnitude of the fault. If this is not constant, we replace
i
=

W
T
H
f
F
i
in (21b) with
i
=

W
T
H
f
(M
t
F
i
).
For more than two faults, this expression is an approximation but, as in modulation
theory, generally quite a good one. The approximation becomes worse when there
are several conicting faults, which means that there are three or more fault vectors
in about the same direction.
We can now dene the diagnosability matrix P as
P
(i,j)
= P(diagnosis i|fault f
j
), i = j
P
(j,j)
= 1

i=j
P
(i,j)
. (24)
It tells us everything about fault association probabilities for normalized faults m =
1, and the off-diagonal elements are monotonically decreasing functions of the fault
magnitude m.
Furthermore, in the classication we should allow the non-faulty class (0), where
f = 0, to decrease the false alarmrate by neglecting residual vectors, though having
12
large amplitude, being far from the known fault vectors. Consider for instance the
residual r
t
= (1, 1)
T
in Figure 2(b). This would most likely be caused by noise,
not a fault. The missed detection probabilities are computed in a similar way as
P(diagnosis 0|fault f
j
) =
1
2
erfc
_
m
j

2
_
(25a)
P
(0,0)
= 1

j
P
(0,j)
< P
FA
. (25b)
5 Example: DC motor
Consider a sampled state space model of a DC motor with continuous time transfer
function
G(s) =
1
s(s + 1)
=
1
s
2
+s
.
The state variables are angle (x
1
) and angular velocity (x
2
) of the motor. Assume the
fault is either an input voltage disturbance (f
1
) (equivalent to a torque disturbance)
or a velocity sensor offset (f
2
).
The derivation of the corresponding state space model is straightforward, and can
be found in any textbook in control theory. Sampling with sample interval T
s
= 0.4
s gives
A =
_
_
_
1 0.3297
0 0.6703
_
_
_, B
u
=
_
_
_
0.0703
0.3297
_
_
_, B
v
=
_
_
_
0.08
0.16
_
_
_, Q = 0.01
2
,
B
d
=
_
_
_
0
0
_
_
_, B
f
=
_
_
_
0.0703 0
0.3297 0
_
_
_, C =
_
_
_
1 0
0 1
_
_
_,
D
u
=
_
_
_
0
0
_
_
_, D
d
=
_
_
_
0
0
_
_
_, D
f
=
_
_
_
0 0
0 1
_
_
_, R = 0.1
2
I.
It is assumed that both x
1
and x
2
are measured. The matrices in the sliding window
model become for L = 2:
O =
_
_
_
_
_
_
_
_
_
_
1 0
0 1
1 0.3297
0 0.6703
_
_
_
_
_
_
_
_
_
_
, H
u
=
_
_
_
_
_
_
_
_
_
_
0 0
0 0
0.0703 0
0.3297 0
_
_
_
_
_
_
_
_
_
_
, H
f
=
_
_
_
_
_
_
_
_
_
_
0 0 0 0
0 1 0 0
0.0703 0 0 0
0.3297 0 0 1
_
_
_
_
_
_
_
_
_
_
,
13
and

W
T
= N
[O H
d
]
=
_
_
_
0.6930 0.1901 0.6930 0.0572
0.0405 0.5466 0.0405 0.8354
_
_
_. (26)
The residual space with structured residuals, as shown in Figure 2, is
W
T
struc
=
_
_
_
1 0.3297 1 0
0 0.6703 0 1
_
_
_. (27)
The difference of the parity spaces generated by (26) and (27), respectively, is il-
lustrated in Figure 2. The faults in the normalized parity space are not orthogonal,
but on the other hand the decision region is particularly simple.
The probability matrix (24) is here
P
(1:2,1;2)
=
_
_
_
0.995 0.005
0.005 0.995
_
_
_.
Note that this matrix is independent of the choice of original parity space (26), (27)
or if the Kalman lter approach (17) is used. By increasing the length of the sliding
window to L = 3, we get a much better performance with a probability matrix
that is very close to diagonal and a very small missed detection probability. The
condence circles of the structured residuals in Figure 3 are more separated than
the ones in Figure 2.
1 0 1 2 3 4 5
1
0
1
2
3
4
5
0 1
2
r
1
r
2
Structured Residual , L=3
(a)
1 0 1 2
0
0.5
1
1.5
2
2.5
0
1
2
r
1
r
2
Normalized Structured Residual L=3
(b)
Fig. 3. Similar to Fig. 2, but with L increased from 2 to 3. The circles are now more
separated, decreasing the risk of incorrect decisions.
Figure 4 shows a systematic evaluation of the design parameter L. Alarger Lmeans
that it takes a longer time to get a complete window with faulty data, so the delay
14
for detection should increase with L. On the other hand, the miss-classication
probabilities decrease quickly in L.
2 2.5 3 3.5 4 4.5 5 5.5 6
10
20
10
15
10
10
10
5
10
0
Sliding window length L
P
r
o
b
a
b
i
l
i
t
y
P(1|2)
P(0|1)
P(0|2)
Fig. 4. Miss-classication probabilities in diagnosis as a function of sliding window length.
As a nal illustration, one can investigate how much we lose in performance using
a cheaper velocity sensor with variance 10 instead of 1, and the result is
P
(1:2,1;2)
=
_
_
_
0.95 0.05
0.05 0.95
_
_
_.
The ten times larger miss-classication probabilities can be compensated for by
sacricing a short delay for detection and using a longer sliding window.
6 Data-driven approaches to compute the parity space
We will in this section briey outline alternative approaches to compute a corre-
spondance to a parity space residual in case of that no model is available a priori.
To simplify, no state disturbance will be included in the comparison. It is suf-
cient to obtain any residual in the parity space, normalization can then be applied
afterwards.
To implement Algorithm 1, only W, S and
i
are needed. For later comparison,
we rst give a general approach to fault detection that only depends on W, no
matter how W is computed. First the residuals are normalized by their estimated
covariance matrix. The matrix S in (20) can be computed analytically when the
model is known, but for conformity we use the same method for all approaches.
15
From a fault-free data set Z
t
with N samples, we take
r
t
= W
T
Z
t
(28a)

R =
1
N L
N

t=L+1
r
t
r
T
t
(28b)

W = W

R
1/2
(28c)
r
t
=

R
1/2
r
t
. (28d)
Here, R corresponds to W
T
SW.
For diagnosis, a data set Z
i
t
of length N
i
for each fault mode is needed. Usually,
these data sets are quite short. The fault vector is estimated using averaging of
residuals

i
=
1
N
i
L
N
i

t=L+1
r
i
t
. (29)
The approaches are sorted in ascending order of model knowledge.
6.1 System identication
The following cases of unknown model are plausible:
If the model (2) is partially given, where certain subsystems and integrators are
known, the data set Z
t
can be used to estimate the free parameters.
If only the structure of the model (2) is known, a subspace identication algo-
rithm can be used to estimate the state space matrices, followed by a prediction
error method to rene the model.
In either case, a function like pem in the system identication toolbox in Matlab
can be applied off-line to a fault-free data set Z
t
[15]. For diagnosis, the faulty data
sets Z
i
t
collected during fault i are used to estimate
i
by averaging the residual.
The on-line residual is then computed as
r
t
= N
O(

A,

C)
(I, H
u
(

A,

B
u
,

C))Z
t
. (30)
6.2 Subspace identication
If the state space model is only instrumental for diagnosis, then one can instead
estimate the parity space directly, using a certain subspace identication algorithm
16
[5]. This yields
r
t
=

N
O
(I,

H
u
)Z
t
. (31)
The key step is a principal component analysis (PCA) of a product of Ln
x
Hankel
matrices of past and future data:
Z
f
Z
T
p
=
_
_
_
Y
f
U
f
_
_
_
_
Y
T
p
U
T
p
_
= PT +

P

T,
where
Y
f
=
_
_
_
_
_
_
_
_
_
_
y(t) y(t + 1) . . . y(t +n
x
1)
y(t + 1) y(t + 2) . . . y(t +n
x
)
.
.
.
.
.
.
y(t +L 1) y(t +L) . . . y(t +L +n
x
2)
_
_
_
_
_
_
_
_
_
_
Y
p
=
_
_
_
_
_
_
_
_
_
_
y(t L) y(t L + 1) . . . y(t L +n
x
1)
y(t L + 1) y(t L + 2) . . . y(t L +n
x
)
.
.
.
.
.
.
y(t 1) y(t) . . . y(t +n
x
2)
_
_
_
_
_
_
_
_
_
_
,
and similarly for U
f
and U
p
. The data window is in this notation a bit different from
before, in that L past (index p) and L+n
x
future (index f) data are used, rather than
just L past data.
The projection matrices are then computed from

P as

P =
_
_
_

P
y

P
u
_
_
_

O
s
=

P

P
T
y

H
u
=

P
T
u
,
from which we can take

N
O
=

P
y
(32)

H
u
= (

P
y

P
T
y
)
1

P
y

P
T
u
, (33)
and these estimates are plugged into the residual generator (31). Afault-free data set
Z
t
provides an estimate of S, while the faulty data sets Z
i
t
can be used to estimate

i
.
17
6.3 PCA
The model-free approach is to use principal component analysis (PCA) [6,17] to
split up the data into two parts, model

Z
t
and residual

Z
t
:
Z
t
=
_
_
_
Y
t
U
t
_
_
_ =

Z
t
+

Z
t
= P
x
x
t
+P
r
r
t
. (34)
The notation has been chosen to show the resemblence with the model-based ap-
proach, the model depends on the state x
t
and the other part is due to the residual
r
t
. We rst describe how to compute this representation, and then comment on
properties, relations and applications.
A singular value decomposition (SVD) is applied to the estimated covariance ma-
trix of Z
t
as follows:

R
Z
=
1
N L
N

t=L+1
Z
t
Z
T
t
= PDP
T
. (35)
Here P is a square unitary matrix, that is P
T
P = PP
T
= I, and D is a diagonal
matrix containing the singular values of

R
Z
. We will split the SVD into two parts
as
P =
_
P
x
P
r
_
, D =
_
_
_
D
x
0
0 D
r
_
_
_ (36)
The split assigns the n
x
largest singular values to the model, and the other n
r
sin-
gular values are assumed to belong to the residual space. By construction, we have
P
T
x
P
x
= I
nx
, P
T
x
P
r
= 0, P
T
r
P
x
= 0, P
T
r
P
r
= I
nr
and P
x
P
T
x
+ P
r
P
T
r
= I
nx+nr
.
Using these properties, the split in (34) is computed by

Z
t
= P
x
P
T
x
Z
t
(37a)

Z
t
= P
r
P
T
r
Z
t
. (37b)
For fault identication, we take the residuals
r
t
= P
T
r
Z
t
(38)
r
t
= D
1/2
r
P
T
r
Z
t
, (39)
where the transformation implies Cov(r
t
) = I in the limit N .
What is the relation to the parity space? To answer this, rst use the model (3) in
18
(34):
Z
t
=
_
_
_
Y
t
U
t
_
_
_ =
_
_
_
O
0
_
_
_x
tL+1
+
_
_
_
H
f
, H
u
, H
v
, I
0, I, 0, 0
_
_
_
_
_
_
_
_
_
_
_
_
_
F
U
V
E
_
_
_
_
_
_
_
_
_
_
= P
x
x
t
+P
r
r
t
. (40)
We conclude the following:
The split of eigenvalues should give rank(P
x
) = n
x
.
The inputs in the data are revealed by zero rows in P
x
, so causality is cleared out.
The range P
x
is the same as the range of O, if these zero rows are omitted.
The residual part must also explain dynamics in the input data, and changes in
input dynamics can be mixed up with system changes.
It cannot be guaranteed that the eigenvalues of the system are larger than the
other ones, so the PCA split based on sorted eigenvalues can be dubious.
Despite the two last points, the examples to follow demonstrate excellent perfor-
mance, though these points should be kept in mind.
7 Example: DC motor revisited
Let us return to the DC motor example in Section 5, where the parity space ap-
proach was investigated. We there got the null space (26), which gives the following
data projection matrix:
N
O
(I, H
u
) =
_
_
_
0.6930 0.1901 0.6930 0.0572 0.0299 0
0.0405 0.5466 0.0405 0.8354 0.2726 0
_
_
_ (41)
7.1 Identication approach
The state space matrices (A, B
u
, C) are estimated from fault-free data, and then the
parity space is computed from these. The numerical result is
N
O
(I, H
u
) =
_
_
_
0.7059 0.0358 0.7066 0.0320 0.0017 0
0.0009 0.6664 0.0008 0.7456 0.0721 0
_
_
_ (42)
which is close to the analytical projection in (41).
19
7.2 Subspace identication approach
The result should be identical to the one in the previous subsection, if the same
subspace approach is used. The main difference is that the state space matrices are
never estimated explicitly.
7.3 PCA approach
The SVD of estimated data covariance matrix Cov(Z
t
) gives the singular values
of (35)
diag(D) = (1.1208, 0.8136, 0.1860, 0.0475, 0.0105, 0.0088).
and projection matrix
P =
_
P
x
P
r
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0.0035 0.0687 0.7008 0.0560 0.6109 0.3575
0.0092 0.0510 0.0043 0.7203 0.2995 0.6235
0.0028 0.0650 0.7070 0.0468 0.6106 0.3478
0.0594 0.0169 0.0101 0.6886 0.4037 0.5992
0.7137 0.6940 0.0651 0.0073 0.0359 0.0589
0.6979 0.7117 0.0682 0.0412 0.0072 0.0042
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
The question is howto split between model and residual. That is, howmany columns
n
x
belongs to P
x
? This choice of n
x
is not a clear cut, since there is no obvious
threshold for the singular values. n
x
=2, 3 or 4 are all plausible choices. One might
rst try n
x
= 4 in the light of the parity space approach above, and the theoretical
dimension of the residual in (12). This would be the direct counterpart to the parity
space. We then take
W = P
r
D
1/2
r
.
In Section 7.5, we investigate what happens for the choice n
x
= 2.
7.4 Comparison
As mentioned, only the choice of W differs between the different approaches. To
quantify the similarity of two approaches, we measure the closeness of two sub-
spaces W
1
and W
2
using the gap metric as a generalization of the angle between
vectors.
20
2 0 2 4 6
2
1
0
1
2
3
4
5
0
1
2
r
1
r
2
Parity space residuals
(a)
8 6 4 2 0 2
2
1
0
1
2
3
4
5
6
0
1
2
r
1
r
2
PCA residuals
(b)
Fig. 5. Convex hull and covariance for residuals generated from the parity space (a) and
PCA (b) two-dimensional (n = 4) residuals when no fault, fault 1 and fault 2 is present,
respectively.
We x the false alarm rate (FAR) to 0.05, and compute the threshold as
h : #(g
t
> h) = N FAR,
on the fault free data {z
0
t
}
N
0
t=1
. We can then evaluate isolation performance experi-
mentally as
p
i
(m) = P(g
t
> h|fault i of magnitude m)
on the data sets {z
i
t
}
N
i
t=1
. The thresholds and achieved FAR are summarized in Table
1. Figure 5 shows the residuals from parity space and PCA design, respectively.
Figure 6 shows p
i
(m). These plots are quite similar and, as can be expected, the
more prior knowledge the better performance, although the difference is minor.
Table 1
Comparison of parameters. The theoretical
2
(n
r
) thresholds are 5.99 (n
r
= 2) and 9.49
(n
r
= 4), respectively.
Method Gap metric Threshold false alarm rate
True parity space 0 5.76 0.052
System identication 0.0066 5.79 0.052
PCA 2D residual 0.0386 6.02 0.052
PCA 4D residual 14.1 0.062
21
0 1 2 3 4 5
0
0.5
1
Fault magnitude
P
(
d
e
t
e
c
t
i
o
n

|

f
a
u
l
t

1
)
True parity space
System identification
PCA
0 1 2 3 4 5
0
0.5
1
Fault magnitude
P
(
d
e
t
e
c
t
i
o
n

|

f
a
u
l
t

2
)
True parity space
System identification
PCA
(a)
0 1 2 3 4 5
0
0.5
1
Fault magnitude
P
(
d
e
t
e
c
t
i
o
n

|

f
a
u
l
t

1
)
True parity space
System identification
PCA
0 1 2 3 4 5
0
0.5
1
Fault magnitude
P
(
d
e
t
e
c
t
i
o
n

|

f
a
u
l
t

2
)
True parity space
System identification
PCA
(b)
Fig. 6. Empirical probability p
i
(m) of detection of no fault, fault 1 and fault 2 is present,
respectively. For PCA, n
x
= 4 in (a) and n
x
= 2 in (b) in (36), respectively.
7.5 Extending the dimension of the PCA residual
In the PCA approach, the split in model and residual was not a clear cut. Choosing
n
x
= 2 yields a four-dimensional residual, and this reveals a very interesting fact.
According to Figure 6(b), the model-free PCA approach outperforms the model-
based parity space approach! The only explanation for this, is that there are sub-
spaces in the data that are almost in the parity space, but not completely. The design
of less conservative parity spaces might be an interesting research area. That is, one
should check the singular values of the observability matrix O
s
and include almost
singular directions as well. This means that the residuals will under the no-fault
assumption normally be somewhat larger (so the threshold has to be increased to
keep the false alarm rate), but the detectability increases. The size of the almost
parity space should be optimized to maximize isolation performance.
8 Simulation example: F16 vertical dynamics
The fault detection algorithm is applied to a model of the vertical dynamics of
an F-16 aircraft. The model is taken from [10], which is a sampled version of a
model in [16]. Preliminary results are also reported in [13]. The involved signals
and their generation in the simulations are summarized in Table 2. Input, state and
measurement noises are all simulated as independent Gaussian variables, whose
variance is given in the same table.
We have the following numerical values for the matrices in the model (2):
22
Table 2
Signals in the F16 simulation study. Size means the variance for the inputs, measurement
noise variance for the outputs, state noise variance for the states and constant magnitude
for the faults, respectively.
Signal Not. Meaning Size
Inputs u
1
spoiler angle (0.1 deg) 1
u
2
forward accelerations (m/s
2
) 1
u
3
elevator angle (deg) 1
Outputs y
1
relative altitude (m) 10
4
y
2
forward speed (m/s) 10
6
y
3
pitch angle (deg) 10
6
Disturb. d
1
speed disturbance -
States x
1
altitude (m) 10
4
x
2
forward speed (m/s) 10
4
x
3
pitch angle (deg) 10
4
x
4
pitch rate (deg/s) 10
4
x
5
vertical speed (deg/s) 10
4
Faults f
1
spoiler angle actuator 0.5
f
2
forward acceleration actuator 0.1
f
3
elevator angle actuator 1
f
4
relative altitude sensor 1
f
5
forward speed sensor 1
f
6
pitch angle sensor 1
A =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0.0014 0.1133 0.0004 0.0997
0 0.9945 0.0171 0.0005 0.0070
0 0.0003 1.0000 0.0957 0.0049
0 0.0061 0.0000 0.9130 0.0966
0 0.0286 0.0002 0.1004 0.9879
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(43a)
23
B
u
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0.0078 0.0000 0.0003
0.0115 0.0997 0.0000
0.0212 0.0000 0.0081
0.4150 0.0003 0.1589
0.1794 0.0014 0.0158
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(43b)
B
d
=
_
0 1 0 0 0
_
T
(43c)
B
f
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0.0078 0.0000 0.0003 0 0 0
0.0115 0.0997 0.0000 0 0 0
0.0212 0.0000 0.0081 0 0 0
0.4150 0.0003 0.1589 0 0 0
0.1794 0.0014 0.0158 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(43d)
C =
_
_
_
_
_
_
_
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
_
_
_
_
_
_
_
, D
f
=
_
_
_
_
_
_
_
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
_
_
_
_
_
_
_
(43e)
D
u
and D
d
are zero matrices of appropriate dimensions.
Residuals were computed for the fault-free case, and for the six different single
faults described in Table 2, according to Algorithm 1, the stochastic parity space
approach. The time window L was selected to 3. This gives a four-dimensional
(n
r
= Ln
y
n
x
= 3 3 5 = 4) residual, which is illustrated in Figure 7.
It is clear from the gure that some of the faults are easy to detect and isolate, while
some (where the residuals are closer to the origin) are harder. Fault f
4
, fault in the
relative altitude sensor, gives a zero residual, so it cannot be detected. The threshold
is chosen to h = 9.3 to get a false alarm rate of 0.05. The probability of correct
isolation is in this simulation and for this threshold, 1, 1, 0.96, 0.05, 0.72, 1,
respectively. That is, fault 4 is not possible to isolate or detect (P
D
= P
FA
= 0.05).
Note that the fault size, as well as the noise level, will affect the detectability and
isolability of the faults. This can be analyzed using Algorithm 2.
The probability of incorrect diagnosis, Equation (23), can be calculated analyti-
cally. The matrix below contains these probabilities, where
P
(i,j)
= prob(diagnosis i|fault j). (44)
24
8 6 4 2 0 2 4 6 8
6
4
2
0
2
4
6
0
1
2
3
4
5
6
r
1
r
2
Parity space residuals
(a)
3 2 1 0 1 2 3 4 5
3
2
1
0
1
2
3
0
1
2
3
4
5 6
r
3
r
4
Parity space residuals
(b)
Fig. 7. Illustration of the four-dimensional residuals from parity space for no fault (0) and
fault 16, respectively. The mean value, estimated covariance matrix and convex hull of
each group of residuals are illustrated. Fault 4 is obviously not diagnosable, and residual r
4
contains almost no information.
3 2 1 0 1 2 3
2
1.5
1
0.5
0
0.5
1
1.5
2
2.5
0
1
2
3
4
5
6
r
1
r
2
Parity space theoretical residuals
(a)
2 1 0 1 2 3 4
2
1
0
1
2
3
0
1
2
3
4
5
6
r
3
r
4
Parity space theoretical residuals
(b)
Fig. 8. Illustration of the residuals from parity space for no fault (0) and fault 16, respec-
tively, but here in another basis. This conrms that fault 4 is not diagnosable. The decision
lines for fault isolation are indicated.
The residual for fault f
4
is zero, the relative altitude fault cannot be detected simply
because we do not measure absolute height. This means that probability of incorrect
25
4 2 0 2 4
4
3
2
1
0
1
2
3
0
1
2
3
4
5
6
r
1
r
3
PCA residuals
(a)
8 6 4 2 0 2 4 6 8
6
4
2
0
2
4
6
0
1
2
3
4
5
6
r
2
r
4
PCA residuals
(b)
Fig. 9. Illustration of the residuals from PCA for no fault (0) and fault 16, respectively.
The mean value, estimated covariance matrix and convex hull of each group of residuals
are illustrated. These can however not directly be compared to the residual components in
Figures 7 and 8 due to that the bases are different. Again, fault 4 is not diagnosable, and
here residual r
1
contains little information.
as well as correct diagnosis all can be considered zero (P
(i,4)
and P
(4,i)
).
P =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1.0000 0.0000 0.0000 0 0.0000 0.0000
0.0000 0.5980 0.0000 0 0.4020 0.0001
0.0000 0.0000 0.9999 0 0.0001 0.0000
0 0 0 0 0 0
0.0000 0.4020 0.0001 0 0.5415 0.0564
0.0000 0.0001 0.0000 0 0.0564 0.9436
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(45)
The probability for incorrect diagnosis is very small in most cases. The case that
poses the most problems is to distinguish faults f
2
and f
5
. These two faults are also
very close in Figure 8, in the sense that they are almost parallel. Yet, the interesting
fact is that more faults than residuals actually can be isolated.
Simulations of PCA are shown in Figure 9. The dimension n
r
of the residuals
(the dimension of P
r
in Equation (36)) is selected to 4, to facilitate a comparison
with the parity space approach. Figure 9 shows the residuals. Note that the residual
components are not the same as in the parity space approach in Figure 7, since we
have another basis for the residual space. The threshold is chosen to h = 9.7 to get
a false alarm rate of 0.05. The probability of correct isolation is in this simulation
and this threshold 1, 1, 0.96, 0.05, 0.67, 1, respectively. That is, compared to the
parity space approach these are almost the same. There is only a slightly worse
performance for isolating fault 5.
26
The residual component r
1
from the PCA method is very small for all faults. This
suggests that it does not contain information about the faults, and that the residual
space is indeed only three-dimensional. From the simulations and analysis of the
stochastic parity space approach, it appears that the residual component r
4
plays a
similar role, and contain very little information for fault isolation.
9 Conclusions
We have here introduced the normalized parity residual space for additive faults in
linear stochastic systems. It was shown how this parity space can be derived in a
Kalman lter framework. We have derived explicit formulas for incorrect diagnosis
probabilities, and these depend critically on the fault to noise ratio. An example
illustrated how the diagnosability matrix can be used as a design tool with respect
to sensor quality and design parameters.
Further, several approaches to fault detection and isolation were compared, where
parity space approach and principle components analysis (PCA) are the conceptu-
ally most interesting ones. A detailed interpretation of PCA analysis in terms of
parity space notation was given. The assumptions, advantages and drawbacks of
these approaches are summarized below:
The parity space approach starts with a state space model of the system. The use
of prior model knowledge improves the performance compared to PCA. With a
partially known model, system identication techniques can be applied. Gener-
ally, the more prior structural knowledge, the better performance. Another ad-
vantage is that a priori probabilities of incorrect diagnosis can be calculated.
PCA requires absolutely no prior knowledge, not even causality (which ones of
the known signals in z
t
are inputs u
t
and outputs y
t
, respectively). The perfor-
mance has been demonstrated to be only slightly worse compared to the case of
perfect model knowledge. Determination of the state dimension is one critical
step in PCA, and it is based on the singular values of the data correlation matrix.
Over-estimating the state dimension gives too few residuals which decreases per-
formance. Under-estimating state dimension can give very good performance, in
that new residuals almost belonging to the parity space are used for detection and
diagnosis. One major risk here, is that when the system enters a new operating
point which was never reached in the training data, this residual might increase
in magnitude and cause a false alarm.
27
References
[1] M. Basseville and I.V. Nikiforov. Detection of abrupt changes: theory and
application. Information and system science series. Prentice Hall, Englewood Cliffs,
NJ., 1993.
[2] L.H. Chiang, E.L.Russell, and R.D. Braatz. Fault Detection and Diagnosis in
Industrial Systems. Springer, 2001.
[3] E.Y. Chow and A.S. Willsky. Analytical redundancy and the design of robust failure
detection systems. IEEE Transactions on Automatic Control, 29(7):603614, 1984.
[4] X. Ding, L. Guo, and T. Jeinsch. A characterization of parity space and its application
to robust fault detection. IEEE Transactions on Automatic Control, 44(2):337343,
1999.
[5] R. Dunia and S.J. Qin. Joint diagnosis of process and sensor faults using principal
component analysis. Control Engineering Practice, 6:457469, 1998.
[6] R. Dunia, S.J. Qin, T.F. Edgar, and T.J. McAvoy. Use of principal component analysis
for sensor fault identication. Computers & Chemical Engineering, 20(971):S713
S718, May 1996. Ett av Joe Qins tidigare papper om PCA.
[7] J. Gertler. Fault detection and isolation using parity relations. Control Engineering
Practice, 5(5):653661, 1997.
[8] J.J. Gertler. Fault Detection and Diagnosis in Engineering Systems. Marcel Dekker,
Inc, 1998.
[9] C.F. van Loan G.H. Golub. Matrix Computations. John Hopkins, third edition edition,
1996.
[10] F. Gustafsson. Adaptive ltering and change detection. John Wiley & Sons, Ltd,
2000.
[11] F. Gustafsson. Stochastic observability and fault diagnosis of additive changes in
state space models. In IEEE Conference on Acoustics, Speech and Signal Processing
(ICASSP), pages 28332836, Salt Lake City, UT, 2001.
[12] F. Gustafsson. Stochastic fault diagnosability in parity spaces. In International
Federation of Automatic Control (IFAC) World Congress, Barcelona, July 2002.
[13] A. Hagenblad, F. Gustafsson, and I. Klein. A comparison of two methods for
stochastic fault detection: the parity space approach and principal component analysis.
In IFAC Symposium on System Identication, Rotterdam, NL, 2003.
[14] J.Y. Keller. Fault isolation lter design for linear stochastic systems. Automatica,
35(10):17011706, 1999.
[15] L. Ljung. System identication, Theory for the user. Prentice Hall, Englewood Cliffs,
NJ, second edition, 1999.
28
[16] J.M. Maciejowski. Multivariable feedback design. Addison Wesley, 1989.
[17] S.J. Qin and W. Li. Detection, identication and reconstruction of faulty sensors with
maximized sensitivity. AICHE Journal, 45:19631976, 1999.
29

You might also like