Semeq PDF
Semeq PDF
Semeq PDF
Amaury B. Andre
Eduardo Beltrame
Jacques Wainer
amaury.andre@semeq.com.br
eduardo@semeq.com.br
wainer@ic.unicamp.br
SEMEQ
SEMEQ
Av Laranjeiras, 2392
13485-254 Limeira - SP
Brazil
Av Laranjeiras, 2392
13485-254 Limeira - SP
Brazil
University of Campinas
Computing Institute
Av. Albert Einstein 1251
13083-852 Campinas - SP
Brazil
Corresponding author:
Jacques Wainer
wainer@ic.unicamp.br
University of Campinas
Av. Albert Einstein 1251
13083-852 Campinas - SP - Brazil
Phone: 55(19)3521-5871
fax: 55(19)3521-5847
Statement: this paper is not been published elsewhere and has not been
submitted simultaneously for publication elsewhere.
Introduction
discard the indication of failure. So, in this work, a very low false negative rate
is considerably more important than a high accuracy rate.
1.1
Related work
In the related works, machine learning techniques such as support vector machines and artificial neural networks (ANN) were also applied to detect faults in
different equipments, such as induction motors (Casimira et al., 2006), rolling
element bearings (Samanta et al., 2003; Samanta and Nataraj, 2009), gear couplings (Samanta, 2004), turbine blades (Kuo, 1995), compressors (Yang et al.,
2005), among others.
Samanta et al. (2003) compared three ANNs, a multilayer perceptron,
a radial basis function network, and a probabilistic neural network, combined
with a genetic algorithm for feature selection to detect faults in bearings. The
networks were capable of detecting bearing faults with 99,83%, 87,5%, and
96,31% accuracy respectively. But no differentiation was made between false
positives and false negatives. The algorithms were tested for a small number of
cases (72 samples for training and 72 samples for testing).
Samanta (2004) worked with gear couplings and presented a comparison
of ANN and SVM. In the paper, 266 samples were used for training both classifiers (ANN and SVM), and 126 samples for testing the accuracy of classification.
The accuracy was close to 100%, but again no differentiation was made between
false positives and false negatives. Also, the main focus of the work were the
results obtained by the application of genetic algorithms for feature selection.
Samanta and Nataraj (2009) also dealt with bearing condition using two
classifiers, ANN and SVM, but particle swarm optimization algorithm was used
for feature selection. The training and testing data set used were larger than
for the previous works. The false negative rate was 4% for the SVM and 11%
1.2
This paper is organized as follows. Section 2 presents the fundamental concepts; section 3 presents the details of our technique for the classification task.
Section 4 presents the results of the application of the classifier, and section 5
discusses the limitations and contributions of this work.
2
2.1
Fundamental concepts
Vibration Analysis
The concept of vibration signature analysis is simple: machines in good condition generally tend to have a fairly stable vibration pattern, which can be
considered as a signature (Randall, 1975). Changes in the internal conditions
are often reflected as changes in the vibration pattern.
Piezoelectric transducers (accelerometers) are used to measure vibration.
The result of data acquisition is the acceleration time wave.
For any machine, many acquisitions are made: in general, each equipment
is measured in its three dimensions: horizontal, vertical and axial directions,
both on the coupling and non-coupling sides. For example, an induction motor
is usually measured in the five positions detailed in figure 1.
Vibration analysis is commonly done by examining individual frequencies present in the signal. These frequencies correspond to certain mechanical
components (for example, the various pieces that make up a rolling-element
2.2
Feature Extraction
As mentioned above, each measure point generates three spectra: the velocity,
the acceleration, and the envelope spectrum. It is unfeasible to use the whole
spectrum for classification, thus one usually computes single numbers that summarize characteristics of part or of the whole spectrum. These numbers are
called features.
6
In this research we used the following features, computed from the time
domain and from the frequency domain data of each spectrum:
the root mean square (rms) (from time domain data) - rms
peak to peak value (from time domain data) - pkpk
the spectrum average (from frequency domain data) - avg
the spectrum standard deviation (from frequency domain) - std
the number of high peaks (peaks that are higher than the average plus
three times the standard deviation from frequency domain) - hp
the cumulative sum of the high peaks (from frequency domain) - cshp
In some cases, the detection of a fault require the comparisson of the
current spectra with previously collected data. In ths research we used both
the data collected in the previous month, and the reference for that spectrum.
The reference is the data collected when the machine was known to be in good
conditions, either because it was new or it had just undergone maintenance.
The following features are calculated as the difference from the current
feature value, to the previous month and to the reference value:
the difference from the previous month rms - rmsa
the difference from the previous month pkpk - pkpka
the difference from the previous month number of high peaks - hpa
the difference from the previous month cumulative sum of the high peaks
- cshpa
the difference from the reference rms - rmsr
the difference from the reference pkpk - pkpkr
7
2.3
SVM classifier
Support vector machine (SVM) is a statistical learning technique used in various classification problems (Cristianini and Shawe-Taylor, 2000). The standard
SVM deals with binary classification problems, and its basic idea is to determine
a linear boundary between the two different classes. The boudary (or separating
hyperplane) is oriented in a way that its distance to the nearest data points in
each class is maximized. The nearest data points are known as support vectors.
Figure 2 illustrates the SVM: the boundary is represented by the solid line and
the support vectors are emphasized. The margin of the SVM is the region from
the hyperplanes that contain the suppor vectors in each side (in dashed line).
w x b = 0,
(1)
where w is the vector that defines the boundary, x is the input vector of dimension N , and b is a scalar threshold. The distance from the boundary to the
margins is:
d=
1
kwk
(2)
The optimal linear boundary that separates the two classes is calculated
using the following optimization problem:
minimize
: kwk
subject to
: yi (w xi + b) 1
PM
minimize
1
2
subject to
yi (w xi + b) 1 i
kwk + C
i=1 i
(3)
/2 2
(4)
For this research, the cost of a false negative is far superior than the cost of a
false positive. We call this situation the unbalanced cost of errors.
The standard SVM technique assumes balanced cost. Equation 3 makes
no distinction between the two forms of misclassifications: a positive example
being classified as negative (false negative), and the other way around, both are
treated equaly (the C and i terms)
We extended the standard SVM in two different ways to account for the
unbalanced cost. The first way is to alter equation 3 to add different costs for
false negatives and false positives. Thus, we make the cost of false positives C,
and of false negatives C, where > 1. The new equation is:
2
minimize
1
2
subject to
yi (w xi + b) 1 i
kwk + C
iis
a false positive i + C
j is
a false negative j
(5)
This change would make it likely that the false negatives, if they exists, will be
closer to the separating hyperplane than the false positives, since their cost is
10
3.1
SVM+kNN
Our second extension to a standard SVM is to more carefully consider the points
that fall close to the separating boundary, since it is where the false negatives
are likely to appear.
The k-nearest-neighbors (kNN) algorithm can be described as follows:
a new sample is classified in accordance to a function of its k-most closest
neighbors classification. This function can be, for example, a majority polling,
or a consensus criteria. The main parameter of the algorithm is the number of
neighbors k that should be checked to decide a classification.
For the data points inside a region close to the hyperplane, defined as the
region whose distance to the hyperplane is t d, where d is the margin of the
SVM algorithm (equation 2), we apply a kNN consensus criteria, in which a new
data is classified as normal if and only if all of its k neighbors in the training
set are also normal. If at least one of the k-nearest neighbors is classified as
defective, the new data point also received the defective classification.
Figure 3 illustrates the algorithm. The star is a new data point. Since it
lies close to the hyperplane, that is, within the dotted lines, it will not follow
the standard SVM decision procedure and will not receive the classification
of normal (white circles). We will instead use the kNN consensus criteria.
Let assume that that k=5, that is, we will pool all 5 nearest neighbors of the
new data point (the points within the circle) and if any of them is classified as
defective (filled circles), the new data point will receive a defective classification.
This strategy combines the two main advantages of each algorithm: the
generalization efficiency of SVM, specially for data points far away from the
hyperplane, and the complex boundary that the kNN algorithm creates close to
11
defective
normal
4
4.1
Results
Feature and parameter selection
Even though we do not work with the whole spectrum, but with 14 features
described above, not all features may be useful for the classification task. In fact,
features that are not useful for the classification task may hinder it (Guyon and
Elisseeff, 2003). Thus selecting which of the features are relevant is an important
step in any automatic classification task. Many techniques have been developed
for feature selection (for example Guyon and Elisseeff (2003) and other papers
in that JMLR Special issue on the topic). In this research we used a rather
traditional method of forward selection by cross-validation.
The method starts by selecting a subset of the data and a quality metric
for compassion between the different classifiers. Then, using only one of the
12
features at a time, we compute the quality metric for each single feature classifier
using cross-validation. The feature that maximizes the metric is selected. In
the next interaction, all the features that were not included in the previous
interactions, are added, one at a time, and the feature that again maximizes
the metric is kept. The process repeats until the addition of any of the other
features decreases the metric. The final set of featrues is then set of features of
the previous interaction.
Besides selecting the features, our SVM+KNN classifier has the following
parameters whose values need to be determined:
C from equation 3
from the RBF kernel function 4
the multiplying parameter for the cost of false negatives
k from the kNN algorithm
t the proportion of the margin in which the KNN algorithm will be used
The determination of the parameters of the classifier was done by brute force
exploration of a few grid points. For each combination of features, using cross
validation, we determined the quality metric of the classifier for all combinations
of the 5 different values for each parameter.
For the feature selection algorithm we used a set of 1000 spectra from
induction motors, equally divided into normal and defective. For the quality
metric we used a variation of accuracy: if there was any case of false negatives we
would say that the quality of the classifier was 0, if there was no false negatives,
we measured the accuracy, that is the number of correct classifications divided
by the total number of examples.
The results of the rounds of feature selection are displayed in Table 1.
Each row in the table represents the results for one round in the feature selection
13
algorithm, and the number in bold face is the best result for each round. Some
classifiers have 0% accuracy because they generated false negatives. In the first
column, there is the indication of the features already selected. The iterative
procedure stopped at the sixth interaction because the accuracy dropped with
the addition of any new feature.
Feature
1
rms
78.3
1
1,2
1,2,4
1,2,4,7
1,2,4,7,11
2
pkpk
62.7
77.7
3
hp
49
75.7
78
71.7
78.7
73.3
4
cshp
0
76.7
78.8
5
avg
43
0
0
73.7
73
77.7
6
dev
0
62.3
68.3
74.3
69.7
70.3
7
rmsa
0
77.7
76
79.7
8
pkpka
71
77.3
78.7
76.7
77.7
80.7
9
hpa
68.7
76
71.7
70
78.3
75.7
10
cshpa
0
68
75
72.3
75
70.3
11
msr
0
0
78.7
78
81.3
4.2
In this section we present the results of applying the SVM+KNN classifier with
the features and parameters defined above. These results are a sample of the
14
12
pkpkr
0
62.7
68
74.7
75.7
72.7
13
hpr
0
66.3
71
70.7
66
76.7
14
cshpr
0
0
74.3
76.3
75.3
78
SVM+KNN classifier in real life use. We used data sampled from 11 months
of operations from SEMEQ to train the classifier, and used the classifier on all
data from the 12th month to evaluate the false negative rate and accuracy. The
data is separated by type of machine and by the type of spectrum. The training
set was composed of the same number of normal and defective spectra (for each
machine and spectrum types). The testing set was the whole month data.
Table 2 shows the results for induction motors. The first column indicates
the type of spectra, the second the number of spectra used in the training set,
sampled from the eleven months of operations and equally divided into normal
and defective, the third column shows the number of spectra used in the test set,
the fourth column the number of false positives, that is the number of spectra
incorrectly labeled as defective by the classifier, the fifth column the number of
false negatives, and the sixth column the accuracy of the classifier (number of
correct labeled examples divided by the number of examples used in the test
set).
Spectrum
velocity
acceleration
envelope
Training
Set
1000
1000
1000
Testing
Set
1223
1058
1051
False
Positives
329
211
288
False
Negatives
0
0
0
Average
Accuracy (%)
73.1
80.1
72.6
75.3
There has been some tradition in the use of statistical learning techniques
to automatically detect indications of a developing failure from spectra data
15
Spectrum
velocity
acceleration
envelope
Training
Set
791
629
320
Testing
Set
790
876
626
False
Positives
188
171
89
False
Negatives
0
0
0
Average
Accuracy (%)
76.2
80.5
85.8
80.8
Training
Set
40
774
232
Testing
Set
1003
1023
1009
False
Positives
84
144
260
False
Negatives
0
0
0
Average
Accuracy (%)
91.6
85.9
74.2
83.9
Spectrum
velocity
acceleration
envelope
Training
Set
749
910
710
Testing
Set
1028
1037
1025
False
Positives
184
318
207
False
Negatives
0
0
0
Average
Accuracy (%)
82.1
69.3
79.8
77.1
17
in order to more precisely associate the fault in the machine to one or few of its
spectra would require the help of an specialist. We have not yet determined if
the cost of such enterprise is worth for the project.
By the time of the writing of this paper, the system described is in daily
operation at SEMEQ for one year.
References
Casimira, R., Boutleuxa, E., Clercb, G., and Yahouib, A. (2006). The use of
features selection and nearest neighbors rule for faults diagnostic in induction
motors. Engineering Applications of Artificial Intelligence, 19(2):169177.
Chang, C.-C. and Lin, C.-J. (2001). Libsvm: a library for support vector machines. http://www.csie.ntu.edu.tw/ cjlin/libsvm.
Cristianini, N. and Shawe-Taylor, N. (2000). An Introduction to Support Vector
Machines. Cambridge University Press, Cambridge.
Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature
selection. Journal of Machine Learning Research, (3):11571182.
Kuo, R. J. (1995). Intelligent diagnosis for turbine blade faults using artificial neural networks and fuzzy logic. Engineering Applications of Artificial
Intelligence, 8(1):2534.
Mobley, R. K. (2002). An introduction to predictive maintenance. Butterworth
Heinemann.
Randall, R. B. (1975). Vibration signature analysis - techniques and instrument
systems. Noise Control and Vibration Reduction, 6:8189.
18
Rojas, A. and Nandi, A. K. (2006). Practical scheme for fast detection and
classification of rolling-element bearing faults using support vector machines.
Mechanical Systems and Signal Processing, 20(7):15231536.
Samanta, B. (2004). Gear fault detection using artificial neural networks and
support vector machines with genetic algorithms. Mechanical Systems and
Signal Processing, 18(3):625644.
Samanta, B., Al-Balushi, K. R., and Al-Araimi, S. A. (2003). Artificial neural
networks and support vector machines with genetic algorithm for bearing fault
detection. Engineering Applications of Artificial Intelligence, 16(7-8):657665.
Samanta, B. and Nataraj, C. (2009). Use of particle swarm optimization for
machinery fault detection. Engineering Applications of Artificial Intelligence,
18(2):308316.
Widodo, A. and Yang, B.-S. (2007). Application of nonlinear feature extraction
and support vector machines for fault diagnosis of induction motors. Expert
Systems with Applications, 33(1):241250.
Yang, B.-S., Hwang, W.-W., Kim, D.-J., and Tan, A. C. (2005). Condition
classification of small reciprocating compressor for refrigerators using artificial
neural networks and support vector machines. Mechanical Systems and Signal
Processing, 19(2):371390.
19