10 1016@j Measurement 2019 107377

Measurement 152 (2020) 107377
Contents lists available at ScienceDirect
Measurement
journal homepage: www.elsevier.com/locate/measurement
Machinery fault diagnosis with imbalanced data using deep generative

adversarial networks
Wei Zhang a,b, Xiang Li b,c,d,⇑, Xiao-Dong Jia d, Hui Ma b,e, Zhong Luo b,e, Xu Li f
a
School of Aerospace Engineering, Shenyang Aerospace University, Shenyang 110136, China
b
Key Laboratory of Vibration and Control of Aero-Propulsion System Ministry of Education, Northeastern University, Shenyang 110819, China
c
College of Sciences, Northeastern University, Shenyang 110819, China
d
NSF I/UCR Center for Intelligent Maintenance Systems, Department of Mechanical Engineering, University of Cincinnati, Cincinnati 45221, USA
e
School of Mechanical Engineering and Automation, Northeastern University, Shenyang 110819, China
f
State Key Laboratory of Rolling and Automation, Northeastern University, Shenyang 110819, China
a r t i c l e i n f o a b s t r a c t
Article history: Despite the recent advances of intelligent data-driven fault diagnosis methods on rotating machines, bal-
Received 21 October 2019 anced training data for different machine health conditions are assumed in most studies. However, the
Received in revised form 21 November 2019 signals in machine faulty states are usually difficult and expensive to collect, resulting in imbalanced
Accepted 9 December 2019
training dataset in most cases. That significantly deteriorates the effectiveness of the existing data-
Available online 14 December 2019
driven approaches. This paper proposes a deep learning-based fault diagnosis method to address the
imbalanced data problem by explicitly creating additional training data. Generative adversarial networks
Keywords:
are firstly used to learn the mapping between the distributions of noise and real machinery temporal
Fault diagnosis
Imbalanced data
vibration data, and additional realistic fake samples can be generated to balance and further expand
Deep learning the available dataset afterwards. Through experiments on two rotating machinery datasets, it is validated
Rotating machines that the data-driven methods can significantly benefit from the data augmentation, and the proposed
Generative adversarial networks method offers a promising tool on fault diagnosis with imbalanced training data.
Ó 2019 Elsevier Ltd. All rights reserved.
1. Introduction this assumption usually does not hold in real industries, since it is
generally difficult and expensive to collect data in machine faulty
In the recent years, with the development of the advanced conditions, despite the easy data collection in machine healthy
rotating machines in modern industries such as intelligent manu- state. Consequently, the imbalanced training data are available in
facturing, aerospace industry etc., the conventional physical most cases, which poses negative effect on the data-driven fault
model-based methods are becoming less capable of providing reli- diagnosis model.
able fault diagnostic results, and the intelligent data-driven Fig. 1 illustrates the influence of the imbalanced training data
approaches are emerging to offer promising tools for accurate on the model performance. Using balanced training data, effective
machine condition assessment [1–6]. Generally, data-driven fault discriminative features of different classes can be learned by the
diagnosis models are established through exploration of the statis- data-driven methods, which generalize well on the testing sam-
tics of the supervised training data, which are assumed to cover a ples. However, when the training data are imbalanced, the model
wide variety of machine health conditions. Therefore, the effective- is inclined to be over-trained by the majority classes, and the deci-
ness of the fault diagnosis method is highly dependent on the qual- sion boundaries of the minority classes tend to shrink, resulting in
ity and quantity of the training data. degraded generalization on the testing samples. Therefore, the
In the current literature, the data-driven fault diagnosis studies data-driven models are generally less confident of identifying
are mostly carried out under the assumption that balanced training machine faults in such scenarios.
data can be obtained, indicating similar amount of labeled samples In order to address the imbalanced data issue, different sam-
in different machine conditions can be used for training. However, pling methods have been proposed, which basically fall into two
categories, i.e. under-sampling the majority classes and over-
sampling the minority classes. While the under-sampling methods
⇑ Corresponding author at: College of Sciences, Northeastern University, She- generally lead to information losses, the over-sampling approaches
nyang 110819, China. have been much preferred in the latest studies [7–9]. Especially,
E-mail address: xiangli@mail.neu.edu.cn (X. Li).
https://doi.org/10.1016/j.measurement.2019.107377
0263-2241/Ó 2019 Elsevier Ltd. All rights reserved.
2 W. Zhang et al. / Measurement 152 (2020) 107377
Fig. 1. Data-driven fault diagnosis performances with balanced and imbalanced data, and the effect of synthetic over-sampling methods.
the synthetic over-sampling methods have been promising on the 2. Related works and preliminaries
imbalanced data problems, which create additional training sam-
ples based on the available data as shown in Fig. 1. For instance, 2.1. Synthetic over-sampling approaches
the popular synthetic minority oversampling technique (SMOTE)
[10] aims to interpolate a new data sample between a certain real Generally, synthetic over-sampling approaches aim at generat-
sample and one of its nearest neighbors. In this way, additional ing new data samples of minority classes to alleviate the imbal-
samples can be generated to expand and balance the imbalanced anced data problem. The synthetic minority oversampling
training dataset, leading to improvements in the model perfor- technique (SMOTE) is one of the most widely used methods, which
mance. While a number of synthetic over-sampling methods have generates new samples through interpolation of the real data [10].
been proposed in the past years, most studies are based on data Some variants of SMOTE were also proposed to improve the data
interpolation, and less effective in creating data variants subject generation effect, including MSMOTE [7] which is modified to
to the underlying distribution of the real data. Therefore, the exist- eliminate noise samples by adaptive mediation, borderline-
ing methods still suffer from overfitting the limited minority data. SMOTE [16] where only the minority examples near the borderline
Recently, deep learning is emerging as a highly efficient data- are over-sampled etc. The adaptive synthetic sampling approach
driven technique, which is characterized by the deep neural (ADASYN) was proposed by He et al. [17] where a weighted distri-
network architecture with multiple linear and nonlinear data bution is used for different minority class examples based on their
transformation operations. A number of tasks including intelligent level of difficulty in learning, and more synthetic data are gener-
fault diagnosis have largely benefited from the development of ated for the minority class examples that are harder to learn.
deep learning in the past years [11–14]. Besides the successful Bunkhumpornpat et al. proposed the Safe-Level-SMOTE method
application in establishing the relationship between measured sig- [18], where each minority class instance is assigned a safe level,
nal and machine health conditions, deep learning has also shown and SMOTE is implemented to generate new data only in the safe
promising effects in data generation. For instance, in image pro- region.
cessing tasks, realistic images have been artificially generated While a wide variety of over-sampling methods have been
using generative adversarial neural networks [15]. Therefore, deep developed, data interpolation is mostly used which fails to learn
generative adversarial neural networks hold the potential to effec- the underlying distribution of the real data, and incorrect samples
tively address the imbalanced data problem by creating additional may possibly be generated in this way. Recently, deep neural net-
reliable data samples for training. works have shown promising ability on data generation [15], and
In this paper, a deep learning-based synthetic over-sampling have been successfully applied to address the data imbalance prob-
method is proposed for machinery fault diagnosis with imbalanced lem. The conditional generative adversarial network (CGAN) was
data. Two stages are included in the proposed method. In the first proposed by Mirza et al. [9], where the network can be trained to
stage, generative adversarial networks are adopted to learn the dis- generate data conditioned on class labels. Additional samples of
tributions of the real data samples, which can be used to balance the minority classes can be explicitly created for data augmenta-
and further expand the training dataset by creating additional real- tion. The balancing generative adversarial network (BAGAN) was
istic fake samples. In the second stage, a deep convolutional neural proposed by Mariani et al. as a data augmentation tool to restore
network is employed afterwards for fault diagnosis, which is balance in imbalanced image datasets [8]. The generative neural
trained using the enlarged dataset. Experiments on two rotating network learns useful features from the majority classes, which
machinery datasets are carried out to validate the proposed facilitate data generation of the minority classes. High-quality
method, and promising results are obtained in different tasks with images can be created with BAGAN using imbalanced data.
imbalanced data.
The remainder of this paper starts with the related works and 2.2. Data-driven fault diagnosis methods
preliminaries in Section 2. The proposed fault diagnosis method
is presented in Section 3, and experimentally validated and inves- In the past years, machinery fault diagnosis has been popularly
tigated in Section 4. We close the paper with conclusions in investigated in the literature [19,20]. A combined polynomial chir-
Section 5. plet transform and synchroextracting technique was proposed by
W. Zhang et al. / Measurement 152 (2020) 107377 3
Yu et al. [21] for analyzing non-stationary signals of rotating inputs. The prior on the input noise distribution is denoted as
machinery, which is helpful for fault diagnosis. Jiang et al. [22] pro- pz ðzÞ, and the generated data are thus Gðz; hG Þ where hG denotes
posed a coarse-to-fine decomposing strategy for weak fault detec- the parameters in G. The discriminator D with parameters hD takes
tion of rotating machines, where the variational mode the real data and the generated fake data Gðz; hG Þ as inputs, and the
decomposition (VMD) is adopted to analyze different kinds of sig- output Dðx; hD Þ denotes the probability that the input comes from
nals [23]. Especially, deep learning has been widely used in the real data rather than the generated data. Adversarial training is
machinery fault diagnosis tasks due to the great merits of reliable implemented that the discriminator is updated to accurately clas-
condition assessment, easy model establishment, low requirement sify the real and fake data, while the generator is optimized to gen-
of special expertise etc. Promising fault diagnosis results have been erate realistic samples which can not be distinguished by the
obtained in a number of studies using different kinds of deep neu- discriminator. In summary, the network optimization can be for-
ral networks [24–28] such as convolutional neural network (CNN) mulated as [15],
[29–31], recurrent neural network (RNN) [32] etc. Lei et al. [33]
minmaxVðG; DÞ ¼ Expdata ðxÞ ½log DðxÞ þ Ezpz ðzÞ ½logð1 DðGðzÞÞÞ;
proposed a two-stage learning method for intelligent fault diagno- G D
sis, where sparse filtering and a two-layer neural network are used ð1Þ
for feature extraction, and softmax regression is adopted for
machine health condition classification. Lu et al. [29] utilized a where pdata denotes the distribution of the real data. Through adver-
stacked denoising auto-encoder for machinery fault identifications sarial training between G and D, the noise distribution pz can be
with signals containing ambient noise and operating condition projected to be similar with pdata by the generator, and additional
fluctuations. Sparse auto-encoders were proposed by Sun et al. realistic fake data can be thus generated.
[34] in unsupervised deep neural network for induction motor In the current literature, some initial studies [42–44] have been
fault diagnosis. Partial corruption of the auto-encoder input is carried out on using generative neural networks to address the
added to enhance the robustness of the learned feature data imbalance problem. However, most of the existing methods
representation. generally focus on feature-based data generation. Despite the
While most fault diagnosis studies are carried out with bal- promising performance, expertise on signal processing and fault
anced data, the imbalanced data problem has also attracted much diagnosis is basically required. In this study, the raw machinery
attention [35]. An online sequential prediction method was pro- vibration data are focused on, and realistic vibration acceleration
posed by Mao et al. [36] for imbalanced fault diagnosis problem, data are generated to enhance the training dataset. In this way, lit-
where extreme learning machine is used, and the principal curve tle prior knowledge is required, which largely facilitates the appli-
and granulation division are introduced to simulate the machine cation in the real industries.
faulty data distributions. Martin-Diaz et al. [37] proposed a super-
vised classification approach for fault diagnosis based on the adap- 3. Proposed method
tive boosting algorithm with an optimized sampling technique
dealing with the imbalanced dataset. In [38], a new synthetic 3.1. Overview
over-sampling approach called weighted minority over-sampling
(WMO) was devised to balance the data distribution with imbal- Fig. 2 shows the overview of the proposed method on fault diag-
anced dataset. A deep auto-encoder was used afterwards for fea- nosis with imbalanced data, including 2 stages i.e. data generation
ture extraction, and a decision tree was adopted for fault and fault classification. First, deep neural networks are used to
classification. To enlarge the training dataset, Li et al. [39] proposed learn the underlying distributions of the machinery vibration data
an data augmentation method using different signal processing in different health conditions using the available imbalanced data-
techniques. The results show the deep learning-based fault diagno- set. The projection from noise to real data distribution is estab-
sis methods can largely benefit from the expanded dataset with lished, suggesting additional realistic fake samples can be
more valid instances. generated from noise. In this way, the original imbalanced dataset
In the latest studies, generative neural networks have been can be balanced and further expanded with the generated samples,
applied to generate machinery data samples. Khan et al. [40] uti- which can be used for developing an effective data-driven fault
lized generative adversarial networks (GAN) for modeling the bear- diagnosis method afterwards.
ing degradation behavior. The future trajectory of the bearing
health indicator can be generated for prognostics. GAN was also 3.2. Stage 1: data generation
adopted in the planetary gearbox fault pattern recognition task
with imbalanced data by Wang et al. [41]. The fault diagnosis mod- The generative adversarial networks are used to learn the distri-
ule is integrated in the adversarial training scheme, which can be butions of the machinery vibration data, and generate fake realistic
optimized using both the real and fake data. In this paper, a samples to expand the training dataset. In this study, fault diagno-
GAN-based method is proposed to address the fault diagnosis task sis of multiple machine health conditions is investigated, and mul-
with imbalanced data. Different from most existing studies which tiple networks are adopted for distribution learning. Fig. 3 shows
implicitly explore the fake data. Additional realistic samples are the scheme of this stage and the network architecture.
generated first to explicitly expand the training dataset, which N im generation modules are employed and each one aims to
are further used to improve the performance of data-driven fault learn the data distribution of each machine health condition,
diagnosis methods. respectively. N im denotes the number of the classes that need to
be enhanced. In each generation module, a generative adversarial
2.3. Generative adversarial networks network is adopted, including a generator and a discriminator as
introduced in Section 2.3. The prior distribution of the input noise
Generative adversarial networks (GAN) have been successfully z 2 RN z is assumed to be standard Gaussian distribution, and N z is
developed in the recent years with the promising performance the dimensionality of z.
on realistic data generation. Generally, two modules are adopted, In the generator, one fully-connected layer with N input neurons
i.e. generator G and discriminator D, which are both parameterized are first used, and three convolutional layers are then adopted
as deep neural networks. GAN aims to learn the distribution of the whose filter numbers are 128, 64 and 32, respectively. After the
generator pg over the target data x, using noise variables z as flatten layer, two fully-connected layers are further used with
Fig. 2. Overview of the proposed method.
1024 and N input neurons, and the output is thus the generated sam-
ple from the noise input. In the discriminator, two convolutional
layers are adopted with filter numbers of 64 and 32, respectively. Fig. 3. Data generation scheme. LR: Leaky ReLU activation function. BN: Batch
After the flatten layer, two fully-connected layers with 128 and 1 normalization.
neurons are used. Throughout the network, the leaky rectified lin-
ear unit (leaky ReLU) activation functions are generally adopted ^hG ¼ arg maxDðGðz; hG Þ; ^hD Þ;
[45], batch normalization is applied in the generator to accelerate hG
model training [46], and the filter size of 10 is used for the convo- ð2Þ
^hD ¼ arg minDðGðz; ^hG Þ; hD Þ; maxDðx; hD Þ ;
lutional layers. hD hD
In network optimization, adversarial training is implemented, z Nð0; IÞ; x 2 Sitrain ;
where the discriminator is trained to distinguish the real and fake
data, while the generator is trained to generate realistic samples
where ^
hG and ^
hD denote the optimal values of hG andhD , respectively.
which can not be identified by the discriminator. For the i-th
generation module, the network parameters are optimized to Sitrain is the set of the available samples in the i-th concerned
achieve, machine condition.
The popular stochastic gradient descent (SGD) algorithm can be 4. Experimental study
readily used to solve Eq. (2). Specifically, a 2-step optimization is
applied in each training epoch. First, the parameters hD in the dis- 4.1. Dataset descriptions
criminator are fixed, and the parameters hG in the generator are
optimized as, 4.1.1. CWRU dsataset
The CWRU rolling bearing dataset is provided by the Bearing
@Lg
hG hG þ d ; ð3Þ Data Center of Case Western Reserve University [47]. The dataset
@hG is publicly available and has been widely used on fault diagnosis.
1 X
nbatch
The vibration data used in this study were collected from the drive
Lg ¼ DðGðzi ; hG Þ; hD Þ;
nbatch end of the motor under the rotating speed of 1797 rpm, and on four
i¼1
health conditions: 1) healthy (H), 2) outer race fault (OF), 3) inner
where Lg denotes the objective for the generated data, evaluated by race fault (IF) and 4) ball fault (BF). Different fault severities are
nbatch randomly sampled instances of noise from Gaussian distribu- considered with fault diameters of 7, 14 and 21 mils, respectively.
tion, nbatch is the size of the mini-batch, and d represents the learn- Therefore, 10 bearing conditions are diagnosed.
ing rate. Next, hD is updated while hG remains constant as,
4.1.2. Bogie dataset
@Lg @Ld
hD hD d ; ð4Þ The Bogie dataset is collected from an experimental setup of
@hD @hD
high-speed multi-unit train bogie bearing system shown in Fig. 5.
1 X
nbatch
Ld ¼ Dðxi ; hD Þ; The accelerometer is placed on the load module for vibration data
nbatch i¼1 collection with sampling frequency of 5 kHz. The rotating speed of
1950 rpm is implemented, corresponding with the train speed of
where Ld denotes the objective for the real data, evaluated by nbatch 320 km/h. Three kinds of faulty bearings are generated, i.e. outer
randomly selected real samples from the dataset. Through itera- race fault (OF), roller fault (RF) and inner race fault (IF). Three
tions of the 2-step optimization in Eqs. (3) and (4), the generated levels of fault severities are also considered, i.e. incipient, medium
fake samples become more and more realistic, building parameter- and severe faults, resulting in 10 bearing conditions including the
ized relationship between the noise and real data distributions. healthy state (H). The detailed information of the two datasets is
presented in Table 1.
3.3. Stage 2: fault classification
4.2. Compared approaches
Fig. 4 shows the network architecture in the fault classification
stage. Generally, the proposed network follows the typical super- In this study, different methods for the imbalanced data problem
vised learning scheme. Three convolutional and max-pooling lay- are implemented for comparisons [48]. As the baseline, the Imbal-
ers are first adopted with filter numbers of 64, 32 and 16 anced approach is carried out where only the imbalanced training
respectively, followed by a flatten layer. Two fully-connected lay- dataset is used with no data augmentation technique. The
ers are used next, with 1024 and N c neurons respectively, where UnderSampling method performs random under-sampling for the
N c denotes the number of the machine health conditions. The soft- majority classes, and the ADASYN method uses the adaptive syn-
max function is adopted for classification, and the cross-entropy thetic sampling approach [17]. The popular SMOTE method [10],
loss Ls is minimized to reduce the empirical classification error, i.e. the synthetic minority oversampling technique, is also
which is defined as, implemented.
In the proposed framework of data augmentation, the Pro-
1 X aug X
n Nc
exc;i;j Balanced method denotes the scenario where additional samples
Ls ¼ 1fyi ¼ jg log PNc x ; ð5Þ
naug i¼1 j¼1 k¼1 e
c;i;k of the minority classes are added to balance the dataset.
Furthermore, since the data-driven fault diagnosis methods gener-
where xc;i;j denotes the j-th element of the output vector, taking the ally benefit from larger training dataset, the proposed method is
i-th sample as input, and yi represents the corresponding machine used to generate additional data for all the machine conditions,
condition label. naug is the number of the samples in the expanded resulting in a significantly expanded dataset, that is denoted as the
training dataset, including both the real and generated fake data. Pro-Expand method. In order to evaluate the generation effect, the
Fault Classification
CNN Max CNN Max CNN Max Flat. FC FC Softmax

Samples 64 Pooling 32 Pooling 16 Pooling 512 Nc
ReLU ReLU ReLU ReLU
BN
Fig. 4. Fault classification network architecture.

Fig. 5. Test rig and bearing faults in the Bogie dataset.
AllFake method is carried out where only the generated fake data are Specifically, six tasks are evaluated on each dataset. With
used. In addition, the RealBalanced method is used as a reference for respect to the CWRU dataset, three tasks are implemented with
evaluations of different methods, where the balanced dataset con- different imbalance ratio, i.e. C1, C2 and C3. In order to examine
taining only the real samples in different classes is considered. the robustness of the proposed method against environmental
All the methods are carried out to prepare the training datasets, noise, additional Gaussian noise is added to the testing data for
which are used to develop the data-driven fault diagnosis evaluation, and the noisy data are generated based on different
approaches afterwards. For simplicity, the fault classification signal-to-noise ratio (SNR), which is defined as,
model in the proposed method as presented in Section 3.3 is
shared by all the compared methods. SNRðdBÞ ¼ 10log10 ðPsignal =Pnoise Þ; ð6Þ
4.3. Experimental results and performance analysis where Psignal and P noise denote the powers of the original signal and
the additional Gaussian noise, respectively. In this way, each task
In this section, the proposed method is experimentally vali- is also evaluated using noisy data, that makes C1-Noise, C2-Noise
dated using the two bearing datasets. Multiple tasks with different and C3-Noise tasks for the CWRU dataset respectively. Similarly,
imbalance ratios are investigated, as presented in Table 2. B1, B2 and B3 tasks are implemented on the Bogie dataset, and
Table 1
Information of the two datasets. Inc., Med. and Sev. denote incipient, medium and severe, respectively.
Dataset Class Label 1 2 3 4 5 6 7 8 9 10

CWRU Fault Location N/A (H) IF IF IF BF BF BF OF OF OF
Fault Size (mil) 0 7 14 21 7 14 21 7 14 21
Bogie Fault Location N/A (H) IF IF IF RF RF RF OF OF OF
Fault Severity N/A Inc. Med. Sev. Inc. Med. Sev. Inc. Med. Sev.
Table 2
Descriptions of the fault diagnosis tasks.
Dataset Task Imbalance Ratio (Class Label) Noise

1 2 3 4 5 6 7 8 9 10 (SNR)
C1 1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 +1
C1-Noise 1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 2
CWRU C2 1 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 +1
C2-Noise 1 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 2
C3 1 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 +1
C3-Noise 1 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 2
B1 1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 +1
B1-Noise 1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 2
Bogie B2 1 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 +1
B2-Noise 1 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 2
B3 1 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 +1
B3-Noise 1 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 2
the B1-Noise, B2-Noise and B3-Noise tasks are under noisy environ- observed that the performance of the data-driven fault diagnosis
ments respectively. methods is significantly influenced by the imbalanced training
In this study, different methods are evaluated for comparisons. data. Lower testing accuracies are generally obtained by the same
For both the two datasets, it is assumed that the sample dimension method with smaller imbalance ratio. The proposed methods out-
is 512. By default, 200 labeled samples in the machine healthy con- perform the compared approaches in most cases, showing the
dition are available in the training dataset, and the number of the effectiveness and superiority of the proposed data augmentation
samples in the other conditions is determined by the imbalance method.
ratio in different tasks. The testing data include 2000 samples, with Specifically, the Pro-Balanced method achieves higher testing
each class containing 200 samples respectively. The Pro-Expand accuracies than the other existing approaches in different tasks,
method indicates the training dataset is expanded to contain and its performance improvements are more significant in the sce-
1000 samples of each class including the real data, and the AllFake narios with smaller imbalance ratio and additional noise. Further-
method includes 1000 generated fake samples of each class. more, better performance can be mostly achieved by the Pro-
In network training, the back-propagation (BP) algorithm is Expand method with expanded dataset, and its testing accuracy
applied for the updates of all the network parameters, and the is even higher than that of the RealBalanced method in the noisy
Adam optimization method [49] is used. The reported experimen- environments. That shows the proposed method significantly
tal results are generally averaged by 10 trials to reduce the effect of enhances the data robustness of the fault diagnosis methods
randomness. The model parameters are presented in Table 3. They against additional noise. It is also noted that the AllFake method
are mostly determined from the validation results in the task C1, where the diagnostic model is trained only using the generated
which is a relatively easy task containing less imbalanced training fake data, still obtains promising diagnostic results. That suggests
data and environmental noise is not applied. high similarity between the generated fake samples and the real
data, thus showing the effectiveness of the proposed data genera-
4.3.1. Diagnostic results tion scheme.
The fault diagnostic results in different tasks using It should be pointed out that different amount of data in the
different methods are presented in Table 4. It can be minority classes are explored by the proposed method for data
generation in different tasks. Since the performance of the genera-
Table 3 tive adversarial network generally depends on the size of the train-
Parameters used in this paper.
ing data, lower testing accuracies are mostly obtained with smaller
Parameter Value Parameter Value imbalance ratio for each bearing dataset. However, the proposed
Epochs (Stage 1) 2e5 d 1e5 method still achieves promising results using limited available
Epochs (Stage 2) 5e3 N input 512 data in tasks C3 and B3, further showing its effectiveness in data
Nz 256 augmentation.
Table 4
Average testing accuracies in different tasks using different methods (%).
Method C1 C1-Noise C2 C2-Noise C3 C3-Noise B1 B1-Noise B2 B2-Noise B3 B3-Noise

Imbalanced 90.3 72.7 51.7 37.8 22.6 18.2 65.4 44.5 28.7 23.7 15.7 12.6
UnderSampling 91.5 71.7 52.5 38.3 33.6 29.8 66.3 51.3 42.7 30.6 24.4 20.1
SMOTE 92.7 74.4 63.4 48.3 40.4 32.9 80.1 62.6 45.2 38.3 26.3 24.7
ADASYN 94.1 72.3 65.2 55.7 35.0 30.4 82.0 63.7 50.0 39.5 25.8 22.6
RealBalanced 99.9 74.2 99.9 73.8 99.9 74.0 99.9 86.4 99.9 86.2 99.9 85.7
AllFake 76.5 51.6 73.4 48.6 52.0 44.6 83.6 58.3 56.5 43.7 48.7 36.4
Pro-Balanced 90.5 74.6 81.6 72.2 61.3 52.8 88.6 68.2 60.4 45.3 47.6 38.5
Pro-Expand 90.3 79.2 82.5 78.7 61.2 58.4 95.5 72.6 70.1 48.5 50.2 42.7
Fig. 6. Effects of the data augmentation strength on the model testing performance in different tasks.
Fig. 7. Performances of different methods under different signal-to-noise ratio of the testing data.
4.3.2. Performance analysis are contaminated with Gaussian noise of different signal-to-noise
In this section, the performance of the proposed method is ratio, and the results are presented in Fig. 7. It clearly shows that
investigated in different scenarios with imbalanced data. Fig. 6 the additional noise remarkably deteriorates the model perfor-
shows the effects of the data augmentation strength on the model mance in the imbalanced data problems, and smaller signal-to-
testing performance. Concretely, the augmentation strength noise ratio results in lower testing accuracy. The proposed meth-
denotes the size of the expanded dataset. For instance, strength 1 ods generally obtain better results than the other compared
is the same with the Pro-Balanced method, and strength 5 means approaches in different scenarios. The Pro-Expand method further
the dataset is enlarged to be 5 times of that with strength 1. achieves higher testing accuracies than the Pro-Balanced method,
It can be observed that generally, larger augmentation strength especially in the cases with strong noise. Therefore, the relation-
leads to higher testing accuracy in different tasks. The improve- ship between the noise and real data distributions is effectively
ments are more significant in the tasks with additional noise, captured by the data generation scheme, and the proposed method
and close to 10% increase can be mostly obtained. The results is further validated.
indicate the proposed method is able to learn the underlying
distribution of the real data, and the noisy data distribution can 4.3.3. Data visualizations
be effectively covered through data augmentation. The proposed method aims to explicitly generate realistic fake
Next, the robustness of the proposed method against samples for data augmentation. Fig. 8 shows examples of the gen-
additional noise is further investigated, where the testing data erated instances in different machine health conditions in the task
Fig. 8. Examples of the real and generated fake data samples in different bearing health conditions in the task C1, as well as the corresponding frequency spectra.
C1, and the corresponding real data samples are also presented. It 5. Conclusion
can be observed that the temporal vibration patterns of the real
and fake data are similar with each other. Furthermore, the fast In this paper, a deep learning-based fault diagnosis method is
Fourier transformation is applied on the samples, and the fre- proposed to address the imbalanced data problem using generative
quency spectra of the real and fake data also have high similarity. adversarial networks. Multiple generation modules are adopted for
Moreover, in order to further show the consistence of the gen- data augmentation of the minority classes. Through adversarial
erated data and the real data, more fake samples are presented training between the generator and discriminator, the mapping
in Figs. 9–11, which correspond with the inner race fault, ball fault between the distributions of noise and real data can be established,
and outer race fault conditions respectively. It can be observed that which can be used to generate additional fake samples to balance
the generated data are generally pretty similar with the real vibra- and further expand the training dataset. Based on the experimental
tion signals. The results show the proposed method is able to effec- validations on two rotating machinery datasets, the data-driven
tively learn the underlying distributions of the real data, and fault diagnostic model can significantly benefit from the generated
realistic fake samples can be generated for data augmentation. fake samples, that suggests the proposed data augmentation
Fig. 9. Examples of the real and multiple generated fake data samples in the inner race fault condition.
Fig. 10. Examples of the real and multiple generated fake data samples in the ball fault condition.
Fig. 11. Examples of the real and multiple generated fake data samples in the outer race fault condition.
method is promising for fault diagnostic tasks with imbalanced Instead, multiple generators are preferred in the fault diagnosis
data. task on the vibration data, with each generator focusing on one
It should be pointed out that while some model such as the con- machine health condition respectively. Furthermore, the same
ditional GAN is able to generate data of different kinds at the same operating condition of the rotating machines is considered in this
time, it is not suggested in this study based on the experiments. study. In practice, the rotating speeds usually change in different
cases, that results in cross-domain fault diagnosis problems. While [17] H. Haibo, B. Yang, E.A. Garcia, L. Shutao, ADASYN Adaptive synthetic sampling
approach for imbalanced learning, in: Proceedings of IEEE International Joint
it is beyond the scope of this study on the data imbalance problem,
Conference on Neural Networks, 2008, pp. 1322–1328.
it is straight-forward and promising to integrate the well devel- [18] C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Safe-Level-SMOTE: Safe-
oped transfer learning techniques in the proposed framework to level-synthetic minority over-sampling technique for handling the class
address the data imbalance and cross-domain fault diagnosis imbalanced problem, in: Proceedings of Advances in Knowledge Discovery
and Data Mining, Berlin, Heidelberg, 2009, pp. 475–482.
issues simultaneously. [19] H. Ma, J. Zeng, R. Feng, X. Pang, Q. Wang, B. Wen, Review on dynamics of
Despite the improvements in the testing performances, the cracked gear systems, Eng. Fail. Anal. 55 (2015) 224–245.
main drawback of the proposed method lies in the relatively large [20] Z. Luo, J. Wang, R. Tang, D. Wang, Research on vibration performance of the
nonlinear combined support-flexible rotor system, Nonlinear Dyn. 98 (1)
model, consisting of multiple GANs for the minority classes. Fur- (2019) 113–128.
ther research works will be carried out on the optimization of [21] K. Yu, T.R. Lin, H. Ma, H. Li, J. Zeng, A combined polynomial chirplet transform
the network structure. and synchroextracting technique for analyzing nonstationary signals of
rotating machinery, IEEE Trans. Instrum. Meas. (2019), 1–1.
[22] X. Jiang, J. Wang, J. Shi, C. Shen, W. Huang, Z. Zhu, A coarse-to-fine
decomposing strategy of VMD for extraction of weak repetitive transients in
Declaration of Competing Interest fault diagnosis of rotating machines, Mech. Syst. Signal Process. 116 (2019)
668–692.
The authors declare that they have no known competing finan- [23] X. Jiang, C. Shen, J. Shi, Z. Zhu, Initial center frequency-guided VMD
for fault diagnosis of rotating machines, J. Sound Vib. 435 (2018) 36–55.
cial interests or personal relationships that could have appeared
[24] C. Shen, Y. Qi, J. Wang, G. Cai, Z. Zhu, An automatic and robust features learning
to influence the work reported in this paper. method for rotating machinery fault diagnosis based on contractive
autoencoder, Eng. Appl. Artif. Intell. 76 (2018) 170–184.
[25] X. Li, W. Zhang, Q. Ding, J.-Q. Sun, Multi-layer domain adaptation method for
Acknowledgements rolling bearing fault diagnosis, Signal Processing 157 (2019) 180–197.
[26] L. Wen, L. Gao, X. Li, A new deep transfer learning based on sparse
auto-encoder for fault diagnosis, IEEE Trans. Syst., Man, Cybern.: Syst. (99)
The material in this paper is based on work supported by grants (2017) 1–9.
(N180703018, N170503012, N180708009 and N170308028) from [27] X. Li, W. Zhang, N. Xu, Q. Ding, Deep learning-based machinery fault
the Fundamental Research Funds for the Central Universities, grant diagnostics with domain adaptation across sensors at different places, IEEE
Trans. Industr. Electron. (2019), 1–1.
(11902202) from the National Natural Science Foundation of [28] X. Li, W. Zhang, Q. Ding, X. Li, Diagnosing rotating machines with weakly
China, grant (VCAME201906) from the Key Laboratory of Vibration supervised data using deep transfer learning, IEEE Trans. Industr. Inf. (2019),
and Control of Aero-Propulsion System Ministry of Education, 1–1.
[29] C. Lu, Z.Y. Wang, W.L. Qin, J. Ma, Fault diagnosis of rotary machinery
Northeastern University, and grant (2019-BS-184) from Liaoning components using a stacked denoising autoencoder-based health state
Provincial Department of Science and Technology. identification, Signal Processing 130 (2017) 377–388.
[30] W. Sun, R. Zhao, R. Yan, S. Shao, X. Chen, Convolutional discriminative feature
learning for induction motor fault diagnosis, IEEE Trans. Industr. Inf. 13 (3)
References (2017) 1350–1359.
[31] X. Li, W. Zhang, Q. Ding, Cross-domain fault diagnosis of rolling element
bearings using deep generative neural networks, IEEE Trans. Industr. Electron.
[1] X. Guo, L. Chen, C. Shen, Hierarchical adaptive deep convolution neural
66 (7) (2019) 5525–5534.
network and its application to bearing fault diagnosis, Measurement 93 (2016)
[32] L. Guo, N. Li, F. Jia, Y. Lei, J. Lin, A recurrent neural network based health
490–502.
indicator for remaining useful life prediction of bearings, Neurocomputing 240
[2] H. Ren, W. Liu, M. Shan, X. Wang, A new wind turbine health condition
(2017) 98–109.
monitoring method based on VMD-MPE and feature-based transfer learning,
[33] Y. Lei, F. Jia, J. Lin, S. Xing, S.X. Ding, An intelligent fault diagnosis method using
Measurement 148 (2019) 106906.
unsupervised feature learning towards mechanical big data, IEEE Trans.
[3] S.X. Ding, Data-driven design of observer-based fault diagnosis systems, in: S.
Industr. Electron. 63 (5) (2016) 3137–3147.
X. Ding (Ed.), Data-driven Design of Fault Diagnosis and Fault-tolerant Control
[34] W. Sun, S. Shao, R. Zhao, R. Yan, X. Zhang, X. Chen, A sparse auto-encoder-
Systems, Springer, London, 2014, pp. 175–200.
based deep neural network approach for induction motor faults classification,
[4] X. Zhang, Y. Liang, J. Zhou, Y. Zang, A novel bearing fault diagnosis model
Measurement 89 (2016) 171–178.
integrated permutation entropy, ensemble empirical mode decomposition and
[35] F. Jia, Y. Lei, N. Lu, S. Xing, Deep normalized convolutional neural network for
optimized SVM, Measurement 69 (2015) 164–179.
imbalanced fault classification of machinery and its understanding via
[5] R. Razavi-Far, E. Hallaji, M. Farajzadeh-Zanjani, M. Saif, A semi-supervised
visualization, Mech. Syst. Signal Process. 110 (2018) 349–367.
diagnostic framework based on the surface estimation of faulty distributions,
[36] W. Mao, L. He, Y. Yan, J. Wang, Online sequential prediction of bearings
IEEE Trans. Industr. Inf. 15 (3) (2019) 1277–1286.
imbalanced fault diagnosis by extreme learning machine, Mech. Syst. Signal
[6] C. Li, R.-V. Sanchez, G. Zurita, M. Cerrada, D. Cabrera, R.E. Vásquez, Multimodal
Process. 83 (2017) 450–473.
deep support vector classification with homologous features and its
[37] I. Martin-Diaz, D. Morinigo-Sotelo, O. Duque-Perez, R.J. Romero-Troncoso,
application to gearbox fault diagnosis, Neurocomputing 168 (2015) 119–127.
Early fault detection in induction motors using AdaBoost with imbalanced
[7] S. Hu, Y. Liang, L. Ma, Y. He, MSMOTE: Improving classification performance
small data and optimized sampling, IEEE Trans. Ind. Appl. 53 (3) (2017)
when training data is imbalanced, in: Proceedings of Second International
3066–3075.
Workshop on Computer Science and Engineering, vol. 2, 2009, pp. 13–17.
[38] Y. Zhang, X. Li, L. Gao, L. Wang, L. Wen, Imbalanced data fault diagnosis of
[8] G. Mariani, F. Scheidegger, R. Istrate, C. Bekas, C. Malossi, BAGAN: data
rotating machinery using synthetic oversampling and feature learning, J.
augmentation with balancing GAN, arXiv preprint arXiv:1803.09655.
Manuf. Syst. 48 (2018) 34–50.
[9] M. Mirza, S. Osindero, Conditional generative adversarial nets, arXiv preprint
[39] X. Li, W. Zhang, Q. Ding, J.-Q. Sun, Intelligent rotating machinery fault
arXiv:1411.1784.
diagnosis based on deep learning using data augmentation, J. Intell. Manuf.
[10] N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic
https://doi.org/10.1007/s10845-018-1456-1.
minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357.
[40] S.A. Khan, A.E. Prosvirin, J. Kim, Towards bearing health prognosis using
[11] T. Han, C. Liu, W. Yang, D. Jiang, Learning transferable features in deep
generative adversarial networks: Modeling bearing degradation, in:
convolutional neural networks for diagnosing unseen machine conditions, ISA
Proceedings of International Conference on Advancements in Computational
Transactions.
Sciences, 2018, pp. 1–6.
[12] L. Guo, Y. Lei, S. Xing, T. Yan, N. Li, Deep convolutional transfer learning
[41] Z. Wang, J. Wang, Y. Wang, An intelligent diagnosis scheme based on
network: a new method for intelligent fault diagnosis of machines with
generative adversarial learning deep neural networks and its application to
unlabeled data, IEEE Trans. Industr. Electron. (2018), 1–1.
planetary gearbox fault pattern recognition, Neurocomputing 310 (2018) 213–
[13] B. Yang, Y. Lei, F. Jia, S. Xing, An intelligent fault diagnosis approach based on
222.
transfer learning from laboratory bearings to locomotive bearings, Mech. Syst.
[42] W. Mao, Y. Liu, L. Ding, Y. Li, Imbalanced fault diagnosis of rolling bearing
Signal Process. 122 (2019) 692–706.
based on generative adversarial network: a comparative study, IEEE Access 7
[14] X. Li, W. Zhang, Q. Ding, A robust intelligent fault diagnosis method for rolling
(2019) 9515–9530.
element bearings based on deep distance metric learning, Neurocomputing
[43] D. Cabrera, F. Sancho, J. Long, R. Sánchez, S. Zhang, M. Cerrada, C. Li, Generative
310 (2018) 77–95.
adversarial networks selection approach for extremely imbalanced fault
[15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.
diagnosis of reciprocating machinery, IEEE Access 7 (2019) 70643–70653.
Courville, Y. Bengio, Generative Adversarial Nets, Curran Associates Inc, 2014.
[44] F. Zhou, S. Yang, H. Fujita, D. Chen, C. Wen, Deep learning fault diagnosis
[16] H. Han, W.-Y. Wang, B.-H. Mao, Borderline-SMOTE: A new over-sampling
method based on global optimization GAN for unbalanced data, Knowledge-
method in imbalanced data sets learning, in: Proceedings of Advances in
Based Syst.
Intelligent Computing, Berlin, Heidelberg, 2005, pp. 878–887..
[45] A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural [47] W.A. Smith, R.B. Randall, Rolling element bearing diagnostics using the Case
network acoustic models, in: Proceedings of 30th International Conference on Western Reserve University data: a benchmark study, Mech. Syst. Signal
Machine Learning, vol. 28, 2013. Process. 64–65 (2015) 100–131.
[46] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network [48] G. Lema, F. Nogueira, C.K. Aridas, Imbalanced-learn: a python toolbox to tackle
training by reducing internal covariate shift, in: Proceedings of 32nd the curse of imbalanced datasets in machine learning, J. Mach. Learn. Res. 18
International Conference on Machine Learning, vol. 1, Lile, France, 2015, pp. (1) (2017) 559–563.
448–456. [49] D. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint
arXiv:1412.6980.

10 1016@j Measurement 2019 107377

Uploaded by

Copyright:

Available Formats

10 1016@j Measurement 2019 107377

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 1016@j Measurement 2019 107377

Uploaded by

Copyright:

Available Formats

Measurement 152 (2020) 107377

Contents lists available at ScienceDirect

Machinery fault diagnosis with imbalanced data using deep generative

Fig. 2. Overview of the proposed method.

CNN Max CNN Max CNN Max Flat. FC FC Softmax

Fig. 4. Fault classification network architecture.

Fig. 5. Test rig and bearing faults in the Bogie dataset.

Dataset Class Label 1 2 3 4 5 6 7 8 9 10

Dataset Task Imbalance Ratio (Class Label) Noise

Method C1 C1-Noise C2 C2-Noise C3 C3-Noise B1 B1-Noise B2 B2-Noise B3 B3-Noise

You might also like