Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: Application and Challenge
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: Application and Challenge
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: Application and Challenge
https://doi.org/10.1007/s11063-021-10719-z
Chenhui Qian1 · Junjun Zhu1 · Yehu Shen1 · Quansheng Jiang1 · Qingkui Zhang1
Abstract
Mechanical intelligent fault diagnosis is an important method to accurately identify the health
status of mechanical equipment and ensure its safe operation. With the advent of the “big
data” era, it has become an inevitable tendency to choose different deep network models to
improve the ability of data processing and classify faults. Meanwhile, in order to improve the
generalization performances of fault diagnosis methods in different diagnosis scenarios, some
fault diagnosis algorithms based on deep transfer learning have been developed. This paper
introduces the concepts of deep transfer learning and explains the investigation motive. The
advent in intelligent fault diagnosis of instances-based deep transfer learning, network-based
deep transfer learning, mapping based deep transfer learning and adversarial-based deep
transfer learning in recent years are summarized. Finally, we discuss the existing problems
and development trend of deep transfer learning for intelligent fault diagnosis. This research
has a positive significance for utilising deep transfer learning method in mechanical fault
diagnosis.
1 Introduction
With the expansion of the scale of fault diagnosis objects, the increasing numbers of test
points installed on each equipment, the appearance of higher frequency of data sampling,
and the exponentially explosion of fault data obtained, the field of fault diagnosis has been
pushed into the "big data" era [1–3]. However, faced with such a large number of fault data,
the traditional fault diagnosis method is difficult to effectively extract deep information to
B Quansheng Jiang
qschiang@163.com
B Qingkui Zhang
zhangqk@usts.edu.cn
1 School of Mechanical Engineering, Suzhou University of Science and Technology, Suzhou 215009, PR
China
123
2510 C. Qian et al.
(a)
(b)
(c)
diagnose large equipment. Therefore, stronger requirements are put forward for the real-time
online fault detection of large-scale mechanical equipment in the production process [4, 5].
In order to solve the various challenges in fault diagnosis, researchers have developed many
different diagnosis methods, which have achieved a lot of good results. As shown in Fig. 1,
according to the different mechanisms of these methods, fault diagnosis can be divided into
three categories: model based diagnosis methods, signal processing based diagnosis methods
and data-driven methods. In order to learn the mechanism of the occurrence and change with
fault characteristics in different mechanical systems and operating conditions, the model-
based method is used to analyze the faults [6, 7]. Another kind of fault diagnosis research
based on signal processing mainly uses time–frequency domain analysis [8–11], wavelet
packet transform [12–15], envelope spectrum analysis [16, 17], high-order statistics analysis
and other technologies [18–21]. By analyzing and solving the signal, this method can reduce
the noise and interference, enhance the relevant signal features, extract the corresponding
fault frequency, and achieve efficient fault diagnosis. The fault diagnosis methods based
on data-driven technology do not need to establish a complex mathematical model, but
construct a feature extraction model and classifier, and use a large number of historical data
collected from the fault mechanical equipment to adapt the optimization model. This kind of
diagnosis method is suitable for the fault diagnosis of complex machinery as a result of simple
operation and good effects. Among these, inspired by the brain like cognitive mechanism,
deep learning establishes a multiple hidden layer network to learn the representation of
features and hidden structures from the input data [22–24]. Compared with other intelligent
fault diagnosis methods, the fault diagnosis methods based on deep learning meets the needs
of adaptive processing of massive high-dimensional fault data. Therefore, it has become one
of the most potential research directions in the field of fault diagnosis.
With the significant development of artificial intelligence and big data analysis technology,
the research on deep learning for fault diagnosis algorithms has achieved outstanding results
[25–27]. However, due to the interference of environmental factors, the coupling of complex
faults, the change of working conditions, the redundancy of fault data and other factors
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2511
in the fault diagnosis process [28], the current intelligent diagnosis algorithms still have
some shortages in actual applications. For example, the current intelligent diagnosis methods
perform excellently when dealing with specific fault diagnosis tasks, but the performance
of the classification will be serious declined when the fault diagnosis tasks change [29].
Moreover, under realistic conditions, it is difficult to collect sufficient and high-quality fault
data for the current intelligent fault diagnosis algorithms to train a reliable model. In addition,
the contradiction between big data and less-labeled samples, the contradiction between big
data and weak computing, the contradiction between universal models and individual needs,
and the needs of specific applications [30, 31] also make the traditional intelligent fault
diagnosis algorithm difficult to achieve people’s expectations when it was applied to the
actual situation. As one of the effective methods to solve these problems, it has become a
active research direction to apply the deep transfer learning in the field of fault diagnosis
[32].
Since the 1995 NIPS seminar "Learning to Learn", transfer learning, as a cross-domain
and cross-task learning method, has attracted more and more attention in the field of machine
learning. Its basic concept is to train a machine learning model with high generalization ability
for individual target training data, and make full use of the knowledge from auxiliary data that
is different from the target domain but related to improve the performance of intelligent fault
diagnosis algorithm in the target task [33, 34]. In the process of solving practical problems,
it is very expensive or even impossible to label data for each domain, but transfer learning
can easily solve the problem of scarcity of target task data labels [35]. Therefore, in order to
improve the feasibility and accuracy of the intelligent diagnosis methods, it is very effective
and necessary to use transfer learning to label the target domain data and transfer the source
domain knowledge.
Feature extraction is the key procedure of intelligent fault diagnosis methods, and its
effectiveness directly determines the accuracy of fault diagnosis results [36]. However, the
current fault diagnosis methods based on transfer learning still rely on traditional artificial
feature extraction methods, which limit the potential of transfer learning in the field of
fault diagnosis. As a breakthrough in the field of modern artificial intelligence, deep learning
can automatically learn useful low-dimensional representations from high-dimensional input
data to replace traditional artificial feature sets and achieve better fault diagnosis effect by
stacking multiple layers of nonlinear information processing modules [37]. On the other
hand, the construction of classifier also directly determines the result of fault classification.
Some researchers combine the traditional classifier and ensemble learning strategy to classify
bearing faults robustly [38–41]. This is not available in the current fault diagnosis algorithms
based on transfer learning. Therefore, how to combine deep learning and migration learning
in the field of fault diagnosis is a promising development direction. Compared with traditional
transfer learning, the advantages of deep transfer learning are mainly reflected in two aspects:
more expressive features and the "end-to-end" transfer process [42]. On the one hand, the
deep network in deep transfer learning extracts information from the input data in a layer-
by-layer learning pattern. Its deep architecture allows it to generate more advanced data
representations. This more effective feature representation can help improve transfer learning
performance. Moreover, this adaptive feature extraction method overcomes the errors caused
by artificial extraction, and does not require the researcher to have corresponding professional
knowledge. On the other hand, the deep network architecture helps transfer learning to achieve
an "end-to-end" process, which is beyond the ability of classic transfer learning [43].
To sum up, deep transfer learning provides a new way to solve the problem of fault
diagnosis. This review divides deep transfer learning methods for intelligent fault diagnosis
123
2512 C. Qian et al.
into four categories: instances-based deep transfer learning, network-based deep transfer
learning, mapping based deep transfer learning and adversarial-based deep transfer learning.
The research is structured as follows: Sect. 2 describes the general situation of the cur-
rent fault diagnosis with deep transfer learning and the investigation motive of this review.
Section 3 covers the application of deep transfer learning in fault diagnosis, the current prob-
lems and commonness of these methods. Section 4 provides the weakness of deep transfer
learning methods for fault diagnosis and future development directions. Finally, conclusions
are given in Sect. 5.
2 Research Motivation
The process of fault diagnosis algorithms based on data driven methods is commonly con-
sisted of the following parts: (1) Obtain data from mechanical equipment; (2) Partition the
gained datasets; (3) Put forward the model of feature extraction model and fault recognition
classifier; (4) Train network model with training datasets; (5) Test the trained model with the
test datasets.
For the traditional intelligent fault diagnosis algorithm, the model trained by sufficient
high-quality fault data can achieve good diagnosis results. But the samples obtained under
the real conditions are difficult to meet the needs of training healthy model, which makes the
traditional intelligent fault diagnosis algorithm degenerate [2]. We attribute this phenomenon
to two problems: small and imbalance (S&I-IFD) data and data distributions change.
In the actual production and manufacturing process, the machinery in the process of
running experiences in a healthy state more than fault state. These results in the long-tailed
distribution of data obtained in different health conditions, that is, the number of failure data
is far less than the number of healthy data. It is difficult for the data samples composed of
unbalanced data to train convincing fault classification model. S&I-IFD is the first problem
that should be considered when fault diagnosis is applied in practice, and it is also one of the
most serious problems that affect the work of fault diagnosis.
The problem of data distributions change can also be called domain adaptation prob-
lem (DA-IFD). During the operation of mechanical equipment, with the change of working
conditions and time, the operation state of mechanical equipment will change, and the char-
acteristics of data will change accordingly, which make the construction of model in IFD
algorithm more difficult, and increase the implementation difficulty of IFD in practical fault
diagnosis tasks [44]. The DA-IFD is an urgent problem in the field of fault diagnosis.
The existence of the above two problems makes the traditional intelligent fault diagnosis
algorithm in the actual tasks into a dilemma. Furthermore, with the change of intelligent
system goals, it will bring about some problems, such as the reduction of application sce-
nario data, the contradiction between task personalization and customization, and insufficient
model robustness, which will also hinder the application of machine learning. Therefore, in
order to extract knowledge from one or more application scenarios to improve the learning
performance in the target scenario, the transfer learning method emerged as the times require.
Unlike machine learning, which focuses on generalizing the commonalities between datasets,
transfer learning aims to generalize the commonalities between different tasks or domains
[45, 46]. Deep transfer learning combines the feature extraction ability of deep architecture
with the characteristics of "knowledge" [47, 48] for transfer learning, which effectively solves
the above problems. This special mechanism gives it a natural advantage when dealing with
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2513
Fig. 2 A typical procedure of the deep transfer learning fault diagnosis approach
small sample problems. At the same time, the process of transferring "knowledge" also helps
it avoid a lot of unnecessary repetitive work in training models.
As show in Fig. 2, deep transfer learning can be divided into four categories: instances-
based deep transfer learning, network-based deep transfer learning, mapping based deep
transfer learning and adversarial-based deep transfer learning. The operating mechanisms
of these four methods are different. The instances-based deep transfer learning is a sample
adjustment strategy, which is suitable for S&I-IFD. When the trained model is not able to
deal with the target task for the number of target samples are insufficient. The instances-based
deep transfer learning increases the number of samples similar to the target domain in the
source domain by data adjustment algorithms or data generation models.
Another method to solve S&I-IFD is the network-based deep transfer learning. It utilizes
the source domain data to pre-train the model. After the pre-training model is completed,
the target domain data is taken to fine tune the network. It is worth mentioning that the
fine-tuning network is based on the target data. This means that the model in this method
has only one branch. And it will not be performed with distribution adjustment operation.
Therefore, it cannot be applied to scenes with large discrepancy in data distribution between
source domain and target domain.
In order to deal with DA-IFD, the mapping based deep transfer learning is proposed.
Researchers believe that the source domain and target domain can be mapping into a high-
dimensional regenerated Hilbert subspace. Then, the distance between two domains can be
calculate by distance measurement formula. Finally, the distribution discrepancy has been
reduced by minimizing the distance. The transfer process usually occurs in the last few layers
of the model. After the feature of fault data is extracted by the model, the domain transfer
layer is used to realize the operation of domain alignment.
Under the influence of generative adversarial networks (GAN), in the adversarial-based
deep transfer learning, the adversarial transfer strategy is embedded to automatically learn the
hermit measurement of source domain and target domain. The feature extractor is considered
123
2514 C. Qian et al.
as the generator and the distribution measurement function is considered as the discriminator.
The distribution discrepancy is reduced by adversarial training.
The process of fault diagnosis algorithms based on deep transfer learning are commonly
consisted of the following parts: (1) Obtain the fault data of the target mechanical equipment;
(2) Select the auxiliary samples related to the domain target task as the source domain data;
(3) Establish a feature extraction modeler and fault classifier; (4) Design the appropriate
transfer strategy according to the target task; (5) Train the model with source domain data
and target domain data; (6) Test the trained model with the target test datasets. In order to
obtain cleaner fault characteristics, some signal processing methods may be used, such as
time–frequency analysis technology, principal component analysis, empirical mode decom-
position, etc. It is precise because the sample data of the source domain is used in the transfer
learning process that these auxiliary data greatly affect the effect of the model. The similarity
between the auxiliary data and the target domain data determines the actual performance of
the trained algorithm model. In addition, combined with specific problems, the appropriate
transfer strategies are embedded the deep model. This is the core of deep transfer intelligent
fault diagnosis algorithms. The current research objects of intelligent fault diagnosis focus
on bearings, gearboxes, motors and other mechanical equipment. The publications about
applications of deep transfer learning to fault diagnosis are listed in Table 1. We introduce
the failure modes corresponding to these methods. At the same time, we summarize the
advantages of these methods.
Nowadays, the research on deep transfer learning for fault diagnosis has achieved fruitful
results, but a review of the latest achievements is still lacking, and the future research direc-
tions are not clear enough. Therefore, we review the applications of deep transfer learning to
solve the problems of data imbalance and domain adaptation in fault diagnosis, and summary
some future perspectives in this paper.
As a weight adjustment strategy, TraAdaBoost [49] aims to balance the effect of source
domain and target domain on classifier, the mechanism of TrAdaBoost is shown in Fig. 3.
After classifying the source samples and target samples, the samples are adjusted by increas-
ing the number of mis-classified samples in the source domain and decreasing the number of
misclassified samples in the target domain. The classification boundary moves to the correct
classification because of this. Finally, IFD methods with TraAdaboost will have good results.
The concrete algorithm is shown in Table 2.
Shen et al. [50] designed similarity threshold between auxiliary data and target domain
data based on singular value decomposition (SVD), changed the weight of TrAdaBoost algo-
rithm in training, so as to obtain more effective auxiliary data to improve the accuracy of
the algorithm. Spectral centroid is an effective index to measure the information of fault
vibration data in the frequency domain. In order to enhance the matching between auxiliary
data and target domain, Shen et al. [51] designed the spectral centroid algorithm to elimi-
nate the inferior samples in the source domain. Chen et al. [52] used recursive quantitative
analysis (RQA) to extract the characteristic parameters of recurrence rate certainty and recur-
sive entropy to describe the dynamic characteristics of vibration time series effectively, and
constructed special collection to improve the transfer effect.
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2515
Instances-based DTL Can solve the problem of Bearings Shen et al. [50], Chen et al.
S&I-IFD [52] and Wu et al. [53, 54]
Independent of specific Gears Shen et al. [51], Qian et al.
deep models [55, 56] and Cao et al.
[67]
Network-based DTL Reduce the computational Bearings Wen et al. [66], Zhang et al.
complexity of the model [68], Yu et al. [70], Sun
Shorten training time et al. [71] and Hasan et al.
Suitable for deeper network [76]
models Motors Shao et al. [69]
The algorithm is simple to
implement Others Zhang et al. [63]
Strong robustness and good
generalization ability
Mapping-based DTL Reduce the difference of data Bearings Ren et al. [72], He et al.
distribution adaptively [73, 75], Guo et al. [86],
Combined with arbitrary Ainapure et al. [87], Yang
deep model et al. [88], Xu et al. [90],
Wang et al. [92],Xu et al.
[93], Han et al. [97] and
Liu et al. [98]
Gears He et al. [74], Li et al. [85]
and Wang et al. [91]
Motors Xiao et al. [89]
Others Sun et al. [94]
Adversarial-based DTL Reduced the distribution Bearings Jin et al. [100], Li et al.
discrepancy of data [101], Li et al. [102], Jiao
adaptively by the principle et al. [103, 105] and Li
of generation confrontation et al. [104]
Fine-grained matching of
faults in different domains
Learning transferable
characteristics better
The most direct way to solve the problem of data imbalance is to increase the number
of transportable auxiliary samples explicitly. The data characteristics of source and target
fields obtained by data mining are used to construct similarity measurement indexes of
samples between domains and to select more suitable samples by combining with Tradboost
technology. These methods effectively solve the problem of data imbalance, but the traditional
manual feature extraction method needs prior information and professional knowledge in the
field, and the extracted features are mostly primary features.
Different from the traditional feature extraction method, in order to generate more similar
auxiliary samples, Wu et al. [53] used the bi-directional long-short term (Bi-LSTM) model to
learn the mapping relationships between the two datasets. And in order to further reduce the
distribution discrepancy between auxiliary data and target data, domain classifiers were used.
The experiment showed that this method has obvious effect in the case of less data markers in
the target area. Based on this, considering the difference of conditional distribution of different
123
2516 C. Qian et al.
health state between failure sample sets, Wu et al. [54] constructed a long-short term memory
recurrent neural network to generate similar auxiliary sample sets. For finding the similar
characteristics of source domain data and target domain data, joint distribution adaptation
(JDA) was used to reduce the discrepancy of conditional probability distribution between
auxiliary data and target data. Because of this, the knowledge in the source domain can be
better assisted in completing the target domain tasks. The results showed that the transfer
effect of auxiliary data with better adaptability is more obvious. Compared with the traditional
machine learning method, this method can construct a large number of high-quality samples
which are suitable for the training of target model adaptively, thus effectively overcoming the
influence of the insufficient number of fault samples. The method has achieved some results
in solving the data imbalance in fault diagnosis, and it still has great potential in the future.
In the procedure of industrial fault diagnosis, the acquisition of data with labels is a problem
that cannot be ignored. For getting high quality data with labels, some traditional intelligent
diagnosis methods expand the datasets by resampling, generative adversarial networks and
computer simulation on the collected data samples. For example, Qian et al. [55, 56], Zhang
et al. [57], Wang et al. [58], Wu et al. [59], Xie et al. [60] others have done relevant work in
this field. Although these methods can also get labeled data, some shortcomings still exist.
The problems of poor diversity and low credibility often exist in the data generation, the
calculation time and cost are too large are ignored.
As a data adjustment strategy, this kind of deep transfer learning can deal with this problem
well because it focuses on the similarity of single data instances and improve the mobility of
two domains on the instance level. The theory of this method is relatively perfect, it is easy
to deduce the generalized upper bound, and the operation is simple, so it is widely used in
fault diagnosis. Most important of all, this method only adjusts the datasets, not for feature
extraction and fault recognition. So, it can be combined with some advanced deep models.
It is worth considering is that, due to the redundancy and high coupling characteristics of
fault data information, only using instance enhancement algorithm cannot meet the needs of
fault diagnosis in engineering, that is, it is unable to obtain the deep information of fault data
to effectively help solve the problem of fault identification. Moreover, the implementation of
this method is based on data sets, which requires a high level of auxiliary datasets. So, the
matching degree of the auxiliary training set and the target domain is also a problem to be
considered.
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2517
Some studies show that neural network is an iterative and continuous abstraction process [61,
62], and the front layer of the network can be regarded as feature extractor, and the primary
feature extracted by this extractor is universal. Based on this, another method to solve the
data imbalance problem is to reuse the shallow network which extracts fault features from
the source task from the perspective of depth model, transfer it to the target task, and then
use a small number of target domain samples to fine tune the classification layer parameters,
as show in Fig. 4. This method greatly saves the time and cost of training the model. At the
same time, the transferred network has the trained parameters, which avoids the increment
of the cost calculation caused by improper setting of the initial parameters.
123
2518 C. Qian et al.
Fault 1
Fault 2
Fault 1
Fault 2
In order to obtain more fault information, Zhang et al. [63] constructed fused frequency
spectrum by superposing multiple groups of frequency spectrums in parallel. The instance
adjustment strategy was used to filter out the source domain data most related to the target data
to train multiple stacking automatic encoder (SAE) models. Finally, an adaptive ensemble
model is constructed according to the results of SVM. Experiments showed that both instance
selection and ensemble learning can improve the fault diagnosis accuracy of the transfer
learning model. Some researchers [64] had carried out further research on ensemble transfer
learning. And their research results showed that they have achieved good results in image
classification tasks. Combined the conventional ideas of ensemble CNNs, Xia et al. [65]
proposed three ensemble TCNNs. At the same time, the transfer learning was introduced to
classify the image tasks.
As a powerful CNN model, VGG-19 has been successfully applied in image classification,
pattern recognition and speech recognition. VGG-19 model construction theory is relatively
complete, and it is feasible to apply VGG-19 to mechanical fault diagnosis. However, com-
pared with the image data and voice data, the mechanical fault data is fewer, which are not
enough to train the deep learning model with deep network. Wen et al. [66] proposed a deep
transfer learning (TranVGG-19) based on pre-training VGG-19 for fault diagnosis. Firstly,
the time domain signal was transformed into RGB image. Then, the pre-trained VGG-19
was used as the feature extractor to extract the features of the image. Finally, a softmax
classifier was trained according to the characteristics of target domain data. In order to deal
with the problem of early diagnosis in gear transmission, Cao et al. [67] proposed a deep
transfer learning method based on CNN, which did not need manual feature extraction and
high number of training samples. Zhang et al. [68] proposed a transfer learning method based
on neural network to improve the performance of fault classification under different work-
ing conditions. Shao et al. [69] proposed a high-precision mechanical system working state
detection framework based on deep CNN, which transformed the original data into images
through wavelet transform to obtain the time–frequency distribution. Then, the pre-training
network was used to extract initial features. The labeled time–frequency image was used to
fine tune the higher level of the neural network structure.
Compared with the existing methods, this method has the advantages of short training
time and high accuracy. Yu et al. [70] used bi-directed long short-term memory (Bi-LSTM)
to adaptively learn bearing fault data, mining temporal characteristics in data, and combined
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2519
with transfer learning to identify faults under multiple working conditions. Compared with
other methods, Bi-LSTM has a good improvement in temporal feature extraction. Thus, it is
very suitable for fault diagnosis of mechanical systems.
In the face of insufficient data labeling in the target domain, Sun et al. [71] proposed
an optimized deep transfer learning algorithm based on the characteristics that stacking
automatic encoder (SAE) can reduce the pressure of sample data labeling during training.
It reduced the complexity of the algorithm and increased the data reconstruction ability and
model robustness. As a variant of the encoder, denoising autoencoder (DAE) can reconstruct
the high-dimensional data by "encoding" and "decoding", to obtain useful information. It is
very effective in processing high-noise data. Based on this, Ren et al. [72] proposed a hybrid
model based on stack DAE and SVR to solve the problem of low efficiency and accuracy
under the condition of high-dimensional noise and small sample data, and reconstructed the
vibration signal through deep DAE to obtain a more robust feature representation. In order to
solve the problem of pre-training model on a small number of labeled source domain models
by combining the advantages of different deep learning models. He et al. [73] proposed
an enhanced convolutional neural network (ECNN) method transferred from convolutional
automatic encoder (CAE) for intelligent fault diagnosis of rotor bearing system.
During the training process of the deep network, the phenomenon of gradient explosion
and gradient disappearance are often occurred. The traditional Sigmoid and Tanh activa-
tion functions will make the gradient disappear and increase the computational complexity.
He et al. [74] proposed a deep transfer multiwavelet-automatic encoder model to solve this
problem. The model utilizes the characteristics of multiwavelet neural network, which shows
faster convergence speed when approaching non-stationary signals, and captures the char-
acteristics of non-stationary signals in vibration signals better. In addition, for obtaining
sufficient information from the collected signals, He et al. [75] proposed an enhanced deep
autoencoder model, in which the activation function is replaced by a scaled exponential linear
unit to improve the quality of the collected data mapping, and non-negative function is used
to modify the constraint term of loss function to improve the effect of data reconstruction.
Compared with the one-dimensional original vibration signal, Spectrum maps obtained by
spectrum analysis technology can easily extract features to detect distributed faults with clear
band edges. For describing the potential information of the fault signal better, Hasan et al. [76]
converted one-dimensional AE signal into a two-dimensional image for migration learning,
which added a new dimension to bearing fault diagnosis. Wen et al. [77] proposed a deep
learning framework that used transfer learning neural networks to achieve high-precision fault
classification. The original sensor data was transformed into image by wavelet transform. The
pre-trained network was used to extract the low-level features, and the labeled time–frequency
images were used to finetune the high-level network structure.
These transfer learning methods based on networks relax the requirements of target domain
by using parameter transfer. In addition, in order to improve the classification effect of IDF,
researchers often use a deeper network model to achieve the requirements of fault diagnosis
accuracy. But limited by the number of layers of the network model, it cannot make a
breakthrough. The network-based transfer methods can effectively solve this problem, and
these methods are simple and reliable.
What bothers us is that these methods are also affected by the distributions of target
samples seriously in the actual diagnosis process. How to obtain the auxiliary dataset similar
to the data distribution of the target domain in the industrial scene is a problem we need to
consider.
123
2520 C. Qian et al.
The instance-based deep transfer learning method and the model-based deep transfer learning
method do not consider the problem of inter-domain adaptation. When the distributions of
the data in source domain and target domain are significantly different, the performances
of some algorithms will degrade, which seriously affects the stability of the intelligent fault
diagnosis algorithm in practical applications [78]. As shown in Fig. 5, mapping based deep
transfer learning is to map the raw data to the constructed high-dimensional feature space,
use the domain adaptation layer to reduce the distance between the source domain and the
target domain, and then learn the low dimensional mapping to meet the objective function.
Metric matrix is the core of mapping based deep transfer learning, different metrics have
different matching effects on data. The existing metrics include maximum mean discrep-
ancy (MMD), Kullback–Leibler divergency (KL divergence) and Wasserstein Divergency,
etc. Among them, MMD is the most widely used metric function. The methods of domain
adaptation using MMD include domain adaptive neural network (DaNN) [79], deep domain
confusion (DDC) [80] and domain adaption network(DAN) [81], etc.
MMD is a kernel learning method to project two domain data into a high dimensional
Hilbert subspace and learn the data distribution difference between them, its calculation
formula is as follows:
1 1
M M D(x S , x T ) ∅(xs ) − 2
∅(xt ) H (1)
|x |
S xs ∈X S |x |
T x ∈X
t T
In Eq. (1), xs ∈ X S represents the data of the source domain, xt ∈ X T represents the
data of the target domain, and the MMD distance is calculated by ∅(·). H represents high
dimensional Hilbert reproducing space.
In order to calculate the MMD, kernel matrix K and L was used in DAN.
KS,S KS,T
K
KT,S KT,,T
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2521
⎧ 1
⎪
⎨ |xS |2 xi , xj XS
Li,j 1
xi , xj XS (2)
⎪
⎩ |xT | 1
2
min min
M M D(x S , x T ) tr (K L) − γ tr (K ) (3)
k≥0 k≥0
where γ ≥ 0 is a trade-off parameter, kernel matrix k can be decompose as follows:
1 1
K KK − 2 K− 2 K (4)
Through a matrix W̃ R (|x S |+|x T |)×m , the distance canbe1 simplified into to
where W K −1/2 W̃ϕ (|x S |+|x T |)×m . Then, the distance can be simplified as:
where γtr W T W is regularization term for W. To minimize the equation, the kernel learning
is simplified as follows:
min W tr W T K L K W + γ tr W T W
s.t. WKHKW Im (7)
Long et al. further improved the generalization ability of deep transfer learning network
for big data by adding probability distribution adaptation layers to the network [82–84].
Experiments showed that the methods are very effective in image classification. Different
from two-dimensional image data, mechanical fault data often has the characteristics of
time correlation, coupling and redundancy. Based on this, Li et al. [85] used particle swarm
optimization (PSO) algorithm and L2 regularization method to optimize the regional adaptive
neural network to deal with the classification problem of early gear pitting fault under multiple
working conditions, to improve the stability and accuracy of mechanical fault diagnosis. Guo
et al. [86] adjusted the data distribution in the source domain by maximizing the domain
recognition error and minimizing the probability distribution. Consequently, the model was
able to serve well both in the source and target domains.
Ainapure et al. [87] used multiple convolution network to automatically extract bearing
data features, and optimized the distribution of learning features by the maximum mean
square error, to achieve high diagnostic accuracy. In order to solve the problem that it is
difficult to train model because the available fault data is insufficient in actual machine fault
diagnosis, Yang et al. [88] designed an intelligent fault diagnosis method based on trans-
fer learning from laboratory bearing to locomotive bearing. Convolution neural network
was used to extract transferable feature from the raw vibration data in data set, and then
multi-layer domain adaptive regularization term and training method were used. Pseudo tags
impose constraints on the learning domain shared CNN parameters to reduce the distribu-
tion differences and the distance between classes of transfer learning features. At the same
time, experiments showed that compared with single-layer domain adaptive method, multi-
layer domain adaptive method can learn the transferable features with smaller distribution
123
2522 C. Qian et al.
difference. This regularization term method used multi-layer domain adaptive method can
significantly improve the accuracy of fault diagnosis, but only uses the weights selected by
BP mode, without considering the selection of different weights and their impact on the
diagnosis performance. Xiao et al. [89] has made an improvement in this aspect, adding
a new optimization objective to reweight these parameters in the objective loss function, to
give corresponding weight to low-level features and high-level features. Different from Xiao,
Xu et al. [90] and Wang et al. [91] used factor analysis to solve the problem of parameter
weighting.
In view of the problem that most unsupervised transfer learning algorithms do not consider
the conditional distribution, resulting in poor robustness, Wang et al. [92] analyzed multi-scale
low-level features to reduce low-level loss and obtained high-level feature expression. Xu et al.
[93] added scale index linear unit activation function to convolution network. The method
adopted a two-branch network structure of weight sharing, including deep feature extraction
network for fusion feature extraction, transfer learning network for domain adaptive problem
and feature recognition network for unbalanced sample to determine feature similarity. In
order to better predict the life of tool, Sun, et al. [94] proposed a deep transfer learning (DTL)
network based on sparse autoencoder (SAE) to predict tool life. The off-line data was used
to train the SAE. And the parameters in the SAE are transferred to the new model. Then,
minimizing the feature of the source domain and target domain through KL divergence.
Finally, the purpose of online prediction of tool life was achieved. Li et al. [95], Lu et al. [96]
regularized the weights of the trained deep model during the transfer process to minimize
the feature distribution mismatch, so as to learn the invariant features of the source domain
and prevent the trained model from overfitting. Han [97] combined the model of JDA and
deep network to complete the adaptive work. Compared with other methods, its convergence
is smoother.
The current deep transfer fault diagnosis methods based on feature mapping adjust the
marginal distribution and condition distribution of the source domain and the target domain
by adding the domain adaptive layer, and the source domain samples can adapt to the target
domain tasks much better, so as to pre-train a healthy model to help complete the target tasks.
Constructing an effective domain adaptive layer to reduce the distance between the source
domain and the target domain is the core of the method. This domain alignment method
improves the transfer learning at the sample set distribution level, which greatly improves
the performance of fault diagnosis.
Domain alignment method can solve the problem of inconsistent data distribution, but
it still has some problems as follows. Firstly, how to choose the best distance measure to
minimize the distribution difference is a problem to be studied when domain adaptation is
in progress. Secondly, it is not enough to use distance measurement method to shorten the
marginal and conditional distribution of datasets. It is necessary to consider the influence of
the weight of conditional distribution and marginal distribution on mechanical equipment
fault diagnosis, as well as whether the fault data within class adaptation is effective.
Some researchers add the generative adversarial networks to the deep network, which is the
influence of the change of learning weight from adaptation to the effect of domain adaptation.
Generative adversarial networks [98] is one of the hottest directions in the field of artificial
intelligence. It consists of two parts: generator and discriminator. Through the game between
generator and discriminator, it completes the confrontation training to obtain enough similar
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2523
samples with the target domain. It is based on this idea that adversarial training is used to
solve the problem of sample feature change in fault diagnosis. Its process is shown in Fig. 6.
The features of source domain data are extracted as the input of domain discriminator,
and the source features extractor and the source feature space are fixed. Then, obtaining the
target feature as another input of discriminator. Thereafter, discriminator learns to determine
which feature space the inputs belong to. domain-adversarial neural network (DANN) [99] is
a typical method. Its loss function consists of two parts: (1) Classifying loss items in source
domain; (2) Classifying loss items in source domain and target domain.
The source domain classification loss items are as follows:
1 i
n
min L y (W, b, V, c) + λ · R(W, b) (8)
W,b,V,c n
i1
In Eq. (9), (W, b) is a matrix pair, (u, z) is a vector scalar pair, L d can be expressed as:
1 1
L d G d G f (xi ) , di di log + (1 − di ) log (10)
G d G f (xi ) G d G f (xi )
In Eq. (10), G d is a domain regressor, G f is a hidden layer, and d i annotates the binary
variable (domain label) of the ith sample, which indicates that xi is from the source distribution
di 0 or target distribution di 1.
The final objective function of DANN is as follows:
⎛ ⎞
1 i 1 i 1 i
n n N
E(W, V, b, c, u, z) Ly (W, b, V, c) − λ⎝ Ld (W, b, u, z) + Ld (W, b, u, z)⎠ (11)
n n n
i1 i1 in+1
Applying fault diagnosis methods to industrial scenarios, one of the most common prob-
lems is that the fault data characteristics of mechanical equipment are time-varying, which
Source domain
Parameter
transfer
Target domain
123
2524 C. Qian et al.
leads to the poor classification performance of the trained model. Jin et al. [100] attributed
the problem of bearing state recognition under variable working conditions to domain adap-
tive problem. After extracting the features in fault signal, domain classifier was used to train
with the way of generative adversarial to obtain the target domain extractor parameters suit-
able for the source domain extractor. In order to improve the speed of model training, Li
et al. [101] used learning rate grid search strategy and pretraining steps to quickly reach
the optimal equilibrium point and accelerate the convergence speed of the model. Li et al.
[102] adjusted the weight of transferable features when the target domain label is aligned
with the source domain label to enhance the positive transfer of shared categories and ignore
the source outliers. With the increment of the number of layers, deep convolution network
will be exposed to degradation, gradient disappearance and explosion problems. In order to
solve these problems, Jiao et al. [103] combined residual network with generative adversarial
network for more effective intelligent fault diagnosis.
At the same time, in order to improve the effect of domain adaptation, joint maximum
mean square error was introduced to reduce the distance between source domain and target
domain. In order to obtain more effective feature representation, Li et al. [104] proposed an
antagonistic transfer learning method based on stacked automatic coding method to solve fault
classification under different working conditions. Jiao et al. [105] used Wasserstein distance
to measure the difference of classifiers, considered the underlying geometric properties of
probability space, and could share the supported distribution measures.
Different from the way of embedding measurement function in the network to reduce
the difference between domains in the feature mapping method, the depth migration method
based on GAN use discriminator to train the target domain feature extractor, which reduces the
influence of data on the training parameters to a certain extent and enhances the robustness of
the fault diagnosis algorithm. These adversarial-based deep transfer learning methods for fault
diagnosis train the source domain and target domain through "generator" and "discriminator"
to achieve the effect of inter domain and intra class adaptation. From the previous research, this
kind of method has the best data feature matching effect, but with the addition of generation
countermeasure network, the computing time is greatly extended.
Instance adjustment strategy and parameter transfer strategy can be used to deal with S&I-
IFD. Feature mapping strategy and adversarial transfer strategy can be used to deal with
DA-IFD. Among them, running mechanism of the four deep transfer learning methods are
different from each other. It means that these kinds of transfer strategies can be combined
with each other. The intelligent fault diagnosis algorithm combined with multiple transfer
strategies is suitable for research on fault diagnosis for more complex application scenarios.
4.1 The Weakness of Deep Transfer Learning Methods for Fault Diagnosis
In the field of fault diagnosis, the research on deep transfer learning fault diagnosis algorithms
have achieved some results. However, there are some problems in this aspect:
From the perspective of samples, in the process of deep transfer learning for fault diagnosis,
it is necessary to obtain sufficient source domain samples with high similarity to the target
data. At present, most of the current researched on mechanical equipment fault diagnosis
are based on the same signal pattern between source domain data and target domain data. In
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2525
practical application, in order to better study the fault diagnosis of mechanical equipment, it is
important to obtain the fault signal with obvious features. Therefore, researchers must select
appropriate sensors to collect fault signals. Sometimes, the signal mode between the target
domain data and the source domain data collected by the sensor are different. The input data
with different modes will inevitably lead to inconsistent and asymmetric expression. This
makes it difficult to obtain and transfer knowledge from fault data under different working
conditions. The difference of signal modes will bring difficulties to the training of intelligent
fault diagnosis model. At present, the research based on such problems has not been fully
considered. Deep transfer learning can utilize auxiliary samples to help complete the target
task, but its transfer effect is limited to a certain extent by the distribution discrepancy
between the target domain samples and the source domain samples. In other words, for
getting a positive effect on the target tasks after the transfer process, the degree of similarity
between the source domain data and the target domain data must exceed a certain threshold.
Otherwise, it will lead to the problem of negative transfer. Some signal processing methods
are used to eliminate the differences between data in different domains and screen out more
universal fault features, which is conducive to further improve the performance of migration
learning fault diagnosis algorithm.
From the perspective of application scenario, the fault type generated in the operation of
mechanical equipment is unknown under real conditions. During the operation of mechanical
equipment, the relationship between the collected data and the health status of mechanical
equipment is difficult observed. At the same time, the uncertainty of mechanical equipment
operation makes it impossible for us to obtain the complete fault data and labels of the
diagnosed object immediately, so as to help us complete the fault diagnosis task. This unsu-
pervised fault classification task is unavoidable. However, in some deep transfer learning
algorithms, it is very important to obtain the label information of the target domain data. Suf-
ficient label information can improve the classification accuracy of the model. When the label
information of the target domain is lack, the trained intelligent diagnosis models are often
unable to achieve good classification results. For unsupervised conditions, it is necessary to
develop an effective deep transfer fault diagnosis algorithm. In addition, in the actual indus-
trial production process, the fault of mechanical equipment has the characteristics of high
coupling. The existing deep transfer fault diagnosis algorithms perform well in the samples
with single fault characteristics, but once they are applied to solve the classification tasks of
multi coupled complex faults, the classification effect is uncertain.
From the perspective of model, different deep learning models have different advantages
and disadvantages. Explore the working principle of different model and combining them
reasonably in the study of specific fault diagnosis problems can improve the feature extraction
performance and fault classification accuracy of the model. This is an important work to
develop intelligent fault diagnosis algorithm based on deep transfer learning. Training a
model satisfied the data distribution of source domain and target domain is the goal of deep
transfer learning, but it is difficult to obtain. Therefore, it is necessary to analyze the error
upper bound of the model in the target domain and design a general model more suitable for
transfer learning. However, similar to deep learning, deep transfer learning also has a "black
box" phenomenon, and the interpretability of the model is poor. This leads to the problems
of security and algorithm reliability in some specific mechanical equipment fault diagnosis
tasks. Consequently, when deep transfer learning algorithms are innovated, it is necessary
to provide an interpretable basis for the transparency of "black box" phenomenon. Some
researchers have studied the interpretability of deep network and achieved some results. For
instance, the mechanism of conceptual vernacular (CW) was introduced to understand the
process of network layer by layer learning [106]. The traditional network was modified into
123
2526 C. Qian et al.
an understandable CNN. By this means, Dai et al. [107] analyzed the high-level information
coding of CNN and further evaluated the decision-making of the model. These studies have
achieved some achievements. However, the interpretability of deep learning is still a problem
worthy of study.
Around deep transfer learning, there are several future development directions in the field of
fault diagnosis.
(1) Developing better metric distance for domain adaptation. Auxiliary datasets with more
similar distribution can improve the effect of deep transfer learning method. It is very
important to calculate the similarity of datasets for choosing the appropriate fault sam-
ples. Therefore, it is necessary to establish an effective similarity evaluation index to
screen datasets. At the level of data adaptation, the distance metric is essential. Although
there were some loss functions to measure distance, such as maximum mean difference
(MMD), joint distribution adaptation (JDA) and correlation comparison (Coral), when
these loss functions are used for domain adaptation, there is still the problem of imbal-
anced category matching in the domain. Therefore, it is a research direction to develop
more effective distance metric.
(2) Developing fault diagnosis across signal modes. Some researches choose to use prepro-
cessing and manual feature extraction to unify signal mode and dimension. This signal
processing method will cause the defect of signal feature, which could cause the most
effective acquisition of fault data feature information in diagnosis. Signal analysis of
cross modal data is an important research direction of applying transfer learning to solve
practical fault diagnosis tasks. Therefore, it is necessary to develop an effective deep
transfer learning method for cross signal mode fault diagnosis
(3) Developing fault diagnosis based on transfer-reinforcement learning. Considering the
absence of labels, it has practical significance to develop some unsupervised learning
algorithms for different mechanical equipment. In recent years, as one of the paradigms
and methodologies of machine learning, reinforcement learning (RL) has been widely
used in different fields. It continuously interacts with the environment to learn strate-
gies and achieve specific goals by maximizing rewards. Especially, it has no specific
requirements for the label of samples. It is very potential to combine transfer learning
and reinforcement learning to solve the problem of less label in practical fault diagnosis.
(4) Building a better deep learning hybrid model to adapt to transfer learning. The processing
mechanism of different deep learning methods to extract data features is different, so
the applicable fault data samples are also different. Some of the proposed diagnostic
methods only use one model. These models have limited ability of feature extraction and
poor performance in dealing with complex samples. However, in reality, the mechanical
equipment in the workshop is affected by the environment, working conditions and
human factors, and the data samples collected by sensors are of high complexity. A
single deep learning model cannot meet our needs. Therefore, for different application
scenarios, considering the complementary advantages of different deep models, building
a better hybrid model to help the immutable features in the migration data is an important
work of intelligent fault diagnosis algorithm based on depth migration learning.
(5) Establishing the database of fault diagnosis with evaluation significance. The effective-
ness of the intelligent diagnosis algorithm proposed by the researchers can be reflected
in the fault data set. Generally speaking, the better the performance in the experiment,
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2527
the more significant the effectiveness of the algorithm. The data sets used in the research
are various. Different data sets have different experimental conditions, which leads to the
uneven transfer task levels between the data sets. There is no uniform index to evaluate
the performance of different algorithms. We need to develop some public datasets with
evaluation significance to evaluate the proposed algorithm based on transfer learning.
5 Conclusions
With the deepening of research on intelligent fault diagnosis approaches, deep transfer learn-
ing methods have attracted widespread attention in the academia and industry. Based on the
feature extraction ability and knowledge transfer characteristics of deep transfer learning, it
can handle typical problems such as small samples, variable working conditions, and envi-
ronmental noise interference in the field of mechanical fault diagnosis. At the same time,
the "end-to-end" learning process of deep transfer learning greatly reduces the complexity of
fault diagnosis algorithms, and meets people’s application requirements for intelligent fault
diagnosis algorithms. Firstly, this article introduces the basic knowledge of deep transfer
learning. Secondly, we summarize its latest progress in the field of fault diagnosis. Then, we
analyze the current research status of deep transfer learning in the field of fault diagnosis.
Finally, we discuss the current problems and development trends in the field of diagnosis. At
present, the application of deep transfer learning in future fault diagnosis has shown great
potential and is worthy of further research.
Acknowledgements This research was supported by the National Natural Science Foundation of China (Grant
No. 51975394), the Natural Science Foundation of Jiangsu Province (No. BK20211336) and Postgraduate
Research & Practice Innovation Program of Jiangsu Province (Grant No. KYCX20_2752). The author would
appreciate the anonymous reviewers and the editor for their valuable comments.
References
1. Lei Y, Jia F, Kong D, Lin J, Xing S (2018) Opportunities and challenges of machinery intelligent fault
diagnosis in big data era. J Mech Eng 54(05):94–104
2. Hoang D, Kang H (2019) A survey on deep learning based bearing fault diagnosis. Neurocomputing
335:327–335
3. Wang Y, Wang Z (2021) Data-driven model-free adaptive fault-tolerant control for a class of discrete-time
systems. IEEE Trans Circuits Syst II: Express Briefs 1–1
4. Zhang QH, Qin A, Lei S (2015) Vibration sensor based intelligent fault diagnosis system for large
machine unit in petrochemical industries. Int J Distrib Sens Netw 13:239405
5. Sun G, Zhang Q, Shao L (2013) The build of a new non-dimensional indicator for fault diagnosis in
rotating machinery. Int J Wirel Mobile Comput 6(3):271–276
6. Zhang H, Ma J, Li X, Xiao S, Gu F, Ball A (2021) Fluid-asperity interaction induced random vibration
of hydrodynamic journal bearings towards early fault diagnosis of abrasive wear. Tribol Int 160:107028
7. Liu R, Jing L, Meng X, Lyu B (2021) Mixed elastohydrodynamic analysis of a coupled journal-thrust
bearing system in a rotary compressor under high ambient pressure. Tribol Int 159:1–18
8. Zhang X, Liu Z, Wang J, Wang J (2019) Time-frequency analysis for bearing fault diagnosis using
multiple q-factor gabor wavelets. ISA Trans 87:225–234
9. Feng Z, Zhu W, Zhang D (2019) Time-Frequency demodulation analysis via vold-kalman filter for
wind turbine planetary gearbox fault diagnosis under nonstationary speeds. Mech Syst Signal Process
128:93–109
10. Chen X, Feng Z (2019) Time-frequency space vector modulus analysis of motor current for planetary
gearbox fault diagnosis under variable speed conditions. Mech Syst Signal Process 121:636–654
11. Sun R, Yang Z, Chen X, Tian S, Xie Y (2018) Gear fault diagnosis based on the structured sparsity
time-frequency analysis. Mech Syst Signal Process 102:346–363
123
2528 C. Qian et al.
12. Wang L, Liu Z, Cao H, Zhang X (2020) Subband averaging kurtogram with dual-tree complex wavelet
packet transform for rotating machinery fault diagnosis. Mech Syst Signal Process 142:106755
13. Zhou R, Bao W, Li N, Huang X, Yu D (2010) Mechanical equipment fault diagnosis based on redundant
second generation wavelet packet transform. Digital Signal Process 20(1):276–288
14. Rajeswari C, Sathiyabhama B, Devendiran S (2014) Bearing fault diagnosis using wavelet packet trans-
form, hybrid pso and support vector machine. Procedia Eng 97:1772–1783
15. Qu J, Zhang Z, Gong T (2016) A novel intelligent method for mechanical fault diagnosis based on dual-
tree complex wavelet packet transform and multiple classifier fusion. Neurocomputing 171:837–853
16. Feng Z, Chen X, Liang M (2016) Joint envelope and frequency order spectrum analysis based on iterative
generalized demodulation for planetary gearbox fault diagnosis under nonstationary conditions. Mech
Syst Signal Process 76–77:242–264
17. Dibaj A, Hassannejad R, Ettefagh M, Ehghaghi M (2020) Incipient fault diagnosis of bearings based
on parameter-optimized VMD and envelope spectrum weighted kurtosis index with a new sensitivity
assessment threshold. ISA Trans 114:413–433
18. Li L, Yao L, Wang H, Gao Z (2021) Iterative learning fault diagnosis and fault tolerant control for
stochastic repetitive systems with Brownian motion. ISA Trans
19. Zhou Y, Yan S, Ren Y, Liu S (2021) Rolling bearing fault diagnosis using transient-extracting transform
and linear discriminant analysis. Measurement 178:109298
20. Chen B, Song D, Zhang W, Cheng Y, Wang Z (2021) A performance enhanced time-varying morpho-
logical filtering method for bearing fault diagnosis. Measurement 176:109163
21. Tiwari P, Upadhyay S (2021) Novel self-adaptive vibration signal analysis: concealed component decom-
position and its application in bearing fault diagnosis. J Sound Vib 502:116079
22. Zhao K, Shao HD (2020) Intelligent fault diagnosis of rolling bearing using adaptive deep gated recurrent
unit. Neural Process Lett 51(1):1165–1184
23. Wang FT, Liu XF, Deng G, Yu XG, Li HK (2019) Remaining life prediction method for rolling bearing
based on the long short-term memory network. Neural Process Lett 50:2437–2454
24. Pandey SK, Janghel RR (2019) Recent deep learning techniques, challenges and its applications for
medical healthcare system: a review. Neural Process Lett 50:1907–1935
25. Wen C, Feiya LV (2020) Review on deep learning based fault diagnosis. Acta Electron Sin
42(01):234–248
26. Tan G, Wang Z (2021) Reachable set estimation of delayed Markovian jump neural networks based on
an improved reciprocally convex inequality. IEEE Trans Neural Netw Learn Syst 1–6
27. Tan G, Wang Z, Shi Z (2021) Proportional-integral state estimator for quaternion-valued neural networks
with time-varying delays. IEEE Trans Neural Netw Learn Syst 23:2162–2388
28. Chen Z, Chen X, José, Li C (2019) Application of deep learning in equipment prognostics and health
management. Acta Instrum Sin 40(09):206–226
29. Weiss K, Khoshgoftaar T, Wang D (2016) A survey of transfer learning. J Big Data 3(1):1–40
30. Li C, Zhang S, Qin Y, Estupinan E (2020) A systematic review of deep transfer learning for machinery
fault diagnosis. Neurocomputing 407:121–135
31. Zhang T, Chen J, Li F (2021) Intelligent fault diagnosis of machines with small & imbalanced data: A
state-of-the-art review and possible extensions. ISA Trans 119:152–171
32. Lei Y, Yang B, Jiang X (2020) Applications of machine learning to machine fault diagnosis: a review
and roadmap. Mech Syst Signal Process 138:106587
33. Zhuang F, Luo P, Qing H, Shi Z (2015) Survey on transfer learning research. Acta Softw Sin 26(01):26–39
34. Zhuang F, Qi Z, Duan K (2020) A comprehensive survey on transfer learning. Proc IEEE 99:1–34
35. Mao W, Feng W (2021) A new deep auto-encoder method with fusing discriminant information for
bearing fault diagnosis. Mech Syst Signal Process 150(12):107233
36. Li X, Jiang H, Niu M (2020) An enhanced selective ensemble deep learning method for rolling bearing
fault diagnosis with beetle antennae search algorithm. Mech Syst Signal Process 142:106752
37. Zhao M, Kang M, Tang B (2018) Deep residual networks with dynamically weighted wavelet coefficients
for fault diagnosis of planetary gearboxes. IEEE Trans Ind Electron 65(5):4290–4300
38. Cococcioni M, Lazzerini B, Volpi S (2013) Robust diagnosis of rolling element bearings based on
classification techniques. IEEE Trans Ind Inf 9(4):2256–2263
39. Xia S, Chen B, Wang G (2021) mCRF and mRD: two classification methods cased on a novel multiclass
label noise filtering learning framework. IEEE Trans Neural Netw Learn Syst 99:1–15
40. Wang Z, Huang H, Wang Y (2020) Fault diagnosis of planetary gearbox using multi-criteria feature
selection and heterogeneous ensemble learning classification. Measurement 173(5):108654
41. Xia S, Wang G, Chen Z, Duan Y, Liu Q (2018) Complete random forest based class noise filtering
learning for improving the generalizability of classifiers. IEEE Trans Knowl Data Eng 31:1–1
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2529
42. Qin Y, Wang X, Zou J (2019) The optimized deep belief networks with improved logistic sigmoid units
and their application in fault diagnosis for planetary gearboxes of wind turbines. IEEE Trans Ind Electron
66(5):3814–3824
43. Zuo L, Jing M, Li J (2020) Challenging tough samples in unsupervised domain adaptation. Pattern
Recognit 110:107540
44. Wang CJ, Xu ZL (2021) An intelligent fault diagnosis model based on deep neural network for few-shot
fault diagnosis. Neurocomputing 0925-2312
45. Che CC, Wang HW, Ni XM, Fu Q (2020) Domain adaptive deep belief network for rolling bearing fault
diagnosis. Comput Ind Eng 143:106427
46. Souza RM, Nascimento EG, Miranda UA, Silva WJ, Lepikson HA (2021) Deep learning for diagnosis
and classification of faults in industrial rotating machinery. Comput Ind Eng 153:107060
47. Chen S, Ge H, Li H (2021) Hierarchical deep convolution neural networks based on transfer learning
for transformer rectifier unit fault diagnosis. Measurement 167:108257
48. Li F, Tang TJ, He QY (2021) Deep convolution domain-adversarial transfer learning for fault diagnosis
of rolling bearings. Measurement 169:108339
49. Dai W, Yang Q, Xue G (2007) Boosting for transfer learning. In: Proceedings of the 24th international
conference on Machine learning, pp 193–200
50. Shen F, Chen C, Yan R (2017) Application of SVD and transfer learning strategy on motor fault diagnosis.
J Vib Eng 30(01):118–126
51. Shen F, Chen C, Xu J, Yan R (2019) Application of spectral centroid transfer in bearing fault diagnosis
under varying working conditions. Acta Instrum Sin 40(05):99–108
52. Chen C, Shen F, Yan R (2017) Enhanced least squares support vector machine-based transfer learning
strategy for bearing fault diagnosis. Acta Instrum Sin 38(01):33–40
53. Wu Z, Jiang H, Lu T, Zhao K (2020) A deep transfer maximum classifier discrepancy method for rolling
bearing fault diagnosis under few labeled data. Knowl Based Syst 196:105814
54. Wu K, Jiang H, Zhao K, Li X (2020) An adaptive deep transfer learning method for bearing fault
diagnosis. Measurement 151:107227
55. Qian W, Li S, Jiang X (2019) Deep transfer network for rotating machine fault analysis. Pattern Recognit
96:106993
56. Qian W, Li S, Yi P, Zhang K (2019) A novel transfer learning method for robust fault diagnosis of
rotating machines under variable working conditions. Measurement 138:514–525
57. Zhang W, Li X, Jia X, Ma H, Luo Z, Li X (2019) Machinery fault diagnosis with imbalanced data using
deep generative adversarial networks. Measurement 152:107377
58. Wang W, Sun C, Wang L, Chen B (2020) Research on Fault diagnosis technology of planetary gearbox
using deep learning generative. Mech Sci Technol 39(01):117–123
59. Wu C, Feng F, Wu S, Chen T, Jiang P (2019) An effective method for imbalanced sample generation
and its application in fault diagnosis of planetary gearbox. Acta Ordnance Eng 40(07):1349–1357
60. Xie P, Zhang Z (2001) A new approach to conform feature samples for fault diagnosis classifiers. Syst
Eng Electron 23(11):35–35
61. Liu R, Yang B, Zio E, Chen X (2018) Artificial intelligence for fault diagnosis of rotating machinery: a
review. Mech Syst Signal Process 108:33–47
62. Wang J, Ma Y, Zhang L, Gao R, Wu D (2018) Deep learning for smart manufacturing: methods and
applications. J Manuf Syst 48:144–156
63. Zhang L, Guo L, Gao H, Dong D, Fu G, Hong X (2020) Instance-based ensemble deep transfer learning
network: a new intelligent degradation recognition method and its application on ball screw. Mech Syst
Signal Process 140:106681
64. Murugan K, Kishore G, Inam (2021) Hyperspectral image classification using ensemble transfer learning.
J Phys Conf Ser 1916(1):012082
65. Xia S, Xia Y, Yu H (2019) Transferring ensemble representations using deep convolutional neural
networks for small-scale image classification. IEEE Access 7:168175–168186
66. Wen L, Li X, Li X, Gao L (2019) A new transfer learning based on VGG-19 network for fault diagnosis.
In: Proceedings of IEEE 23rd international conference on computer supported cooperative work in
design (CSCWD), Porto, Portugal, pp 205–209
67. Cao P, Zhang S, Tang J (2018) Preprocessing-free gear fault diagnosis using small datasets with deep
convolutional neural network-based transfer learning. IEEE Access 6:26241–26253
68. Zhang R, Tao H, Wu L, Guan Y (2017) Transfer learning with neural networks for bearing fault diagnosis
in changing working conditions. IEEE Access 5:14347–14357
69. Shao S, Stephen M, Yan R (2019) Highly accurate machine fault diagnosis using deep transfer learning.
IEEE Trans Ind Inf 15(4):2446–2455
123
2530 C. Qian et al.
70. Yu Y, He M, Liu B, Chen C (2019) Research on acoustic emission signal recognition of bearing fault
based on TL-LSTM. Chin J Sci Instrum 40(05):51–59
71. Sun M, Wang H, Liu P, Huang S, Fan P (2019) A sparse stacked denoising autoencoder with optimized
transfer learning applied to the fault diagnosis of rolling bearings. Measurement 146:305–314
72. Ren J, Hu X, Zhu F (2017) Effectiveness prediction of weapon equipment system-of-systems based on
deep learning feature transfer. Syst Eng Electron Technol 39(12):2745–2749
73. He Z, Shao H, Zhong X, Yang Y, Cheng J (2020) An intelligent fault diagnosis method for rotor-bearing
system using small labeled infrared thermal images and enhanced CNN transferred from CAE. Adv Eng
Inform 46:101150
74. He Z, Shao H, Wang P (2020) Deep transfer multi-wavelet auto-encoder for intelligent fault diagnosis
of gearbox with few target training samples. Knowl Based Syst 191:105313
75. He Z, Shao H, Jing L, Cheng J, Yang Y (2020) Transfer fault diagnosis of bearing installed in different
machines using enhanced deep auto-encoder. Measurement 152:107393
76. Hasan M, Islam M, Kim J (2019) Acoustic spectral imaging and transfer learning for reliable bearing
fault diagnosis under variable speed conditions. Measurement 138:620–631
77. Wen L, Gao L, Dong Y, Zhu Z (2019) A negative correlation ensemble transfer learning method for fault
diagnosis based on convolutional neural network. Math Bioences Eng 16(5):3311–3330
78. Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153
79. Ghifary M, Kleijn W, Zhang M (2014) Domain adaptive neural networks for object recognition. In:
Pacific rim international conference on artificial intelligence. Springer, pp 898–904
80. Tzeng E, Hoffma J, Zhang N (2014) Deep domain confusion: maximizing for domain invariance. arXiv:
1412.3474
81. Long M, Wang J (2015) Learning transferable features with deep adaptation networks. In: International
conference on machine learning (ICML)
82. Long M, Cao Y, Cao Z, Wang J, Jordan I (2019) Transferable representation learning with deep adaptation
Networks. IEEE Trans Pattern Anal Mach Intell 41(12):3071–3085
83. Long M, Wang J, Cao Y, Sun J, Philip S (2016) Deep learning of transferable representation for scalable
domain adaptation. IEEE Trans Knowl Data Eng 28(8):2027–2040
84. Long M, Wang J, Jordan M (2016) Deep transfer learning with joint adaptation networks. In: ICML, pp
2208–2217
85. Li J, Li X, He D, Qu Y (2020) A domain adaptation model for early gear pitting fault diagnosis based
on deep transfer learning network. Proc Inst Mech Eng Part O J Risk Reliab 234(1):168–182
86. Guo L, Lei Y, Xing S (2019) Deep convolutional transfer learning network: a new method for intelligent
fault diagnosis of machines with unlabeled data. IEEE Trans Ind Electron 66(9):7316–7325
87. Ainapure A, Li X, Singh J (2020) Enhancing intelligent cross-domain fault diagnosis performance on
rotating machines with noisy health labels. Procedia Manuf 48:940–946
88. Yang B, Lei Y, Jia F, Xing S (2019) An intelligent fault diagnosis approach based on transfer learning
from laboratory bearings to locomotive bearings. Mech Syst Signal Process 122:692–706
89. Xiao D, Huang Y, Zhao L (2019) Domain adaptive motor fault diagnosis using deep transfer learning.
IEEE Access 7:80937–80949
90. Xu J, Huang J, Zhao Y, Zhou L (2020) A robust intelligent fault diagnosis method for rolling bearings
based on deep convolutional neural network and domain adaptation. Procedia Comput Sci 174:400–405
91. Wang J, Xie J, Zhang L (2016) A factor analysis based transfer learning method for gearbox diagnosis
under various operating conditions. In: International symposium on flexible automation. IEEE, pp 81–86
92. Wang X, Shen C, Xia M (2020) Multi-scale deep intra-class transfer learning for bearing fault diagnosis.
Reliab Eng Syst Saf 202:107050
93. Xu K, Li S, Wang J (2019) A novel convolutional transfer feature discrimination network for imbalanced
fault diagnosis under variable rotational speed. Meas Sci Technol 30(10):105107
94. Sun C, Ma M, Zhao Z (2018) Deep transfer learning based on sparse auto-encoder for remaining useful
life prediction of tool in manufacturing. IEEE Trans Ind Inf 15(4):2416–2425
95. Li W, Liang Y (2020) Deep transfer learning based diagnosis for machining process lifecycle. Procedia
CIRP 90:642–647
96. Lu W, Liang B, Cheng Y (2017) Deep model based domain adaptation for fault diagnosis. IEEE Trans
Ind Electron 64(99):2296–2305
97. Han T, Liu CM, Yang W (2020) Deep transfer network with joint distribution adaptation: a new intelligent
fault diagnosis framework for industry application. ISA Trans 97:269–281
98. Liu SW, Jiang HK, Wu ZH, Li XQ (2021) Rolling bearing fault diagnosis using variational autoencoding
generative adversarial networks with deep regret analysis. Measurement 168:108371
99. Ganin Y, Ustinova E, Ajakan H (2017) Domain-adversarial training of neural networks. J Mach Learn
Res 17(1):189–209
123
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: … 2531
100. Jin Y, Liu X, Yao M, Huang F (2019) Fault diagnosis model of rolling bearing under variable condition
based on domain adversarial migration. Process Autom Instrum 40(12):55–60
101. Li Q, Shen CQ, Chen L, Zhu ZK (2021) Knowledge mapping-based adversarial domain adaptation: a
novel fault diagnosis method with high generalizability under variable working conditions. Mech Syst
Signal Process 147:107095
102. Li X, Zhang W, Ma H, Luo Z, Li X (2020) Partial transfer learning in machinery cross-domain fault
diagnostics using class-weighted adversarial networks. Neural Netw 129:313–322
103. Jiao J, Zhao M, Lin J, Liang K (2020) Residual joint adaptation adversarial network for intelligent
transfer fault diagnosis. Mech Syst Signal Process 145:106962
104. Li J, Huang R, Li W (2020) Intelligent fault diagnosis for bearing dataset using adversarial transfer
learning based on stacked auto-encoder. Procedia Manuf 49:75–80
105. Jiao J, Lin J, Zhao M, Liang K (2020) Double-level adversarial domain adaptation network for intelligent
fault diagnosis. Knowl Based Syst 205:106236
106. Chen Z, Bei Y, Rudin C (2020) Concept whitening for interpretable image recognition. Nat Mach Intell
2(12):772–782
107. Dai D, Tang C, Wang G, Xia S (2021) Building partially understandable convolutional neural networks
by differentiating class-related neural nodes. Neurocomputing 452:169–181
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
123