Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Surfing Time–Motion Characteristics Possible to Gain Using Global Navigation Satellite Systems: A Systematic Review
Previous Article in Journal
Using Inertial Measurement Units to Examine Selected Joint Kinematics in a Road Cycling Sprint: A Series of Single Cases
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Remaining Useful Life Prediction Based on Deep Learning: A Survey

1
School of Information Engineering, Wuhan College, Wuhan 430212, China
2
College of Computer, National University of Defense Technology, Changsha 410073, China
3
National Key Laboratory of Science and Technology on Vessel Integrated Power System, Naval University of Engineering, Wuhan 430033, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(11), 3454; https://doi.org/10.3390/s24113454
Submission received: 21 April 2024 / Revised: 22 May 2024 / Accepted: 24 May 2024 / Published: 27 May 2024
(This article belongs to the Section Fault Diagnosis & Sensors)

Abstract

:
Remaining useful life (RUL) is a metric of health state for essential equipment. It plays a significant role in health management. However, RUL is often random and unknown. One type of physics-based method builds a mathematical model for RUL using prior principles, but this is a tough task in real-world applications. Another type of method estimates RUL from available information through condition and health monitoring; this is known as the data-driven method. Traditional data-driven methods require significant human effort in designing health features to represent performance degradation, yet the prediction accuracy is limited. With breakthroughs in various application scenarios in recent years, deep learning techniques provide new insights into this problem. Over the past few years, deep-learning-based RUL prediction has attracted increasing attention from the academic community. Therefore, it is necessary to conduct a survey on deep-learning-based RUL prediction. To ensure a comprehensive survey, the literature is reviewed from three dimensions. Firstly, a unified framework is proposed for deep-learning-based RUL prediction and the models and approaches in the literature are reviewed under this framework. Secondly, detailed estimation processes are compared from the perspective of different deep learning models. Thirdly, the literature is examined from the perspective of specific problems, such as scenarios where the collected data consist of limited labeled data. Finally, the main challenges and future directions are summarized.

1. Introduction

Remaining useful life (RUL) is the useful life left in an asset at a particular time of operation. It is a crucial technology in health management. Accurate RUL prediction provides instructions for system design, production, and maintenance. For device maintenance, its costs constitute a large portion of the operating and overhead expenses in industries. Hence, both academia and industry are committed to promoting RUL prediction technology for better maintenance strategies.
Generally, there are three strategies for performing maintenance [1,2]. The first strategy is to perform maintenance after the occurrence of failure. Under this strategy, assets experience the least number of maintenance events during their entire life cycle. However, it often shortens the life span of assets due to irreversible damage caused by failure. The second strategy is to perform maintenance periodically, known as time-based maintenance (TBM). This preventive maintenance strategy assumes that the estimated mean time between failures is statistically or experientially known. The third strategy is called condition-based maintenance (CBM). It utilizes component-specific sensors to monitor the health information of a component. A component’s degradation state is evaluated based on deviations from normal running conditions. However, the deviations may not necessarily equate to failure in case of operating condition changes or acceptable degradation levels. By establishing a relationship between the reliability characteristics of a component’s population and sensor signals, a more accurate assessment of the degradation process can be made. Unlike the periodical maintenance of TBM, the CBM strategy aims to perform maintenance at the optimal time point based on collected condition information. Considering the advantages of CBM, it is currently a hot topic in the literature.
When talking about CBM, diagnostics and prognostics are two important aspects that should be distinguished. Diagnostics deals with fault detection, isolation, and identification when abnormalities occur, while prognostics deals with fault and degradation prediction before they occur. As diagnostics has already been widely studied and applied in industry. The prognostics aspect which has not made significant progress is focused on in this survey. It is an active area of research and development in the aerospace, automotive, nuclear, process controls, and national defense fields [3]. Among all prognostic technologies, remaining useful life (RUL) estimation is a key technology, which is also useful in other fields, such as resource recycling [4].
In order to precisely estimate remaining useful life, methods are usually divided into two categories: model-based methods and data-driven methods. Because it is usually a tough task to accurately build a mathematical model for a physical system using prior principles in real-world applications, the data-driven model is currently the main method being researched. It discovers the law of performance degradation from history trajectories and forms the RUL model.
Traditional data-driven RUL prediction methods include least-squares predictor model and autoregressive moving-average (ARMA) model. They usually consist of two steps. Firstly, features are extracted from the input data. Then, these features are mapped to an RUL value. The bottleneck of this kind of method lies in designing suitable health features, which is highly manpower-intensive but has limited prediction accuracy. To address this issue, shallow neural networks [5,6,7] are adopted, but their performance still does not meet the demands of real-world applications. Traditional data-driven RUL prediction methods can be classified into two broad types: models based on the directly observed state processes and those based on unobservable state processes [8]. The former includes the regression, Brownian motion, Gamma processes, and Markovian-based models. The latter includes stochastic-filtering-based models, covariate-based hazard models, and methods based on hidden Markov models (HMM) and hidden semi-Markov models (HSMM). Analysis of traditional data-driven methods reveals four common shortcomings and challenges: (i) the reliance on physics-based models, (ii) data fusion where multidimensional input data must be handled, (iii) modeling the influence of external environmental variables, and (iv) the development of a model that can handle multiple failure modes.
In recent years, deep learning technology has made great progress in image identification, machine translation, autopilot, and other areas. This provides new ideas for RUL prediction. Deep learning technology enables the establishment of end-to-end models from historical data to RUL value, avoiding the complex feature extraction process. Moreover, the working environment of assets whose RUL is to be predicted is usually complex and noisy. The deep learning method is able to handle these problems automatically without field knowledge.
Another significant factor behind this trend is the big data paradigm. Traditional data-driven RUL prediction methods cannot cope with the big data scenario, while deep learning methods are naturally developed for the big data paradigm. When applying deep learning to big data processing, it is able to learn multi-scale, multi-level, and hierarchical representation knowledge. This makes it inherently feasible for data-driven RUL prediction.
Taking a look at the technical state, it is observed that all key aspects are prepared for adopting deep learning methods for RUL prediction. Currently, we are experiencing the fourth industrial revolution, where accurate RUL prediction is an urgent need. A typical characteristic of “Industry 4.0” is intelligentization, which relies on big data, computational capability, and algorithm models. These three aspects are intertwined and develop together, as shown in Figure 1.
In the context of the industrial internet of things or Industry 4.0, massive sensor data collected forms the big data for RUL prediction. Since the 1980s, the ability to store information per capita worldwide has doubled every 40 months [9]. According to a report by IDC, the scale and growth rate of data in the world is shown as the data volume curve in Figure 1. This phenomenon has spawned the fourth paradigm of scientific research, known as data-intensive science in the era of big data [10].
Deep learning has gained a new opportunity for rapid development in this big data context. After several peaks and valleys, the development of deep learning technology has reached a stage of widespread use. It originated from simple models such as the MP model and Hebb’s rule, and the first truly meaningful two-layer neural network perceptron was published in 1958. Benefiting from new discoveries in human visual system, the first deep learning system of a feedforward multi-layer neural network was trained using the Group Method of Data Handling (GMDH) method [11] in 1968. Then, the neural network entered the first “ice age”. In 1982, John Hopefield proposed a recurrent neural network that simulates human memory, restoring the vitality of the neural network to some extent. Four years later, the back-propagation method solved the problem of computational power that caused the first “ice age”, marking the second rise of deep learning [12]. In 2006, Professor Hinton firstly proposed a deep belief network, defining the method of combined learning by multi-layer neural networks as deep learning [13]. This marked the third rise of deep learning. Since then, deep learning has made major breakthroughs in the fields of speech recognition and image recognition, etc.
The history of deep learning reveals that computational complexity is an important factor limiting the development of deep learning. From the high performance computing Top500 (www.top500.org) website, it is observed that we have entered the exascale computing era (E-level) after the terascale (T-level) and petascale (P-level) calculations, as shown in Figure 1. This promotes the development of deep learning. On the other hand, it also provides the basis for big data analysis.
Hence, the problem of RUL prediction is expected to achieve breakthroughs with deep learning methods. Some acronyms used are summarized in Table 1.

2. Related Works

In the literature, there are already several surveys on related works. In 2007, Schwabacher et al. [14] carried out a survey on artificial intelligence methods for prognostics. It was tied to a real-world project on NASA’s on-board Integrated Systems Health Management (ISHM) system. As described above, RUL prediction methods are classified into two broad types: model-based and data-driven methods. In 2010, Si et al. [8] reviewed statistical data-driven approaches. They suggested that past recorded failure data may be scarce because critical assets are not allowed to run to failure. However, a broad sense of any data, which they named CM data, is a more important source of information. They further classified the observed CM data into direct CM and indirect CM data. Keeping this idea in mind, they surveyed direct and indirect CM data-driven approaches, respectively. Liu et al. [15] reviewed artificial intelligence approaches for the fault diagnosis of rotating machinery. Nash et al. [16] surveyed deep learning technology in the study of materials degradation. They classified the detection of degradation into direct and indirect detection. Zhao et al. [17] provided a review of fault diagnosis and prognostic methods based on deep learning technology. They focused on deep neural network (DNN), deep belief network (DBN) and convolutional neural network (CNN) models, and summarized some potential future research issues. Khan et al. [18] presented a systematic review of deep-learning-based health management systems. They focused on three deep learning architectures: auto-encoder, convolutional neural network, and recurrent neural network. Zhao et al. [19] reviewed the application of deep learning technology to machine health monitoring from the perspective of auto-encoder, CNN, and RNN models. Moreover, experiments were conducted to study the performance of these approaches. Surveys that focus on deep-learning-based RUL prediction were conducted by Remadna et al. [20] and Wang et al. [21]. However, a limited number of works in the literature were reviewed in [20], and only lithium-ion batteries are considered in [21].
An overall comparison between related surveys is summarized in Table 2. They are compared in two dimensions of goal and method. The goal dimension is divided into PHM and RUL prediction. The goal of PHM is wider. It contains not only prognostics but also diagnostics. In a PHM system, selected parameters are monitored. Once anomaly is detected, the diagnostic process will be carried out to locate it. Prognosis, however, is a process aimed at predicting and estimating the RUL of a system. In the method dimension, related surveys are classified into three types: TDD, SNN, and DNN. The abbreviation of TDD denotes the traditional data-driven method, while SNN and DNN denote the shallow neural network method and the deep neural network method, respectively. It is important to note that both the SNN method and the DNN method belong to the data-driven category. In this survey, only the DNN method for the RUL prediction problem of prognostics is focused on. Throughout this survey, it is observed that RUL prediction and deep learning technology are both hot research topics. RUL prediction is promoted by industry needs, and deep learning is considered a revolutionary technology. Most works that focus on using deep learning methods in RUL prediction were carried out in the past several years. At present, it is necessary to conduct a survey on the challenge, approach, and future of deep-learning-based RUL prediction.
The following are four principles that guide this survey:
  • The literature is reviewed under unified problem formulation and framework for deep-learning-based RUL prediction.
  • Different from deep neural network surveys, related works are surveyed mainly from the perspective of the RUL prediction problem.
  • Details of deep technology are not introduced in the paper. It is assumed that readers already have this knowledge or can acquire it from other sources.
  • In order to study the general methodology of deep-learning-based RUL prediction, specific application fields or test datasets will not be detailed.

3. Unified Framework and Models

3.1. Problem Formulation

Currently, there is no clear definition of remaining useful life (RUL). It is even difficult to define the failure time of a system. Generally, the remaining useful life is defined as the time length from current time point to the failure time point intuitively.
 Definition 1. 
T: Let T be the time of the system failure.
 Definition 2. 
t: Let the system has survived until time t.
From the perspective of statistics [22], the remaining useful life (RUL) can be defined as:
r u l = T t
Then, the deep-learning-based RUL prediction method constructs a deep neural network that calculates the RUL value from raw input data in an end-to-end manner.

3.2. Unified Framework

A unified framework of deep-learning-based RUL prediction is shown in Figure 2. It covers most related works in the literature. A general deep-learning-based RUL prediction method is made up of three stages. At the data preprocessing stage, collected raw data are preprocessed by filtering, normalization, and splitting. The preprocessed data are suitable as the input of deep neural network. At the health indicator generation stage, it aims to generate features with strong representation ability of health. At the last stage, it performs prediction and outputs the final RUL value. In a specific scenario, it is not always necessary to combine all stages. In some works, they generate RUL values just at the health indicator fusion and smoothing step. Furthermore, some methods directly feed collected data into RUL prediction deep neural networks.

3.2.1. Data Preprocessing

Currently, systems are usually composed of various interacting components, in which the collective actions are difficult to deduce from those of the individual elements. This leads to the result that predictability is usually nonlinear. For condition monitoring, there are usually multiple sensors that receive information about system health due to incomplete understanding of the multidimensional failure mechanisms. Along with data noise, these factors render current methods ineffective. Moreover, real-world systems usually operate under a variety of operating conditions. All these situations make the data preprocessing step important for accurate RUL prediction.
In deep-learning-based RUL prediction, data preprocessing involves three aspects: data filtering, data normalization, and data splitting.
(1)
Data filtering
In real applications, operations and working conditions are usually complicated. It is hard to obtain an optimized sensor deployment strategy. Deep learning models may not fully grasp the underlying physical processes solely by using raw input data. The successful prognosis of remaining useful life requires proper filtering of raw data. Generally speaking, selected data should possess abilities of diagnostic, the sensitivity, and consistency. The diagnostic ability is the basic metric that determines the prediction accuracy of selected data. The sensitivity metric is highly correlated with prediction performance. Furthermore, the consistency metric is related to the confidence of the predicted result.
The collected data usually vary in sensitivity to both spatial and temporal spaces. As the bearing vibration signal, for example, time domain features are insensitive to small changes but sensitive to noise. They have an advantage in representing the middle stage of the bearing degradation process, while the frequency domain features are sensitive to the earlier and later stages [23]. The filtering of input signals requires a comprehensive consideration of degradation pattern representational ability, information redundancy, and complexity of signal features.
In order to enhance the diagnostic ability of collected data, Wu et al. [24] designed a strategy to prepare additional input data before training, which is referred to as dynamic differential data. These data are calculated as the forward difference between the current sensor monitoring value and the previous value under the same operation mode. The dynamic differential data inherently contain a great amount of information about the performance degradation of an asset. Based on this strategy, they proposed a vanilla LSTM neural network for RUL prediction.
(2)
Data normalization
In many cases, raw inputs differ in scale. Directly feeding these data into a deep learning network will lead to unequally weighted input features and will slow down the learning and convergence of model training. Therefore, raw inputs are usually converted into a common scale. The most common technique is z-score normalization [25]. Bektas et al. [26] proposed a component-wise multi-regime normalization method for data preprocessing to provide a common scale across all the complex features.
(3)
Data splitting
When splitting data for a deep learning model, there are three levels to consider.
Firstly, datasets are split into training datasets and test datasets. Special data splitting strategies can optimize the hyperparameters of deep neural networks and calculate the uncertainty of the deep learning model as well. Labeled data and unlabeled data are prepared for supervised learning and unsupervised learning separately. They can also be combined as semi-supervised learning.
Secondly, the full life cycle of an asset is usually split into the health stage and the deterioration stage. Furthermore, the selection of the deterioration start point is sensitive to the performance of prediction models. The performance of most assets usually degrades after a certain period of usage after being produced, but does not following a linear trend. The beginning part of a data sequence is designated as healthy part. However, it is hard to choose the degrading point, which is a common issue in the literature. Most research in the literature assumes that the degradation is not noticeable, and the RUL will be piecewise continuous, such that the RUL is constant at the beginning and then decreases linearly. However, this assumption often makes these models impractical for real-world tasks. Heimes et al. [27] proposed a multi-layer perceptron neural network (MLP) classifier, trained by the extend Kalman filter (EKF) [28], to distinguish between degraded and healthy status data. Li et al. [29] proposed an approach to determine the first prediction time (FPT) based on kurtosis [30]. Based on this feature, the prediction process is separated into two stages. The first stage is to predict the degradation state of the system, and the sensitive features from the first stage are then fed into the second stage to predict the remaining useful life [31,32].
Thirdly, in accordance with different data splitting methods, there are three mapping modes from input data to output value. They are the point-to-point (P2P) mode, the sequence-to-point (S2P) mode, and the sequence-to-sequence (S2S) mode. The P2P calculates an RUL value for every input record. The S2P generates an RUL value from a sequence input record, while the S2S will obtain more than one output value. For the RUL prediction scenario, P2P and S2P are usually adopted. Lim et al. [33] studied the effect of splitting time window size on prediction accuracy. Elsheikh et al. [34] chunked training sequences into consecutive overlapping windows of observation sequences. Each of these input sequences is fed into a bidirectional handshaking LSTM (BHSLSTM), and the output is the corresponding RUL. These windows allow the network to experience observation sequences from systems starting at different initial health states, which avoids the aforementioned problems.
Then, a common question left is how to prepare input data for deep learning neural networks. There are three general categories of point-wise, segment-wise, and temporal-wise methods. A variation of the segment-wise method is the time window sliding approach [25]. The goal of an efficient data splitting mechanism is to enhance the generalization power and training performance of a model. Wang et al. [35] proposed a hybrid long–short sequences method for engine RUL prediction. LSTM architecture is used for RUL predicting of long temporal sequences. Furthermore, gradient-boosting regression (GBR) is used to predict RUL of time window sequences. The RUL outputs of both LSTM and GBR are fed into a back-propagation neural network to obtain the final prediction.

3.2.2. Health Indicator Generation

The health of a system is complicated. There usually does not exists a deterministic indicator to express the health state. In order to calculate the RUL value, a metric of health indexing (HI) is generated first to represent the health state. Three key properties of HI are monotonicity, trendability, and prognosability. Monotonicity requires that the equipment does not undergo self-healing, which would result in non-monotonic trends. Trendability indicates the degree to which the evolution of the health indicator has the same shape and can be described by the same functional form. Prognosability measures the variance of the HI values at failure time. It is expected that failures occur at the same value of the HI.
Because of the unmeasurable feature of health indexing, datasets for deep neural network do not have health indicator training labels. This stage is usually implemented using unsupervised learning methods.
The health indicator generation stage can be further divided into two steps of feature transformation and HI fusion.
(1)
Feature transformation and selection
Different from feature extraction of traditional data-driven method, where features are designed based on a physical model, feature transformation is implemented using deep learning methods. It maps input data to feature spaces with strong correlations to the health state. In this process, noise reduction is also performed.
(2)
HI fusion and regression
Extracted features usually contribute different weights to the final health indicator. Compared with traditional health indicator generation methods, deep-learning-based methods can adaptively complete health indicator fusion. Moreover, deep learning methods are able to discover the knowledge of health indicators at different scales. After HI fusion and regression, the generated health indicator is prepared to predict the RUL value.

3.2.3. RUL Prediction

At this stage, it performs the final prediction using deep learning methods. It is critical to choose a suitable deep neural network model. The deep neural network is trained and optimized for high prediction performance and generalization ability.
(1)
Deep neural network model
In the application of RUL prediction, the two most widely used models are the recurrent neural network (RNN) and the convolutional neural network (CNN). The first one is able to discover temporal features, and the second one is able to discover different scale features. In order to discover more features, different deep neural networks can also be combined to predict the RUL value.
(2)
Deep neural network training
To train a deep neural network, some common problems include over-fitting, under-fitting, gradient explosion, and gradient disappearance. Accordingly, the solution is to design proper activation functions, loss functions, and regularization strategies. Despite this, there are situations that general training methods cannot handle. Special methods, such as cross-validation and transfer learning, may be adopted.
(3)
Deep neural network optimization
The main optimization goal here is met using the hyperparameters of the deep neural network. It is a complicated problem, whose objectives include enhancing prediction performance and improving convergence for predicting RUL using deep learning methods. A general method is to treat it as a multi-objective optimization problem and design a proper algorithm to search for globally optimized parameters.

4. Application of Deep Learning in RUL Prediction

Based on the proposed unified framework in Section 3, the mechanism of how deep learning technology is applied for RUL prediction is reviewed. The applications are categorized into four types: deep-learning-based health indexing, deep neural network model towards applications, deep learning method applications, and ad hoc deep learning applications for RUL prediction. As the deep learning method aims to implement RUL prediction in an end-to-end manner, without relying on human labor for feature design, a new approach is needed to establish health indices. Hence, the first problem is working out how to utilize deep learning technology to implement health indexing. Then, the application of deep learning to RUL prediction is reviewed from the perspective of different deep neural network models. In order to complete the final step of application in a production environment, deep learning methods such as transfer learning, hybrid deep learning, and ensemble learning are adopted. Additionally, ad hoc deep learning approaches are devised to address common challenges encountered in practice.

4.1. Deep-Learning-Based Health Indexing

Conceptually, health indexing involves obtaining metrics that can quantitatively evaluate the health status of assets. Efficient health indexing metrics should possess properties of monotonicity, trendability, and prognosability [36,37]. Generally, health indicators (HIs) are categorized into physical HIs (PHIs) and fused HIs (FHIs). PHIs refer to the extraction of physical significantly related information from the acquired data using statistics or signal processing algorithms. FHIs, also known as synthesized HIs or virtual HIs, are typically constructed using multiple PHIs or multiple sensor information via data fusion algorithms. In real-world applications, PHIs are usually difficult to obtain due to limited knowledge about physical properties. Recently, synthesized HIs have received significant attention [38].
FHIs are usually constructed by manually fusing multidimensional statistical features. They have already been reported to have achieved good results in the literature. However, synthesized HIs still have three drawbacks. Firstly, statistical features have different ranges, and these features do not contribute equally to the construction of HIs. Secondly, it is difficult to determine a definite failure threshold because HI values have a large variation range among different assets at the time of failure. Thirdly, statistical features vary in their sensitivity to faults. Furthermore, there are often multiple components that interact with each other in a complex system, resulting in intricate dependencies between sensor data.
Feature transformation and fusion are two essential steps in obtaining a health indexing model that satisfies three properties of a health indicator. The essence of feature transformation is to reduce the dimension of raw features. Then, a denoising process is adopted to smooth local random fluctuations and enhance the monotonicity of the health indicator. Table 3 summarizes health indexing technologies used in the literature.

4.1.1. Feature Transformation and Selection

There are two functionalities of feature transformation. The first is to transform raw data into features that are most related to health status [46] and are feasible for subsequent processing steps. The other is to transform features in one space to features in another space. The objective of space transformation is dimension reduction. To implement the first functionality, some common methods include Fourier transform (FT), Wigner–Ville distribution (WVD), wavelet transform (WT), and continuous wavelet transform (CWT) [45] in time, frequency, and time–frequency domains. Because the handcrafted feature extraction approach is highly dependent on expert prior knowledge and labor resources, deep learning methods are leveraged to address this problem. Generally, it is impossible to obtain correct values of features, so deep learning in this step usually belongs to the unsupervised learning method. The auto-encoder (AE) [49], including its variations of enhanced auto-encoder (EAE) [42], stacked denoising auto-encoder (SDA) [43,44], and stacked sparse auto-encoder (SAE) [45], are able to automatically extract highly abstracted features. For feature extraction, the convolutional neural networks (CNNs) are able to take advantage of the spatial structure of input data, especially in vibration signal processing scenarios, through local connections and pooling [40]. In order to take advantage of temporal characteristics of input data when performing feature transformation, the RNN can be adopted. To enable RNN model with denoising capability, it is extended as RNN encoder–decoder (RNN-ED). Furthermore, the masking vector and delta vector techniques can be adopted to handle value missing problems [47,48].
For the second functionality of feature space transformation, a classical technology is principal component analysis (PCA) [41]. PCA is a linear technique that maps original features into a new feature space with fewer dimensions and extracts principal components. Based on PCA, the kernel principal component analysis (KPCA) technology adds a step before PCA by employing nonlinear kernel functions to map the original features into a new hyper dimensional feature space [51]. Self-organizing map (SOM) is a neural-network-based clustering method that is able to map features in higher-dimensional space into features in lower-dimensional space without changing the top structure. This capability is feasible for feature transformation [42].
As transformed features can be various and correlated, only part of them will be selected. The selection criteria should satisfy the requirement of an efficient health indicator [46]. As mentioned previously, there are three properties of a health indicator. The selection of feasible features can be taken as a multi-objective optimization problem [39]. It is important to note that the selection is strongly correlated with the second stage of fusion, where the aim is to select features that are ultimately beneficial for generating an efficient health indicator.

4.1.2. Health Indicator Fusion and Regression

After transformation and selection, features are fused into a single health indicator (HI). The premise is that features that individually do not have the characteristics of health indicators may provide useful information when considered jointly. The straight forward idea is to calculate the time residual from the current time point to the failure point. Then, the health indicator fusion problem is transformed into a multidimensional distance calculation problem. Solutions, such as the AAKR approach, can be adopted [39].
Another idea for health indicator fusion is to implement it as a weighted average value, and the key technology is to design an efficient weighting mechanism. To address this problem, global search methods, such as the grey wolf optimizer (GWO) algorithm, can be adopted to calculate weighting coefficients [42]. For most deep-learning-based feature extraction and selection methods, the features are mapped to feed-forward single- or multiple-layer perceptron networks as a weighting approach to calculate the fusion health indicator [40].
Convolutional neural networks (CNNs) are a form of deep learning model that are able to implement feature transformation, selection, and fusion into a single process [52]. A typical CNN comprises three layers: the convolutional layer, the subsampling layer, and the fully connected layer. The structure of a CNN is a key factor that affects the accuracy of the generated health indicator [53]. Because the convolutional operation is able to handle multidimensional input, the time–frequency domain features extracted using continuous wavelet transform (CWT) are inherently feasible for CNNs. The recurrent neural network (RNN) is a deep neural network that is able to take advantage of mutual information from selected features. This makes it an option for health indicator fusion [46]. The long short-term memory (LSTM) is an RNN that is able to keep track of historical behavior features and combine those features with present measured features to fuse the final health indicator [45].
The problem that arises after fusion is health indicator regression, which is required to predict future health indicators. Theoretically, the health indicator changes monotonically. There may be outlier regions that affect the performance of the health indicator. The first step is to detect outliers, also known as trend burrs. Solutions include machine learning, inform theory, and statistics methods. For the statistics approach, the HI is treated as a normally distributed random variable. Then, a sample will be considered an outlier according to statistical rules, such as the 3 σ rule [40]. In order to smooth the health indicator, moving average (MA) is a common approach for coping with time series trajectories [49]. It takes both the current observation and historical observations into account to estimate the current value. The exponentially weighted moving average (EWMA) [54] approach is a variation of MA that assigns exponential weight to historical observations, decayed by the distance from the current step.
Another more general problem is the nonlinear characteristic of health degradation. In this situation, deep neural networks such as hierarchical gated recurrent unit (GRU) are suitable for health indicator regression [41]. The Gaussian process regression (GPR) algorithm is a kernel-based machine learning technique that can conveniently specify high-dimensional and flexible nonlinear regression. It is used to estimate the future failure health indicator up to the current time [52]. It is important to note that an application does not have to follow a single regression model. On the one hand, the failure types and degradation patterns differ between different assets. In this scenario, clustering algorithms, such as K-means, can be adopted to separate assets and environments into different models [52]. On the other hand, the degradation patterns differ at different stages of the same asset. The regression models may differ at different stages [43,44].

4.2. Deep Neural Network Models for RUL Prediction

In this section, the application of deep learning to RUL prediction is reviewed from the perspective of different deep neural network models, including the auto-encoder, the restricted Boltzmann machine, the recurrent neural network, and the convolutional neural network models.

4.2.1. Auto-Encoder Model

The most common usage of the auto-encoder (AE) model is feature learning in an unsupervised manner. In real applications, data often contain sparse features that are noisy and redundant in raw signals. Auto-encoders are naturally suitable for addressing these problems. To discover more information from raw input data, auto-encoders can be stacked to explore features in different scale layers. Then, the problem is how to train a stacked auto-encoder. Lin et al. [55] proposed the HELM [56] approach to solve the problem for the turbofan engines scenario, which is evaluated as being more efficient than back-propagation training algorithms.
Considering that collected data consist of time series, an accurate prediction requires a reasonable fusion of damage tendency and current states. Integrated deep denoising auto-encoders (IDDAs) can be combined to address this issue. The idea behind this design is that distant records are used to simulate the damage trend, while recent records are used to simulate the smoothing process of recent changes. Yan et al. [57] implemented an IDDA architecture with two deep denoising auto-encoders (DDAs) and a linear regression analysis. They split time series data into distant records and recent records at each time moment during the training phase. These two types of records are fed into the two DDAs, respectively. Based on this technology, they presented a concept of device electrocardiogram (DECG) to replace the traditional factory information system (FIS).

4.2.2. Restricted Boltzmann Machine Model

Restricted Boltzmann machine is also a type of unsupervised machine learning algorithm. It is a generative stochastic artificial neural network that learns the probability distribution over input datasets. It is typically used for feature compression and denoising. To obtain RUL, it is usually followed by a supervised learning model [58]. Liao et al. [59] proposed an RUL prediction method following the two-stage paradigm. In the first stage, they developed an enhanced restricted Boltzmann machine to automatically generate features. To avoid over-fitting and stabilize the learning process of the enhanced RBM, they did not adopt traditional weight decay and sparsity regularization. Instead, they first designed a slope regularization term to represent the trendability. Then, the SOM algorithm [60] maps the high-dimensional output from the RBM to a one-dimensional health indicator. Based on the health indicator, RULs are predicted in a similarity-based manner (Table 4).
Through stacking multiple layers of RBMs, a deep belief network (DBN) is formed, which is able to learn the deep representation of the input data and is suitable for time series prediction problems [64]. The training of an unsupervised DBN is usually processed in a greedy, layer-wise manner. When applied to classification tasks, another discriminative learning procedure is implemented after the generative pretraining process of the DBN. This model is named the discriminative deep belief network (DDBN). The discriminative fine-tuning is carried out by adding a final layer of variables that represent the desired label samples. Ma et al. [61] proposed a discriminative deep belief network and ant colony optimization (DDBN-ACO) method for health status assessment for bearing and turbine engine machines. The ant colony optimization (ACO) technique is adopted to obtain optimal hyperparameters of the DBN. Wang et al. [62] investigated a deep-learning-based approach for material removal rate (MRR) prediction in polishing. Stacked RBMs are trained to represent input data in multiple high-dimensional spaces. Then, the output of stacked RBMs and the input are both fed to a feed-forward three-layer neural network for MRR regression. The network structure and learning rate are optimized through a particle swarm optimization (PSO) algorithm. The idea behind these two applications is the same, while the major difference lies in the structure. The first model takes the output of the last RBM as input for the fine-tuning network, while the second model takes outputs of multiple stacked RBMs as joint input for the feed-forward neural network.
One problem with the standard RBM with stochastic binary units is that it tends to model discrete data instead of continuous data. The continuous deep belief network (CDBN) is constructed with continuously valued stochastic units to solve this problem. Shao et al. [63] proposed a novel CDBN method for predicting rolling bearing performance degradation. It first uses locally linear embedding to quantify health degradation. Then, a continuous deep belief network (CDBN) is constructed based on a series of continuous restricted Boltzmann machines (CRBMs) [65] to learn the hidden nonlinear relationships. To optimize the learning rate and the number of units in the CDBN, they adopted the genetic algorithm (GA) as an intelligent optimization method.

4.2.3. Recurrent Neural Network Model

The neural networks can be categorized into two major categories: feed-forward neural networks, which do not have feedback connections from a layer to a previous layer, and recurrent neural networks, where there are feedback connections. Generally speaking, recurrent neural networks are naturally suitable for time series prediction [66,67,68]. Hence, they are also the most widely researched models for RUL prediction applications. In turn, RNN methods in those applications are classified into four types according to the structures of RNN models (Table 5).
(1)
Standard RNN methods
The most straightforward RNN structure is to link the output of an FNN to an extra input node, forming a feedback loop. Tse et al. [69] implemented this structure for time series prediction at the early stage. The experimental results claimed that FNN and RNN methods have better performance than conventional autoregressive models. Tian et al. [71] integrated two context layers from the Elman network and the Jordan network separately to construct an extended RNN model, which was confirmed to be more accurate than the Jordan network and the Elman network individually. Liu et al. [72] developed an adaptive RNN model for lithium-ion battery remaining useful life estimation. The predictor is constructed based on a feed-forward multi-layer neural network with adaptive and recurrent feedback links from output nodes and hidden nodes separately. The adaptive feedback links represent temporal information spatially, while the recurrent feedback links deal with time explicitly. Experimental results show that their method outperforms the classical RNN and the RNF.
For the training of RNN models, a primary issue is the selection between incremental and batch training mechanisms. In the long-term RUL prediction problem using the RNN model, Malhi et al. [70] proposed to use competitive learning for input data clustering before RNN training. After that, the recurrent neural network is batch-trained to calculate the initial weight. Then, it is incrementally trained for the final cluster. It is confirmed that the batch training is useful for trend prediction, which avoids the over-fitting associated with incremental training. To calculate parameters in order to obtain an accurate RNN model, truncated back-propagation through time (BPTT) [27] can be adopted to compute the gradients. Furthermore, the extended Kalman filter [28] training method can be utilized to update the weights of the network. Moreover, the differential evolution (DE) [102] process can be used to enhance the generalization power of RNNs and optimize over-fitting issues.
(2)
ESN methods
The echo state network (ESN) is a paradigm of RNN that randomly establishes a large sparse reservoir to replace the hidden layers of RNNs. One of the main advantages of ESN lies in its training procedure, which is based on simple linear regression. It can be trained with little computational effort, while still providing the high generalization capabilities of RNN models. The ESN method was used to predict the degradation evolution of the stack for the proton exchange membrane fuel cell (PEMFC) [74,103]. When setting up the architecture of an ESN network, it is difficult to determine the optimal parameters. The multi-objective differential evolution method was adopted to search for globally optimized parameters for RUL prediction in ESN networks [104].
In order to improve the estimation precision for complex datasets with different features, an intuitive idea is to combine multiple ESNs that match to the varied datasets to address the problem. Subsequently, the Kalman filter technique is adopted to cope with multiple outputs of these ESNs. Peng et al. [73] proposed another combining ESN approach. Their approach classifies input data units into groups based on the condition data features, and trains a separate ESN model for each group, establishing an ESN model library.
(3)
LSTM methods
The most serious problem of standard RNN is that the early time information cannot be retained through the network. To address this problem, long–short time memory (LSTM) [105] is the most widely used RNN, which is able to learn long-term patterns [91]. Moreover, the problems of gradient vanishing and exploding are also effectively addressed in LSTM compared to traditional RNN models. The application of LSTM for RUL prediction of C-MASS data and PHM 08 Challenge data can be found in [76]. Other application fields include PEMFC [77], hard disk drivers (HDDs) [90], lithium-ion battery [78,79,80,81], jet engines [84], aero engines [85], and rolling bearing [86].
As the input data pass through neural network, information generated in each layer can be regarded as representation of the input in a specific dimensional space. Adding additional layer could potentially unveil deeper relationships between the inputs and outputs. Hsu et al. [89] proposed a two-layer LSTM architecture for RUL prediction. Zhang et al. [91] and Zheng et al. [92] developed deep LSTM approaches by stacking multiple LSTM layers. Zhao et al. [93] also proposed a deep LSTM approach for tool wear prediction by stacking multiple LSTM layers.
The bidirectional LSTM network is another improvement of LSTM. Zhang et al. developed a bidirectional LSTM network for machine remaining life prediction [94]. Firstly, a one-layer perceptron network is designed for health indexing from raw input data. The deep neural network consists of two hidden LSTM layers, each of which is bidirectional. Both the forward path and backward path are computed independently, and their outputs are concatenated. The forward direction aims to discover the system variation pattern, while the backward direction is designed to smooth the predictions.
The other deep learning networks can also be adopted to improve the performance of LSTM. Convolutional operations were conducted on both the input-to-state and state-to-state transitions of the LSTM [106]. Different from the hybrid of CNN and RNN in the following section, this method not only preserved the advantages of LSTM but also incorporated time–frequency features [87]. Furthermore, there are also other enhancement technologies for LSTM. Chen et al. [88] integrated both those handcrafted features and automatically learned features for RUL prediction, and an attention mechanism was introduced to highlight the importance of different features.
In production environment, there are some special issues to address. Faced with the over-fitting problem in LSTM training, dropout techniques can be utilized [100]. The LSTM model should be adaptively optimized using techniques such as the resilient mean square back-propagation (RMSprop) method [83]. Maintenance in real-world applications is usually based on the RUL probability distribution function (PDF). However, the LSTM model is unable to obtain uncertainties. This problem is not thoroughly researched in the literature. The Monte Carlo simulation approach can be used to generate prediction uncertainties, and feed those uncertainties into LSTM model for training [83]. There are also cases when only part-life-cycle observation data are available. An online learning model was proposed for improved LSTM-based RUL prediction [82]. It uses online observation data to update parameters in real time.
(4)
GRU methods
Gated recurrent unit (GRU) is a gating mechanism in recurrent neural network model [107]. It is a variant of LSTM with fewer parameters, because it synthesizes the input gate and forget gate of LSTM into a single reset gate. When adopting GRU for RUL, it can make full use of the historical state information of limited samples, and can effectively slow down the forgetting speed of important trend information [95].
It can be integrated with other neural networks to form a deep neural network, enabling it to extract features at different scale layers. Ren et al. [96] proposed a multi-scale dense gate recurrent unit network (MDGRU) for bearing remaining useful life prediction. A restricted Boltzmann machine network makes up the first two layers of MDGRU, which reduces the dimension of original features. Then, the output is fed into the following multi-scale time layer. This layer implements the embedding function that prevents the loss of information of the feature. Several skip-GRU layers employ dropout strategy, ReLU activation function, and Adam optimization algorithm to enhance the traditional GRU network. Finally, three dense layers implement ensemble learning and predict RUL value.
When constructing the input–output structure of the GRU network, there are two options: the sequence-to-sequence method and the sequence-to-one method. Chen et al. [97] developed a two-step solution for RUL prediction. In the first step, they applied kernel principal component analysis (KPCA) to extract nonlinear features. Then, they designed a gated recurrent unit (GRU) to predict RUL. Their experiment results show that sequence-to-one method is more suitable in their scenario, and GRU has advantages in both training performance and prediction accuracy.
The bidirectional RNN model is a common idea to extract features from time series [98]. Zhao et al. [99] proposed a local-feature-based gated recurrent unit (LFGRU) network, which combines handcrafted feature design with automatic feature learning. The raw sensory input is first divided into segments with a fixed window length. Then, tridomain features, including time, frequency, and time–frequency, are extracted from each local window. Bidirectional GRU is fed with the generated local feature sequence to learn representation features. Considering that information in the middle range of the sequence might be lost in a bidirectional GRU, they introduced a weighted feature averaging approach to highlight the impact of the middle local features. Two stacked fully connected dense layers, taking middle local features and learned representation features as input separately, formed the supervised learning of RUL prediction. A temporal self-attention mechanism was introduced into a novel bidirectional gated recurrent unit to predict RUL, where each considered time instance was assigned a self-learned weight [101].

4.2.4. Convolutional Neural Network Model

Traditional machine learning methods are usually suitable for lower-dimensional features, while deep learning methods are suitable for high-dimensional features under the big data context. The CNN model is good at dealing with high-dimensional features using fewer parameters to achieve the same functionality and precision (Table 6). Therefore, it is a good choice for high-dimensional data, which is an outstanding characteristic in the context of RUL prediction.
The first attempt to apply CNNs for RUL prediction was conducted by Babu et al. [108]. Convolution and pooling filters in their approach are applied along the temporal dimension. Their model is constructed with two pairs of convolution layers and pooling layers, and one normal fully connected multi-layered perceptron network. To ensure equal contribution from all features across all operating conditions, a custom normalization mechanism was designed. Ren et al. [111] proposed a CNN architecture consisting of eight layers: three convolution layers, three pooling layers, one flatten layer, and one output layer. Considering the lack of time domain features, they designed a new feature extraction method called spectrum-principal-energy-vector to obtain the feature vector. As the RUL output of the CNN may not be continuous, they also proposed a linear regression method to smooth the forward prediction result.
To gain better knowledge of the input data with additional multi-scale feature information is an advantage of CNN model. Zhu et al. [109] proposed a multi-scale convolutional neural network (MSCNN) method for bearing RUL prediction. Wavelet transform (WT) is adopted to transform original data into a time frequency representation (TFR). Then, bilinear interpolation is used to reduce the data dimension. After that, the data are fed into the MSCNN for model training. Li et al. [110] proposed a deep convolutional neural network method for aero-engine RUL prediction. To select raw features, sensor data are processed using min–max normalization method, and samples are prepared in a time window manner. Based on this attempt, they further proposed a multi-scale CNN method to predict RUL of bearings [29]. The training and testing samples are prepared from measured data using the short-time Fourier transform (STFT) technique. The dropout technique is adopted to avoid the over-fitting problem. Furthermore, the leaky rectified linear units (leaky ReLU) technique [113] is adopted to optimize gradient vanishing and gradient diffusion problems. The difference between these two MSCNNs lies in the structure of the deep neural network. In the first MSCNN, a mixed layer is designed to accept features from the third convolutional layer and the second pooling layer. The output of the special mixed layer is then fed into the regression layer to perform RUL prediction. In the second MSCNN, three consecutive convolutional layers are designed for feature extraction. The features generated by the three layers are concatenated to preserve the diverse information and obtain features of multiple scales. Then, the concatenated features are fed into another convolutional layer to reduce dimensionality, followed by a fully connected layer. Li et al. [114] improved the efficiency of the MSCNN by setting different convolution kernel sizes in parallel at the same layer. In order to effectively identify the distinctions of different sensor data, a multi-scale convolutional attention network (MSCAN) [115] is proposed, where self-attention modules were first constructed to effectively fuse the input multisensor data.
Another important concept of CNN is the channel. Jiang et al. [112] proposed an enhanced CNN (ECNN) method for predicting the RUL of turbofan engines. The time series input of the CNN network has two channels, with the first channel being the earlier data and the second channel being the later data. This design takes advantage of temporal relationship between different channels. To train the networks, the data are preprocessed using mean removal and variance scaling normalization methods. For health indexing, the degradation point of an engine is chosen as half of the largest time cycle. After that, it degrades linearly. The network parameters are trained using an adaptive moment estimation (Adam) optimizer.

4.3. Deep Learning Methods for RUL Prediction

4.3.1. Transfer Learning Method

A major challenge in data-driven prognostics is that it is often difficult to obtain a large number of failure samples. This situation arises for several reasons: (1) running until failure is not permitted for critical assets; (2) many failures occur slowly and follow a degradation path, which might take months or even years. To address the issue of data scarcity, transfer learning can be adopted by taking advantage of datasets in related domains or simulated datasets [116]. It has already made great progress in image, audio, and text processing scenarios. The basic idea is to train models on different but related source datasets first, and then tune the trained models using the target dataset. However, the data distribution in the source dataset may be different from that of the target dataset. In such cases, effective knowledge transfer is the key challenge in improving the performance of learning, enabling one to avoid dependence on the volume of failure samples. Based on the availability of sample labels, transfer learning can be divided into three categories: inductive transfer learning, transductive transfer learning, and unsupervised transfer learning [117]. Inductive transfer learning is employed when labeled data are available in a target domain. The transductive transfer learning is suitable for scenarios where labeled data are only available in the source domain. Furthermore, unsupervised transfer learning is leverage to address problems where there are no labeled data in both the source and target domains. When implementing a transfer learning approach, transfer learning can be conducted at four levels, according to the objects to be transferred: instance transfer, feature representation transfer, parameter transfer, and rational knowledge transfer.
Zhang et al. [118] proposed a transfer learning approach with bidirectional LSTM deep neural networks for RUL prediction. It belongs to inductive transfer learning and parameter transfer. They conducted a series of experiments on the C-MAPSS datasets [119] of different operation conditions. The experimental results showed that transfer learning is effective in most cases except when transferring knowledge from a dataset of multiple operating conditions to a dataset of a single operating condition, which led to the open problem of negative transfer learning. Sun et al. [120] presented a deep transfer learning (DTL) method. The deep learning model, trained by historical failure data, consists of stacked sparse auto-encoders (SAE), and one nonlinear regression layer. Then, three transfer strategies of weight transfer, feature transfer learning and weight update are used to transfer the SAE architecture to a new object scenario for RUL prediction. Da Costa et al. [121] proposed a new data-driven approach for domain adaptation in prognostics using long short-term neural networks (LSTM), where domains were composed of data with different fault modes and operating conditions. Zhu et al. [122] and Cao et al. [123] proposed transductive transfer learning for bearing RUL prediction under different conditions, which was conducted at the feature representation transfer level. Cheng et al. [124] implemented the same idea for bearing RUL prediction under multiple failure behaviors. Through adding failure behavior judgment, more refined transfer was implemented [125].

4.3.2. Hybrid Deep Learning Method

In order to take advantage of different deep learning models, different models are integrated to generate hybrid model with higher accuracy. The most widely used paradigm is a two-stage hybrid model (Table 7). At the first stage, features are preprocessed using a deep neural network as a sparse auto-encoder. At the second stage, deep neural networks are trained to calculate the output from features generated at the first stage [126,127].
For the RUL prediction problem, Gao et al. [128] adopted stacked denoising auto-encoders (SDAE) to extract deep features. Then the deep features are fed into a support vector machine (SVM) to predict the RUL of integrated modular avionics (IMAs). Song et al. [129] proposed an auto-encoder–BLSTM hybrid method. It takes advantage of feature extracting capability of the auto-encoder and the temporal modeling power of BLSTM. Ren et al. [135] combined a deep auto-encoder and deep neural networks for bearings RUL prediction. They proposed a novel feature vector based on time–frequency–wavelet combined features to represent the bearing degradation process. Then, a deep auto-encoder was presented to compress the combined features without increasing the scale of the following nine-layer prediction DNN. This prediction DNN is featured with a special RUL normalization, named experience–max normalization. Then, the forward linear smoothing method is used to smooth the prediction results. Deutsch et al. [136] presented a DBN-FNN approach for predicting RUL of gears and bearings. The deep belief network, which has the advantages of self-taught feature learning capability, is made up of stacking RBMs. An FNN architecture is used for supervised fine-turning of RUL prediction. The optimal hyper-parameters of DBN-FNN structure were determined using a grid search method.
Different from general two-stage hybrid model, deep learning models can be mixed in a more complicated way. The hybrid of CNN and RNN is a common approach, where CNN is designed to extract local features, and RNN is designed to encode the temporal information [130,131,132,133]. Hinchi et al. [134] implemented the same idea, except that the bidirectional LSTM is replaced with a unidirectional LSTM model. Daroogheh et al. [137] proposed a novel hybrid architecture by integrating multiple models and neural network techniques. A mode-based particle filters (PFs) method is used to predict the propagation of health indicators. Then, a neural network based model is adopted to enhance the accuracy of the overall particle-filtering-based strategy. They considered three neural network paradigms: (1) MLP, (2) RNNs, and (3) WNNs. Experimental results show that all three paradigms are able to obtain reliable and accurate predictions. Furthermore, the performance is not dependent on the type of neural network structures, which validates both the accuracy and extendibility of the proposed hybrid method. Li et al. [138] designed a hybrid Elman–LSTM model for battery RUL prediction. They first implemented an empirical mode decomposition (EMD) algorithm to reconstruct battery capacity series. Then, the LSTM and Elman neural networks were combined to capture the battery capacity degradation features with increasing cycle number in the long term and to represent the capacity recovery at certain cycles in the short term.

4.3.3. Ensemble Learning Method

Ensemble learning can be considered as a special type of hybrid learning. The performance of a deep learning model may be influenced by various factors, such as environmental uncertainties, different operational conditions, and the number of available sensors. A single model trained on data under certain situations may not generalize well to other situations. Ensemble learning provides a solution for this problem. An ensemble of multiple models is able to leverage the advantages of each individual model, and improve generalization capability.
Two main aspects to be considered when applying an ensemble learning method are the following: (1) how to design ensemble models; (2) how to fuse ensemble model outputs to generate final RUL value (Table 8).
Particle filter is often used as prognostic technique for estimating the evolution of the degradation state. It relies on analytical models of both degradation state evolution and measurement. The probabilistic relation between them is then used to update the prediction of the evolution of the degradation upon the acquisition of a measure within a Bayesian framework. Baraldi et al. [143] implemented a PF approach. They built the measurement model using ANNs.
Hu et al. [142] proposed an ensemble method composed of five models: three similarity-based interpolation approaches, one extrapolation-based approach, and one recurrent neural network approach. The K-fold cross validation (CV) technique was adopted to calculate the accuracy of the prediction models. Then, three weighted-sum formulation mechanisms were designed. First, the accuracy-based weighting mechanism is based on the CV accuracy. Second, the diversity-based weighting mechanism assigns higher weights to models with higher prediction diversity. Third, the proposed optimization-based weighting mechanism takes the advantages of both the first two and has proven to perform better in three cases: the 2008 IEEE PHM challenge problem, power transformer problem, and electric cooling fan problem.
Peel et al. [139] proved that the combination and filtering of models can yield a remarkable prediction performance. Their method won the IEEE GOLD category of the PHM’08 Data Challenge. The ensemble consisted of three models: a radial basis function (RBF) model with 15 hidden nodes and two MLP models, one with 75 hidden nodes and one with 100 hidden nodes. The ensemble models were selected using tournament heuristic algorithm. Then, the Kalman filter was adopted for fusing multiple neural network model predictions. The limitation of this approach is that it assumes the health of the system linearly degrades with usage, which may be divided into nonlinear health stage and degradation stage in real-world applications. A switching Kalman filter (SWF) approach [140,141] was proposed to overcome the limitation.
In order to select the most suitable models for an ensemble, it can be viewed as a multi-objective optimization problem. The two main objectives are accuracy and diversity. The accuracy objective quantifies the similarity between the output of a model and the real value, while the diversity objective measures the discrepancy between the outputs of multiple models. The accuracy objective improves the performance of each model, and the diversity objective improves the generalization performance.
Zhang et al. [147] proposed an ensemble learning method named multi-objective deep belief networks ensemble (MODBNE). It combines a multi-objective evolutionary algorithm with the traditional DBN technique to train a DBN ensemble. To obtain a population of candidate DBNs, they employed a widely used approach of MOEA/D [148]. The two conflicting objectives are maximizing accuracy and maximizing diversity. The solution space is defined in terms of DBN’s structural parameters and training parameters. After the candidate DBNs are trained, they are combined to generate the final prediction of RULs. The combination weights are optimized by differential evolution (DE) [149] algorithm. Rigamonti et al. [144] proposed an echo state networks (ESNs) ensemble approach for RUL prediction. The member ESN models are constructed using multi-objective differential evolution (DE) method. The two complementary objectives are cumulative relative accuracy (CRA) and Alpha-Lambda ( α - λ ) accuracy prognostic metrics. The CRA provides an average estimation of the RUL prediction relative error and, being a relative measure, tends to enlarge errors made at the end of the system’s life. On the other hand, the α - λ accuracy indicates how many times, on average, the RUL prediction falls within two relative confidence bounds. ESN outputs are fused using a dynamic local aggregation approach. Moreover, they also investigated the estimation uncertainty problem using MVE method, which is considered to be easily embedded within a local ensemble framework.
Different from most works that take operation conditions into consideration, Li et al. [145] proposed an ensemble-learning-based prognostic approach with degradation-dependent weights for RUL prediction. They employed an empirical model to determine the performance degrading stages. Then, the fusing weights for member algorithms are optimized in a stage-wise manner.

4.4. Ad Hoc Deep-Learning-Based RUL Prediction

4.4.1. Multiple Operational Conditions Application

In practice, similar or identical engineered systems are exposed to different operational conditions for performing different tasks. Different operational conditions may significantly accelerate or decelerate the degradation process, which will affect the RUL of engineered systems [25,150,151]. Hence, it is challenging to finish the final step of applying deep-learning-based RUL prediction methods to a production environment.
The most straightforward method is to design a mechanism that is able to normalize different operation conditions. Bektas et al. [26] proposed a multi-regime normalization method to provide a common scale across different operation conditions. Based on normalized features, they presented a similarity-based deep learning method for RUL prediction.
Another idea is to take the operation conditions as input to a deep learning model. Huang et al. [25] developed a novel prognostic method based on bidirectional LSTM (BLSTM) networks to address the multiple operation conditions learning problem. Considering the complexity of health conditions, which are monitored as sensory data, their temporal hidden features are first extracted by a BLSTM. Then, operational conditions data and extracted features are taken as the input to another BLSTM. The basic component of a BLSTM is the common LSTM. The dropout technique and early stopping method were employed to relieve the over-fitting problem.

4.4.2. Insufficient Labeled Data Application

Insufficient labeled data problem denotes that there is little historical data recorded with labels indicating the exact remaining useful life. There are two main reasons for this phenomenon. First, labeling work is labor-intensive and time-consuming. Second, most assets work in normal situations during their lifetime in production environment. In the case of mission-critical assets such as nuclear power plant, a single failure means a disaster.
The general method is semi-supervised learning (SSL), which is able to make use of both labeled and unlabeled data. The unlabeled data are fed into an unsupervised learning model, and the labeled data are fed into a supervised model. Yoon et al. [152] proposed an unsupervised variational auto-encoder (VAE) to translate the original space into a lower dimensional space. Then, an RNN model takes the output of the VAE as input and generates RUL. It is trained using labeled data. Listou Ellefsen et al. [153] proposed a five-layer deep network. The first layer is a restricted Boltzmann machine performing unsupervised learning on raw unlabeled input data. It uses a rectified linear unit (ReLU) as the activation function. For labeled data, the learned abstract features are fed into the subsequent deep architecture of two LSTM layers: one FNN layer and one output layer. The supervised learning deep network is trained through the truncated back-propagation through time (TBPTT) technique. Another method is to apply sufficient simulation data as the training data [116].

4.4.3. Uncertainty of RUL Prediction

In real applications, estimated RUL usually varies widely due to the model parameters and noisy sensor data. Deterministic prediction values may not sufficient for RUL based operation decisions like maintenance. If the RUL prediction interval can be estimated, it would provide more information for operation decision.
Lacking uncertainty representation is currently a common disadvantage of deep learning methods. Zhao et al. [154] used a DBN-RVM fusing method to predict RUL of lithium-ion batteries. Based on the restricted Boltzmann machine (RBM), DBN is powerful in feature extraction and data reduction but lacks the capability of uncertainty representation. RVM has the capability for uncertainty representation but has low accuracy and poor stability in long-term prediction. By making full use of the advantages of these two techniques, their method demonstrates the effectiveness of prediction accuracy and reliability simultaneously. The bootstrap method was adopted to calculate the interval of RUL values calculated by the LSTM–FNN architecture [155]. Variational inference was employed to quantify the uncertainty of recurrent convolutional neural networks in prognostics, and a probabilistic RUL prediction result is obtained using Monte Carlo dropout [156]. Yang et al. [157] developed an improved dropout method based on nonparametric kernel density to improve the estimation accuracy of RUL. Furthermore, the Gaussian process regression (GPR) model is utilized to fit uncertainty quantification caused by battery capacity regeneration when using LSTM to predict RUL [158]. Assuming that the predicted RUL follows a normal distribution, a normal distribution output layer was designed to quantify uncertainty [159].

5. Challenge and Future Directions

RUL prediction has always been important in industry and attracts a lot of attention in academia. Due to the complexity of the problem and limited by the current state of the art, RUL prediction has still not been widely used in real systems. Recently, deep learning technology has made breakthrough progress and has been widely applied in multiple areas. Naturally, it has also attracted attention of researchers in the RUL prediction field. This can be proved by the phenomenon of a dramatic increase in related works that use deep learning methods to predict RUL recently. However, it is still at its preliminary stage compared to other areas, taking speech recognition and LLM as examples. Figure 3 shows the overall research architecture of RUL prediction based on deep learning.
The research challenges and future directions of deep-learning-based RUL prediction are summarized as follows.

5.1. Unified Framework and Architecture

Most reviewed works are designed for specific assets or problems. Little is known about why and how these architectures and approaches have been designed and implemented. Hence, it is a challenge to design a general unified framework and provide a common paradigm to evaluate the validity of different methods. The future research directions are as follows.

5.1.1. General Paradigm and Benchmarking

It is necessary to provide a general deep learning paradigm for RUL prediction. The performance of a deep learning model for RUL prediction should be evaluated in multiple dimensions. General benchmarking is needed to discover the advantages and disadvantages of different methods exactly, and help the development of deep-learning-based RUL prediction technology.

5.1.2. Open-Source Dataset

Considering the huge model complexity and depth of deep learning model, the performance of RUL prediction is limited by the scale of training datasets. To publish large-scale open-source dataset is meaningful to promote the development of deep-learning-based RUL prediction technology.

5.2. Health Indicator Generation

A good health indicator (HI) often determines the accuracy and reliability of RUL prediction. Hence, it is the first challenge when considering the utilization of deep learning techniques for RUL prediction. The future research directions are as follows.

5.2.1. Deep-Learning-Based Health Indexing

In order to generate efficient health indicators, deep learning methods are able to capture the key information in the process of degradation from raw signals, which does not rely heavily on the prior knowledge [160]. The dataset used to construct health indicators may be imbalanced and irregular under variable working conditions, so unsupervised deep learning techniques should be given more attention [161]. Furthermore, the variation of deep learning models should be considered to mine the hidden stationary and nonstationary information [162].

5.2.2. Domain Knowledge Utilization

It is an advantage of deep learning that its performance does not strongly depend on domain knowledge. However, deep neural networks are still regarded as black boxes models currently. Their inner mechanisms are unexplainable. Domain knowledge can contribute to the success of applying deep learning on RUL prediction. It can be utilized to generate discriminative features which enables reducing the scale of the subsequent deep learning model. It can also be integrated into deep neural network training, such as regularization term designing for higher generalization capability.

5.2.3. Feature Transformation and Regression

The goal of feature transformation and regression is to generate a health indicator satisfying the requirements of monotonicity, trendability, and prognosability. It is a complicated task to ensure the obtaining of the maximum of meaningful information from raw signals. However, efficient feature transformation and regression can not only improve the accuracy of RUL prediction, but also reduce the costs of complex deep learning models.

5.3. Deep-Learning-Based RUL Prediction

RUL prediction is the essential process that calculates the RUL value from the health indicator or even raw input data. The first challenge is to evaluate the performance of RUL prediction with different deep neural networks and analyze the principle behind them. Then, different supervised learning methods and transfer learning technologies can be adopted as a complement to solve specific problems. When the models are chosen, the second challenge is to optimize them for efficient RUL prediction. The future research directions are as follows.

5.3.1. Hybrid Learning for RUL Prediction

There is a consensus that the performance of RUL prediction can be improved by hybrid learning methods. The most widely used hybrid learning method is ensemble learning. The models composing the ensemble will generate RUL values in parallel. Then, these RUL values are assigned with weights under special mechanism and produce the final RUL value. Another category of hybrid learning method is to take advantage of different deep neural networks in general manner. A classic combination is to utilize restricted Boltzmann machine or auto-encoder at the health indexing stage, and recurrent neural network or convolutional neural network at the RUL prediction stage. Moreover, hybrid learning enables extracting more features from different deep neural network models and layers of neural network, which in turn creates potential for getting more accurate RUL value.

5.3.2. Deep Neural Network Optimization for RUL Prediction

Accompanied by deep learning, the training of deep neural networks is also a challenging problem. Problems of traditional deep learning also exist in RUL prediction scenarios, including over-fitting, under-fitting, gradient explosion, and gradient disappearance. Hyperparameter tuning is another direction worthy of deep research. Currently, hyperparameters are mostly manually designed, which is overly dependent on manual experience and may not yield sufficient accuracy. This can be optimized by meta-heuristic algorithms such as the genetic algorithm and particle swarm optimization.

5.4. Real-World Applications

The final goal of prediction is indeed to estimate the RUL of real-world assets. When dealing with real-world environments, the challenges primarily lie in three areas: source data collection, operating condition identification, and model deployment. Some potential future research directions related to these challenges are presented here.

5.4.1. Imbalanced Dataset

The imbalanced dataset is a widely existing problem in production environment. The amount of fault data is much less than that of health data. Enhanced deep learning methods of unsupervised learning, semi-supervised learning, and transfer learning can be proposed to address this issue.

5.4.2. Multiple Operation Condition

Currently, most works assume that assets work in constant operation condition, which is not the truth in real production environments. The gap between ideal experimental working conditions and dynamic operation conditions should be covered.

5.4.3. Cost of Deep Learning Method

Within this survey, few works concern the costs, especially for the computations, of various deep learning models. This may stem from the difficulty of comparing models across different hardware and asset types. The costs may also be disregarded under the assumption of an ideal experimental environment. However, accurate RUL prediction should be carried out in a time-critical manner in order to finally realize the benefit from deep-learning-based RUL prediction.

5.4.4. Multi-Objective Deep-Learning-Based RUL Prediction

Generally, the goal of deep-learning-based RUL prediction is accuracy. However, it is hard to obtain the most accurate model, and the accuracy may change with application scenario even for the same model. It can be formalized as multi-objective optimization problem. Taking accuracy and diversity difference as two conflicting objectives, it would be able to obtain Pareto-optimal results.

6. Conclusions

Deep-learning-based RUL prediction is comprehensively surveyed. All related works are reviewed in three dimensions. Firstly, a unified framework is proposed to analyze relative works and guide future method design. It divides the overall process into three stages of data preprocessing, health indicator generation, and RUL prediction. Then, the details of prediction are compared from the perspective of different deep learning models, including the auto-encoder, the restricted Boltzmann machine, the deep belief network, the recurrent neural network, and the convolutional neural network. Thirdly, the literature is reviewed from the perspective of specific problems, including transfer learning, the multiple operational conditions method, the insufficient labeled data method, hybrid deep learning, ensemble learning, and the uncertainty of RUL prediction. Based on the systematic review, the key technologies and some challenging problems are summarized.

Author Contributions

Conceptualization, F.W.; methodology, F.W.; validation, F.W.; formal analysis, F.W.; investigation, Q.W. and Y.T.; resources, X.X.; writing—original draft preparation, F.W.; writing—review and editing, X.X.; supervision, Q.W. and Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Guangdong Major Project of Basic and Applied Basic Research under Grant 2019B030302002.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Endrenyi, J.; Aboresheid, S.; Allan, R.; Anders, G.; Asgarpoor, S.; Billinton, R.; Chowdhury, N.; Dialynas, E.; Fipper, M.; Fletcher, R.; et al. The present status of maintenance strategies and the impact of maintenance on reliability. IEEE Trans. Power Syst. 2001, 16, 638–646. [Google Scholar] [CrossRef]
  2. Ahmad, R.; Kamaruddin, S. An overview of time-based and condition-based maintenance in industrial application. Comput. Ind. Eng. 2012, 63, 135–149. [Google Scholar] [CrossRef]
  3. Peng, Y.; Dong, M.; Zuo, M.J. Current status of machine prognostics in condition-based maintenance: A review. Int. J. Adv. Manuf. Technol. 2010, 50, 297–313. [Google Scholar] [CrossRef]
  4. Mazhar, M.; Kara, S.; Kaebernick, H. Remaining life estimation of used components in consumer products: Life cycle data analysis by Weibull and artificial neural networks. J. Oper. Manag. 2007, 25, 1184–1193. [Google Scholar] [CrossRef]
  5. Gebraeel, N.; Lawley, M.; Liu, R.; Parmeshwaran, V. Residual life predictions from vibration-based degradation signals: A neural network approach. IEEE Trans. Ind. Electron. 2004, 51, 694–700. [Google Scholar] [CrossRef]
  6. Shao, Y.; Nezu, K. Prognosis of remaining bearing life using neural networks. Proc. IMechE Part I J. Syst. Control Eng. 2000, 214, 217–230. [Google Scholar] [CrossRef]
  7. Gebraeel, N.Z.; Lawley, M.A. A neural network degradation model for computing and updating residual life distributions. IEEE Trans. Autom. Sci. Eng. 2008, 5, 154–163. [Google Scholar] [CrossRef]
  8. Si, X.S.; Wang, W.; Hu, C.H.; Zhou, D.H. Remaining useful life estimation—A review on the statistical data driven approaches. Eur. J. Oper. Res. 2011, 213, 1–14. [Google Scholar] [CrossRef]
  9. Hilbert, M.; López, P. The world’s technological capacity to store, communicate, and compute information. Science 2011, 332, 60–65. [Google Scholar] [CrossRef]
  10. Hey, T.; Tansley, S.; Tolle, K. The Fourth Paradigm: Data-Intensive Scientific Discovery; Microsoft Research: Redmond, WA, USA, 2009; Volume 1. [Google Scholar]
  11. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef]
  12. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  13. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
  14. Schwabacher, M.; Goebel, K. A Survey of Artificial Intelligence for Prognostics. In Proceedings of the AAAI Fall Symposium: Artificial Intelligence for Prognostics, Arlington, VA, USA, 9–11 November 2007; pp. 108–115. [Google Scholar]
  15. Liu, R.; Yang, B.; Zio, E.; Chen, X. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mech. Syst. Signal Process. 2018, 108, 33–47. [Google Scholar] [CrossRef]
  16. Nash, W.; Drummond, T.; Birbilis, N. A review of deep learning in the study of materials degradation. NPJ Mater. Degrad. 2018, 2, 37. [Google Scholar] [CrossRef]
  17. Zhao, G.; Zhang, G.; Ge, Q.; Liu, X. Research advances in fault diagnosis and prognostic based on deep learning. In Proceedings of the 2016 Prognostics and System Health Management Conference (PHM-Chengdu), Chengdu, China, 19–21 October 2016; pp. 1–6. [Google Scholar]
  18. Khan, S.; Yairi, T. A review on the application of deep learning in system health management. Mech. Syst. Signal Process. 2018, 107, 241–265. [Google Scholar] [CrossRef]
  19. Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
  20. Remadna, I.; Terrissa, S.L.; Zemouri, R.; Ayad, S. An overview on the deep-learning-based prognostic. In Proceedings of the 2018 International Conference on Advanced Systems and Electric Technologies (IC_ASET), Hammamet, Tunisia, 22–25 March 2018; pp. 196–200. [Google Scholar]
  21. Wang, S.; Jin, S.; Bai, D.; Fan, Y.; Shi, H.; Fernandez, C. A critical review of improved deep learning methods for the remaining useful life prediction of lithium-ion batteries. Energy Rep. 2021, 7, 5562–5574. [Google Scholar] [CrossRef]
  22. Banjevic, D. Remaining useful life in theory and practice. Metrika 2009, 69, 337–349. [Google Scholar] [CrossRef]
  23. Ren, L.; Cui, J.; Sun, Y.; Cheng, X. Multi-bearing remaining useful life collaborative prediction: A deep learning approach. J. Manuf. Syst. 2017, 43, 248–256. [Google Scholar] [CrossRef]
  24. Wu, Y.; Yuan, M.; Dong, S.; Lin, L.; Liu, Y. Remaining useful life estimation of engineered systems using vanilla LSTM neural networks. Neurocomputing 2018, 275, 167–179. [Google Scholar] [CrossRef]
  25. Huang, C.G.; Huang, H.Z.; Li, Y.F. A bidirectional LSTM prognostics method under multiple operational conditions. IEEE Trans. Ind. Electron. 2019, 66, 8792–8802. [Google Scholar] [CrossRef]
  26. Bektas, O.; Jones, J.A.; Sankararaman, S.; Roychoudhury, I.; Goebel, K. A neural network filtering approach for similarity-based remaining useful life estimation. Int. J. Adv. Manuf. Technol. 2019, 101, 87–103. [Google Scholar] [CrossRef]
  27. Heimes, F.O. Recurrent neural networks for remaining useful life estimation. In Proceedings of the 2008 International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; pp. 1–6. [Google Scholar]
  28. Singhal, S.; Wu, L. Training multilayer perceptrons with the extended Kalman algorithm. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 27–30 November 1989; pp. 133–140. [Google Scholar]
  29. Li, X.; Zhang, W.; Ding, Q. deep-learning-based remaining useful life estimation of bearings using multi-scale feature extraction. Reliab. Eng. Syst. Saf. 2019, 182, 208–218. [Google Scholar] [CrossRef]
  30. Li, N.; Lei, Y.; Lin, J.; Ding, S.X. An improved exponential model for predicting remaining useful life of rolling element bearings. IEEE Trans. Ind. Electron. 2015, 62, 7762–7773. [Google Scholar] [CrossRef]
  31. Shi, Z.; Chehade, A. A dual-LSTM framework combining change point detection and remaining useful life prediction. Reliab. Eng. Syst. Saf. 2021, 205, 107257. [Google Scholar] [CrossRef]
  32. Pan, Z.; Meng, Z.; Chen, Z.; Gao, W.; Shi, Y. A two-stage method based on extreme learning machine for predicting the remaining useful life of rolling-element bearings. Mech. Syst. Signal Process. 2020, 144, 106899. [Google Scholar] [CrossRef]
  33. Lim, P.; Goh, C.K.; Tan, K.C. A time window neural network based framework for Remaining Useful Life estimation. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 1746–1753. [Google Scholar]
  34. Elsheikh, A.; Yacout, S.; Ouali, M.S. Bidirectional handshaking LSTM for remaining useful life prediction. Neurocomputing 2019, 323, 148–156. [Google Scholar] [CrossRef]
  35. Wang, S.; Zhang, X.; Gao, D.; Chen, B.; Cheng, Y.; Yang, Y.; Yu, W.; Huang, Z.; Peng, J. A remaining useful life prediction model based on hybrid long–short sequences for engines. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 1757–1762. [Google Scholar]
  36. Javed, K.; Gouriveau, R.; Zerhouni, N.; Nectoux, P. Enabling health monitoring approach based on vibration data for accurate prognostics. IEEE Trans. Ind. Electron. 2014, 62, 647–656. [Google Scholar] [CrossRef]
  37. Boškoski, P.; Musizza, B.; Dolenc, B.; Juričić, Ð. Entropy Indices for Estimation of the Remaining Useful Life. In Advances in Technical Diagnostics, Proceedings of the 6th International Congress on Technical Diagnostic, ICDT2016, Gliwice, Poland, 12–16 September 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 373–384. [Google Scholar]
  38. Qiu, H.; Lee, J.; Lin, J.; Yu, G. Robust performance degradation assessment methods for enhanced rolling element bearing prognostics. Adv. Eng. Inform. 2003, 17, 127–140. [Google Scholar] [CrossRef]
  39. Baraldi, P.; Bonfanti, G.; Zio, E. Differential evolution-based multi-objective optimization for the definition of a health indicator for fault diagnostics and prognostics. Mech. Syst. Signal Process. 2018, 102, 382–400. [Google Scholar] [CrossRef]
  40. Guo, L.; Lei, Y.; Li, N.; Yan, T.; Li, N. Machinery health indicator construction based on convolutional neural networks considering trend burr. Neurocomputing 2018, 292, 142–150. [Google Scholar] [CrossRef]
  41. Li, X.; Jiang, H.; Xiong, X.; Shao, H. Rolling bearing health prognosis using a modified health index based hierarchical gated recurrent unit network. Mech. Mach. Theory 2019, 133, 229–249. [Google Scholar] [CrossRef]
  42. Zhao, L.; Wang, X. A deep feature optimization fusion method for extracting bearing degradation features. IEEE Access 2018, 6, 19640–19653. [Google Scholar] [CrossRef]
  43. Xia, M.; Li, T.; Shu, T.; Wan, J.; De Silva, C.W.; Wang, Z. A two-stage approach for the remaining useful life prediction of bearings using deep neural networks. IEEE Trans. Ind. Inform. 2018, 15, 3703–3711. [Google Scholar] [CrossRef]
  44. Xia, M.; Li, T.; Liu, L.; Xu, L.; Gao, S.; De Silva, C.W. Remaining useful life prediction of rotating machinery using hierarchical deep neural network. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canda, 5–8 October 2017; pp. 2778–2783. [Google Scholar]
  45. Senanayaka, J.S.L.; Van Khang, H.; Robbersmyr, K.G. Autoencoders and recurrent neural networks based algorithm for prognosis of bearing life. In Proceedings of the 2018 21st International Conference on Electrical Machines and Systems (ICEMS), Jeju, Republic of Korea, 7–10 October 2018; pp. 537–542. [Google Scholar]
  46. Guo, L.; Li, N.; Jia, F.; Lei, Y.; Lin, J. A recurrent neural network based health indicator for remaining useful life prediction of bearings. Neurocomputing 2017, 240, 98–109. [Google Scholar] [CrossRef]
  47. Gugulothu, N.; Tv, V.; Malhotra, P.; Vig, L.; Agarwal, P.; Shroff, G. Predicting remaining useful life using time series embeddings based on recurrent neural networks. arXiv 2017, arXiv:1709.01073. [Google Scholar] [CrossRef]
  48. Malhotra, P.; Tv, V.; Ramakrishnan, A.; Anand, G.; Vig, L.; Agarwal, P.; Shroff, G. Multi-sensor prognostics using an unsupervised health index based on LSTM encoder–decoder. arXiv 2016, arXiv:1608.06154. [Google Scholar]
  49. Hasani, R.M.; Wang, G.; Grosu, R. An automated auto-encoder correlation-based health-monitoring and prognostic method for machine bearings. arXiv 2017, arXiv:1703.06272. [Google Scholar]
  50. Chen, Y.; Peng, G.; Zhu, Z.; Li, S. A novel deep learning method based on attention mechanism for bearing remaining useful life prediction. Appl. Soft Comput. 2020, 86, 105919. [Google Scholar] [CrossRef]
  51. Sahu, A.; Apley, D.W.; Runger, G.C. Feature selection for noisy variation patterns using kernel principal component analysis. Knowl. Based Syst. 2014, 72, 37–47. [Google Scholar] [CrossRef]
  52. Yoo, Y.; Baek, J.G. A novel image feature for the remaining useful lifetime prediction of bearings based on continuous wavelet transform and convolutional neural network. Appl. Sci. 2018, 8, 1102. [Google Scholar] [CrossRef]
  53. Wen, L.; Li, X.; Gao, L.; Zhang, Y. A new convolutional neural-network-based data-driven fault diagnosis method. IEEE Trans. Ind. Electron. 2017, 65, 5990–5998. [Google Scholar] [CrossRef]
  54. Chitraganti, S.; Aberkane, S.; Aubrun, C. Statistical properties of exponentially weighted moving average algorithm for change detection. In Proceedings of the 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), Maui, HI, USA, 10–13 December 2012; pp. 574–578. [Google Scholar]
  55. Lin, Y.; Li, X.; Hu, Y. Deep diagnostics and prognostics: An integrated hierarchical learning framework in PHM applications. Appl. Soft Comput. 2018, 72, 555–564. [Google Scholar] [CrossRef]
  56. Tang, J.; Deng, C.; Huang, G.B. Extreme learning machine for multilayer perceptron. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 809–821. [Google Scholar] [CrossRef] [PubMed]
  57. Yan, H.; Wan, J.; Zhang, C.; Tang, S.; Hua, Q.; Wang, Z. Industrial big data analytics for prediction of remaining useful life based on deep learning. IEEE Access 2018, 6, 17190–17197. [Google Scholar] [CrossRef]
  58. Deutsch, J.; He, D. Using deep-learning-based approaches for bearing remaining useful life prediction. In Proceedings of the Annual Conference of the PHM Society, Denver, CO, USA, 3–6 October 2016; Volume 8. [Google Scholar]
  59. Liao, L.; Jin, W.; Pavel, R. Enhanced restricted Boltzmann machine with prognosability regularization for prognostics and health assessment. IEEE Trans. Ind. Electron. 2016, 63, 7076–7083. [Google Scholar] [CrossRef]
  60. Kohonen, T. The self-organizing map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
  61. Ma, M.; Sun, C.; Chen, X. Discriminative deep belief networks with ant colony optimization for health status assessment of machine. IEEE Trans. Instrum. Meas. 2017, 66, 3115–3125. [Google Scholar] [CrossRef]
  62. Wang, P.; Gao, R.X.; Yan, R. A deep-learning-based approach to material removal rate prediction in polishing. CIRP Ann. 2017, 66, 429–432. [Google Scholar] [CrossRef]
  63. Shao, H.; Jiang, H.; Li, X.; Liang, T. Rolling bearing fault detection using continuous deep belief network with locally linear embedding. Comput. Ind. 2018, 96, 27–39. [Google Scholar] [CrossRef]
  64. Kuremoto, T.; Kimura, S.; Kobayashi, K.; Obayashi, M. Time series forecasting using a deep belief network with restricted Boltzmann machines. Neurocomputing 2014, 137, 47–56. [Google Scholar] [CrossRef]
  65. Chen, H.; Murray, A.F. Continuous restricted Boltzmann machine with an implementable training algorithm. IEE Proc. Vis. Image Signal Process. 2003, 150, 153–158. [Google Scholar] [CrossRef]
  66. Lipton, Z.C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv 2015, arXiv:1506.00019. [Google Scholar]
  67. Malhotra, P.; TV, V.; Vig, L.; Agarwal, P.; Shroff, G. TimeNet: Pre-trained deep recurrent neural network for time series classification. arXiv 2017, arXiv:1706.08838. [Google Scholar]
  68. Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef] [PubMed]
  69. Tse, P.; Atherton, D. Prediction of Machine Deterioration Using Vibration Based Fault Trends and Recurrent Neural Networks. Trans. ASME 1999, 121, 355–362. [Google Scholar] [CrossRef]
  70. Malhi, A.; Yan, R.; Gao, R.X. Prognosis of defect propagation based on recurrent neural networks. IEEE Trans. Instrum. Meas. 2011, 60, 703–711. [Google Scholar] [CrossRef]
  71. Tian, Z.; Zuo, M.J. Health condition prognostics of gears using a recurrent neural network approach. In Proceedings of the 2009 Annual Reliability and Maintainability Symposium, Fort Worth, TX, USA, 26–29 January 2009; pp. 460–465. [Google Scholar]
  72. Liu, J.; Saxena, A.; Goebel, K.; Saha, B.; Wang, W. An Adaptive Recurrent Neural Network for Remaining Useful Life Prediction of Lithium-Ion Batteries; Technical Report; National Aeronautics and Space Administration Ames Research: Moffett Field, CA, USA, 2010. [Google Scholar]
  73. Peng, Y.; Wang, H.; Wang, J.; Liu, D.; Peng, X. A modified echo state network based remaining useful life estimation approach. In Proceedings of the 2012 IEEE Conference on Prognostics and Health Management, Denver, CO, USA, 18–21 June 2012; pp. 1–7. [Google Scholar]
  74. Morando, S.; Jemei, S.; Gouriveau, R.; Zerhouni, N.; Hissel, D. Fuel cells prognostics using echo state network. In Proceedings of the IECON 2013-39th Annual Conference of the IEEE Industrial Electronics Society, Vienna, Austria, 10–13 November 2013; pp. 1632–1637. [Google Scholar]
  75. Zhanga, B.; Zhang, L.; Xub, J. Remaining useful life prediction for rolling element bearing based on ensemble learning. Chem. Eng. 2013, 33, 157–162. [Google Scholar]
  76. Zheng, S.; Ristovski, K.; Farahat, A.; Gupta, C. Long short-term memory network for remaining useful life estimation. In Proceedings of the 2017 IEEE International Conference on Prognostics and Health Management (ICPHM), Dallas, TX, USA, 19–21 June 2017; pp. 88–95. [Google Scholar]
  77. Liu, J.; Li, Q.; Chen, W.; Yan, Y.; Qiu, Y.; Cao, T. Remaining useful life prediction of PEMFC based on long short-term memory recurrent neural networks. Int. J. Hydrogen Energy 2019, 44, 5470–5480. [Google Scholar] [CrossRef]
  78. Zhang, Y.; Xiong, R.; He, H.; Liu, Z. A LSTM-RNN method for the lithuim-ion battery remaining useful life prediction. In Proceedings of the 2017 Prognostics and System Health Management Conference (PHM-Harbin), Harbin, China, 9–12 July 2017; pp. 1–4. [Google Scholar]
  79. Chemali, E.; Kollmeyer, P.J.; Preindl, M.; Ahmed, R.; Emadi, A. Long short-term memory networks for accurate state-of-charge estimation of Li-ion batteries. IEEE Trans. Ind. Electron. 2017, 65, 6730–6739. [Google Scholar] [CrossRef]
  80. Li, P.; Zhang, Z.; Xiong, Q.; Ding, B.; Hou, J.; Luo, D.; Rong, Y.; Li, S. State-of-health estimation and remaining useful life prediction for the lithium-ion battery based on a variant long short term memory neural network. J. Power Sources 2020, 459, 228069. [Google Scholar] [CrossRef]
  81. Park, K.; Choi, Y.; Choi, W.J.; Ryu, H.Y.; Kim, H. LSTM-based battery remaining useful life prediction with multi-channel charging profiles. IEEE Access 2020, 8, 20786–20798. [Google Scholar] [CrossRef]
  82. Zhou, F.; Hu, P.; Yang, X. RUL prognostics method based on real time updating of LSTM parameters. In Proceedings of the 2018 Chinese Control Decis. Conf. (CCDC), Shenyang, China, 9–11 June 2018; pp. 3966–3971. [Google Scholar]
  83. Zhang, Y.; Xiong, R.; He, H.; Pecht, M.G. Long short-term memory recurrent neural network for remaining useful life prediction of lithium-ion batteries. IEEE Trans. Veh. Technol. 2018, 67, 5695–5705. [Google Scholar] [CrossRef]
  84. Dong, D.; Li, X.Y.; Sun, F.Q. Life prediction of jet engines based on lstm-recurrent neural networks. In Proceedings of the 2017 Prognostics and System Health Management Conference (PHM-Harbin), Harbin, China, 9–12 July 2017; pp. 1–6. [Google Scholar]
  85. Yuan, M.; Wu, Y.; Lin, L. Fault diagnosis and remaining useful life estimation of aero engine using LSTM neural network. In Proceedings of the 2016 IEEE International Conference on Aircraft Utility Systems (AUS), Beijing, China, 10–12 October 2016; pp. 135–140. [Google Scholar]
  86. Wang, F.; Liu, X.; Deng, G.; Yu, X.; Li, H.; Han, Q. Remaining life prediction method for rolling bearing based on the long short-term memory network. Neural Process. Lett. 2019, 50, 2437–2454. [Google Scholar] [CrossRef]
  87. Xiang, S.; Qin, Y.; Luo, J.; Pu, H.; Tang, B. Multicellular LSTM-based deep learning model for aero-engine remaining useful life prediction. Reliab. Eng. Syst. Saf. 2021, 216, 107927. [Google Scholar] [CrossRef]
  88. Chen, Z.; Wu, M.; Zhao, R.; Guretno, F.; Yan, R.; Li, X. Machine remaining useful life prediction via an attention-based deep learning approach. IEEE Trans. Ind. Electron. 2020, 68, 2521–2531. [Google Scholar] [CrossRef]
  89. Hsu, C.S.; Jiang, J.R. Remaining useful life estimation using long short-term memory deep learning. In Proceedings of the 2018 IEEE International Conference on Applied System Invention (ICASI), Chiba, Japan, 13–17 April 2018; pp. 58–61. [Google Scholar]
  90. Lima, F.D.S.; Pereira, F.L.F.; Leite, L.G.; Gomes, J.P.P.; Machado, J.C. Remaining useful life estimation of hard disk drives based on deep neural networks. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–7. [Google Scholar]
  91. Zhang, J.; Wang, P.; Yan, R.; Gao, R.X. Deep learning for improved system remaining life prediction. Procedia CIRP 2018, 72, 1033–1038. [Google Scholar] [CrossRef]
  92. Zheng, S.; Ristovski, K.; Gupta, C.; Farahat, A. Deep Long Short Term Memory Network for Estimation of Remaining Useful Life of the Components. U.S. Patent 11,288,577, 29 March 2019. [Google Scholar]
  93. Zhao, R.; Wang, J.; Yan, R.; Mao, K. Machine health monitoring with LSTM networks. In Proceedings of the 2016 10th International Conference on Sensing Technology (ICST), Nanjing, China, 11–13 November 2016; pp. 1–6. [Google Scholar]
  94. Zhang, J.; Wang, P.; Yan, R.; Gao, R.X. Long short-term memory for machine remaining life prediction. J. Manuf. Syst. 2018, 48, 78–86. [Google Scholar] [CrossRef]
  95. Zhou, J.; Qin, Y.; Luo, J.; Zhu, T. Remaining useful life prediction by distribution contact ratio health indicator and consolidated memory GRU. IEEE Trans. Ind. Informat. 2022, 19, 8472–8483. [Google Scholar] [CrossRef]
  96. Ren, L.; Cheng, X.; Wang, X.; Cui, J.; Zhang, L. Multi-scale dense gate recurrent unit networks for bearing remaining useful life prediction. Fut. Gener. Comput. Syst. 2019, 94, 601–609. [Google Scholar] [CrossRef]
  97. Chen, J.; Jing, H.; Chang, Y.; Liu, Q. Gated recurrent unit based recurrent neural network for remaining useful life prediction of nonlinear deterioration process. Reliab. Eng. Syst. Saf. 2019, 185, 372–382. [Google Scholar] [CrossRef]
  98. She, D.; Jia, M. A BiGRU method for remaining useful life prediction of machinery. Measurement 2021, 167, 108277. [Google Scholar] [CrossRef]
  99. Zhao, R.; Wang, D.; Yan, R.; Mao, K.; Shen, F.; Wang, J. Machine health monitoring using local feature-based gated recurrent unit networks. IEEE Trans. Ind. Electron. 2017, 65, 1539–1548. [Google Scholar] [CrossRef]
  100. Wu, J.; Hu, K.; Cheng, Y.; Zhu, H.; Shao, X.; Wang, Y. Data-driven remaining useful life prediction via multiple sensor signals and deep long short-term memory neural network. ISA Trans. 2020, 97, 241–250. [Google Scholar] [CrossRef] [PubMed]
  101. Zhang, J.; Jiang, Y.; Wu, S.; Li, X.; Luo, H.; Yin, S. Prediction of remaining useful life based on bidirectional gated recurrent unit with temporal self-attention mechanism. Reliab. Eng. Syst. Saf. 2022, 221, 108297. [Google Scholar] [CrossRef]
  102. Storn, R. On the usage of differential evolution for function optimization. In Proceedings of the North American Fuzzy Information Processing, Berkeley, CA, USA, 19–22 June 1996; pp. 519–523. [Google Scholar]
  103. Morando, S.; Jemei, S.; Hissel, D.; Gouriveau, R.; Zerhouni, N. Proton exchange membrane fuel cell ageing forecasting algorithm based on Echo State Network. Int. J. Hydrogen Energy 2017, 42, 1472–1480. [Google Scholar] [CrossRef]
  104. Rigamonti, M.; Baraldi, P.; Zio, E.; Roychoudhury, I.; Goebel, K.; Poll, S. Echo state network for the remaining useful life prediction of a turbofan engine. In Proceedings of the European Conference of the PHM Society, Bilbao, Spain, 5–8 July 2016; pp. 255–270. [Google Scholar]
  105. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  106. Ma, M.; Mao, Z. Deep-convolution-based LSTM network for remaining useful life prediction. IEEE Trans. Ind. Inform. 2020, 17, 1658–1667. [Google Scholar] [CrossRef]
  107. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
  108. Babu, G.S.; Zhao, P.; Li, X.L. Deep convolutional neural network based regression approach for estimation of remaining useful life. In Database Systems for Advanced Applications, Proceeding of the 21st International Conference, DASFAA 2016, Dallas, TX, USA, 16–19 April 2016, Proceedings, Part I 21; Springer: Berlin/Heidelberg, Germany, 2016; pp. 214–228. [Google Scholar]
  109. Zhu, J.; Chen, N.; Peng, W. Estimation of bearing remaining useful life based on multiscale convolutional neural network. IEEE Trans. Ind. Electron. 2018, 66, 3208–3216. [Google Scholar] [CrossRef]
  110. Li, X.; Ding, Q.; Sun, J.Q. Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab. Eng. Syst. Saf. 2018, 172, 1–11. [Google Scholar] [CrossRef]
  111. Ren, L.; Sun, Y.; Wang, H.; Zhang, L. Prediction of bearing remaining useful life with deep convolution neural network. IEEE Access 2018, 6, 13041–13049. [Google Scholar] [CrossRef]
  112. Jiang, J.R.; Kuo, C.K. Enhancing Convolutional Neural Network Deep Learning for Remaining Useful Life Estimation in Smart Factory Applications. In Proceedings of the 2017 International Conference on Information, Communication and Engineering (ICICE), Xiamen, China, 17–20 November 2017; pp. 120–123. [Google Scholar]
  113. Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. Proc. ICML 2013, 30, 3. [Google Scholar]
  114. Li, H.; Zhao, W.; Zhang, Y.; Zio, E. Remaining useful life prediction using multi-scale deep convolutional neural network. Appl. Soft Comput. 2020, 89, 106113. [Google Scholar] [CrossRef]
  115. Wang, B.; Lei, Y.; Li, N.; Wang, W. Multiscale convolutional attention network for predicting remaining useful life of machinery. IEEE Trans. Ind. Electron. 2020, 68, 7496–7504. [Google Scholar] [CrossRef]
  116. Zhang, Y.; Feng, K.; Ji, J.; Yu, K.; Ren, Z.; Liu, Z. Dynamic model-assisted bearing remaining useful life prediction using the cross-domain transformer network. IEEE/ASME Trans. Mechatron. 2022, 28, 1070–1080. [Google Scholar] [CrossRef]
  117. Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
  118. Zhang, A.; Wang, H.; Li, S.; Cui, Y.; Liu, Z.; Yang, G.; Hu, J. Transfer learning with deep recurrent neural networks for remaining useful life estimation. Appl. Sci. 2018, 8, 2416. [Google Scholar] [CrossRef]
  119. Ramasso, E.; Saxena, A. Performance Benchmarking and Analysis of Prognostic Methods for CMAPSS Datasets. Int. J. Progn. Health Manag. 2014, 5, 1–15. [Google Scholar] [CrossRef]
  120. Sun, C.; Ma, M.; Zhao, Z.; Tian, S.; Yan, R.; Chen, X. Deep transfer learning based on sparse autoencoder for remaining useful life prediction of tool in manufacturing. IEEE Trans. Ind. Inform. 2018, 15, 2416–2425. [Google Scholar] [CrossRef]
  121. da Costa, P.R.d.O.; Akçay, A.; Zhang, Y.; Kaymak, U. Remaining useful lifetime prediction via deep domain adaptation. Reliab. Eng. Syst. Saf. 2020, 195, 106682. [Google Scholar] [CrossRef]
  122. Zhu, J.; Chen, N.; Shen, C. A new data-driven transferable remaining useful life prediction approach for bearing under different working conditions. Mech. Syst. Signal Process. 2020, 139, 106602. [Google Scholar] [CrossRef]
  123. Cao, Y.; Jia, M.; Ding, P.; Ding, Y. Transfer learning for remaining useful life prediction of multi-conditions bearings based on bidirectional-GRU network. Measurement 2021, 178, 109287. [Google Scholar] [CrossRef]
  124. Cheng, H.; Kong, X.; Chen, G.; Wang, Q.; Wang, R. Transferable convolutional neural network based remaining useful life prediction of bearing under multiple failure behaviors. Measurement 2021, 168, 108286. [Google Scholar] [CrossRef]
  125. Dong, S.; Xiao, J.; Hu, X.; Fang, N.; Liu, L.; Yao, J. Deep transfer learning based on Bi-LSTM and attention for remaining useful life prediction of rolling bearing. Reliab. Eng. Syst. Saf. 2023, 230, 108914. [Google Scholar] [CrossRef]
  126. Long, J.; Sun, Z.; Li, C.; Hong, Y.; Bai, Y.; Zhang, S. A novel sparse echo autoencoder network for data-driven fault diagnosis of delta 3-D printers. IEEE Trans. Instrum. Meas. 2019, 69, 683–692. [Google Scholar] [CrossRef]
  127. Zhang, S.; Sun, Z.; Li, C.; Cabrera, D.; Long, J.; Bai, Y. Deep hybrid state network with feature reinforcement for intelligent fault diagnosis of delta 3-D printers. IEEE Trans. Ind. Inform. 2019, 16, 779–789. [Google Scholar] [CrossRef]
  128. Gao, Z.; Ma, C.; Luo, Y. Rul prediction for ima based on deep regression method. In Proceedings of the 2017 IWCIA, Hiroshima, Japan, 11–12 November 2017; pp. 25–31. [Google Scholar]
  129. Song, Y.; Shi, G.; Chen, L.; Huang, X.; Xia, T. Remaining useful life prediction of turbofan engine using hybrid model based on autoencoder and bidirectional long short-term memory. J. Shanghai Jiaotong Univ. 2018, 23, 85–94. [Google Scholar] [CrossRef]
  130. Zhao, R.; Yan, R.; Wang, J.; Mao, K. Learning to monitor machine health with convolutional bidirectional LSTM networks. Sensors 2017, 17, 273. [Google Scholar] [CrossRef]
  131. An, Q.; Tao, Z.; Xu, X.; El Mansori, M.; Chen, M. A data-driven model for milling tool remaining useful life prediction with convolutional and stacked LSTM network. Measurement 2020, 154, 107461. [Google Scholar] [CrossRef]
  132. Liu, H.; Liu, Z.; Jia, W.; Lin, X. Remaining useful life prediction using a novel feature-attention-based end-to-end approach. IEEE Trans. Ind. Inform. 2020, 17, 1197–1207. [Google Scholar] [CrossRef]
  133. Ren, L.; Dong, J.; Wang, X.; Meng, Z.; Zhao, L.; Deen, M.J. A data-driven auto-CNN-LSTM prediction model for lithium-ion battery remaining useful life. IEEE Trans. Ind. Inform. 2020, 17, 3478–3487. [Google Scholar] [CrossRef]
  134. Hinchi, A.Z.; Tkiouat, M. Rolling element bearing remaining useful life estimation based on a convolutional long–short-term memory network. Procedia Comput. Sci. 2018, 127, 123–132. [Google Scholar] [CrossRef]
  135. Ren, L.; Sun, Y.; Cui, J.; Zhang, L. Bearing remaining useful life prediction based on deep autoencoder and deep neural networks. J. Manuf. Syst. 2018, 48, 71–77. [Google Scholar] [CrossRef]
  136. Deutsch, J.; He, D. Using deep-learning-based approach to predict remaining useful life of rotating components. IEEE Trans. Syst. Man Cybern. Syst. 2017, 48, 11–20. [Google Scholar] [CrossRef]
  137. Daroogheh, N.; Baniamerian, A.; Meskin, N.; Khorasani, K. Prognosis and health monitoring of nonlinear systems using a hybrid scheme through integration of PFs and neural networks. IEEE Trans. Syst. Man Cybern. Syst. 2016, 47, 1990–2004. [Google Scholar] [CrossRef]
  138. Li, X.; Zhang, L.; Wang, Z.; Dong, P. Remaining useful life prediction for lithium-ion batteries based on a hybrid model combining the long short-term memory and Elman neural networks. J. Energy Storage 2019, 21, 510–518. [Google Scholar] [CrossRef]
  139. Peel, L. Data driven prognostics using a Kalman filter ensemble of neural network models. In Proceedings of the 2008 International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; pp. 1–6. [Google Scholar]
  140. Lim, P.; Goh, C.K.; Tan, K.C.; Dutta, P. Estimation of Remaining Useful Life Based on Switching Kalman Filter Neural Network Ensemble; Technical Report; Rolls Royce Singapore: Singapore, 2014. [Google Scholar]
  141. Lim, P.; Goh, C.K.; Tan, K.C.; Dutta, P. Multimodal degradation prognostics based on switching Kalman filter ensemble. IEEE Trans. Neural Netw. Learn. Syst. 2015, 28, 136–148. [Google Scholar] [CrossRef]
  142. Hu, C.; Youn, B.D.; Wang, P.; Yoon, J.T. Ensemble of data-driven prognostic algorithms for robust prediction of remaining useful life. Reliab. Eng. Syst. Saf. 2012, 103, 120–135. [Google Scholar] [CrossRef]
  143. Baraldi, P.; Compare, M.; Sauco, S.; Zio, E. Ensemble neural-network-based particle filtering for prognostics. Mech. Syst. Signal Process. 2013, 41, 288–300. [Google Scholar] [CrossRef]
  144. Rigamonti, M.; Baraldi, P.; Zio, E.; Roychoudhury, I.; Goebel, K.; Poll, S. Ensemble of optimized echo state networks for remaining useful life prediction. Neurocomputing 2018, 281, 121–138. [Google Scholar] [CrossRef]
  145. Li, Z.; Wu, D.; Hu, C.; Terpenny, J. An ensemble learning-based prognostic approach with degradation-dependent weights for remaining useful life prediction. Reliab. Eng. Syst. Saf. 2019, 184, 110–122. [Google Scholar] [CrossRef]
  146. Xia, T.; Song, Y.; Zheng, Y.; Pan, E.; Xi, L. An ensemble framework based on convolutional bidirectional LSTM with multiple time windows for remaining useful life estimation. Comput. Ind. 2020, 115, 103182. [Google Scholar] [CrossRef]
  147. Zhang, C.; Lim, P.; Qin, A.K.; Tan, K.C. Multiobjective deep belief networks ensemble for remaining useful life estimation in prognostics. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2306–2318. [Google Scholar] [CrossRef] [PubMed]
  148. Li, H.; Zhang, Q. Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGA-II. IEEE Trans. Evol. Comput. 2008, 13, 284–302. [Google Scholar] [CrossRef]
  149. Storn, R.; Price, K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
  150. Yan, H.; Liu, K.; Zhang, X.; Shi, J. Multiple sensor data fusion for degradation modeling and prognostics under multiple operational conditions. IEEE Trans. Reliab. 2016, 65, 1416–1426. [Google Scholar] [CrossRef]
  151. Mi, J.; Li, Y.F.; Yang, Y.J.; Peng, W.; Huang, H.Z. Reliability assessment of complex electromechanical systems under epistemic uncertainty. Reliab. Eng. Syst. Saf. 2016, 152, 1–15. [Google Scholar] [CrossRef]
  152. Yoon, A.S.; Lee, T.; Lim, Y.; Jung, D.; Kang, P.; Kim, D.; Park, K.; Choi, Y. Semi-supervised learning with deep generative models for asset failure prediction. arXiv 2017, arXiv:1709.00845. [Google Scholar]
  153. Ellefsen, A.L.; Bjørlykhaug, E.; Æsøy, V.; Ushakov, S.; Zhang, H. Remaining useful life predictions for turbofan engine degradation using semi-supervised deep architecture. Reliab. Eng. Syst. Saf. 2019, 183, 240–251. [Google Scholar] [CrossRef]
  154. Zhao, G.; Zhang, G.; Liu, Y.; Zhang, B.; Hu, C. Lithium-ion battery remaining useful life prediction with Deep Belief Network and Relevance Vector Machine. In Proceedings of the 2017 IEEE International Conference on Prognostics and Health Management (ICPHM), Dallas, TX, USA, 19–21 June 2017; pp. 7–13. [Google Scholar]
  155. Liao, Y.; Zhang, L.; Liu, C. Uncertainty prediction of remaining useful life using long short-term memory network based on bootstrap method. In Proceedings of the 2018 IEEE International Conference on Prognostics and Health Management (ICPHM), Seattle, WA, USA, 11–13 June 2018; pp. 1–8. [Google Scholar]
  156. Wang, B.; Lei, Y.; Yan, T.; Li, N.; Guo, L. Recurrent convolutional neural network: A new framework for remaining useful life prediction of machinery. Neurocomputing 2020, 379, 117–129. [Google Scholar] [CrossRef]
  157. Yang, J.; Peng, Y.; Xie, J.; Wang, P. Remaining useful life prediction method for bearings based on LSTM with uncertainty quantification. Sensors 2022, 22, 4549. [Google Scholar] [CrossRef] [PubMed]
  158. Liu, K.; Shang, Y.; Ouyang, Q.; Widanage, W.D. A data-driven approach with uncertainty quantification for predicting future capacities and remaining useful life of lithium-ion battery. IEEE Trans. Ind. Electron. 2020, 68, 3170–3180. [Google Scholar] [CrossRef]
  159. Wang, W.; Lei, Y.; Yan, T.; Li, N.; Nandi, A.K. Residual Convolution Long Short-Term Memory Network for Machines Remaining Useful Life Prediction and Uncertainty Quantification. J. Dyn. Monit. Diagn. 2021, 1, 2–8. [Google Scholar] [CrossRef]
  160. Qin, Y.; Yang, J.; Zhou, J.; Pu, H.; Mao, Y. A new supervised multi-head self-attention autoencoder for health indicator construction and similarity-based machinery RUL prediction. Adv. Eng. Inform. 2023, 56, 101973. [Google Scholar] [CrossRef]
  161. Qin, Y.; Zhou, J.; Chen, D. Unsupervised health indicator construction by a novel degradation-trend-constrained variational autoencoder and its applications. IEEE/ASME Trans. Mechatron. 2021, 27, 1447–1456. [Google Scholar] [CrossRef]
  162. Zhou, J.; Qin, Y.; Luo, J.; Wang, S.; Zhu, T. Dual-thread gated recurrent unit for gear remaining useful life prediction. IEEE Trans. Ind. Inform. 2022, 19, 8307–8318. [Google Scholar] [CrossRef]
Figure 1. The development background of RUL prediction based on deep learning.
Figure 1. The development background of RUL prediction based on deep learning.
Sensors 24 03454 g001
Figure 2. Unified deep-learning-based RUL prediction framework.
Figure 2. Unified deep-learning-based RUL prediction framework.
Sensors 24 03454 g002
Figure 3. Future directions.
Figure 3. Future directions.
Sensors 24 03454 g003
Table 1. Acronyms used in the survey.
Table 1. Acronyms used in the survey.
NotationDescription
TBMTime-based maintenance
CBMCondition-based maintenance
HIHealth indicator
DNNDeep neural network
ANNArtificial neural network
AEAuto-encoder
SDAStacked denoising Auto-encoder
SAEStacked sparse Auto-encoder
EAEEnhanced Auto-encoder
RBMRestricted Boltzmann machines
DBNDeep belief network
CDBNContinuous deep belief network
FNNFeed-forward neural network
RNNRecurrent neural network
RNN-EDRNN encoder–decoder
ESNEcho state network
LSTMLong short-term memory
GRUGated recurrent unit
CNNConvolutional neural network
MSCNNMulti-scale convolutional neural network
PCAPrinciple component analysis
KPCAKernel principal component analysis
MAMoving average method
EWMAExponentially weighted moving average
GWOGrey wolf optimizer
SOMSelf-organizing map
CWTContinuous wavelet transform
GPRGaussian process regression
ELMExtreme learning machines
HELMHierarchical extreme learning machines
ACOAnt colony optimization
GAGenetic algorithm
BPTTBack-propagation through time
SSLSemi-supervised learning
ReLURectified linear unit
GBRGradient-boosting regression
Table 2. Comparison between related survey works.
Table 2. Comparison between related survey works.
PHMRULTDDSNNDNN
Schwabacher et al. [14]
Si et al. [8]
Liu et al. [15]
Nash et al. [16]
Zhao et al. [17]
Khan et al. [18]
Zhao et al. [19]
Remadna et al. [20]
Wang et al. [21]
Table 3. Health indexing technologies.
Table 3. Health indexing technologies.
Transformation and SelectionFusion and Regression
Baraldi et al. [39]Binary Differential EvolutionAuto-Associative Kernel Regression
Guo et al. [40]CNN3 σ rule
Li et al. [41]KPCAExponentially weighted moving average
Zhao et al. [42]Enhanced autoencoder, SOM networkGrey wolf optimizer
Yoo et al. [42]CWT, CNNGaussian process regression
Xia et al. [43,44]SDAshallow ANN
Senanayaka et al. [45]CWT, sparse autoencoderLSTM
Guo et al. [46]Related-similarity feature approachRNN
Gugulothu et al. [47,48]RNN-EDMasking vector, Delta vector
Hasani et al. [49]Auto-encoderMoving average filter
Chen et al. [50]CNN, bidirectional GRUGRU
Table 4. RBM methods.
Table 4. RBM methods.
FieldStructure
Ma et al. [61]Bearings, turbine engineStacked RBMs followed by an discriminative fine-tuning layer
Wang et al. [62]Material removal rateStacked RBMs followed by an feed-forward three layers perceptron network
Shao et al. [63]Rolling bearingCDBN
Liao et al. [59]BearingsEnhanced RBM with SOM method for feature fusion
Table 5. RNN methods.
Table 5. RNN methods.
FieldStructure
Tse et al. [69]Industrial machinesThe output node is feedback loop linked to extra input nodes.
Malhi et al. [70]Rolling bearingThe output node is feedback loop linked to extra input nodes.
Tian et al. [71]GearboxExtended RNN
Liu et al. [72]Lithium-ion batteryAdaptive RNN
Heimes et al. [27]PHM 08 Challenge datasetRNN
Peng et al. [73]Turbofan engineModified ESN
Morando et al. [74,75]Proton Exchange Membrane Fuel CellESN
Zheng et al. [76]C-MAPSS dataset, PHM 08 Challenge dataset, Milling datasetLSTM
Liu et al. [77]PEMFCLSTM
Zhang et al. [78], Chemali et al. [79],
Li et al. [80], Park et al. [81],
Zhou et al. [82], Zhang et al. [83]
Lithium-ion batteryLSTM
Dong et al. [84]Jet enginesLSTM
Yuan et al. [85]Aero enginesLSTM
Wang et al. [86]Rolling bearingLSTM
Xiang et al. [87]Aero enginesMCLSTM
Chen et al. [88]C-MAPSS datasetLSTM
Hsu et al. [89]C-MAPSS datasetStacking two LSTM layers
Lima et al. [90]Hard disk driversStacking two LSTM layers and one fully connected layer
Zhang et al. [91]Gas turbine engineDeep LSTM
Zheng et al. [92]Equipment systemDeep LSTM
Zhao et al. [93]Tool wearStacking multiple LSTM layers
Zhang et al. [94]C-MAPSS datasetBidirectional LSTM
Zhou et al. [95]BearingCMGRU
Ren et al. [96]BearingMDGRU
Chen et al. [97]C-MAPSS datasetGRU
She et al. [98]BearingBiGRU
Zhao et al. [99]Tool wearLFGRU
Wu et al. [100]C-MAPSS datasetdeep LSTM
Zhang et al. [101]Turbofan engineBiGRU
Table 6. CNN methods.
Table 6. CNN methods.
FieldStructure
Babu et al. [108]C-MAPSS dataset, PHM 08 Challenge datasetCNN
Zhu et al. [109]BearingMSCNN
Li et al. [110]C-MAPSS datasetDCNN
Li et al. [29]PHM 2012 Challenge datasetMSCNN
Ren et al. [111]BearingCNN
Jiang et al. [112]C-MAPSS datasetECNN
Table 7. Hybrid methods.
Table 7. Hybrid methods.
FieldStructure
Gao et al. [128]IMASDAE, SVM
Song et al. [129]Turbofan EngineAutoencoder–BLSTM
Zhao et al. [130]CNC machineCNN, BLSTM
An et al. [131]Milling toolCNN, LSTM
Liu et al. [132]Turbofan EngineCNN, BGRU
Ren et al. [133]Lithium-ion batteriesCNN, LSTM
Hinchi et al. [134]Rolling element bearingCNN, LSTM
Ren et al. [135]BearingsDeep autoencoder, DNN
Deutsch et al. [136]Gears, bearingsDBN, FNN
Daroogheh et al. [137]Gas turbine enginePFs, MLP, RNNs, WNN
Li et al. [138]Lithium-ion batteriesLSTM, Elman neural networks
Table 8. Ensemble learning methods.
Table 8. Ensemble learning methods.
FieldModelsFeature Fusion
Peel et al. [139]PHM 2008 DatasetRBF, MLPKalman filter
Lim et al. [140,141]PHM 2008 DatasetMLPsSwitching Kalman filter
Hu et al. [142]PHM 2008 Dataset, power transformer, electric cooling fanRVM, SVM, exponential fitting, quadratic fitting, RNNAccuracy-based weighting, diversity-based weighting, optimization-based weighting
Baraldi et al. [143]Crack propagationANNsParticle filter
Zhang et al. [75]Rolling element bearingANNsDynamically weights updating
Rigamonti et al. [144]C-MAPSS dataset, Cutting knivesESNsDynamically local aggregation
Li et al. [145]Aeroengine Bearings, Aircraft enginesRS, ES, SS, QB, RNNDegradation-dependent weighting
Xia et al. [146]C-MAPSS datasetCNN-BLSTMWeighted average method
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, F.; Wu, Q.; Tan, Y.; Xu, X. Remaining Useful Life Prediction Based on Deep Learning: A Survey. Sensors 2024, 24, 3454. https://doi.org/10.3390/s24113454

AMA Style

Wu F, Wu Q, Tan Y, Xu X. Remaining Useful Life Prediction Based on Deep Learning: A Survey. Sensors. 2024; 24(11):3454. https://doi.org/10.3390/s24113454

Chicago/Turabian Style

Wu, Fuhui, Qingbo Wu, Yusong Tan, and Xinghua Xu. 2024. "Remaining Useful Life Prediction Based on Deep Learning: A Survey" Sensors 24, no. 11: 3454. https://doi.org/10.3390/s24113454

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop