ASurveyonDeepLearningforData-drivenSoftSensors Earlyaccess
ASurveyonDeepLearningforData-drivenSoftSensors Earlyaccess
net/publication/348642886
CITATIONS READS
13 430
2 authors, including:
Qingqiang Sun
UNSW Sydney
5 PUBLICATIONS 74 CITATIONS
SEE PROFILE
All content following this page was uploaded by Qingqiang Sun on 29 January 2021.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3
existing work, such as [7, 16], et al. Although those methods According to Universal Approximation Theory, if there are
already have many applications, they may suffer from some enough nodes in the hidden layer, the function represented by
drawbacks, like heavy workload brought by handcrafted feature the network shown in Fig. 1 can approximate any continuous
engineering or inefficiency when dealing with large amount of function [17-19]. Furthermore, using multiple layers of neurons
data, and so on. To demonstrate the significance of DL for soft to represent some functions is much simpler.
sensor modeling, the distinct merits of DL and the trends or Since Hinton et al. proposed a faster learning algorithm
characteristics of industrial processes should be discussed. which was applied to Deep Belief Network (DBN), the
maximum depth of network can be tens of layers [13]. Later on,
A. Merits of deep learning techniques He et al. proposed the Deep Residual Network, which solved
To begin with, the structure of a simple network with single the performance degradation problem caused by increasing
hidden layer is shown in Fig. 1. There are three layers, namely network depth. From then on, the depth of neural network can
an input layer, a hidden layer and an output layer. Input layer reach a level of hundreds of layers [20]. However, “deep” in
contains variables x1, , xm and a constant node “1”. The Deep Learning theory is not absolutely defined. In speech
recognition domain, four layers of network can be considered
hidden layer has many nodes, and each node has an activation as “deep”, while in image recognition, networks with more than
function . The feature in each node is extracted through affine 20 layers are common.
transformation and activation function transformation from Deep Learning has its own advantage compared with
original input layer, which are defined as followed formula: conventional soft sensor modeling methods. Here we classify
H i M i x1 , , xm them into three categories at a greater granularity: rule-based
system, classical machine learning and shallow representation
m (1)
= wik0 xk bi0 learning. The differences between them are shown in Fig. 2, in
k 1 which the green blocks indicate components that are able to
Then the final output is the combination of those composite learn information from data [21].
functions: Rule-based system, also known as production system or
n expert system, is the simplest form of artificial intelligence.
y x w1k H k x (2) Rules are coded into the programs as the representation of
k 1
knowledge, which tell the system what to do or what to
The weight and bias parameters ( wij0 , bi0 ) need to be learned conclude in different situations [22-24]. In this way, the
by minimizing the lost function, which is defined according to performance of rule-based system depends almost entirely on
specific task and target. This process is called as “training” or expert knowledge, which is hard to obtain and hard to update,
“learning”. especially in complicated cases. A rule-based system could be
feature considered as having “fixed” intelligence, in contrast, a
machine learning system is more adaptive and closer to human
wij0 M 1 H1 intelligence. Instead of outputting a result directly from a fixed
x1 w11
set of rules wrote by human, classical machine learning firstly
M 2 H2 w12 extracts features from raw input data and then maps from
y features to obtain the final output. However, the forms of
xm features are still handcrafted based on knowledge and
experience, which is called as feature engineering [25, 26]. In
w1n order to extract features that better represent the underlying
1
0 M n Hn
b i problem, the process of feature engineering is usually
Input Hidden Output complicated, including feature selection, feature construction
layer layer layer and feature extraction. Because the upper bound of the
Fig. 1. The structure of a network with single hidden layer performance of conventional machine learning is mainly
determined by data and features, the effect of those approaches
relies heavily on the ability of the engineer to extract good
features. Therefore, representation learning approaches were
proposed so as to automatically learn the implicit useful
representations or features from raw data [27]. In this way, data
representation is often trained in conjunction with subsequent
predictive tasks. Representation learning does not rely on
expert experience, but requires a large training data set.
Compared with shallow representation learning, deep learning
is a kind of deep representation learning, which tries to learn
more hierarchical and more abstract representations using deep
networks. As an end-to-end approach, what deep learning
needs is enough and quality data, rather than complicated
feature engineering.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3
is because of these three features that CNN is particularly suited D. Recurrent Neural Network
to process grid-like data [56]. RNN is developed for processing sequential data. The basic
Generally, after the convolution there is a pooling operation architecture and loss computation graph of RNN is shown in
to further adjust the output. The pooling function uses the Fig. 7. The left network can be unfolded over time sequence to
overall statistical characteristics of the adjacent outputs at a get the right form. Every time step has an input, a hidden unit,
certain location to replace the network output at that location, and an output. Besides, recurrent connections exist between
and no parameters need to be learned. For instance, the max hidden units.
pooling operation uses the maximum output to represent the Given a specific status h 0 , RNN can propagate forward.
corresponding rectangular region [57]. Other common pooling Suppose the activation of hidden layer is tanh and the output
functions, such as the average of a rectangular neighborhood,
layer is fed into a softmax function to generate normalized
the L2 norm of a rectangular neighborhood, or a weighted
probabilities ŷ , the corresponding layers from t 1 to t
average based on the distance from the central pixel, are also
can be updated according to the following formula:
widely used to compress parameter space. CNN also has a lot of
a b Wh Ux ,
t t 1 t
variants, such as LeNet, AlexNet, VggNet, and so on [58-60]. (6)
h tanh a ,
t
t
(7)
aW+bX bW+cX cW+dX
o c Vh ,
t t
a b c d + + + (8)
e f g h
*
W X
=
eY+fZ fY+gZ gY+hZ
yˆ softmax o ,
t t
(9)
Y Z eW+fX fW+gX gW+hX
i j k l + + + where b and c denotes the bias vectors.
iY+jZ jY+kZ kY+lZ The total loss is just the sum of the losses over all the time
Input Kernel Output steps. For example, if Lt is computed as the negative
Fig. 6. A 2D convolution case log-likelihood of y t given x 1 , , x t , then
y y
t 1
y
t
y
t 1
L x , , x
1
, y ,
1
, y
(10)
L log pmodel y x , , x ,
t t 1 t
L
t 1
L
t t
L
t t 1
CNN Supervised; Grid-like data; Sparse interactions; Parameters The contradiction between the dependence Local dynamic
local feature extractor. sharing; Equivariant representations. on the depth of network and the slow modeling, frequency
parameter updating of deeper network. domain processing, et al.
RNN Supervised; Sequence data; Learn the relationship between The challenge of long-term dependence. Dynamic modeling, et
Update parameters by BPTT. different time steps. al.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3 6
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3 7
applications based on deep learning techniques are reviewed by combining the encoder of the first one with the decoder of
here. The existing work are introduced and discussed, and the the second one, which works well under the missing data
factors such as motivation, strategy, and effectiveness are situation.
mainly highlighted. The following contents are expanded In some cases, AEs could work better by combining it with
according to the mainstream model to which each work other methods or improving its learning strategy. For example,
belongs. Yao et al. implemented a deep network of Autoencoders for
unsupervised feature extraction and then utilized extreme
A. Autoencoder based applications learning machine for regression task [97]. Wang et al. adopted
AE and its variants are widely used to construct soft sensors the Limited-memory Broyden-Fletcher-Goldfarb-Shanno
for semi-supervised learning and dealing with missing data in algorithm to optimize the weights parameters learned by SAE,
industrial processes. Also, excellent performance can be and then the features extracted were fed into support vector
achieved by combining with traditional machine learning regression (SVR) model for estimating the rotor deformation of
algorithms. air preheaters [98]. Instead of using pure data-driven models,
Since AE is an unsupervised-learning model, it is often Wang et al. combined a knowledge-based model (KDM)
modified to a semi-supervised or supervised form so as to named the Lab model with a data-driven model (DDM) namely
complete the predictive tasks. For example, a semi-supervised Stacked Autoencoder, and the experimental results verified that
probabilistic latent variable regression model was developed the hybrid method is prior than using only KDM or DDM [99].
using Variational Autoencoder (VAE) in [87]. A common way Using an improved gradient descent algorithm, Yan et al.
is to introduce the supervision from label variables into the proposed a DAE-based method which was demonstrated to be
procedures of encoding and decoding. In [88], a Variable-wise effective compared with conventional approaches like shallow
Weighted Stacked AE (VW-SAE) was proposed to introduce learning methods [100]. Besides, to adaptively model
the linear Pearson coefficient between the inputs of each hidden time-varying processes, a just-in-time fine-tuning framework
layer and quality labels when pre-training so as to extract was proposed for SAE-based soft sensor construction [101].
feature in a semi-supervised way. Furthermore, techniques
B. Restricted Boltzmann Machine based applications
based on nonlinear relationship, like mutual information [89],
were adopted to better extract feature representation. However, Nonlinearity is a widely-existing characteristic in industrial
both linear and nonlinear relationships are artificially specified processes. Aiming at this, RBM and its variants, especially
and may be inadequate or unsuitable. Thus, a relatively more DBN, are generally used as unsupervised nonlinear feature
intelligent and automatic way is to add the predictive loss of extractors in industrial process modeling.
quality labels into the pre-training cost [90]. Besides, other Predictors can take advantage of features learned by RBM or
strategies also can be adopted to build the connections between DBN, and SVR and BPNN are two common kinds of predictors.
hidden layers and label values. Sun et al. used gated units to For example, to address the problem of high nonlinearity and
measure the contribution of the features in different hidden strong correlation among multi-variables in the process of
layers and better control the information flows between hidden coal-fired boiler, a novel deep structure using continuous RBM
layers and the output layer [91]. Moreover, focusing on (CRBM) and SVR algorithms was proposed [102]. A related
semi-supervised scenarios when there are only a small number work was proposed by Lian et al., which uses DBN and SVR
of labeled samples and an excess of unlabeled samples, a kind with the improved particle swarm optimization to complete the
of double ensemble learning approach was proposed which task of rotor thermal deformation prediction [103]. In [104], a
takes both data diversity and structural diversity into account soft sensor model based on the DBN and BPNN was proposed
[92]. to predict the 4-carboxy-benzaldchydc concentration in the
Missing data is one of the most commonly encountered purified terephthalic acid industrial production process. Faced
problem while designing industrial soft sensors. As a variant of with the complexity and nonlinearity of nonlinear system
autoencoder, VAE performs well in learning data distribution modeling, an improved BPNN based on RBM was proposed in
and dealing with missing data problem. For example, a [105]. In this work, the structure of BPNN is optimized by
generative model named VA-WGAN was proposed based on utilizing sensitivity analysis and mutual information theories
VAE and Wasserstein GAN, and it can generate the same and the initialization of parameters is done by RBM. While in
distributions of real data from industrial processes, which is [106], DBN was used to learn hierarchical features for a BPNN,
hard to achieve by conventional regression models [93]. In [94], which was constructed for modeling the relationships between
VAE was employed to extract the distribution of each feature extracted features and mill level in a ball mill production
variable for a just-in-time modeling approach, and the process. In addition to SVR and BPNN, Extreme Learning
effectiveness of it was verified through a numerical example Machine (ELM) can also work as a predictor based on the
and an industrial process. Moreover, the authors enriched the features extracted by DBN. And the idea was realized in the
theory by proposing an output-relevant VAE for just-in-time measurement of nutrient solution composition for soilless
soft sensor application, which aims to deal with missing data culture [107].
[95]. Different with the former, two kinds of VAEs were used To overcome the data-rich-but-information-poor problem,
in a new soft sensor framework which also focuses on the RBMs can be utilized for ensemble learning. For instance,
missing data [96]. The first one named Supervised Deep VAE Zheng et al. proposed a soft sensing framework which
was designed to obtain the distribution of latent features, which integrates the ensemble strategy, DBN, and correntropy kernel
was used as a prior of the second one known as the Modified regression into a unified soft sensing framework [108].
Unsupervised Deep VAE. Then the framework was constructed Similarly, an ensemble deep kernel learning model was
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3 8
proposed in industrial polymerization process, which adopts RNN-based soft sensors were developed to estimate
DBN for unsupervised information extraction [109]. In the variables with strong dynamic characteristic, such as the curing
other case, lack of the labeled sample also leads to poor of epoxy/graphite fiber composites [121], the contact area that
information, which can be settled by semi-supervised learning tires of a car are making with the ground [122], the indoor air
using DBN, like the work proposed in [110]. In [111], focusing quality (IAQ) in the subway [123], the melt-flow-length in the
on labeled data scarcity, computational complexity reduction, injection molding process [124], the biomass concentrations
and unsupervised feature exploitation, a DBN based soft sensor [125], the product concentration of reactive distillation
is designed. columns [126].
RBMs have some other interesting applications as well. Apart from methods based on ordinary RNN, LSTM is also a
Graziani et al. designed a soft sensor based on DBN for a plant popular model in soft sensing applications, which can be deeper
process to estimate an unknown measurement delay rather than and more powerful since long-term dependence is weakened.
quality variables [112]. Another DBN-based model was For example, a LSTM-based soft sensor model was proposed to
applied to process flame images, rather than common structural cope with strong nonlinearity and dynamics of the process in
data, in industrial combustion processes for oxygen content [127]. Besides, Yuan et al. proposed a supervised LSTM
prediction [113]. And Zhu et al. investigated the selection of network, which used both the input and quality variables to
DBN structure for the soft sensor application in an industrial learn dynamic hidden states, and the method was proved to be
polymerization process. By comparing with feedforward neural effective on a penicillin fermentation process and an industrial
debutanizer column [128]. Besides, a LSTM network was used
networks, the DBN-based method can give more accurate
to predict the content of nitrogen-derived components in
predictions of the polymer melt index [114].
wastewater treatment plants [129].
C. Convolutional Neural Network based applications There are other variants that are designed for specific
CNNs are mainly utilized for processing grid-like data, industrial applications. As an example, a two-stream network
especially image data. Besides, they can also be developed to structure was designed, which adopts batch normalization and
capture local dynamic characteristics of industrial process data dropout tricks, to learn diverse features of the various process
or process signals in frequency domain. data [130]. In [131], another type of RNN called Time Delayed
By processing image data, CNN can be used to construct soft Neural Network (TDNN) was implemented for inferential state
sensors. For example, Horn et al. uses CNN to extract features estimation for an ideal reactive distillation column. Besides, the
in froth flotation sensing, which shows a good feature Echo State Network (ESN) as a kind of RNN was also used for
extraction speed and predictive performance [115]. However, soft sensing application in the high-density polyethylene
images are still seldom utilized for soft sensor construction (HDPE) production process and purified terephthalic acid
compared to common data forms. (PTA) production process [132]. By taking advantage of
As for dynamic problems, Yuan et al. also proposed singular value decomposition (SVD), the collinearity and
multichannel CNN (MCNN) for soft sensing application in the over-fitting problems were solved. Recently, an ensemble
industrial debutanizer column and hydrocracking process, semi-supervised model which combining SAE with
which can learn dynamics and various local correlations of Bidirectional LSTM (BLSTM) was proposed in [133]. The new
different variable combinations [116]. Besides, Wang et al. method can not only extract and utilize the temporal behavior in
used two CNN-based soft sensor models to deal with abundant labeled and unlabeled data but also take the time dependency
process data for the purpose of staying low complexity and hidden in quality metric self into consideration. Also, GRU
embracing the process dynamics at the same time [117]. In based method are proposed for automatic deep extraction of
[118], a soft sensor was proposed using the convolutional robust dynamic features in [134], and achieves good
neural network, which predicts the measurements at next time performance in a debutanizer distillation process.
step by extracting time-dependent correlations from a moving E. Other Deep Learning based applications
window.
In addition to applications based on above mainstream
In frequency domain, CNNs can acquire high invariance to
models, some other deep models are also used to solve soft
signal translation, scaling and distortion. In [119], a pair of
sensing problems. Some typical applications are discussed as
convolution layer and max-pooling layer was utilized at the
the following and the others will not be analyzed in detail here.
lowest part of network to extract high level abstraction from the
vibration spectral features of the mill bearing. And then ELM Semi-supervised modeling
learns a mapping from the extracted features to the mill level. In In [135], a semisupervised framework was constructed by
the field of aerospace engineering, a virtual sensor model with integrating manifold embedding into a deep neural network
partial vibration measurements using a CNN was proposed for (DNN), in which manifold embedding exploited the local
estimating the structural response, which is important for neighbor relationship among industrial data and improved the
structural health monitoring and damage detection but physical utilization efficiency of unlabeled data in deep neural network.
sensors are limited in the corresponding operational conditions Besides, a just-in-time semi-supervised soft sensor based on
[120]. extreme learning machine was proposed to online estimate the
Mooney viscosity with multiple recipes in [136].
D. Recurrent Neural Network based applications
Dynamic modeling
RNNs are widely used for dynamic modeling, and various Except CNNs and RNNs, there are some other neural
variants like LSTM are also applied in real cases. networks are used for dynamic modeling. Graziani et al.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3 9
proposed a dynamic DNN based soft sensor to estimate the As shown in diagram (b) of Fig. 9, soft sensors based on DL
research octane number for a reformer unit in a refinery and theory were constructed in many scenarios, including chemical
nonlinear finite inputs response models were investigated [137]. industry, power industry, machinery manufacturing, aerospace
Wang et al. proposed a dynamic network called NARX-DNN, engineering, and so on. Among them, chemical industry
which can interpret the quality prediction error of validation applications account for the largest proportion at about 66.7%.
data from different aspects and automatically determine the The effectiveness of most of the work reviewed in this
most appropriate delay of historical data [138]. Besides, a survey is verified by doing numerical simulation experiments
dynamic strategy is adopted to improve the dynamic capture (e.g. [95], [116], etc.), or by using public available benchmark
performance of the extreme learning machine, which is datasets (e.g. [139]), or by modeling the datasets from
combined with PLS in [139]. real-world processes (e.g. [93], [94], [95], [110], [116], [123],
Data generation etc.). The most common case is the third type, which can reflect
Due to the harsh environment of the industrial process, the characteristics of real processes as much as possible. For
directly collecting data may be difficult. Therefore, a example, in chemical industry field, actual run data are
Generative Adversarial Networks based method was proposed collected from processes like debutanizer process [96],
for data generation in [140]. polymerization processes [109], hydrocracking process [116],
to name a few. However, more detailed and specific factors
Elimination of redundancy
need to be considered when applying those soft sensors to real
In [141], a double least absolute shrinkage and selection
scenarios.
operator (dLASSO) algorithm was integrated into a multilayer
perceptron (MLP) network to solve two redundancy problem: Earlier than 2016 2017 2018 2019 2020
the input variable redundancy and the model structure 6
redundancy. 5
Inference and approximation 4
Due to the strong learning ability, deep neural networks can Count 3
be used for intelligent control purposes. For example, A soft 2
sensor based on Levenberg-Marquart and adaptive linear
1
network was designed and applied in inferential control of a
0
multicomponent distillation process [142]. In addition, the AE-based RBM-based CNN-Based RNN-based Other
adaptive fuzzy means algorithm was utilized to evolve a radial models-based
basis function (RBF) neural network, which aimed at the (a)
approximation of an unknown system [143]. 15.8%
5.3%
F. Summary of the existing applications Chemical Industry
5.3%
The purposes of developing DL-based novel soft sensors Power Industry
1.8%
include feature extraction, solving missing value issues, 1.8%
Machinery Manufacturing
dynamic characteristics capture, semi-supervised modeling, 1.8% Environmental Monitoring
and so on (as show in Table. 1). It is worth noting that only 1.8% Agriculture Production
existing applications in soft sensor field are discussed in detail, Aerospace Engineering
which does not mean that what has not yet appeared in the field Transportation Industry
of soft sensor is not possible. For example, although VAE is the 66.7% Bioprocess Industry
mainstream method to deal with missing value problems for
soft sensor application using DL, methods based on RBM and
(b)
GAN are also feasible in other fields [144, 145]. To design Fig. 9. Statistics on existing relevant work: (a) publications in different years;
feasible models, different strategies were adopted, such as (b) applications on different fields.
optimizing network structure, improving the training algorithm,
and integrating different algorithms, et al. V. DISCUSSIONS AND OUTLOOK
From the applications discussed in above subsections, some Although deep learning has made great progress in many
points can be further summarized. Firstly, the statistics on soft fields, there is still a lot of work to do to better apply the
sensor applications using DL methods can be seen in Fig. 9, advanced methods in the soft sensor domain, especially to meet
which is based on a total of 57 references discussed and cited in the demands in practical industrial processes. Data and
Section IV. From the diagram (a), the trend is clear that there structure are the two most important issues required to be
are more and more algorithms based on DL theory during considered all the time. Around these two topics, some hot
recent years, which is a reflection of the increasing demand for research directions should be paid more attention in the future.
DL models in real industrial process modeling. Moreover,
compared with 3 other main theories, CNN-based methods are Lack of labeled sample
applied less. This is because grid-like data such as images are Although the data is easy to obtain under the trend of big data,
more used for classification rather than regression tasks. the annotation cost is still very expensive. Therefore, we always
Besides, although AE looks simpler than other main models, it hope that using fewer labeled samples can train a model with
is easier to develop and expand, so it is also of great potential. good generalization ability. Traditional solution of this problem
is using semi-supervised learning methods, while the more and
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3 10
more serious imbalance problem between unlabeled and deep learning compared with traditional algorithms and the
labeled data makes it less satisfactory. Self-supervised learning trends of the industrial processes were discussed in detail to
(SSL) is another feasible solution, which is a kind of demonstrate the necessity and significance of deep learning
unsupervised strategy [146]. Different with transfer learning algorithms for soft sensor modeling; (ii) main DL models,
[32, 33], the useful feature representations are learned from a tricks and frameworks/toolkits were discussed and summarized
pretext task designed from the unlabeled input data (not from to help readers better develop DL-based soft sensors; (iii)
other similar datasets). Contrastive way is one of the most practical application scenarios were analyzed by reviewing and
popular type of SSL, and has made some great achievements in discussing existing work or publications; (iv) possible research
speech, images, text and reinforcement learning fields [148]. hot points for future work were investigated shortly.
However, a lot of investigation and exploration work remains It is our hope for this paper to serve as a taxonomy and also a
to be done for its soft sensing application. tutorial of advances elucidated from a multitude of works on
Hyperparameter optimization deep learning based soft sensors, and to provide the community
For a long time, how to optimize hyperparameters and with a picture of the roadmap and matters for future endeavors.
structures of networks is a difficult issue for researchers and
engineers [106, 114, 141]. And most of such work require REFERENCES
manual trial. To avoid heavy workload and great randomness, [1] B. Huang, and R. Kadali, Dynamic Modeling, Predictive Control and
meta-learning was proposed and investigated, which is also Performance Monitoring, Springer London, 2008.
[2] X. Wang, B. Huang, and T. Chen, “Multirate Minimum Variance Control
called as “learn to learn” [148]. The motivation is to offer Design and Control Performance Assessment: A Data-Driven Subspace
machine with human-like learning ability. Instead of learning a Approach,” IEEE. T. Contr. Syst. T., vol. 15, no. 1, pp. 65-74, 2006.
single function for a specific task, meta-learning learns a [3] Z. Chen, S. X. Ding, T. Peng, C. Yang, and W. Gui, “Fault Detection for
function to output functions for several subtasks. At the same Non-Gaussian Processes Using Generalized Canonical Correlation
Analysis and Randomized Algorithms,” IEEE. T. Ind. Electron., vol. 65,
time, many subtasks are required for meta-learning, and each no. 2, pp. 1559-1567, 2018.
subtask has its own training set and test set. After effective [4] Y. Jiang, S. Yin, J. Dong, O. Kaynak, “A Review on Soft Sensors for
training, machine can possess the ability to optimize Monitoring, Control and Optimization of Industrial Processes,” IEEE
hyper-parameters including selecting network structures by Sensors Journal, 2020, doi: 10.1109/JSEN.2020.3033153.
[5] V. Venkatasubramanian, R. Rengaswamy, S. N. Kavuri, “A review of
itself. This is attractive for multimodal and changing processes. process fault detection and diagnosis: Part II: Qualitative models and
Model reliability search strategies,” Computers & Chemical Engineering, vol. 27, no. 3,
Deep learning methods learns features in an end-to-end way, pp. 313-326, 2003.
[6] P. Kadlec, B. Gabrys, S. Strandt, “Data-driven soft sensors in the process
which increases the difficulty for engineers or designers to industry,” Comput. Chem. Eng. vol. 33, pp. 795-814, 2009.
understand what and how they learned. Besides, the [7] M. Kano, M. Ogawa, “The state of the art in chemical process control in
dependence of the learning process on data increases the Japan: good practice and questionnaire survey,” J. Process Control, vol.
inaccuracy caused by poor data quality. Both of these two 20, pp. 969-982, 2010.
[8] K. Pearson, “LIII. On lines and planes of closest fit to systems of points
factors pose a threat on the reliability of DL models. Therefore, in space,” Philosophical Magazine, vol. 2, no. 11, pp. 559-572, 1901.
it is important to improve the model reliability, and model [9] H. Wold, “Estimation of principal components and related models by
visualization [149, 150] and combination with experience or iterative least squares,” Multivar. Anal., Vol. 1, pp. 391-420, 1966.
knowledge [151] are two feasible ways. Model visualization [10] Q. Jiang, X. Yan, H. Yi and F. Gao, “Data-Driven Batch-End Quality
Modeling and Monitoring Based on Optimized Sparse Partial Least
helps researchers to understand what has been learned, while Squares,” IEEE Transactions on Industrial Electronics, vol. 67, no. 5, pp.
introducing experience or knowledge helps to reduce 4098-4107, May 2020, doi: 10.1109/TIE.2019.2922941.
inaccuracy brought by just relying on data. Nevertheless, these [11] W. Yan, H. Shao, X. Wang, “Soft sensing modeling based on support
two points need more investigations for practical industrial vector machine and Bayesian model selection,” Comput, Chem. Eng.
vol. 28, pp. 1489-1498, 2004.
application. [12] K. Desai, Y. Badhe, S.S. Tambe, B.D. Kulkarni, “Soft-sensor
Distributed parallel modeling development for fed-batch bioreactors using support vector regression,”
Biochem. Eng. J., vol. 27, pp. 225-239, 2006.
With the trend of industrial big data discussed in Section II,
[13] G. Hinton, S. Osindero, Y-W. Teh, “A Fast Learning Algorithm for Deep
how to efficiently model the process from large amount of data Belief Nets,” Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006.
is an important and urgent issue. A feasible solution is to [14] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
transform original deep learning models into the distributed and feedforward neural networks,” J. Mach. Learn. Res., vol. 9, pp. 249–256,
2010.
parallel modeling. By splitting a large data set into several [15] Y. LeCun, Y. Bengio, G. Hinton, “Deep learning,” Nature, vol. 521, no.
small distributed blocks, data processing can be carried out 7553, pp. 436-444, 2015.
simultaneously, which is conducive to large-scale data [16] F.A.A. Souza, R. Araújo, and J. Mendes, “Review of soft sensor methods
for regression applications,” Chemometrics and Intelligent Laboratory
modeling [152, 153]. So far, however, there is still a long Systems, vol. 152, pp.69-79, 2016.
distance to go. [17] K. Hornik, et al. “Multilayer feedforward networks are universal
approximations,” Neural Networks, vol. 2, pp. 359-366, 1989.
[18] G. Cybenko, “Approximation by superpositions of a sigmoidal
VI. CONCLUSIONS function,” Math. Control Signals System, vol. 2, pp. 303-314, 1989.
Deep Learning techniques have shown their great potential [19] K. Hornik, “Approximation capabilities of multilayer feedforward
networks,” Neural Networks, vol. 4, pp. 251-257, 1991.
in many fields, as well as in soft sensor. In order to summarize [20] K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image
the past, analyze the present, and look into the future, in this Recognition,” arXiv:1512.03385v1, 2015.
work, we made the following contributions to the application of [21] I. Goodfellow, Y. Bengio, A. Courville, Deep learning, vol. 1,
deep learning theory in the field of soft sensor: (i) the merits of Cambridge, MA, USA: the MIT press, 2016.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3 11
[22] C. Grosan, A. Abraham, “Rule-Based Expert Systems,” Intelligent by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533-536,
Systems, vol. 17, pp. 149-185, 2011. 1986.
[23] A. Ligęza, Logical Foundations for Rule-based Systems. 2nd edn. [49] H. Larochelle, I. Lajoie, Y. Bengio, P. A. Manzagol, “Stacked denoising
Springer, Heidelberg, 2006. autoencoders: learning useful representations in a deep network with a
[24] J. Durkin, Expert Systems: Design and Development. Prentice Hall, New local denoising criterion,” J Mach Learn Res, vol. 11, no. 12, pp.
York, 1994. 3371-3408, 2010.
[25] C. R. Turner, A. Fuggetta, L. Lavazza, A. L. Wolf, “A conceptual basis [50] B. Schölkopf, J. Platt, T. Hofmann, “Efficient learning of sparse
for feature engineering,” Journal of Systems and Software, vol. 49, no. 1, representations with an energy-Based model,” Proceedings of advances
pp. 3-15, 1999. in neural information processingsystems, pp. 1137-1144, 2006.
[26] F. Nargesian, H. Samulowitz, U. Khurana, E. B. Khalil, D. Turaga, [51] M. A. Ranzato, Y. L. Boureau, Y. Lecun, “Sparse feature learning for
“Learning Feature Engineering for Classification,” Presented at deep belief networks,” Proceedings of international conference on neural
Proceedings of the Twenty-Sixth International Joint Conference on information processing systems, vol. 20, pp. 1185-1192, 2007.
Artificial Intelligence, Aug. 2017, doi: 10.24963/ijcai.2017/352. [52] A. Hassanzadeh, A. Kaarna, T. Kauranne, “Unsupervised multi-manifold
[27] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A classification of hyperspectral remote sensing images with contractive
review and new perspectives,” IEEE Transactions on Pattern Analysis Autoencoder,” Neurocomputing, vol. 257, pp.67-78.
and Machine Intelligence, vol.35, no. 8, pp. 1798-1828, 2013. [53] Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends
[28] Andrew Ng, “Scale drives machine learning progress,” in Machine in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
Learning Yearning, pp. 10-12. [online]. Available: [54] G. E. Hinton, “A practical guide to training restricted Boltzmann
https://www.deeplearning.ai/machine-learning-yearning/. machines,” Neural networks: Tricks of the trade. Springer, Berlin,
[29] S. J. Pan, Q. Yang, “A survey on transfer learning,” IEEE Transactions Heidelberg, pp. 599-619, 2012.
on Knowledge and Data Engineering, vol. 22, no.10, pp. 1345-1359, [55] G. E. Hinton, R. R. Salakhutdinov, “Deep Boltzmann machines,” J Mach
Oct. 2010. Learn Res, vol. 5, no. 2, pp. 1967-2006, 2009.
[30] Y. Bengio, “Deep Learning of Representations for Unsupervised and [56] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
Transfer Learning,” Proceedings of ICML workshop on unsupervised with deep convolutional neural networks,” Advances in neural
and transfer learning, pp. 17-36, 2012. information processing systems. 2012.
[31] W. Shao, Z. Song, and L. Yao, “Soft sensor development for multimode [57] Y. Zhou, and R. Chellappa, “Computation of optical flow using a neural
processes based on semisupervised Gaussian mixture models,” network,” IEEE 1988 International Conference on Neural Networks,
IFAC-PapersOnLine, vol. 51, no. 18, pp. 614–619, 2018. 1988, doi: 10.1109/ICNN.1988.23914.
[32] F. A. A. Souza and R. Araújo, “Mixture of partial least squares experts [58] Y. LeCun, L. Bottou, Y. Bengio, et al. “Gradient-based learning applied
and application in prediction settings with multiple operating modes,” to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp.
Chemometrics Intell. Lab. Syst., vol. 130, no. 15, pp. 192–202, 2014. 2278–2324, 1998.
[33] H. Jin, X. Chen, L. Wang, K. Yang, and L. Wu, “Dual learning-based [59] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
online ensemble regression approach for adaptive soft sensor modeling with deep convolutional neural networks,” In Advances in Neural
of non-linear time-varying processes,” Chemometrics Intell. Lab. Syst., Information Processing Systems, pp. 1097–1105, 2012.
vol. 151, pp. 228–244, 2016. [60] K. Simonyan, A. Zisserman, “Very deep convolutional networks for
[34] M. Kano, and K. Fujiwara, “Virtual sensing technology in process large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
industries: trends and challenges revealed by recent industrial [61] P. J. Werbos, “Backpropagation through time: What it does and how to
applications,” Journal of Chemical Engineering of Japan, 2012, doi: do it,” Proc. IEEE, vol. 78, no. 10, pp. 1550–1560, Oct. 1990.
10.1252/jcej.12we167. [62] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies
[35] L. X. Yu, “Pharmaceutical Quality by Design: Product and Process with gradient descent is difficult,” IEEE Transactions on Neural
Development, Understanding, and Control,” Pharm Res, vol. 25, pp. Networks, vol. 5, no. 2, pp. 157–166, 1994.
781–791, 2008, doi: 10.1007/s11095-007-9511-1. [63] R. Pascanu, T. Mikolov, Y. Bengio, “On the difficulty of training
[36] S. J. Qin, “Process Data Analytics in the Era of Big Data,” AIChE recurrent neural networks,” In Proceedings of International Conference
Journal, vol. 60, no. 9, pp. 3092-3100, 2014. on Machine Learning, pp. 1310-1318, 2013.
[37] N. Stojanovic, M. Dinic, L. Stojanovic, “Big data process analytics for [64] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:
continuous process improvement in manufacturing,” 2015 IEEE Continual prediction with LSTM,” Neural computation, vol. 12, no. 10,
International Conference on Big Data, 2015, doi: pp. 2451–2471, 2000.
10.1109/BigData.2015.7363900. [65] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep
[38] L. Yao, Z. Ge, “Big data quality prediction in the process industry: A recurrent neural networks,” arXiv preprint arXiv:1312.6026, 2013.
distributed parallel modeling framework,” J. Process Contr., vol. 68, pp. [66] K. Cho, B. V. Merriënboer, C. Gulcehre, F. Bougares, H. Schwenk, and
1-13, 2018. Y. Bengio, “Learning phrase representations using RNN
[39] M. S. Reis, and G. Gins, “Industrial Process Monitoring in the Big encoder-decoder for statistical machine translation,” In Proceedings of
Data/Industry 4.0 Era: from Detection, to Diagnosis, to Prognosis,” the Empiricial Methods in Natural Language Processing 2014, 2014.
Processes, vol. 5, no. 3, 35, 2017, doi:10.3390/pr5030035. [67] G. Chrupala, A. Kadar, and A. Alishahi, “Learning language through
[40] S. W. Roberts, “Control charts tests based on geometric moving pictures,” arXiv: 1506.03694, 2015.
averages,” Technometrics, vol. 1, pp. 239-250, 1959. [68] F. Girosi, M. Jones, and T. Poggio, “Regularization theory and neural
[41] C. A. Lowry, W. H. Woodall, C. W. Champ, C. E. Rigdon, “A networks architectures,” Neural computation, vol. 7, no. 2, pp. 219-269,
multivariate exponentially weighted moving average control chart,” 1995.
Technometrics, vol. 34, pp. 46–53, 1992. [69] D. M. Montserrat, Q. Lin, J. Allebach, E. J. Delp, “Training object
[42] T. Kourti, J. F. MacGregor, “Multivariate SPC methods for process and detection and recognition CNN models using data augmentation,”
product monitoring,” J. Qual. Technol., vol. 28, pp. 409–428, 1996. Electronic Imaging, vol. 2017, no. 10, pp. 27-36, 2017.
[43] M. S. Reis, P. M. Saraiva, “Prediction of profiles in the process [70] N. Jaitly, and G. E. Hinton, “Vocal tract length perturbation (VTLP)
industries,” Ind. Eng. Chem. Res., vol. 51, pp. 4254–4266, 2012. improves speech recognition,” Proc. ICML Workshop on Deep Learning
[44] C. Duchesne, J. J. Liu, J. F. MacGregor, “Multivariate image analysis in for Audio, Speech and Language, Vol. 117, 2013.
the process industries: A review,” Chemom. Intell. Lab. Syst., vol. 117, [71] P. Vincent, H. Larochelle, Y. Bengio, et al. “Extracting and composing
pp. 116-128, 2012. robust features with denoising autoencoders,” Proceedings of the 25th
[45] D. C. Montgomery, C. M. Mastrangelo, “Some statistical process control international conference on Machine learning, pp. 1096-1103, 2008.
methods for autocorrelated data,” J. Qual. Technol., vol. 23, pp. 179– [72] B. Poole, J. Sohl-Dickstein, and S. Ganguli, “Analyzing noise in
193, 1991. autoencoders and deep networks,” arXiv preprint arXiv: 1406.1831,
[46] T. J. Rato, M. S. Reis, “Advantage of using decorrelated residuals in 2014.
dynamic principal component analysis for monitoring large-scale [73] R. Caruana, S. Lawrence, and C. L. Giles, “Overfitting in neural nets:
systems,” Ind. Eng. Chem. Res., vol. 52, pp. 13685–13698, 2013. Backpropagation, conjugate gradient, and early stopping,” Advances in
[47] G. E. Hinton, and J. L. McClelland, “Learning representations by neural information processing systems, 2001.
recirculation,” In NIPS’ 1987, pp. 358–366, 1988. [74] Z. Zhang, Y. Xu, J. Yang, X. Li, D. Zhang, “A survey of sparse
[48] D. E. Rumelhar, G. E. Hinton, R. J. Williams, “Learning representations representation: algorithms and applications,” IEEE access, vol. 3, pp.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3 12
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3 13
pyrolysis reactor for compositions predictions of gas phase [138] K. Wang, C. Shang, F. Yang, Y. Jiang and D. Huang, “Automatic
components,” Computer Aided Chemical Engineering, Elsevier, Vol. 44, hyper-parameter tuning for soft sensor modeling based on dynamic deep
pp. 2245-2250, 2018. neural network,” 2017 IEEE International Conference on Systems, Man,
[119] J. Wei, L. Guo, X. Xu and G. Yan, “Soft sensor modeling of mill level and Cybernetics (SMC), Banff, AB, pp. 989-994, 2017, doi:
based on convolutional neural network,” The 27th Chinese Control and 10.1109/SMC.2017.8122739.
Decision Conference (2015 CCDC), Qingdao, pp. 4738-4743, 2015, doi: [139] Y. He, Y. Xu, and Q. Zhu, “Soft-sensing model development using
10.1109/CCDC.2015.7162762. PLSR-based dynamic extreme learning machine with an enhanced
[120] S. Sun, Y. He, S. Zhou, et al. “A data-driven response virtual sensor hidden layer,” Chemometrics and Intelligent Laboratory Systems, vol.
technique with partial vibration measurements using convolutional 154, pp. 101-111, 2016.
neural network,” Sensors, vol. 17, no. 12, 2017, doi: [140] X. Wang, “Data Preprocessing for Soft Sensor Using Generative
10.3390/s17122888. Adversarial Networks,” 2018 15th International Conference on Control,
[121] H.B. Su, L.T. Fan, J.R. Schlup, “Monitoring the process of curing of Automation, Robotics and Vision (ICARCV), Singapore, pp. 1355-1360,
epoxy/graphite fiber composites with a recurrent neural network as a soft 2018, doi: 10.1109/ICARCV.2018.8581249.
sensor,” Engineering Applications of Artificial Intelligence, vol. 11, no. [141] Y. Fan, B. Tao, Y. Zheng and S. Jang, “A Data-Driven Soft Sensor Based
2, pp. 293-306, 1998. on Multilayer Perceptron Neural Network with a Double LASSO
[122] C.A. Duchanoy, M.A. Moreno-Armendáriz, L. Urbina, et al. “A novel Approach,” in IEEE Transactions on Instrumentation and Measurement,
recurrent neural network soft sensor via a differential evolution training vol. 69, no. 7, pp. 3972-3979, July 2020, doi:
algorithm for the tire contact patch,” Neurocomputing, vol. 235, pp. 10.1109/TIM.2019.2947126.
71-82, 2017. [142] A. Rani, V. Singh, J.R.P. Gupta, “Development of soft sensor for neural
[123] J. Loy-Benitez, S.K. Heo, C.K. Yoo, “Soft sensor validation for network based control of distillation column,” ISA transactions, vol. 52,
monitoring and resilient control of sequential subway indoor air quality no. 3, pp. 438-449, 2013.
through memory-gated recurrent neural networks-based autoencoders,” [143] A. Alexandridis, “Evolving RBF neural networks for adaptive
Control Engineering Practice, vol. 97: 104330, 2020. soft-sensor design,” International journal of neural systems, vol. 23, no.
[124] X. Chen, F. Gao, G. Chen, “A soft-sensor development for 6, 2013: 1350029.
melt-flow-length measurement during injection mold filling,” Materials [144] M.D. Zeiler, et al. “Modeling pigeon behavior using a Conditional
Science and Engineering: A, vol. 384, no. 1-2, pp. 245-254, 2004. Restricted Boltzmann Machine.” ESANN, 2009.
[125] L.Z. Chen, S.K. Nguang, X.M. Li, et al. “Soft sensors for on-line [145] Y. Luo, et al. “Multivariate time series imputation with generative
biomass measurements,” Bioprocess and Biosystems Engineering, vol. adversarial networks,” Advances in Neural Information Processing
26, no. 3, pp. 191-195, 2004. Systems, 2018.
[126] G. Kataria, K. Singh, “Recurrent neural network based soft sensor for [146] L. Jing and Y. Tian, “Self-supervised Visual Feature Learning with Deep
monitoring and controlling a reactive distillation column,” Chemical Neural Networks: A Survey,” in IEEE Transactions on Pattern Analysis
Product and Process Modeling, vol. 13, no. 3, 2017, doi: and Machine Intelligence, 2020, doi: 10.1109/TPAMI.2020.2992393.
10.1515/cppm-2017-0044. [147] A. Oord, Y. Li, O. Vinyals, “Representation learning with contrastive
[127] W. Ke, D. Huang, F. Yang and Y. Jiang, “Soft sensor development and predictive coding,” arXiv preprint arXiv: 1807.03748, 2018.
applications based on LSTM in deep neural networks,” 2017 IEEE [148] C. Finn, P. Abbeel, S. Levine, “Model-agnostic meta-learning for fast
Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, adaptation of deep networks,” arXiv preprint arXiv:1703.03400, 2017.
pp. 1-6, 2017, doi: 10.1109/SSCI.2017.8280954. [149] L. Maaten, G. Hinton, Visualizing data using t-SNE,” Journal of machine
[128] X. Yuan, L. Li and Y. Wang, “Nonlinear Dynamic Soft Sensor Modeling learning research, no. 9, pp. 2579-2605, Nov. 2008.
with Supervised Long Short-Term Memory Network,” in IEEE [150] M.D. Zeiler, R. Fergus, “Visualizing and understanding convolutional
Transactions on Industrial Informatics, vol. 16, no. 5, pp. 3168-3176, networks,” European conference on computer vision. Springer, Cham,
May 2020, doi: 10.1109/TII.2019.2902129. pp. 818-833, 2014.
[129] I. Pisa, I. Santín, J.L. Vicario, et al. “ANN-based soft sensor to predict [151] S. Kabir, R. U. Islam, M. S. Hossain, et al. “An Integrated Approach of
effluent violations in wastewater treatment plants,” Sensors, vol. 19, no. Belief Rule Base and Deep Learning to Predict Air Pollution.” Sensors,
6, 2019: 1280. vol. 20, no. 7: 1956, 2020.
[130] R. Xie, K. Hao, B. Huang, L. Chen and X. Cai, “Data-Driven Modeling [152] Q. Jiang, S. Yan, H. Cheng and X. Yan, “Local-Global Modeling and
Based on Two-Stream λ Gated Recurrent Unit Network with Soft Sensor Distributed Computing Framework for Nonlinear Plant-Wide Process
Application,” in IEEE Transactions on Industrial Electronics, vol. 67, no. Monitoring with Industrial Big Data,” IEEE Transactions on Neural
8, pp. 7034-7043, Aug. 2020, doi: 10.1109/TIE.2019.2927197. Networks and Learning Systems, doi: 10.1109/TNNLS.2020.2985223.
[131] S.R. V. Raghavan, T.K. Radhakrishnan, K. Srinivasan, “Soft sensor [153] Z. Yang, Z. Ge, “Monitoring and Prediction of Big Process Data with
based composition estimation and controller design for an ideal reactive Deep Latent Variable Models and Parallel Computing,” Journal of
distillation column,” ISA transactions, vol. 50, no. 1, pp. 61-70, 2011. Process Control, vol. 92, pp. 19-34, 2020.
[132] Y.L. He, Y. Tian, Y. Xu, et al. “Novel soft sensor development using echo
state network integrated with singular value decomposition: Application
to complex chemical processes,” Chemometrics and Intelligent Qingqiang Sun received the B.Eng.
Laboratory Systems, vol. 200, 2020: 103981, doi:
degree in Electrical Engineering and
10.1016/j.chemolab.2020.103981.
[133] X. Yin, Z. Niu, Z. He, et al. “Ensemble deep learning based Automation from Xiamen University,
semi-supervised soft sensor modeling method and its application on Xiamen, China, in 2017. He received the
quality prediction for coal preparation process,” Advanced Engineering M.Eng. degree in the Department of
Informatics, vol. 46, 2020: 101136.
Control Science and Engineering,
[134] X. Zhang and Z. Ge, “Automatic Deep Extraction of Robust Dynamic
Features for Industrial Big Data Modeling and Soft Sensor Application,” Zhejiang University, Hangzhou, China, in
in IEEE Transactions on Industrial Informatics, vol. 16, no. 7, pp. 2020.
4456-4467, July 2020, doi: 10.1109/TII.2019.2945411. His research interests include
[135] W. Yan, R. Xu, K. Wang, et al. “Soft Sensor Modeling Method Based on
data-based modeling, process data deep learning, soft sensing.
Semisupervised Deep Learning and Its Application to Wastewater
Treatment Plant,” Industrial & Engineering Chemistry Research, vol. 59,
no. 10, pp.4589-4601, 2020.
[136] W. Zheng, Y. Liu, Z. Gao, et al. “Just-in-time semi-supervised soft sensor
for quality prediction in industrial rubber mixers,” Chemometrics and
Intelligent Laboratory Systems, vol.180, pp. 36-41, 2018.
[137] S. Graziani, M.G. Xibilia, “Deep structures for a reformer unit soft
sensor,” 2018 IEEE 16th International Conference on Industrial
Informatics (INDIN). IEEE, pp. 927-932, 2018.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2021.3053128, IEEE
Transactions on Industrial Informatics
TII-20-4482.R3 14
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 22,2021 at 04:44:16 UTC from IEEE Xplore. Restrictions apply.
View publication stats