A - Deep - Learning - Model - For - Smart - Manufacturing - Using - Convolutional - LSTM - Neural - Network - Autoencoders Ok
A - Deep - Learning - Model - For - Smart - Manufacturing - Using - Convolutional - LSTM - Neural - Network - Autoencoders Ok
are adept at image recognition, CNNs mainly operate in a 3) A robust, scalable, DL predictive model that has been
vector space, thereby increasing the difficulty in learning the evaluated on real-world, real-time machine speed signals
high-dimensional features of input time series. For this reason, in a manufacturing plant.
applying CNN architectures alone to the time-series forecast- The remainder of this article is organized as follows. Section II
ing problem is sub-optimal. However, long short-term memory presents the technical preliminaries of key concepts used in this
(LSTM) networks are skillful in sequential learning by passing article. Section III presents the proposed 2-DConvLSTMAE
signal information across time steps. Leveraging the comple- model and methodological approach. In Section IV, the experi-
mentary strengths of CNN and LSTM neural networks, the ments and results are discussed, while Section V concludes this
convolutional LSTM (convLSTM) model both preserves spatial article.
information and performs well in sequential learning [17].
Motivated by this modeling approach, this article proposes II. TECHNICAL PRELIMINARIES
2-DConvLSTMAE, a deep ConvLSTM stacked autoencoder for
This section formulates the sequence-to-sequence time-series
univariate, multistep machine speed forecasting in a manufactur-
forecasting problem and provides a technical background about
ing process. The end-to-end model has three distinct components
key concepts in the domain of DL, which are used within this
as follows:
article.
1) convLSTM encoding layers;
2) bidirectional stacked LSTM decoding layers;
A. Problem Formulation
3) time-distributed supervised learning [fully connected
(FC)] layer. The goal of multistep (i.e., sequence-to-sequence) time-series
The input time-series signal is reconstructed into a sequential forecasting is the use of previously observed (i.e., lagged) input
supervised learning format (i.e., sequences of fixed window size sequences to forecast a fixed-length sequence of the future
and output sequences) using a sliding-window strategy. Each time-series values. In ML, this is typically regarded as a sequen-
input sequence is fed into the encoding layer. The output from tial time-series forecasting problem or sequence-to-sequence
the first encoding layer serves as the input to the second encoding forecasting [19]. To achieve this, the sliding-window method
layer. The resulting sequences are passed onto the flatten and [20] is adopted, which converts the sequential input data to a
repeat vector layers, respectively, where they are reconstructed supervised learning problem (i.e., inputs and outputs). In this
into a one-dimensional (1-D) tensor. This is then passed to the method, a portion of the input time-series sequence (a window
decoding layer, where a stack of bidirectional LSTM layers of lagged values) is reconstructed to serve as input features.
reconstructs the original input time series from the resultant The number of previous time steps is referred to as the window
tensor. The last layer is a time-distributed FC regression layer width/size.
for multistep machine speed forecasting. Given a univariate time series x(t) = {x1 , x2 , x3 , . . . , xt },
In this article, the problem is formulated as a sequential the sequence-to-sequence forecasting problem is to predict the
(i.e., sequence-to-sequence) forecasting problem. The input data future k values of the sequence, ŷ = (ŷ1 , ŷ2 , . . . , ŷk ) ∼ =
comprises the internal speed (measured as the number of strokes (xt+1 , xt+2 , . . . , xt+k ), using the values of previous observa-
per minute) of a metal can bodymaker machine, which operates tions in a sliding window of fixed size w, such that
at high speed and produces up to 300 aluminum cans per minute.
ŷ = (ŷ1 , ŷ2 , . . . , ŷk ) = f (xt−w , xt−w+1 , xt−w+2 , . . . , xt ).
In can manufacturing production planning processes, predicting
(1)
the speed of the bodymaker machine, which is related to the num-
In this article, the input observations refer to the machine
ber of cans produced by the machine, can be used to optimize
speed obtained over a regular period (i.e., one minute). It is
production schedules by allowing real-time adjustment of the
important to note that sequence-to-sequence forecasting differs
individual operating speeds for other upstream or downstream
from single-step time-series forecasting because the predicted
machines. For details about metal can manufacturing, the reader
output is a sequence of predicted machine speed values rather
can refer to [18].
than a single value of the predicted variable. The above data
We test the performance of 2-DConvLSTMAE using his-
transformations for a time series of length N result in a sequence-
torical, real-world machine speed data obtained via machine-
to-sequence forecasting problem having an input matrix X ∈
embedded sensors in an aluminum can manufacturing machine
Rn×w and output matrix Y ∈ Rn×k , where n = (N − w − k +
from a metal packaging plant in the United Kingdom. The
1) is the number of training samples (Fig. 5 shows details about
results of rigorous empirical analyses in this article substan-
the adopted sliding-window approach).
tiate the value of the proposed approach when compared to
state-of-the-art DL models.
The contributions of this article are summarised as follows. B. Structure of the CNN
1) An end-to-end multistep (i.e., sequence-to-sequence) The CNN is a feedforward neural network that is mostly
time-series forecasting model comprising a convLSTM adopted in image and video recognition [14]. Fig. 1 shows
encoder–decoder architecture for multistep machine the structure of a typical CNN model. The input layer takes
speed prediction. the input vector and develops a feature graph corresponding
2) A time-distributed encoder–decoder model, which is ca- to the convolution kernel, which uses a set of weights to produce
pable of short- and long-term representation learning for a feature graph, which is passed onto the next layer. The link
machine speed prediction in smart manufacturing. between the input and convolution layer is established by a
ESSIEN AND GIANNETTI: DEEP LEARNING MODEL FOR SMART MANUFACTURING 6071
E. Autoencoder
The autoencoder is a feedforward neural network in which the
input is the same as the output. In other words, autoencoders are
(unsupervised) learning algorithms that extract features from
input data without the need for labeled target datasets. The
autoencoder consists of three basic components: the encoder,
the code, and the decoder. These function according to their
literal meanings. The encoder compresses the input to a ‘code,’
which is subsequently decoded by the decoder. For this reason,
the autoencoder can be used as a dimensionality reduction
strategy in time-series forecasting as it can compress the input
to a mapped hidden layer [12]. The stacked autoencoder is
a hierarchically layered stack of autoencoders and, just like
autoencoders, they learn in an unsupervised manner. The model
training process involves greedy layer-wise training to minimize Fig. 4. Model architecture of 2-DConvLSTMAE.
the error between the input and output vectors. The subsequent
layer of the autoencoder is the hidden layer of the previous one,
with each of the layers trained by gradient descent algorithm hierarchically. In the ConvLSTM, the length of sequences is a
using an optimization function. hyperparameter that affects model performance, and hence must
be optimized. The optimal length of sequences (i.e., the number
III. 2-DCONVLSTMAE MODEL of the previous sequence segments) was determined using a grid
In this section, the proposed deep ConvLSTM autoencoder search framework (see Section III-C) as 20, with three of these
model for univariate time-series forecasting of machine speed lengths (i.e., three subsequences each of length 20) used in the
in a smart factory is presented. training regime. The output consists of an FC network of ten
units (corresponding to the ten multistep outputs or predictions).
A. ConvLSTM Encoder
Although the ConvLSTM has been applied in time-series B. Bidirectional LSTM Decoder
classification for anomaly detection using video sequences [22], The output of the encoder phase of our model is a series of
its performance is known to deteriorate with an increase in feature map vectors of dimension (n × 1 × 8 × 64), where n
sequence length. To overcome this limitation, the ConvLSTM represents the number of training samples used. To decode the
applies an attention-based mechanism, which adaptively deter- feature maps obtained from the encoder layers, a repeat vector
mines and retains the relevant hidden states across the time steps. layer is applied. The main function of the layer is to ‘repeat’ the
Therefore, (4)–(8) are rewritten as final output vector from the encoding layer in a shape that is a
l t,l constant input to each time step of the decoder. In this way, the
it,l = σ Wxz x + Wxz l
ht−1.l + Wxzl
◦ ct−1.l + blz (9)
decoding layer is able to reconstruct the original input sequence.
f t,l = σ Wxr l
xt,l + Whzl
ht−1.l + Wcrl
◦ ct−1.l + blr (10) The output of this repeat vector layer is passed onto a layered
bidirectional LSTM stacked network (see Fig. 4). Each LSTM
ct,l = it,l ◦ tanh Wxc l t,l
x + Whc l
ht−1.l + blc + rt,l ◦ ct−1.l layer is made of 200 LSTM units, with rectified linear unit
(11) activation applied. The output of the previous LSTM layer is
fed into the next layer as input in a hierarchical manner. In this
ot,l = σ Wxo l
xt,l + Who ht−1.l + Wco l
◦ ct.l + blo (12)
way, the decoder layer can incorporate the encoded output vector
ht = ot,l ◦ tanh ct,l (13) from the ConvLSTM encoder, which improves the performance
of the predictive model by fostering representation learning at
where ◦ represents the Hadamard product, σ represents the
l l l l l l l the individual layers [12].
sigmoid function, Wxz , Wxz , Wxr , Wcr , Wxc , Whc , Wxo , Who ,
l n×T
and Wco ∈ R represent the convolutional kernels within
the model, while blz , blr , blc , and blo are the bias parameters in the C. Hyperparameter Optimization
lth layer of the ConvLSTM. Fig. 4 represents the summary archi- The performance of DL models depends on predetermined
tecture of the proposed 2DConvLSTMAE model. In our model, hyperparameters, which are obtained using an optimization
the ConvLSTM layers have 128 and 64 filters, respectively, process. Unlike model parameters, which are learned using an
with the kernel sizes of (1 × 3). The ConvLSTM layers are optimization function to minimize an objective (or loss) func-
arranged in a layered structure to extract the temporal features tion, hyperparameters are not learned during the model training
ESSIEN AND GIANNETTI: DEEP LEARNING MODEL FOR SMART MANUFACTURING 6073
TABLE II
PERFORMANCE EVALUATION OF PREDICTIVE MODELS
USING WINDOW SIZE OF 30
B. Baseline Models
In order to evaluate the performance of the 2-DConvLSTMAE
model for multistep machine speed prediction, we compare the
performance against the naïve, statistical, and three state-of-the- For model evaluation, we applied three error evaluation
art DL baseline models. metrics—RMSE, mean absolute error (MAE), and symmetrical
1) Persistence Model: The persistence model, a widely used mean absolute percentage error (sMAPE), which are defined by
benchmark model for time-series forecasting, operates on the (14)–(16), respectively
assumption that the predicted value of the target variable remains
n
unchanged from the previous time lag. In other words, the 1
predicted value at time t, y t = yt−1 for all times. This naïve MAE = |y i − yi | (15)
n i=1
model proves to be highly accurate especially in short-term
n
forecasting but exhibits vulnerabilities in multistep prediction 200 |y i − yi |
sMAPE = . (16)
[29]. n i=1 |y i | + |yi |
2) Autoregressive Integrated Moving Average: ARIMA is a
well-known time-series forecasting model. The main assump-
tion of ARIMA is the stationarity of the mean and variance, and D. Implementation Environment
that there exists a linear relationship between the lags (i.e., past The experimental environment used for this article was on a
observations) and the future state, which constitutes a limitation. single machine with Intel Xeon E-2146G CPU @ 3.50 GHz,
3) Residual-Squeeze Net (RSNet): The RSNet proposed in
128-GB memory and NVIDIA Tesla V100-PCIE 16GB GPU.
[15] comprises of 1-D CNNs using the RSNet architecture via The GPU is used for accelerated model training due to a large
the squeeze operation that fuses the information using an optimal computation demand in DL models. The development was per-
combination of channels learned during the model training. We formed using Python 3.6.8 and Tensorflow 1.12.0.
apply a single-channel input data as described above to train the
model.
4) Deep LSTM Encoder–Decoder: We use the model pre- E. Comparison With Baselines
sented in [19], which is a stacked architecture of LSTM layers
To empirically evaluate the performance of the individ-
connected to a time-distributed dense layer.
ual models, we test the models using window sizes of 60
5) CNN-LSTM Encoder–Decoder: A CNN-LSTM autoen-
and 30, respectively, and the evaluation metrics described in
coder model architecture presented in [30] is the third baseline
Section IV-C. Table II and Fig. 6 represent the results for the em-
model. The work presented a classifier, but the model was
pirical analysis of the predictive models trained using a window
modified to include a regression layer.
of size 30. As the results show, 2-DConvLSTMAE outperformed
the baselines in all evaluation metrics. From Table II, it can be
C. Model Performance Evaluation seen that our model, in addition to displaying superior predictive
In terms of model evaluation, we adopted a technique referred accuracy, took the least training time in comparison to the
to as walk-forward validation or backtesting. The traditional DL models. It must be mentioned that although the ARIMA
prediction evaluation methods, such as k-fold cross validation or and persistence models took significantly lower training times,
train-test splitting do not work well when applied to time-series they, however, performed worse than the 2-DConvLSTMAE
data because these evaluation methods assume that there is no (see Table II). Generally, naïve models (such as the persistence
relationship between the observations, which is not the case with model) are used to benchmark the performance of predictive
time-series data, where the sequential dimension needs to be models. Consequently, a model that outperforms a naïve model
preserved. is considered as ‘skillful’ in time-series forecasting [12].
ESSIEN AND GIANNETTI: DEEP LEARNING MODEL FOR SMART MANUFACTURING 6075
Fig. 8. Predictive performance of models on validation partition trained using a window size of 30. (a) RsNet [15]. (b) CNN-LSTM-SAE [30].
(c) LSTM encoder–decoder [19]. (d) 2-DConvLSTMAE.
reduction technique and an unsupervised learning regime, These subsequences have a uniform length such that l = wp ,
fostering representation learning, and, thereby, reducing where l is the subsequence length and w denotes the sliding-
model training time, simultaneously. The combination of window size (i.e., 60 for this current study—see Section IV-A).
the approaches—stacking ConvLSTM and LSTM layers In order to obtain an optimal l, we performed a sensitivity
in an encoder–decoder architecture—toward the machine analysis of this hyperparameter. For this experiment, we set the
speed prediction resulted in better performance (reduction in subsequence length l to 60, 30, 20, 12, and 10, respectively, to
training time and prediction error) for the univariate time-series test the impact of these on the predictive error and the model
prediction of machine speed. training time. The other model hyperparameters remained the
same as before (see Table I). Fig. 10 shows the results of
the performance of the 2-DConvLSTMAE with the different
F. Analysis of Sensitivity of 2-DConvLSTMAE With subsequence lengths. It can be seen that the optimal l is 20
Respect to Sequence Length (red dotted line), as the configuration resulted in the lowest
As previously stated in Section II-D, the sequence length in RMSE and training time, respectively. It must be mentioned here
the ConvLSTM is a hyperparameter that needs to be optimized that the optimal sequence length will depend on the particular
externally. Consider the input to the 2-DConvLSTMAE model, application, which may be linked to individual production cycles
X ∈ Rn×w , which is further segmented into p subsequences. and different data distributions.
ESSIEN AND GIANNETTI: DEEP LEARNING MODEL FOR SMART MANUFACTURING 6077
Fig. 9. Predictive performance of models on validation partition trained using window size of 60. (a) RsNet [15]. (b) CNN-LSTM-SAE [30].
(c) LSTM encoder–decoder [19]. (d) 2-DConvLSTMAE.
V. CONCLUSION
In this article, a novel deep ConvLSTM autoencoder architec-
ture has been proposed for machine speed prediction in a smart
manufacturing process. By restructuring the input sequence to a
supervised learning manner using a sliding-window approach,
the predictive model—2-DConvLSTMAE—was applied to the
multistep time-series forecasting problem. 2-DConvLSTMAE
leveraged the advantage power of the CNN in automatic fea-
ture extraction and the LSTM for sequential and representation
learning. In the proposed model, the encoder–decoder architec-
ture, which doubles as a dimensionality reduction technique,
promoted representation learning in the model training regime,
resulting in the reduced computational demand and training
time. The results from empirical analyses showed that
1) 2-DConvLSTMAE outperformed the naïve and statistical
benchmark models as well as three (3) state-of-the-art
Fig. 10. Sensitivity analysis of impact of subsequence length on accu- DL time-series models, achieving improved predictive
racy and training time. performance;
6078 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 9, SEPTEMBER 2020
2) in addition to improved predictive performance, our [18] E. Wootton, “TALAT lecture 3710 case study on can making,” Eur. Alu-
model required a significantly lower model training time. minium Assoc., Alcan Deutschland GmbH, Göttingen, Germany, 1994.
[19] M. A. Zaytar and C. El Amrani, “Sequence to sequence weather forecasting
This makes 2-DConvLSTMAE not only better in machine with long short-term memory recurrent neural networks,” Int. J. Comput.
speed prediction but also a more practical approach for Appl., vol. 143, no. 11, pp. 975–8887, 2016.
adoption in real manufacturing processes. [20] Y. Yu, Y. Zhu, S. Li, and D. Wan, “Time series outlier detection based on
sliding window prediction,” Math. Problem Eng., vol. 2014, 2014, Art. no.
The results obtained from this article can be directly applied 879736.
to multistep time-series forecasting for smart manufacturing [21] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient flow
processes operations, enabling the improvement of production in recurrent nets: The difficulty of learning long-term dependencies,” in
A Field Guide to Dynamical Recurrent Neural Networks. Hoboken, NJ,
scheduling and planning. For instance, predicting the machine USA: Wiley, 2001.
speed in advance can be used to foster just in time production by [22] Z. Yuan, X. Zhou, and T. Yang, “Hetero-ConvLSTM: A deep learning
providing an indication of future production output so that the approach to traffic accident prediction on heterogeneous spatio-temporal
data,” in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data
operational requirements can be adjusted accordingly. Future Mining, 2018, pp. 984–992.
extension of this article includes extending the proposed model [23] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,”
to multivariate time series including machine states and external in Neural Networks: Tricks of the Trade. New York, NY, USA: Springer,
2012, pp. 9–48s.
sensors data. [24] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimiza-
tion,” J. Mach. Learn. Res., vol. 13, pp. 281–305, 2012.
REFERENCES [25] T. Tieleman and G. E. Hinton, “Lecture 6.5-rmsprop: Divide the gradient
by a running average of its recent magnitude,” COURSERA, Neural Netw.
[1] H. S. Kang et al., “Smart manufacturing: Past research, present findings, Mach. Learn., vol. 4, no. 2, pp. 26–31, 2012.
and future directions,” Int. J. Precis. Eng. Manuf.-Green Technol., vol. 3, [26] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for
no. 1, pp. 111–128, 2016. online learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12,
[2] C. Giannetti, R. S. Ransing, M. R. Ransing, D. C. Bould, D. T. Gethin, and pp. 2121–2159, 2011.
J. Sienz, “A novel variable selection approach based on co-linearity index [27] T. Schaul, S. Zhang, and Y. LeCun, “No more pesky learning rates,” in
to discover optimal process settings by analysing mixed data,” Comput. Proc. 30th Int. Conf. Mach. Learn., 2013, pp. 343–351.
Ind. Eng., vol. 72, no. 1, pp. 217–229, Jun. 2014. [28] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
[3] R. S. Ransing, C. Giannetti, M. R. Ransing, and M. W. James, “A coupled 2014, arXiv:1412.6980.
penalty matrix approach and principal component based co-linearity index [29] M. Khodayar, O. Kaynak, and M. E. Khodayar, “Rough deep neural
technique to discover product specific foundry process knowledge from architecture for short-term wind speed forecasting,” IEEE Trans. Ind.
in-process data in order to reduce defects,” Comput. Ind., vol. 64, no. 5, Informat., vol. 13, no. 6, pp. 2770–2779, Dec. 2017.
pp. 514–523, Jun. 2013. [30] C. Giannetti, A. Essien, and Y. O. Pang, “A novel deep learning approach
[4] L. Wang, Z. Zhang, H. Long, J. Xu, and R. Liu, “Wind turbine gearbox fail- for event detection in smart manufacturing,” in Proc. 49th Int. Conf.
ure identification with deep neural networks,” IEEE Trans. Ind. Informat., Comput. Ind. Eng., Beijing, China, 2019.
vol. 13, no. 3, pp. 1360–1368, Jun. 2017.
[5] B. Cai, L. Huang, and M. Xie, “Bayesian networks in fault diagnosis,”
IEEE Trans. Ind. Inform., vol. 13, no. 5, pp. 2227–2240, Oct. 2017.
[6] C. Giannetti and R. S. Ransing, “Risk based uncertainty quantification Aniekan Essien (Member, IEEE) received the
to improve robustness of manufacturing operations,” Comput. Ind. Eng., B.Eng. degree in computer engineering in 2011,
vol. 101, pp. 70–80, 2016. and the M.Sc. degree in information systems
[7] T. Wuest, D. Weimer, C. Irgens, and K.-D. Thoben, “Machine learning in from the University of Manchester, Manchester,
manufacturing: Advantages, challenges, and applications,” Prod. Manuf. U.K., where he is currently working toward the
Res., vol. 4, no. 1, pp. 23–45, 2016. Ph.D. degree.
[8] M. Alipour, B. Mohammadi-Ivatloo, and K. Zare, “Stochastic scheduling He is currently a Research Assistant with the
of renewable and CHP-based microgrids,” IEEE Trans. Ind. Informat., Future Manufacturing Research Institute, Col-
vol. 11, no. 5, pp. 1049–1058, Oct. 2015. lege of Engineering, Swansea University Bay
[9] C.-H. Wu, C.-C. Wei, D.-C. Su, M.-H. Chang, and J.-M. Ho, “Travel time Campus, Swansea, U.K. His research interests
prediction with support vector regression,” in Proc. IEEE Int. Conf. Intell. include deep learning for time-series forecast-
Transp. Syst., 2003, pp. 1438–1442. ing, artificial intelligence for smart manufacturing, and traffic predictive
[10] A. Essien, I. Petrounias, P. Sampaio, and S. Sampaio, “The impact of analytics.
rainfall and temperature on peak and off-peak urban traffic,” in Proc. Int.
Conf. Database Expert Syst. Appl., 2018, pp. 399–407.
[11] J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu, “Deep learning for Cinzia Giannetti received the Engineering Doc-
smart manufacturing: Methods and applications,” J. Manuf. Syst., vol. 48, torate degree in manufacturing engineering
pp. 144–156, 2018. from Swansea University, Swansea, U.K., in
[12] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, vol. 13. 2015.
Cambridge, MA, USA: MIT Press, 2016. She is an Associate Professor with the Col-
[13] C. Kim, J. Lee, R. Kim, Y. Park, and J. Kang, “DeepNAP: Deep neural lege of Engineering, Swansea University. She
anomaly pre-detection in a semiconductor fab,” Inf. Sci., vol. 457–458, is the holder of EPSRC Innovation Fellowship
pp. 1–11, 2018. in digital manufacturing (2018–2021) and coin-
[14] C. Wu, P. Jiang, C. Ding, F. Feng, and T. Chen, “Intelligent fault diagnosis vestigator with the EPSRC Centre for Doc-
of rotating machinery based on one-dimensional convolutional neural toral Training in Enhancing Human Interactions
network,” Comput. Ind., vol. 108, pp. 53–61, 2019. and Collaborations with Data- and Intelligence-
[15] L. Su, L. Ma, N. Qin, D. Huang, and A. H. Kemp, “Fault diagnosis of high- Driven Systems. Her expertise includes smart manufacturing and sen-
speed train Bogie by residual-squeeze net,” IEEE Trans. Ind. Informat., sors technologies, with emphasis on knowledge and information man-
vol. 15, no. 7, pp. 3856–3863, Jul. 2019. agement, data analytics, and machine learning techniques to support
[16] A. Essien and C. Giannetti, “A deep learning framework for univariate decision making through the use of sensors and large-scale databases.
time series prediction using convolutional LSTM stacked autoencoders,” Dr. Giannetti has significant experience in delivering and supporting
in Proc. IEEE Int. Symp. Innov. Intell. Syst. Appl., 2019, pp. 1–6. applied industrial research and development projects gained both in
[17] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-C. Woo, industry and academia. Before joining the academia, she has worked for
“Convolutional LSTM network: A machine learning approach for precip- a decade in industry as senior technical roles in the delivering, planning,
itation nowcasting,” in Proc. 28th Int. Conf. Neural Inf. Process. Syst., and coordinating software development projects for existing systems
2015, pp. 802–810. and new products in industry and academia.