Predictive Maintenance Using Machine Learning in Industrial IoT
Predictive Maintenance Using Machine Learning in Industrial IoT
Suhas Jangoan3
Zendesk, USA
Abstract:- The use of predictive maintenance Machine communication and constantly flowing data streams, but the
learning techniques aid systems or machines in lowering feature of taking a picture of the full data set and executing
the occurrence of certain types of machine failures via computations with uncertain reaction times is at odds with
prediction and the use of specific methods. An essential this. To handle such demands, self-adaptive algorithms are
tactic for improving the efficiency and reliability of crucial. These algorithms constantly learn and enhance their
industrial equipment and optimizing maintenance models. Furthermore, these algorithms need to exhibit real-
operations is predictive maintenance (PdM). Machine time behavior and provide excellent performance. This is the
learning-based predictive maintenance helps businesses case whether they're operating on robust cloud systems, fog
reduce unscheduled downtime, maintenance expenses, and edge systems, or Internet of Things devices (Mocanu,
and operational efficiency by identifying and fixing Nguyen, Gibescu, & Kling, 2016).
potential equipment issues in advance.
I. INTRODUCTION
predictions in IoT using DL. There are a great number of signal of a neuron (node) in biological neural networks, the
articles that discuss the topics of DL in (I)IoT. To the best of same thing happens here. All the weights in an artificial
our knowledge, no work in the existing body of literature neural network (ANN) need to be initialized to a value,
addresses the subject of Pd.M. about DL and (I)IoT. which is often simply a basic approximation. Through the
process of training the network, those weights are updated
Several distinct deep learning techniques that have holistically, following a predetermined learning rate, to
been discussed for usage in industry and the Internet of ensure that the network is both valid and balanced.
Things are categorized in this overview. In addition to that, "Connections developing over time with training" is another
it discusses real-time processing and data streams about the term that is often used to describe this phenomenon. ANNs
deep learning methodologies that were stated initially. The have been around for more than half a century, and
techniques that are examined and classed are those that are throughout that time, several methods have been created.
designed to increase the real-time and stream-processing
capabilities of the various methods that are discussed in the Auto-encoder (AE), Recurrent Neural Network (RNN),
papers that are being reviewed. There is a particular Restricted Boltzmann Machine (RBN), Deep Neural
emphasis placed on the capability of the methodologies that Network (DBN), Long Short-term Memory (LSTM),
have been presented to offer forecasts. In the last section of Convolutional Neural Network (CNN), Variational Auto-
the study, a summary and a view on future developments are encoder (VAE), Generative Adversarial Network (GAN),
presented (Pani, Pattnaik, & Pattanayak, 2024). and Ladder Net are shown as examples of deep learning
models that may be used for Internet of Things (IoT)
II. MACHINE LEARNING APPROACHES IN applications. When it comes to deep learning models, they
INDUSTRIAL INTERNET OF THINGS (IOT) are divided into three primary categories: generative
techniques (AE, RBM, DBN, VAE), discriminative
An introductory discussion on ANN and DL is approaches (RNN, LSTM, CNN), and hybrid approaches
included at the beginning of this section. Following that, a (GAN, Ladder Net), which are a mix of the two approaches
taxonomy of the many DL approaches that have been described before. This categorization mostly refers to the
described for usage in industry and the Internet of Things learning technique that is being used, with generative
will be presented. When it comes to the requirements of methods essentially adhering to the concept of unsupervised
Pd.M. in IoT contexts, the categorization will be carried out learning and discriminative approaches adhering to the idea
based on the theoretical methods, application areas, and of supervised learning. The underlying learning technique is
strengths and weaknesses of their respective systems. The a critical aspect in the selection of a deep learning strategy.
articles that were assessed covered a variety of issues, This is in addition to the defining of the needed number of
including real-time and data stream processing, as well as layers, which is expressed as complexity. The classification
deep learning approaches in Cyber-Physical Systems (CPS), of techniques as either generative or discriminative, which
Internet of Things (IoT), and Industry 4.0 (I4.0). was selected by, can be substantially found in a great
number of subsequent studies. Numerous DL models are
In contrast to Machine Learning (ML), which is a classified according to how well they work in Internet of
subfield of Artificial Intelligence (AI), Deep Learning (DL) Things applications. The relevant characteristics that are
may be seen as a subclass that falls under the umbrella of mentioned include the capability to work with (partially)
ML. Many people describe deep learning as a class of unlabeled data (feature extraction, feature discovery), the
efficient artificial neural networks (ANNs) that consists of size of the training dataset that is required, the ability to
multiple layers (hidden layers). In addition to supporting reduce the dimensionality of the data, the capability to deal
additional qualities such as the capacity for unsupervised with noisy data and time series data, and the general
learning or automated feature extraction, the huge number of performance classification of these characteristics. The
layers and neurons enables the abstraction of more integration of recurrent neural networks (RNN) with deep
complicated problems. Deep Neural Networks (DNN), Deep neural networks (DBN) and artificial intelligence (AE) is
Belief Networks (DBN), and Recurrent Neural Networks recommended for the reduction of high-dimensional data
(RNN) are some examples of Neural Network Technologies and for dealing with unlabeled data. If the system is
(Sane, 2020). intended to produce predictions, like how PdM systems are
designed, DBN and AEs are often used as an upfront layer
The artificial neural network (ANN) is designed to that supplies categorized data to a later RNN (Elkateb,
mimic the biological neural network that is found in the Métwalli, Shendy, & Abu-Elanien, 2024).
brains of mammals. A neural network (ANN) is made up of
neurons, which are referred to as nodes in ANNs, and the Recurrent neural networks (RNNs) are suggested for
connections that exist between those nodes. The data that is usage in the situation of spatial-temporal data, such as
supplied into the system is used to generate nonlinear output mobility data since they provide strong results when the data
data, which is structured in layers by the nodes. With the is evolving sequentially. RNNs, on the other hand, are not a
help of the connections that exist between the nodes, the viable option if the data also contains long-term
output of one node may be transferred to the input of dependencies. This is because RNNs do not remember prior
another node. The significance of the signal that is conveyed states and outcomes. The article describes a method that
is determined by the weights that are allocated to each link. may be used to manage sequential data streams that
In the same way that a threshold function governs the output originate from human mobility and transportation transition
models that have long-term dependencies (behaviors). In the To forecast the behavior of energy systems in the
form of a specific RNN architecture, the solution that has manner of smart grids, it is important to see that more
been given is a mix of LSTM on the one hand and RNN on intelligent systems are required to provide reliable forecasts
the other. In addition to the capability of managing long- about the future use of energy. ANN-based prediction
term dependencies, the LMST also incorporates labeling and methods are a promising approach, according to the paper
predictive capabilities into the combination of these two titled "Deep Learning for estimating building energy
features. There are a great number of additional works that consumption." This is because these methods can handle
include the combination of recurrent neural networks (RNN) massive and highly non-linear time series data that
and long short-term memory (LSTM) to deal with data originates from various heterogeneous data sources (for
streams or time-series data that have long-term example, SmartMeter) and contains a great deal of
dependencies (such as particular behaviors or the wear and uncertainty (unlabeled data). They benchmarked two distinct
tear of machinery). methods to the RBN, namely the Conditional Restricted
Boltzmann Machine (CRBM) and the Factored Conditional
Choosing the appropriate artificial neural network Restricted Boltzmann Machine (FCRBM), using a synthetic
(ANN) to generate predictions from data streams and time- benchmark dataset throughout the course of the research
series data is the topic of discussion in the study titled "IoT article. Because it incorporates a factored conditional history
Data Analytics Using Deep Learning." A combination of layer, the authors of this experiment have concluded that
LSTM and Naive Bayes models is offered as a means of FCRBN is superior to RNN, Support Vector Machine
retrieving trends and forecasts, as well as validating those (SVM), and CBRM in terms of performance. RBMs are a
trends and predictions in parallel via the detection of kind of stochastic artificial neural network (ANN) that has
anomalies. In contrast to the Naive Bayes model, which is two layers: a visible layer and a hidden layer. The visible
responsible for anomaly identification based on the outputs layer of a recurrent neural network (RBM) is comprised of a
of the LSTM, the linear support vector machine (LSTM) is node for every conceivable value that is present in the input
responsible for producing predictions on data streams. This data, while the hidden layer is responsible for defining the
paper also considers the fact that Simple Feedforward categories of values. As a result of the fact that every visible
Artificial Neural Networks (FNN) such as Single-layer layer node in an RBM is linked to every hidden layer node,
Perceptron (SLP) and Multi-layer Perceptron (MLP) that use an RBN is effective in feature categorization, feature
standard backpropagation (BP) for training are frequently extraction, and complexity reduction (by determining which
not a good choice. This is because these neural networks do features are the most significant). RBMs may be stacked for
not perform well in complex situations and on data streams DL purposes. A conditional history layer, also known as
that have long-term dependencies. It is particularly CRBM, is an extension of the RBM that enables the RBN to
important to keep this in mind when the data streams in identify long-term relationships in time-series data. In
question are time series data and the objective of the model addition, the output of a single stacked CRMB layer is
is to forecast future occurrences or trends. Dependencies factored. This is done to decrease the total number of
between data streams and time-series data often develop compositions that are conceivable.
over time. These dependencies are normal for data collected
by the Internet of Things and provide valuable insights. Data The very effective predicting capabilities of DL are
is assumed to pass linearly through the layers of rudimentary also highlighted in another work that is published in the
artificial neural networks (ANNs), with the premise that realm of energy management. This article describes the use
input data is independent of output data. Considering this, of AE and LSTM to estimate the amount of electricity that
there is no way to recall the input and output states that solar systems will generate. A comparison is made between
occurred in the past (the outcomes that occurred in the past). the accuracy achieved by a combination of AE and LMST
It is a problem if the data from the past is connected to the (Auto-LSTM) and that of other neural networks, namely
data from the present. When compared to other methods, MLP, as well as a physical model. 21 actual solar power
RNN has the potential to provide superior outcomes in data plants were used to collect the data for the benchmark, and
streams and time-series data. The ability to recall prior states the benchmark itself was obtained via an experimental setup
is made possible by the fact that the connections between that was detailed in. To establish benchmarks, the following
nodes in a recurrent neural network (RNN) are in the form metrics are utilized: average root-mean-square deviation
of sequences or loops. A view state is the sole thing that is (RMSD), average mean absolute error (MAE), average
generally remembered to prevent gradient outbursts. absolute deviation (Abs. Dev.), average BIAS, and average
Because of this, only short-term dependencies are taken into correlation. The measured findings demonstrate that all
consideration. It is recommended that long short-term ANN- and DL-based models provide outcomes that are
memory (LSTM) be used in complicated Internet of Things much superior to those produced by the physical model.
scenarios to identify long-term dependencies in the data When it comes to artificial neural networks (ANN) and deep
source. LSTM is a kind of RNN that introduces memory learning (DL) models, Auto-LSTM is the most suitable
units into the network. In addition to being able to retain option for this circumstance and data set. When it comes to
significant former states, these memory units are also able to producing predictions, one of the most important factors that
forget less significant states (Rippel, Lutjen, & Freitag, is discussed is the power to extract characteristics from
2017). unlabeled data.
The work titled "An enhancement deep feature fusion The use of recurrent neural networks (RNNs) and the
method for rotating machinery fault diagnosis" highlights many variants of these networks for efficiently analyzing
the capabilities of AEs in the areas of feature extraction and data is also advocated. RNNs have the potential to give
feature learning. The deep feature fusion approach is superior performance compared to other models, particularly
described in the study to further increase the capability of when applied to more common sensor data such as serial
feature learning while simultaneously reducing the effect of data, time-series data, and data streams. The vast majority of
background sounds. This is accomplished by stacking Deep PdM applications are dominated by this kind of sensor data.
AE (noise reduction) and Contractive AE (improved feature
recognition). A critical component in the process of It is vital to have the skills of real-time processing and
forecasting. real-time learning to be able to construct and permanently
adjust models on huge amounts of data that include the
III. FAST PREDICTIONS USING MACHINE behavior of individuals as well as their geographical and
LEARNING temporal qualities as well as their transportation capacity. A
deep LSTM learning architecture that can do several tasks is
There are several Internet of Things applications that described in this study. The fundamental idea behind this
need real-time processing. In the case of a PdM system, for technique is not to make use of a combined feature vector
instance, a high latency might result in inadvertent reactive but rather to employ several LSTM tasks that are separated
maintenance due to a lack of adequate lead time to schedule by their respective domains (for example, a separate job for
the maintenance operations. The application case has a mobility and transportation mode prediction). Parallel
significant impact on the desired speed at which real-time learning is carried out using this architecture, and the results
processing must be performed. In the context of micro- are pooled by the insights that are desired.
manufacturing systems, where enormous quantities of micro
components are produced at a rapid pace, the phrase "real- It is essential for assistance systems in automobiles,
time" refers to the amount of time that is measured in such as traffic sign recognition, to provide precise results
microseconds. demonstrates that the rejection rate of while maintaining a low latency. Within the scope of this
produced micro parts may be reduced by improving topic, the paper explains how to implement DNN. The
processing speed when defect detection and PdM systems model of the system is not only supplied with data that is
are included. In certain contexts, the concept of real-time fully unlabeled (raw photos), but it is also continually
may refer to the passage of seconds, minutes, or even hours. updated via online learning. When it comes to picture
In the case of PdM applications for offshore wind turbines, identification, a CNN with nine layers is used. To enhance
for instance, the frequency with which the data is accessible the overall performance of the system, max-pooling layers
is mostly minutes and hours (Xie, Wu, Liu, & Li, 2022). are merged with convolutional layers in a manner that is
alternating. Based on the 2D input pixel maps, the
The creation of a real-time crowd prediction system for convolutional layers conduct the convolution operation. In
public transportation is described in the article titled "Metro the process of translating the output of a previous
Density Prediction with Recurrent Neural Network on convolutional layer into the input of a succeeding
Streaming CDR Data." This system makes use of a weight- convolutional layer, the max-pooling layer functions as a
sharing recurrent neural network in conjunction with parallel pre-processor between two convolutional layers. It does this
streaming analytical programming. It is necessary to do real- by removing overlapping areas in the pixel mappings. This
time analysis to have a quick reaction time to emergent removes the need for duplicate processing in the
events, such as admission records at metro stations mixed convolutional layers, which are notoriously difficult and
with data from telecommunication, for example. time-consuming. Multi-Column DNN (MCDNN) is the
nonetheless, the use of a robust neural network model that name given to the method that is explained in the traditional
has a high learning capacity provides a broad variety of method.
fresh insights; nonetheless, this is in contradiction to the
need for a quick reaction time. How to achieve this objective In the study, a solution that is focused on real-time
is broken down into three stages, which are as follows: To approaches to the detection and identification of traffic signs
enhance its capacity to operate on data streams, a) the is described. It is essential to have parallel processing since
adoption of a Recurrent Neural Network (RNN) model; b) it is necessary to identify a variety of traffic signs at the
the implementation of techniques for the parallelization of same time. This is the major focus of attention. In this
RNNs; and c) the use of parallel streaming analytical method, CNN is also used for image processing, and it is
algorithms as part of a cloud-based stream processing combined with AdaBoost to enhance performance and
platform. An independent recurrent neural network (RNN) parallel GPU processing (Song, Kanasugi, & Shibasaki,
is used to represent each metro station in the project that is 2016).
detailed in. To dynamically share weights from stations that
are in comparable "situations" (for example, a downtown Data that has long-term dependencies is a strong
station during rush hour), shared layers are created. This candidate for LSTM models because of the memory cells
allows for the sharing of weights across many models. The that they include. It will likely be feasible to process each
ability to co-train in parallel is another benefit of weight- entity and each group with its neural network if the data
sharing (Gensler, 2021). structure permits the separation of single entities with their
behavior as well as the development of groups of entities.
This enables the separate neural networks to function in makes the argument that performance concerns should get a
parallel, which opens new processing possibilities. An greater amount of attention since they are important
aggregation layer is often responsible for receiving the variables in practical applications of deep learning. In this
outputs of each single and parallel processed neural study, fourteen distinct deep learning projects, such as Alex
network. This layer then combines all of the outputs to Net and Google Net, are compared with one another in
provide an overall result. In the article titled "A Hierarchical terms of their accuracy, memory footprint, parameters,
Deep Temporal Model for Group Activity Recognition," the operations count, inference time, and power consumption.
author explains how to identify several circumstances that The research presented in this article demonstrates that a
may arise during a volleyball match. With the use of long- very little improvement in accuracy may result in a
term dependencies, one LSTM model for each player can significant increase in both the amount of processing power
make predictions about the player's behavior by and the amount of time required for calculation. It is
remembering his prior actions throughout the match. strongly suggested that maximum energy usage be
Subsequently, every single circumstance that occurs established for each DL project and that the accuracy be
throughout the match is modeled as a group of players. The adjusted by that limit.
LSTMs are arranged hierarchically, with the LSTM models
of all of the participants participating being subordinated to IV. CONCLUSION
those of a particular scenario. CNN is used to gather
information about the scenes and the behavior of the players A narrative assessment of chosen literature that applies
based on the photographs. deep learning methods to the area of industrial internet of
things (IoT) to create rapid forecasts of maintenance
Changing the arrangement of layers and connections is concerns was presented in this work. According to the
something that is mentioned in the study titled "Simulation papers, the use of DL in Internet of Things and PdM is an
of Maintenance Activities for Micro-Manufacturing important subject in the business world. Currently, a wide
Systems." This is due of the expectations that have been variety of applications are being used in practice, and these
placed on real-time processing procedures. Computer applications are continuously being created and enhanced.
systems that are fully linked, in which every node of one
layer is connected to every node of the layer below it, can Combining several deep learning models to combine
solve difficult problems, but they also need a significant their respective benefits and capabilities in a single
amount of computational power. The approach of dropping application is a common practice that has been described.
out any connections that do not significantly affect the Additionally, the need for real-time processing of
outcome is a method that may be used to lessen the complicated data and data streams has been established in
complexity of a deep learning network and, therefore, its several application situations.
computational demand, without compromising accuracy in a
meaningful way. In addition to dropout, you could also The applications for predictions that fall within this
discuss max-pooling layers, batch normalization, and category include PdM in particular. Concepts of parallel
transfer learning as other options for performance deep learning networks that make use of a final aggregate
enhancement. layer or intermediate layers to simplify the system are
widely used to enhance the real-time capacity. Even if there
The argument presented in the study titled "An are a lot of activities that can be seen in the field of real-time
Analysis of Deep Neural Network Models for Practical processing of deep learning models, there are also voices
Applications" is that many of the deep learning models that that are critical of the absolute concentration on accuracy.
have been detailed in the literature are simply not These voices are advocating for a larger focus on
appropriate for application in actual situations. For instance, performance and lighter applications that are suited for
this is because their processing time is lengthy or because practical usage. Most papers agree that a significant amount
they use an excessive amount of electricity. In his work, he of research is still required in this field.
The publications that were evaluated are summarized techniques that are effective in producing representative
in Table 1, which also includes a discussion of the DL- benchmarks.
Methods. An overview of the features (or strengths and
shortcomings) of the DL methods described in the related REFERENCES
publication is provided for each study. Additionally, the
suggested application areas (such as predictions) are also [1]. Elkateb, S., Métwalli, A., Shendy, A., & Abu-
included in this summary. Table 1 does not include any Elanien, A. E. B. (2024). Machine learning and IoT –
conclusions or assertions about the validity of the data in a Based predictive maintenance approach for industrial
quantitative manner. It is only by qualitative means that the applications. Alexandria Engineering Journal, 88,
various DL models are categorized into their respective 298–309. https://doi.org/10.1016/j.aej.2023.12.065
categories. This is only when specific measurable values are [2]. Gensler. (2021, February 28). Deep Learning for
specified across all of the publications that were assessed. solar power forecasting — An approach using
The only assertions that are provided by the other papers are AutoEncoder and LSTM Neural Networks | IEEE
qualitative ones. There is an issue that remains unanswered Conference Publication | IEEE Xplore. Retrieved
about how to quantify and evaluate the validity and quality February 28, 2024, from ieeexplore.ieee.org website:
of the findings obtained from various DL approaches. To https://ieeexplore.ieee.org/document/7844673
this day, there have been very few methods created for [3]. Mocanu, E., Nguyen, P. H., Gibescu, M., & Kling,
measuring, assessing, and benchmarking each other. In W. L. (2016). Deep learning for estimating building
addition, such methods are often not verifiable within the energy consumption. Sustainable Energy, Grids and
context of universal validity. For instance, when it comes to Networks, 6, 91–99.
classifications, the use of accuracy estimation procedures, https://doi.org/10.1016/j.segan.2016.02.005
such as the "holdout method" or "n-fold cross-validation," [4]. Mohammadi, M., Al-Fuqaha, A., Sorour, S., &
may be utilized to accomplish the evaluation of Guizani, M. (2018). Deep Learning for IoT Big Data
performance, predictive ability, and model correctness. and Streaming Analytics: A Survey. IEEE
Considering this, the strategies that have been presented Communications Surveys & Tutorials, 20(4), 2923–
partition a training set into data regions for learning and 2960. https://doi.org/10.1109/comst.2018.2844341
validation using a variety of methods. There is currently no [5]. Pani, S., Pattnaik, O., & Pattanayak, B. K. (2024).
notion of measuring, assessing, or benchmarking that has Predictive Maintenance in Industrial IoT Using
been specified for most models. In general, the assessment is Machine Learning Approach. International Journal
carried out here based on the views of specialists. It is of Intelligent Systems and Applications in
pointed out in the study titled "Data Stream Classification Engineering, 12(14s), 521–534. Retrieved from
and big data analytics" that there is a necessity for new https://ijisae.org/index.php/IJISAE/article/view/4689/
techniques of measuring and benchmarking. To evaluate 3363
deep learning models, it is necessary to have measuring