0% found this document useful (0 votes)

27 views

Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

Uploaded by

Mark Jennings

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

Uploaded by

Mark Jennings

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Received December 3, 2021, accepted December 30, 2021, date of publication January 6, 2022, date of current version January

20, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3140646

Prediction of Network Traffic in Wireless Mesh

Networks Using Hybrid Deep Learning Model
SMITA MAHAJAN 1, HARIKRISHNAN R. 2, AND KETAN KOTECHA 3
1 Department of Computer Science Engineering, Symbiosis Institute of Technology, Symbiosis International (Deemed) University, Pune, Maharashtra 412115,
India
2 Department of Electronics and Telecommunication Engineering, Symbiosis Institute of Technology, Symbiosis International (Deemed) University, Pune,

Maharashtra 412115, India

3 Department of Symbiosis Centre for Applied AI (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed) University, Pune, Maharashtra

412115, India
Corresponding author: Harikrishnan R. (dr.rhareish@gmail.com)

ABSTRACT Wireless mesh networks are getting adopted in the domain of network communication.
Their main benefits include adaptability, configuration, and flexibility, with added efficiency in cost and
transmission time. Traffic prediction refers to forecasting the traffic volumes in a network. The traffic
volume includes incoming requests and outgoing data transmitted by the network nodes. The previous
logs of traffic in the network are used for extracting patterns that help for accurate predictions. In this
paper, an analysis of various existing traffic prediction methods is done. Specifically, the analysis of a case
study where the performance of the High-Speed Diesel (HSD) pump is predicted by observing its output.
A network of sensors form a less mesh network; sensors act as nodes while reading the parameters, namely,
three-phase Current, Voltage, Temperature, and Vibration. In this case study, a High-Speed Diesel pumps’
performance is predicted by predicting the vibration parameter as the output parameter. Other parameters
affecting the performance of the High-Speed Diesel pump which are causing the change in vibration value
are identified. Various algorithms, including Statistical Auto-Regressive Integration and Moving Average,
Poisson’s regression, and a few Machine Learning and Deep Learning algorithms like Decision Tree
Regressor, Multi-Layer Perceptron, Linear Regression, and Long Short-Term Memory are implemented
and evaluated for this purpose. Along with the comparison, a novel architecture using Convolution Neural
Network and Long Short-Term Memory is described in this paper. The result and comparison between these
give the clear understanding that the suggested novel Convo-LSTM model gives better performance and
helps to predict the performance of the High-Speed Diesel pump. The proposed system makes a strong case
for the network traffic prediction, where the use of historical data is collected over the wireless mesh network.
A similar analogy can be used where this model could be implemented further for network monitoring tasks.

INDEX TERMS Deep learning, machine learning, multivariate time series analysis, prediction, wireless
mesh networks.

I. INTRODUCTION main advantages of wireless mesh networks are their easy

Networks are playing an important role during this age of adaptability and configuration ability. Any future changes
digital expansion. For a given network, the most critical issues can be easily accommodated, thus leading to lower costs
are its security, load balancing ability, maintainability, and and maintenance. The main concepts related to a wireless
speed. Various network topologies have existed, including mesh network are traffic prediction, traffic routing, and traffic
bus, ring, star, mesh, hybrid, etc. Out of this, mesh networks control. Out of this, traffic prediction is a crucial aspect
have been one of the most popular choices owing to their owing to being the fundamental block on which the per-
stronger connection ability, lesser disadvantages in terms of formance of routing and congestion control algorithms is
lag and rigidity [1]. Wireless mesh networks are wireless dependent. Traffic prediction refers to accurately predicting
based on the mesh topology, as shown in Figure 1. The the possible traffic in a network at a given instance based
on previous network data. An accurate estimate of the net-
The associate editor coordinating the review of this manuscript and
work traffic can help the network administrator improve the
approving it for publication was Pasquale De Meo. availability and transmission speeds of the network [2], [3].

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 7003
S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

Previous approaches for network traffic prediction have pri-

marily focused on the host server logs along with con-
sideration of the network parameter configuration [4]–[6].
This paper compares the performance of the six algorithms,
namely, Decision Tree Regressor, Linear Regression, Mul-
tilayer Perceptron, Poisson’s Regression, Auto Regressive
Integration and Moving average, Long Short Term Memory.
Out of these, ARIMA is a well-known model [7], Poisson’s
regression is a probabilistic model [8] and both are imple-
mented traditionally in various applications. Decision Tree
Regressor [9] and Linear Regressor [10] are machine learning FIGURE 1. A sample wireless mesh network.
algorithms, whereas Multi-Layer Perceptron [11]is a subset
of Deep Neural Network and LSTM is an artificial recurrent
neural network (RNN) architecture [12]. This way, the paper one dimensional (1-D) Convolution neural network and Long
compares the performance of various different algorithms Short Term (LSTM). It has an input layer, one-dimensional
implemented on the same real-world data. This paper pro- convolution layer, pooling layer, LSTM hidden layer, and
poses a neoteric, hybrid technique for network traffic predic- full connection layer in its architecture. The proposed model
tion in wireless mesh networks by focusing on the historical is evaluated in terms of Mean Square Error, Mean Absolute
data collected over the network. Specifically, the case study of Error and Root Mean Square Error to check its performance.
High-Speed Diesel (HSD) pumps is considered, where differ- The main contributions through this paper could be enlisted
ent sensors are used to collect the data values. Specifically, the as follows:
case study of HSD pumps is considered, whose sensor read- 1) A hybrid model for network traffic prediction is pro-
ings are used to evaluate the performance. The installation posed for the wireless mesh networks formed by vari-
of sensors(for reading various parameters) forms a mesh net- ous sensors, with the HSD pump as a case study.
work that would be a good indicator to portray a typical mesh 2) A set of statistical, non-statistical, deep learning and
network traffic scenario. The sensor’s mesh network is used machine learning algorithms are implemented, and
to collect the data that includes readings of input and output results are compared for the collected multivariate time
parameters. In wireless mesh networks, the mesh nodes like series data.
in MANET nodes can form spontaneous connections with 3) After applying these time-series based algorithms,
other nodes due to their intrinsic features to connect with and unbiased analysis of the performances is done, which
can traverse the network, collecting data from sensors, RFID- can be helpful for the researchers in the domain of
enabled nodes, and other fixed Wireless nodes [13]. Wireless wireless mesh networks.
Sensor Networks(WSN)s have been identified as a significant 4) An unbiased analysis of the results that a researcher can
enabler of the IoT models since their inception. In IoT, all expect when applying these multivariate time-series
sensor nodes can get connected to the Internet to share and algorithms in the domain of wireless mesh networks.
receive data [14]. A set of statistical and machine learning The paper outline is as follows: Section 2 provides a review
algorithms are used for the multivariate time series analy- of the previous work done in this domain. Section 3 explains
sis of the collected data and subsequent output prediction. the data collection process and data description. Section 4 elu-
The traditional, fundamental, technical analysis approach is cidates the various analysis methods that are applied for
used by both statistical and non-statistical methods, with a traffic prediction. The obtained results are presented and
significant focus on lag, first-order difference, and second- analyzed in Section 5, while the conclusion of our findings
order difference. The stationarity and nature of the time series is shown in Section 6.
are more important in the fundamental analysis method [15].
Convolution Neural Network (CNN) is commonly employed II. BACKGROUND WORK
in feature engineering because it focuses on the most evident Network traffic prediction and routing have been a topic of
elements in the line of sight. Long Short Term (LSTM) is interest and research in the last few decades. Approaches
extensively utilized in time series because it has the property in this domain have comprised various time series models,
of adopting/enhancing following the sequence of time [3]. namely machine learning algorithms, deep learning (neural
In this paper, a novel Convo-LSTM architecture is pro- networks), and various traditional statistical methods. Time
posed for the wireless mesh network’s traffic prediction, series models like ARIMA have been preferred for fore-
citing a case study of the HSD pump. A Vibration forecasting casting, even in the case of regular networks. Zhou et al. [5]
model based on one dimensional (1-D) CNN-LSTM is built used a combination of ARIMA and GARCH models for
considering the properties of CNN and LSTM. The data prediction and modeling of the network traffic. Wavelet-
tuples are collected over one year’s period, with constant based transformation models have also been used for this
monitoring of the network, which is real-world data. The cause. Unlike image or text data problems, neural networks
fundamental structure of the model is a hybrid or mixing of made inroads into network analysis and prediction almost

7004 VOLUME 10, 2022

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

a decade earlier. Khotanzad et al. [16] were one of the first Li et al. [35] applied a Gaussian regressor with a Prophet
to use neural networks for high-speed network traffic pre- model for user traffic prediction in networks. A quantum-
diction. Alarcon et al. [17] applied a multi-resolution neural PSO approach has also been tried out in this domain [36].
network for the task. Chen et al. [18] deployed a flexible Costa et al., give a thorough overview of predictive main-
neural tree for the resolution of traffic in the case of small tenance projects in Industry 4.0, identifying and classifying
scale networks. There have also been extensive studies com- techniques, standards, and applications.Their survey’s key
paring the performance of traditional methods versus deep contributions include a discussion of the existing issues and
learning networks [19]. Vinayakumar et al. [20] applied var- limitations in predictive maintenance, as well as a proposal
ious sequence models, including Long Short Term Mem- for new taxonomy to define this research topic in light of
ory (LSTM)s, Recurrent neural networks (RNN), and Image Industry 4.0 requirements [37]. As per Costa et al., the
Recognition Neural Networks (IRNN), for prediction of industry has entered an era due to the necessity to adapt
network traffic. Similar types of traditional and advanced and adopt new technologies. The Internet of Things (IoT)
approaches have also been applied for traffic prediction in the is a recent era for communication, in which all kinds of
specific case of wireless networks. Amongst one of the earlier objects in our daily lives, such as smartphones, sensors,
approaches, Gowrishankar and Satyanarayana [21] presented or devices, that have been linked to network-enabled objects
a neural network architecture for wireless network traffic pre- (such as RFID) to communicate with each other and get to
diction. Xiang et al. [22] had proposed a hybrid ANN-based be a part of the Internet. Industry 4.0 is characterized by
approach for this task. Nikravesh et al. [23] analyzed the use connectivity, data volume, tech gadgets, inventory reduction,
of multiple techniques, including SVM, MLP, and MLPWD. customization, and controlled production [37]. Tan et al.,
Ke Wang, et al. [10] have suggested a hybrid model using in their research, have analyzed recent advancements in smart
CNN and LSTM where the ability of CNN and LSTM monitoring and data analytics that have enabled infrastructure
combined for traffic flow prediction. Stefany Coxe et al. [24] predictive maintenance (PdM). As per them, the industry is
have worked on Poisson’s regression and alternatives where currently hesitant in adopting new smart monitoring sensors,
they have stated that Count data represent the number of information technologies, and data analytics to achieve PdM.
times an activity occurred over a specific period. e.g., sup- PdM is data-driven, relying on smart monitoring and data
pose one wants to measure or observe how many hyper analytics insights to prevent downtime through maintenance,
aggressive actions are expressed by children while playing protection, and repairs. PdM is a relatively new trend in the
during a playground area on particular occasions. In that industry that has recently taken a leap in the industrial world
case, that is nothing but count data. As per them, almost since the 1990s; yet, their recent analysis of the industry
all Poisson regression models provide an easy method to revealed that its applicability in infrastructure maintenance
implement analyses of count data. Farhan Mohammad Khan is quite limited [38]. Chuang et al., while emphasizing the
and Rajiv Gupta used an Auto-Regressive Integrated Moving importance of predictive maintenance, have stated that [39],
Average (ARIMA) model to compare the accuracy of the business in all industries can be redefined with the emer-
predicted model [7]with a nonlinear autoregressive (NAR) gence of AI and IoT. The information gathered is utilized
neural network. Recently, for the daily prediction of COVID- not only used to draw inferences from the past but also to
19 cases for the next 50 days, the model was developed forecast the future. Artificial neural networks and evolution-
and implemented. Nie et al. [25], [26] have made multiple ary algorithms are two of the most common AI techniques
contributions in this domain in recent years. One of their for machine diagnosis. According to them, Predictive main-
initial approaches involved the use of deep belief networks tenance cuts down accidental device downtime, lowers main-
for wireless mesh backbone networks. Recently, they fur- tenance costs, and extends equipment life cycle, among other
ther enhanced their work by applying reinforcement learning benefits. The fundamental infrastructure of an IoT frame-
in an IoT setup [26]. Qiu et al. [27] deployed RNNs in a work consists of sensors, actuators, computation servers, and
Spatio-temporal sense for improved performance for traffic the communication network [40]. The TCP/IP (Transmission
prediction. Xu et al. [28] applied a multi-layer Gaussian Control Protocol/Internet Protocol) communication protocol
framework for this task. Recently, researchers have also seen transmits sensor data. The environmental sensing sensors are
the rise of attention mechanism [29] and deep learning in programmed into the programmable interface controller, and
interdisciplinary ways [30]. This trend is slowly reflected the data is saved in a historical manner [37], [39]. Pallavi et al.
in traffic prediction.He et al. [31] applied a meta-learning have mentioned that because IoT devices are typically located
scheme for faster traffic prediction in smaller networks. in geographically separated places, they communicate pri-
Li et al. [32] combined wavelet analysis with backpropa- marily over wireless mediums. They also have stated that
gation neural networks for traffic flow analysis in wireless Wireless channels are known for having significant distortion
networks. Zhang et al. [33] considered a spatiotemporal net- levels while being unstable. Communication techniques are
work with a modified sequence model. Kim came up with essential for the analysis of IoT devices. In this scenario,
an INGARCH model, an enhancement over the previous reliably transferring data without too many retransmissions
GARCH models [34]. Researchers have also used some sta- is a significant concern. The fundamental infrastructure of
tistical and evolutionary algorithms. Recently, most notably an IoT framework consists of sensors, actuators, computation

VOLUME 10, 2022 7005

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

servers, and the communication network [40]. MANET is a while allocating copies [44]. In the research by Devarajan,
vital element of the IoT network, serving as its backbone. Ganesh Gopal et al., the Internet of Things (IoT)is a compu-
MANET nodes with mesh architecture can form spontaneous tational concept that envisions widespread Internet connec-
connections with other nodes due to their intrinsic features, tivity, transforming everyday objects into connected devices.
requiring minimal infrastructure. MANET nodes can traverse The fundamental methodology in an IOT-based model is the
the IoT network, collecting data from sensors, RFID-enabled transmission of billions or perhaps trillions of sensitive data
nodes, and other fixed Wireless nodes. The MANET nodes capable of detecting the surrounding situation, communicat-
take the most effective route to connect with the Internet ing and transferring precise information, and then providing
gateways, which one is available. MANET nodes, like sensor feedback to nature. Remote connections are frequently used
nodes, can be employed as an essential technology in a variety to meet the adaptability and versatility required by IoT inter-
of IoT applications.MANET nodes and sensor nodes (includ- changes. While cellular innovations such as 3/4/5G provide
ing RFID-enabled devices), forming a MESH Network, can interface separations of large devices, they necessitate frame-
be deployed in huge numbers due to their self-configuring work support and legally allowed band. They have explained
nature. For the past decade, researchers have been working the concepts of IoT and IIoT, as well as the current trend
on the Internet of Things (IoT) using a variety of mature of robotization and data exchange in manufacturing break-
technologies such as Radio Frequency Identifiers (RFID), throughs known as Industry 4.0 [45]. As per Farrukh et al.,
Wireless Sensor Network (WSN), Mobile Adhoc Network it is understood that, With the creation of highly accurate
(MANET), and so on [41]. Nagarajan et al. have proposed and accurate algorithms, the investigation aims to focus on
remote health monitoring and data analysis by combining building more rigorous and practical methodologies [46].
IoT and Deep Learning techniques. They have suggested a It should be noted here that rarely has the work on traffic
new IOT-based FoG-assisted cloud network architecture that prediction focused on the historical data, which eventually
collects real-time health care data from patients via numerous can be used for traffic prediction. As a result, having models
medical IoT sensor networks. The analyses of it by using a that are both resilient and appropriate is critical. Further,
deep learning algorithm installed at a Fog-based Healthcare a standard paper highlighting all the contemporary machine
Platform. Furthermore, they have proposed a methodology learning and deep learning methods together could benefit
used to analyze the process in real-time for smart cities. young researchers in this domain. Also, there is hardly any
As per them, for timely, accurate and secure data analysis, notable work done where data is collected over a given period
new IOT-based FoG-assisted cloud network architecture can and analyzed for identifying the patterns in the collected data
be effectuated to various domains such as traffic analysis and where real-life data is considered and eventually predicting
management, agriculture and smart farming, weather fore- the performance of the network. All these points highlight the
casting etc [14]. In the view of Manuel et al., WSNs have been scope for improvement and the need for our research work.
identified as a significant enabler of the IoT models since
their inception. In IoT, all sensor nodes can get connected to
III. DATA COLLECTION AND PREPROCESSING
the Internet to share and receive data; however, in WSNs, the
nodes do not have a direct internet connection. To connect Before describing the algorithms used for modeling the data,
to the Internet, all nodes in the WSN need a mediator [42]. the data collection and preparation process is as mentioned
In their work, Krishnasamy et al. mention that a wireless below.
sensor network (WSN) comprises a large number of sensor
nodes that can both sense and communicate. The sensor nodes A. DATA COLLECTION
work together to gather and send the data to the sink node, The data was collected for over a year using a set of HSD
also known as the coordinator node. The primary goal of pumps. A total of eight sensors were placed for collecting
sensor nodes is to oversee the environment before process- data, forming a wireless mesh network. The collected data
ing and transferring data to an analysis centre. Sensors are consists of seven input variables and one output variable.
installed in locations, which are frequently uneven in design. Each of these parameters are described in Table 1. The total
Sensors are also installed randomly in specific sites that are data consists of 8960 such tuples, each containing one iden-
irregular in shape, relying on the transmission range. As a tifier (Date / Time), seven input variables(sensor reading for
result, an algorithm that can adapt to each geographic region 3phase Current, 3phase Voltage and Temperature), and one
with different deployment structures is required [43]. Authors output variable that is reading from the Vibration sensor. The
Deverajan Ganesh Gopal et al. have researched a DANET, data is divided into training and test sets in the ratio of 4:1,
or dynamic ad hoc network, is a network of multiple dynamic i.e. an 80% −20 % split are observed. The data is processed
nodes that do not necessitate any infrastructure. As and when before converting it into a more algorithm-friendly format.
needed, the movable nodes build a temporary network. This
form of a network without the concept of a centralized entity,
so nodes need to lean on another node to send packets. Mul- B. DATA PREPROCESSING
tiple copies of data are created to enhance data availability. The following checks were performed on the data as a part of
The access frequency and node level are taken into account preprocessing:

7006 VOLUME 10, 2022

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

TABLE 1. Description of the collected data parameters.

FIGURE 3. The seasonal decompose.

well-known method in traffic modelling, as seen in Figure 4.

The Auto-Regressive Integrated Moving Average is a promi-
nent modelling tool for traffic forecasting (ARIMA). [47].
Artificial intelligence-based regression models can provide
the necessary skills. Regression is a solution for creating
models capable of predicting the value of an output variable
in accordance with a set of input variables that are widely
utilized in many disciplines. AI-based algorithms are fre-
quently utilized for complex regression models. This type
FIGURE 2. ADfuller Test Results. of regression approach detects complex correlations between
input variables and interactions between input variables and
output variables automatically [48]. Thus, forecasting Inter-
1) Check the time series stationarity, that is, if the time net traffic is critical for network planning, resource allo-
series is stationary or not. cation and network anomaly detection caused by attacks.
2) Check whether a consistent mean and standard devi- This is because enhanced TCP/IP (Transmission Control
ation exists in the collected data range or not. This is Protocol/Internet Protocol)traffic forecasting can assist net-
verified by plotting the mean and standard deviation work providers in optimizing their resources. Better traffic
values on a rolling window across the entire data range. predictions can assist avoid congestion and resource waste
3) The ADFuller test for stationarity checks: The in bandwidth allocation schemes. Short-term prediction and
ADFuller test is used to check how well a trend persists long-term prediction are the two categories of network traffic
over the time series. This is achieved by keeping a null prediction. Short-term predicts traffic conditions in the near
hypothesis and an alternative hypothesis. Results of the future based on historical and present traffic data. A forecast’s
test are shown in Figure 2. Our null hypothesis is that horizon is only a few minutes long. On the other hand, long-
the time series has a common root and is non-stationary. term prediction provides traffic estimates for longer time
The alternative hypothesis would be the series being periods, such as years. Traditional forecasting models such
stationary. The p − value for the null hypothesis is as the Poisson regression model (PRM) is used to model
calculated. The threshold is set to 0.05 for the p-value. a counting variable, which is usually computed by using
As the value is observed to be less than the threshold, the maximum likelihood estimation (MLE) method [49].
it can be concluded that the null hypothesis is true and Autoregressive (AR) and Autoregressive Integrated Moving
that the series is indeed stationary. Average (ARIMA) can figure out the linear and Short Range
4) Application of seasonal decompose: The seasonal Dependencies (SRD) between terms, but not the Long Range
decompose method is applied to get the triad values Dependencies (LRD), resulting in poor performance when
used for setting up the stationary time series that is used used for Internet traffic forecasting. Nonetheless, they are
for forecasting. Residuals, Seasonality, and Trends are widely used [50].
the three values from this method Figure 3. In this paper, both statistical and non-statistical algo-
rithms are implemented for the task of output prediction. The
IV. ALGORITHMS USED FOR TRAFFIC PREDICTION description of each of these is given below:
In general, there are two types of traffic modelling for
short-term traffic prediction: parametric and non-parametric A. DECISION TREE REGRESSOR
methods. To employ parametric methods, first, a well- The Decision Tree Regressor(DTR) takes together all the
structured but flexible family of models is created, after which input features and iteratively generates multiple trees try-
the model parameters using training data must be estimated. ing out possible combinations of the root, internal, and leaf
Forecasts can then be made using the model. This is a nodes amongst all the features [9]. The tree with the closest

VOLUME 10, 2022 7007

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

FIGURE 4. Block diagram for Prediction model.

predicted output to the actual traffic is considered the best tree through which the input features are passed to derive the
for subsequent inference. output value. Each node in the hidden layer assigns an impor-
For each tree, for each node, two metrics can be calculated: tance weight to each input that it receives from its preceding
Gini index and entropy. Gini index is a probabilistic measure layer along with a bias value for error normalization. Gener-
that indicates the probability that the particular feature being ally, a fully connected setup is used wherein each node in a
at the given node would lead to the prediction error crossing layer is connected to all nodes in its next layer [11].
a set threshold. At a particular node index n, the Gini index Mathematically, the value at a particular node i in a given
for a feature fi is calculated as follows: layer l is derived as follows:
n
Gini Indexn = 1 − 6i=1
n
p2 (fi ) (1) X
H (x)li = wj H (x)l−1
j +b (5)
where p(fi ) indicates the probability of fi being present at node j=1
n. Ideally, the feature that produces the lowest Gini index for A rectified linear unit activation function is used at the
the particular node is assigned to that node. Overall, creating output layer to get a continuous prediction value. After com-
the decision tree would be to reduce the entropy, i.e. degree paring the predicted value with the actual value, the network
of randomness of possible traffic at each position. Entropy is back-propagates i.e. updates the initially randomized weights
defined as follows: so that the predicted output matches the actual value as
E(S) = 6i=1
n
− pi log2 pi (2) closely as possible.
w = w − αdw (6)
The lesser the entropy, the more confident and accurate the
predictions would be. The Decision Tree Regressor figures
D. POISSON REGRESSION
out the best tree setup by iteratively trying out multiple com-
The Poisson regression is a probabilistic model [8].It deploys
binations of positions of the seed and internal nodes in the
a probabilistic mass function (PMF) to check what could be
tree.
the probability of observing a particular continuous output
y for a given input containing a set of dependent variables
B. LINEAR REGRESSION
X ∈ [X1 , X2 , .., Xn ]. This function is defined as follows:
The Linear Regression model uses a linear mapping of fea- y
tures to get a continuous prediction output. It is one of the e−λi ∗ λi i
PMF(yi |xi ) = (7)
most primitive algorithms represented mathematically as: yi !
n
X where λi is the mean rate, also meant to be the predicted value.
H (x) = wi Xi + (3) The predicted regression output for a given input x is defined
i=1 as follows:
where is the error factor put in to accommodate normaliza- λi = exi β (8)
tion [10].To approximate the given data, the regression and
log-linear models can be employed.The data is modelled to Here, β is the regression coefficient or feature importance
match a straight line in (Simple) linear regression. A depen- that is given to each dependent variable. The training objec-
dent variable, y i.e. response variable, can, for example, tive of a Poisson regressor model is to find this β value. The
be described as a linear function of another random variable, best β value would be the one that produces the maximum
x. i.e. predictor variable [10]. value for PMF. The maximum value would be when the slope
of the PMF curve is minimum, which is better derived by
y = wx + b (4) taking derivatives of the logarithm of PMF. This derivative
equation is as follows:
C. MULTI-LAYER PERCEPTRON n
(yi xi β − exi β − lnyi !)
X
The multi-layer perceptron model is a type of deep learning ln(PMF) = (9)
architecture where there is a combination of hidden layers x=1

7008 VOLUME 10, 2022

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

This is equated to zero to find the best β value. This value

is then used to derive the prediction for a new test input xp :
yp = λp = exp β (10)
The training of Poisson regression is done with the objec-
tive of finding the values of the regression coefficient β
that would make the vector of observed count y most likely.
Following are the steps to be taken:
1) Convert the data set into only numeric values.
2) The data set should contain only non-negative integer
FIGURE 5. Block diagram of LSTM architecture.
values that represent the frequency of an event during a
set interval. For our problem statement, it would be the
traffic of network crossing host in particular interval.
F. LONG SHORT TERM MEMORY
3) Then find the regression variables that will influence
the observed counts to derive the maximum PMF value. Long Short-Term Memory is a recurrent network-based
architecture where it keeps track of a cell state to remember
E. AUTO REGRESSIVE INTEGRATION AND MOVING
certain memory trends in the series, shown in Figure 5. For
AVERAGE
every point in the state, the model decides whether to let go of
some information, update some pattern information, or output
ARIMA, that stands for Auto Regressive Integrated Moving
any new information.LSTMs are specifically developed to
Average, deploys a combination of auto-regression and mov-
prevent the problem of long-term dependency. All recurrent
ing average algorithms to get future predictions from past
neural networks are made up of a series of repeated neural
time series value [7]. Mathematically, it is represented as
network modules. This recurring module in standard Recur-
follows:
rent Neural Networks (RNNs), shown in Figure 5, will have
ht = α + β1 Yt−1 + β2 Yt−2 + . . . + βp Yt−p t (11) a simple structure, such as a single tanh layer [12].
yt = ht + φ1 t−1 + φ2 t−2 + . . . + φq t−q (12) There are three gates: forget, update, and output gates that
operate on the given input for a time series input Xi and
There are some terms based on auto-regression and some intermediate output ht .
terms based on moving average. If terms in the time series are
under-different, add more AR terms, and in cases of excess ft = σ (αf xt + βf ht−1 ) (13)
difference, then add more MA terms.ARIMA (p, d, q) method ot = σ (αo xt + βo ht−1 ) (14)
applies lag at the 1st or 2nd level if the non-stationary problem
exists in the data; otherwise, if stationary without lag, then G. CONVO-LSTM
ARMA (p, q) is an alternative method; hence p for Moving This paper proposes a novel combination of CNN along
Average (MA) and q for Autoregressive (AR) order that is with LSTM such that the feature extraction ability of CNN
the number of errors lag in ARIMA model forecast. The most can benefit the sequence recurrence mapping ability of
common method used for making a sequence stationary is to the Recurrent Neural Networks(RNN). Figure 4 depicts the
subtract the initial value from the current value. Depending model structural diagram whereas Figure 6 depicts the pro-
upon if the type of time series, i.e. univariate or multivariate, posed model’s architectural diagram. An input layer, one-
one or more lag is anticipated. Subsequently, the value of dimensional convolution layer, pooling layer, LSTM hidden
d signifies the smallest number of differentiation which is layer, and full connection layer are the main constituents
prescribed to keep the series stationary, so if without differ- that build the main structure of Convo-LSTM. Lecun et al.
entiation, the data series is still stationary, then d = 0. The proposed the CNN network model in 1998. The convolution
identification method began by measuring the presence of operation extracts the attributes from the input layer vectors
autocorrelation (ACF) and partial autocorrelation (PACF) by [52]. In this case, exclusively one dimensional - 1 D opera-
plotting the correlogram by [51]. Then, depending on the tions are only performed owing to the data structure. Pooling
ACF and PACF of the series, estimate of relevant models, layers are deployed to reduce storage requirements and avoid
setting the level of auto-regressive and moving averages. The the huge training costs in the system. The pooling layer
(p, q) identified and the best model is performed based on the subdivides the convolutional layer’s small rectangular chunks
spikes and curve in the graph of ACF and PACF. Once the to generate a single output from each block. Pooling can be
best model is selected, forecasting is done using parameters done in various ways, such as by calculating the average or
(p,d,q) given by the model. Diagnostic forecasting evaluation the maximum. The average pooling takes the average value
involves evaluating the efficacy of the currently built model of the block it is pooling, whereas the max-pooling takes the
using statistically relevant measures such as the Akaike infor- maximum of the block it is pooling [53]. Firstly, the CNN
mation criterion (AIC), Bayesian criterion (BIC), and mean layer extracts the features from the data, which are the read-
square error measurement [51]. ings from the Current, Voltage, Temperature and Vibration

VOLUME 10, 2022 7009

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

sensor’s readings collected over the previous year. The LSTM

is then used to forecast the output, Vibration, based on the
retrieved feature data. As per the experiment’s findings, With
the maximum prediction accuracy, the CNN-LSTM that is
Convo-LSTM, can provide credible forecasting of the output
parameter (Vibration).

1) TIME COMPLEXITY OF CONVO-LSTM

FIGURE 6. Proposed Convo-LSTM architecture.
To determine the time complexity of both the forward prop-
agation and backpropagation processes, the total number of
operations at each 1D CNN layer must first be determined, Local perception and weight sharing, of CNN can signifi-
and then the entire number of operations must be aggre- cantly lower the number of parameters, which in turn help
gated to determine the overall time complexity [54]. During enhance model learning efficiency. The convolution layer and
forward propagation P, the number of connections to the the pooling layer are the two fundamental constituents of
preceding layer at a CNN layer,l, is N l-1 N l the previous CNN. Each convolution layer has a number of convolution
layer’s number of connections is N l-1 N l , an individual linear kernels. The formula mentioned below is used for calculating
convolution, which is a linear weighted sum, is evaluated. Let them
S l-1 and W l-1 represent the vector sizes of the preceding layer
output, Skl-1 and the kernel (weight), respectively. A linear It = tanh(xt ∗ kt + bt ) (22)
convolution is constituted of(S l-1 W l-1 )2 multiplications and
where It is output value as a result of convolution, xt is input
S l-1 additions from a single connection, ignoring the bound-
vector,tanh is activation function, bt is bias and kt is convolu-
ary conditions. If the bias is ignored, the aggregate number of
tion kernels’ weight.The data features are obtained once the
multiplications and additions in layer l will be:
convolution layer completes the convolution operation, but,
N (mul)l = N l-1 N l ∗ S l − 1 ∗ (W l-1 )2 (15) as the extracted feature dimensions are quite large, after the
convolution layer, to lower the feature dimension and to lower
N (add)l = N l-1 N l ∗ S l-1 (16)
the cost of training the network, a pooling layer is added.
Low computational complexity is attained in all of the 1D The forget gate receives the output value of the previous
CNN. Thus, in forward propagation, the total number of mul- moment and the input value of the current time, with which
tiplications T(mul) and the total number of addition T(add), the forget gate’s output value is calculated, [52] as indicated
in the CNN layer l will be in the following formula:
L
X ft = σ (Wf .[ht-1 , Xt ] + bf ) (23)
TFP (mul) = N l-1 N l ∗ S l − 1 ∗ (W l-1 )2 (17)
The last time’s output value and the current time’s input value
l=0
are both fed into the input gate, and the output value and
L
X candidate cell state of the input gate are calculated, as shown
TFP (add) = N l-1 N l ∗ S l − 1 (18)
in the formulas below:
l=0

Now, similarly, at backpropagation iteration, the total number it = σ (Wi .[ht-1 , Xt ] + bf ) (24)
of multiplications and additions due to the first convolution The final output Ot is calculated as follows:
will, therefore, be:
ot = σ (αo xt + βo ht−1 ) (25)
L
TBP (mul) =
X
N l+1 N l ∗ S l + 1 ∗ (W l+1 )2 (19) ht = ot ∗ tanh(Ct ) (26)
l=0 where ft is having the value range of(0,1),Wf and bf are the
XL weight and bias of forget gate. Similarly,Wi and bi are the
TBP (add) = N l+1 N l ∗ S l + 1 (20) weight and bias of the input gate having value range (0,1)and
l=0 Ct is the output of the current cell with value and (0,1) [52].
So at each BP iteration, the total number of multiplications The following is a summary of the proposed architecture:
and additions will be, respectively: The convolution layer and the pooling layer are the two fun-
damental components of CNN. Each convolution layer has
TFP (mul) + TFP (add) + TBP (mul)TBP (add) (21) several convolution kernels. The data features are extracted
after the convolution operation of the Complexity the convo-
2) STATISTICAL ANALYSIS lution layer. As the pulled feature dimensions are very large,
CNN stands for convolutional neural network and is a type of a pooling layer is added after the convolution layer to reduce
feed-forward neural network. It can be exploited to forecast the feature dimension and to reduce the cost of training the
time series with great success. Two inherent features, namely, network. The convolution operation extracts the features from

7010 VOLUME 10, 2022

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

TABLE 2. Summary of proposed Convo-LSTM architecture.

FIGURE 7. Data set-snapshot.

ture, Three Phase Current (IR, IY, IB), Three Phase voltage
(VR, VY, VB), the HSD pump’s Vibration is predicted On
daily, hourly, weekly, monthly, and yearly basis. As per stan-
dard industry practice, the LM35 temperature sensor (Texas
the input layer vectors. In this case, exclusively 1D operations Instruments, Dallas, TX, USA) is a precision IC temperature
are only performed owing to the structure of the data. To avoid sensor with a proportional output is used for temperature.
the introduction of huge training costs in the system, pooling A Hall Effect-based DC current sensor is the ideal method of
layers are also deployed to reduce storage requirements. The measurement for monitoring the current of the motor. Alle-
summary of the proposed architecture is shown in Table 1 gro MicroSystems LLC’s ACS712 current transducer is used
To demonstrate the model’s usefulness, the performance of in this circuit (Worcester, MA, USA) [41]. High-frequency
ARIMA, Decision Tree Regressor, Linear Regressor, Multi- accelerometers with a flat frequency response up to 28kHz for
Layer Perceptron, and Long Short Term Memory is com- multi-stage compressors and boiler feed pumps monitoring
pared in this work, using the same training and test sets and bearing wear detection. The snapshot of the data set is
(sample mentioned in Figure 7 under the same operating shown in Figure 7. Model Description Various parameters
environment. All the model development and its performance of the Convo-LSTM are included in Table 2, which shows
evaluations are performed under the running environment of the CNN-LSTM parameter settings used in this experiment.
Intel (R) Core(TM) i5-7300HQ CPU @ 2.50GHz 2.50 GHz, The specific model is built as follows, based on the parameter
8.00 GB (7.87 GB usable) RAM and Windows10. The sen- settings of the Convo-LSTM network: A three-dimensional
sors used for current, temperature and vibration include the data vector is used as the input training data (None, 10, 7),
LM35 temperature sensor (Texas Instruments, Dallas, TX, where 10 is the time step size and as there are seven attributes
USA) is a precision IC temperature sensor with a proportional of the input data. A one-dimensional convolution layer is used
output (in C) is used for temperature. A Hall Effect-based to send the data at first, which extracts additional features
DC current sensor, Allegro MicroSystems LLC’s ACS712 and produces a three-dimensional output vector (None, 15,
current transducer, is used in this circuit (Worcester, MA, 64), where the size of the convolution layer filters is 64. After
USA). High-frequency accelerometers with a flat frequency passing through the pooling layer, the vector is transformed
response up to 28kHz for multi-stage compressors and boiler into a three-dimensional output vector (None, 13, 32). The
feed pumps monitoring and bearing wear detection. output vector is then trained using the LSTM layer and two
dense layers, and the output data (None, 64) goes through
3) SAMPLE DATA SET DESCRIPTION another complete connection layer after training to retrieve
The proposed system makes a strong case for the network the output value; 64 is the number of hidden units in the
traffic prediction, where the use of historical(time series data) LSTM layer. This CNN-LSTM model structure is shown in
data is collected over the wireless mesh network. The sensors Figure 6.
nodes with mesh architecture form connections. The data
is collected over one year, starting from 1st June 2019 to V. RESULTS
8th June 2020 are obtained from the sensors measuring the The models are trained using the data from the processed
Temperature, Three-phase Voltage, Three-phase current and training set; namely, Decision Tree Regressor (DTR), Lin-
Vibration values of the HSD Pump. A snapshot of the data is ear Regression (LR), Multi-layer Perceptron (MLP), Poisson
shown in Figure 7. As to implement various algorithms, the Regression, Auto-Regressive Integrated Moving Average
data is required to be divided into the training set, testing data (ARIMA), Long Short-Term Memory (LSTM), Convo-
set and validation set. The first 7178 readings of the data are LSTM respectively are trained. To forecast the output from
taken as the training set, and the data of 1345 readings are the test set data, this fully trained model is used. The result of
taken as the validation set the last 450 readings as the test set. the predictions of the model is compared with the actual value
According to the influence factors, including the Tempera- of the output from the data set. Among the six forecasting

VOLUME 10, 2022 7011

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

TABLE 3. Results for hourly prediction model. TABLE 7. Results for yearly prediction model.

TABLE 4. Results for daily prediction model.

FIGURE 8. Actual vs Predicted vibrations using Poisson’s algorithm.

TABLE 5. Results for weekly prediction model.

FIGURE 9. Actual vs Predicted vibrations using ARIMA algorithm.

TABLE 6. Results for monthly prediction model.

FIGURE 10. Actual vs Predicted vibrations using DTR algorithm.

methods, Decision Tree Regressor (DTR), Linear Regres-
sion (LR), Multi-layer Perceptron (MLP), Poisson Regres-
sion, Auto-Regressive Integrated Moving Average (ARIMA), • Hourly: The Sensor readings are logged at the interval
Long Short-Term Memory (LSTM), and Convo-LSTM, the of every hour.
maximum degree of broken line fitting is shown by Convo- • Daily: The Sensor readings are logged as the cumulative
LSTM, which practically fits with each other and the MLP value for the entire day.
model shows the lowest degree of broken line fitting. • Weekly: The Sensor readings are logged at the weekly
The aforementioned model is evaluated across multiple level, thereby negating any daily variations observed
time intervals: between any specific days.

7012 VOLUME 10, 2022

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

• Mean Absolute Errors (MAE): Absolute difference

between actual and predicted output value.From the
observations it is found that the MAE for Convo-LSTM
model is 0.1, which indicates the model’s performance
over other models.
D
X
|xi − yi | (27)
i=1
• Mean Square Error (MSE): Square of the difference
between actual and predicted output.
FIGURE 11. Actual vs Predicted vibrations using Linear Regression
algorithm. D
X
(xi − yi )2 (28)
i=1
• Root Mean Squared Error (RMSE): Square root of the
MSE value. The result shows the comparison of the
RMSE value of Convo-LSTM over other models.
s
1 n di − fi 2
6 (29)
n i=1 σi
The hourly, daily, weekly, monthly and yearly prediction
of the output is presented as the results as mentioned in
Table 3 to Table 7. The result shows the comparison of the
FIGURE 12. Actual vs Predicted vibrations using MLP algorithm. MAE, MSE and RMSE value of Convo-LSTM over other
models for hourly, daily, weekly, monthly, and yearly models,
which indicates the model’s performance over other models.
From the observations, it is found that the MAE for the
Convo-LSTM model is 0.1, MSE is 0.025, and RMSE is 0.16,
which indicates the model’s performance over other models.
For interpretability of the results,the graphs indicating com-
parison between predicted and actual output for the imple-
mented algorithms are shown,where Figure 8 MLP, Figure 9
ARIMA, Figure 10 DTR, Figure 11 Linear Regression, Fig-
ure 12 Multi-Layer Perceptron, Figure 13 LSTM, Figure 14
Convo-LSTM. The resultant graphs have been added to this
FIGURE 13. Actual vs Predicted vibrations using LSTM algorithm. paper.

VI. CONCLUSION
This paper proposes a hybrid model for traffic prediction in
wireless mesh networks by application of regression meth-
ods on system configuration parameters. Specifically, six
different algorithms are applied: decision tree regressor, lin-
ear regression, multi-layer perceptron, Poisson regression,
ARIMA, and LSTM on the three main feature types: three-
phase current, three-phase voltage, and temperature to predict
the output- Vibration of the HSD Pump. This paper also
proposes a new Convo-LSTM setup for this task and achieve
good results from the same. The system was evaluated on five
FIGURE 14. Actual vs Predicted vibrations using Convo-LSTM algorithm. different intervals: hourly, daily, weekly, monthly, and yearly
and it is found the Convo-LSTM algorithm to be the best
performing one. A time-series multivariate data set is similar
• Monthly: The Sensor readings are logged on a month to that of wireless mesh networks data set. This gives a direc-
over month basis. tion to the researchers to implement the proposed algorithm
• Yearly: The Sensor readings on a yearly basis. to predict the volume of network traffic. Future work in the
The results are evaluated across the following three domain includes the application of some contemporary meth-
metrics: ods like the attention mechanism and transformers for traffic

VOLUME 10, 2022 7013

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

prediction. Multimodal networks that combine the physical [20] R. Vinayakumar, K. P. Soman, and P. Poornachandran, ‘‘Applying deep
configuration values and the network system log values can learning approaches for network traffic prediction,’’ in Proc. Int. Conf. Adv.
Comput., Commun. Informat. (ICACCI), Sep. 2017, pp. 2353–2358.
also be proposed. As network systems become larger and [21] Gowrishankar and P. S. Satyanarayana, ‘‘Neural network based traffic
more distributed in nature, smart algorithms that can automat- prediction for wireless data networks,’’ Int. J. Comput. Intell. Syst., vol. 1,
ically predict the incoming traffic and accordingly allocate no. 4, pp. 379–389, Dec. 2008.
[22] L. Xiang, X.-H. Ge, C. Liu, L. Shu, and C.-X. Wang, ‘‘A new hybrid
the resources would become the need of the hour. network traffic prediction method,’’ in Proc. IEEE Global Telecommun.
Conf., Dec. 2010, pp. 1–5.
[23] A. Y. Nikravesh, S. A. Ajila, C.-H. Lung, and W. Ding, ‘‘Mobile network
REFERENCES traffic prediction using mlp, mlpwd, and SVM,’’ in Proc. IEEE Int. Congr.
[1] J. N. Al-Karaki and A. E. Kamal, ‘‘Routing techniques in wireless sensor Big Data, May 2016, pp. 402–409.
networks: A survey,’’ IEEE Wireless Commun., vol. 11, no. 6, pp. 6–28, [24] S. Coxe, S. G. West, and L. S. Aiken, ‘‘The analysis of count data: A gentle
Dec. 2004. introduction to Poisson regression and its alternatives,’’ J. Personality
[2] A. Cilfone, L. Davoli, L. Belli, and G. Ferrari, ‘‘Wireless mesh network- Assessment, vol. 91, no. 2, pp. 121–136, Feb. 2009.
ing: An IoT-oriented perspective survey on relevant technologies,’’ Future [25] L. Nie, D. Jiang, S. Yu, and H. Song, ‘‘Network traffic prediction based on
Internet, vol. 11, no. 4, p. 99, Apr. 2019. deep belief network in wireless mesh backbone networks,’’ in Proc. IEEE
[3] C. Zhang, P. Patras, and H. Haddadi, ‘‘Deep learning in mobile and wireless Wireless Commun. Netw. Conf. (WCNC), Mar. 2017, pp. 1–5.
networking: A survey,’’ IEEE Commun. Surveys Tuts., vol. 21, no. 3, [26] L. Nie, Z. Ning, M. S. Obaidat, B. Sadoun, H. Wang, S. Li, L. Guo,
pp. 2224–2287, 3rd Quart., 2019. and G. Wang, ‘‘A reinforcement learning-based network traffic prediction
[4] H. Moayedi and M. Masnadi-Shirazi, ‘‘Arima model for network traffic mechanism in intelligent Internet of Things,’’ IEEE Trans. Ind. Informat.,
prediction and anomaly detection,’’ in Proc. Int. Symp. Inf. Technol., vol. 4, vol. 17, no. 3, pp. 2169–2180, Mar. 2021.
Sep. 2008, pp. 1–6. [27] C. Qiu, Y. Zhang, Z. Feng, P. Zhang, and S. Cui, ‘‘Spatio-temporal wireless
traffic prediction with recurrent neural network,’’ IEEE Wireless Commun.
[5] B. Zhou, D. He, Z. Sun, and W. H. Ng, ‘‘Network traffic modeling and
Lett., vol. 7, no. 4, pp. 554–557, Aug. 2018.
prediction with arima/garch,’’ in Proc. HET-NETs Conf., 2005, pp. 1–10.
[28] Y. Xu, F. Yin, W. Xu, J. Lin, and S. Cui, ‘‘Wireless traffic prediction
[6] S. Han-Lin, J. Yue-Hui, C. Yi-Dong, and C. Shi-Duan, ‘‘Network traffic with scalable Gaussian process: Framework, algorithms, and verification,’’
prediction by a wavelet-based combined model,’’ Chin. Phys. B, vol. 18, IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1291–1306, Jun. 2019.
no. 11, p. 4760, 2009. [29] P. Ratadiya and D. Mishra, ‘‘An attention ensemble based approach for
[7] F. M. Khan and R. Gupta, ‘‘ARIMA and NAR based prediction model for multilabel profanity detection,’’ in Proc. Int. Conf. Data Mining Workshops
time series analysis of COVID-19 cases in India,’’ J. Saf. Sci. Resilience, (ICDMW), Nov. 2019, pp. 544–550.
vol. 1, no. 1, pp. 12–18, Sep. 2020. [30] P. Ratadiya, K. Asawa, and O. Nikhal, ‘‘A decentralized aggregation
[8] K. F. Sellers and B. Premeaux, ‘‘Conway–maxwell–Poisson regression mechanism for training deep learning models using smart contract system
models for dispersed count data,’’ WIREs Comput. Statist., vol. 13, no. 6, for bank loan prediction,’’ 2020, arXiv:2011.10981.
p. e1533, Nov. 2021. [31] Q. He, A. Moayyedi, G. Dan, G. P. Koudouridis, and P. Tengkvist, ‘‘A meta-
[9] S. Walker, W. Khan, K. Katic, W. Maassen, and W. Zeiler, ‘‘Accuracy learning scheme for adaptive short-term network traffic prediction,’’ IEEE
of different machine learning algorithms and added-value of predicting J. Sel. Areas Commun., vol. 38, no. 10, pp. 2271–2283, Oct. 2020.
aggregated-level energy performance of commercial buildings,’’ Energy [32] M. Li, Y. Wang, Z. Wang, and H. Zheng, ‘‘A deep learning method based on
Buildings, vol. 209, Jan. 2020, Art. no. 109705. an attention mechanism for wireless network traffic prediction,’’ Ad Hoc
[10] K. Wang, C. Ma, Y. Qiao, X. Lu, W. Hao, and S. Dong, ‘‘A hybrid deep Netw., vol. 107, Jun. 2020, Art. no. 102258.
learning model with 1DCNN-LSTM-attention networks for short-term [33] D. Zhang, L. Liu, C. Xie, B. Yang, and Q. Liu, ‘‘Citywide cellular traffic
traffic flow prediction,’’ Phys. A, Stat. Mech. Appl., vol. 583, Jan. 2021, prediction based on a hybrid spatiotemporal network,’’ Algorithms, vol. 13,
Art. no. 126293. no. 1, p. 20, Jan. 2020.
[11] Y. Liu, S. Liu, Y. Wang, F. Lombardi, and J. Han, ‘‘A stochastic compu- [34] M. Kim, ‘‘Network traffic prediction based on INGARCH model,’’ Wire-
tational multi-layer perceptron with backward propagation,’’ IEEE Trans. less Netw., vol. 26, no. 8, pp. 6189–6202, Nov. 2020.
Comput., vol. 67, no. 9, pp. 1273–1286, Sep. 2018. [35] Y. Li, Z. Ma, Z. Pan, N. Liu, and X. You, ‘‘Prophet model and Gaussian
[12] Y. Yu, X. Si, C. Hu, and Z. Jianxun, ‘‘A review of recurrent neural networks: process regression based user traffic prediction in wireless networks,’’ Sci.
LSTM cells and network architectures,’’ Neural Comput., vol. 31, no. 7, China Inf. Sci., vol. 63, no. 4, pp. 1–8, Apr. 2020.
pp. 1235–1270, Jul. 2019. [36] Y. Li, J. Huang, and H. Chen, ‘‘Time series prediction of wireless network
[13] L. Nie, X. Wang, L. Wan, S. Yu, H. Song, and D. Jiang, ‘‘Network traffic traffic flow based on wavelet analysis and BP neural network,’’ J. Phys.,
prediction based on deep belief network and spatiotemporal compressive Conf. Ser., vol. 1533, no. 3, Apr. 2020, Art. no. 032098.
sensing in wireless mesh backbone networks,’’ Wireless Commun. Mobile [37] T. Zonta, C. A. D. Costa, R. D. R. Righi, M. J. D. Lima, E. S. D. Trindade,
Comput., vol. 2018, May 2018, Art. no. 1260860. and G. P. Li, ‘‘Predictive maintenance in the industry 4.0: A systematic lit-
erature review,’’ Comput. Ind. Eng., vol. 2020, Oct. 2020, Art. no. 106889.
[14] S. M. Nagarajan, G. G. Deverajan, P. Chatterjee, W. Alnumay, and
[38] M. Tan, I. Ubhayaratne, Y. Huo, F. B. Varela, and Y. Xiang, ‘‘Predictive
U. Ghosh, ‘‘Effective task scheduling algorithm with deep learning for
maintenance based on smart monitoring and data analytics,’’ in Proc.
Internet of Health Things (IoHT) in sustainable smart cities,’’ Sustain.
Australas. Corrosion Assoc. Corrosion Prevention Conf., 2019, pp. 1–10.
Cities Soc., vol. 71, Jan. 2021, Art. no. 102945.
[39] S.-Y. Chuang, N. Sahoo, H.-W. Lin, and Y.-H. Chang, ‘‘Predictive main-
[15] B. Lindemann, T. Müller, H. Vietz, N. Jazdi, and M. Weyrich, ‘‘A survey on
tenance with sensor data analytics on a raspberry pi-based experimental
long short-term memory networks for time series prediction,’’ Proc. CIRP,
platform,’’ Sensors, vol. 19, no. 18, p. 3884, Sep. 2019.
vol. 99, pp. 650–655, Jan. 2021.
[40] P. Sethi and S. R. Sarangi, ‘‘Internet of Things: Architectures, proto-
[16] A. Khotanzad and N. Sadek, ‘‘Multi-scale high-speed network traffic cols, and applications,’’ J. Electr. Comput. Eng., vol. 2017, Jan. 2017,
prediction using combination of neural networks,’’ in Proc. Int. Joint Conf. Art. no. 9324035.
Neural Netw., vol. 2, Jul. 2003, pp. 1071–1075. [41] S. Mukherjee and G. P. Biswas, ‘‘Networking for IoT and applications
[17] V. Alarcon-Aquino and J. A. Barria, ‘‘Multiresolution FIR neural-network- using existing communication technology,’’ Egyptian Informat. J., vol. 19,
based learning algorithm applied to network traffic prediction,’’ IEEE no. 2, pp. 107–127, Jul. 2018.
Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 36, no. 2, pp. 208–220, [42] A. J. Manuel, G. G. Deverajan, R. Patan, and A. H. Gandomi, ‘‘Optimiza-
Mar. 2006. tion of routing-based clustering approaches in wireless sensor network:
[18] Y. Chen, B. Yang, and Q. Meng, ‘‘Small-time scale network traffic pre- Review and open research issues,’’ Electronics, vol. 9, no. 10, p. 1630,
diction based on flexible neural tree,’’ Appl. Soft Comput., vol. 12, no. 1, Oct. 2020.
pp. 274–279, Jan. 2012. [43] L. Krishnasamy, R. Dhanaraj, D. G. Gopal, T. R. Gadekallu, M. Aboudaif,
[19] T. P. Oliveira, J. S. Barbar, and A. S. Soares, ‘‘Computer network traffic and E. A. Nasr, ‘‘A heuristic angular clustering framework for secured
prediction: A comparison between traditional and deep learning neural statistical data aggregation in sensor networks,’’ Sensors, vol. 20, no. 17,
networks,’’ Int. J. Big Data Intell., vol. 3, no. 1, pp. 28–37, 2016. p. 4937, Aug. 2020.

7014 VOLUME 10, 2022

S. Mahajan et al.: Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model

[44] D. G. Gopal and R. Saravanan, ‘‘Selfish node detection based on evidence HARIKRISHNAN R. received the bachelor’s
by trust authority and selfish replica allocation in danet,’’ Int. J. Inf. degree in electrical and electronics engineering
Commun. Technol., vol. 9, no. 4, pp. 473–491, 2016. from the University of Madras, the master’s degree
[45] G. G. Deverajan, V. Muthukumaran, C.-H. Hsu, M. Karuppiah, in energy system engineering from VIT Univer-
Y.-C. Chung, and Y.-H. Chen, ‘‘Public key encryption with equality test for sity, Vellore, the master’s degree in embedded sys-
industrial Internet of Things system in cloud computing,’’ Trans. Emerg. tem technologies from Anna University, Chennai,
Telecommun. Technol., vol. 4, Jan. 2021, Art. no. e4202. India, and the Ph.D. degree in electrical engineer-
[46] Y. Ali Farrukh, I. Khan, Z. Ahmad, and R. M. Elavarasan,
ing from Sathyabama University, Chennai. He has
‘‘A sequential supervised machine learning approach for cyber attack
21 years of teaching, research, and industrial expe-
detection in a smart grid system,’’ 2021, arXiv:2108.00476.
[47] W. Alajali, W. Zhou, S. Wen, and Y. Wang, ‘‘Intersection traffic prediction rience. He is currently working as an Associate
using decision tree models,’’ Symmetry, vol. 10, no. 9, p. 386, Sep. 2018. Professor with the Electronics and Telecommunication Engineering Depart-
[48] J. P. T. Higgins and S. G. Thompson, ‘‘Controlling the risk of spurious find- ment, Symbiosis Institute of Technology, Symbiosis International Deemed
ings from meta-regression,’’ Statist. Med., vol. 23, no. 11, pp. 1663–1682, University, Pune, India. His main research interests include smart grid, the
2004. Internet of Things, artificial intelligence, and wireless sensor networks.
[49] M. Amin, M. N. Akram, and M. Amanullah, ‘‘On the James-Stein esti-
mator for the Poisson regression model,’’ in Proc. Commun. Statistics-
Simulation Comput., pp. 1–13, 2020.
[50] C. Katris and S. Daskalaki, ‘‘Comparing forecasting approaches for inter-
net traffic,’’ Expert Syst. Appl., vol. 42, no. 21, pp. 8172–8183, 2015.
[51] E. Elakkiya, M. Radha, and R. Sathy, ‘‘Application of Arima model for
predicting cashew nut production in India—An analysis,’’ Int. J. Res. Bus.
Manage., vol. 5, pp. 45–52, 2017. KETAN KOTECHA received the M.Tech. and
[52] W. Lu, J. Li, Y. Li, A. Sun, and J. Wang, ‘‘A CNN-LSTM-based model to Ph.D. degrees from the IIT Bombay. He is cur-
forecast stock prices,’’ Complexity, vol. 2020, May 2020, Art. no. 6622927. rently the Head of the Symbiosis Centre for
[53] M. Alawad and M. Lin, ‘‘Stochastic-based deep convolutional networks Applied AI (SCAAI). He has expertise and expe-
with reconfigurable logic fabric,’’ IEEE Trans. Multi-Scale Comput. Syst., rience in cutting-edge research and projects in AI
vol. 2, no. 4, pp. 242–256, Oct. 2016. and deep learning for more than the last 25 years.
[54] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman, He has published more than 100 widely in a num-
‘‘1d convolutional neural networks and applications: A survey,’’ Mech. ber of excellent peer-reviewed journals on various
Syst. signal Process., vol. 151, May 2021, Art. no. 107398. topics ranging from cutting-edge AI, education
policies, teaching-learning practices, and AI for
all. He has published three patents and delivered keynote speeches at various
national and international forums, including at Machine Intelligence Labo-
ratory, USA, at IIT Bombay, under World Bank Project, at the International
SMITA MAHAJAN received the bachelor’s degree Indian Science Festival organized by the Department of Science Technology,
from Shivaji University, Maharashtra, India, and Government of India, and many more. He was a recipient of the two SPARC
the master’s degree in information technology projects worth INR 166 lacs from MHRD Government of India in AI in
from Mumbai University. She is currently pursu- collaboration with Arizona State University, USA, and The University of
ing the Ph.D. degree in computer science engi- Queensland, Australia, and also a recipient of numerous prestigious awards
neering with Symbiosis International University, like Erasmus+ Faculty Mobility Grant to Poland, DUO-India Professors
Pune, Maharashtra. She is currently working as Fellowship for Research in Responsible AI in collaboration with Brunel
an Assistant Professor with the Computer Sci- University, U.K., LEAP Grant at Cambridge University, U.K., UKIERI Grant
ence Engineering Department, Symbiosis Institute with Aston University, U.K., and a Grant from the Royal Academy of
of Technology, Symbiosis International Deemed Engineering, U.K., under Newton Bhabha Fund. He is also an Associate
University, Pune. Her main research interests include computer networking, Editor of IEEE ACCESS journal.
wireless and broadband networks, deep learning, and machine learning.