Network Traffic Prediction based on Diffusion Convolutional Recurrent Neural Networks

Uploaded by

mexujinhua

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Network Traffic Prediction based on Diffusion Convolutional Recurrent Neural Networks

Uploaded by

mexujinhua

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2019 IEEE INFOCOM WKSHPS: NI 2019: Network Intelligence: Machine Learning for Networking

Network Traffic Prediction based on Diffusion

Convolutional Recurrent Neural Networks
Davide Andreoletti1, 2 , Sebastian Troia2 , Francesco Musumeci2 , Silvia Giordano1 , Guido Maier2 , and Massimo Tornatore2
1
Networking Laboratory, University of Applied Sciences of Southern Switzerland, Manno, Switzerland, Email: {name.surname}@supsi.ch
2 Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, Italy, Email: {name.surname}@polimi.it

Abstract—By predicting the traffic load on network links, a a more effective planning decisions. Short-term traffic predic-
network operator can effectively pre-dispose resource-allocation tion (i.e., predictions within minutes, even seconds) is usually
strategies to early address, e.g., an incoming congestion event. linked to dynamic resource allocation, and can be used to
Traffic loads on different links of a telecom is know to be
subject to strong correlation, and this correlation, if properly improve Quality of Service (QoS) mechanisms as well as for
represented, can be exploited to refine the prediction of future congestion control and optimal resource management. Several
congestion events. Machine Learning (ML) represents nowadays different techniques including time series models, modern data
the state-of-the-art methodology for discovering complex rela- mining techniques, soft computing approaches, and neural
tions among data. However, ML has been traditionally applied networks have been used for network traffic analysis and
to data represented in the Euclidean space (e.g., to images)
and it may not be straightforward to effectively employ it to prediction [1].
model graph-stuctured data (e.g., as the events that take place in Within a telecom network, traffic is exchanged between
telecom networks). Recently, several ML algorithms specifically nodes and crosses network links. Such links have relations
designed to learn models of graph-structured data have appeared among each other, i.e., due to their adjacency, their behaviour
in the literature. The main novelty of these techniques relies on is correlated. For example, it is more likely that congestion
their ability to learn a representation of each node of the graph
considering both its properties (e.g., features) and the structure occurs in links adjacent to a congested link than elsewhere.
of the network (e.g., the topology). In this paper, we employ Due to the large amount of data that is available today in
a recently-proposed graph-based ML algorithm, the Diffusion telecom networks, algorithms coming from the area of Ma-
Convolutional Recurrent Neural Network (DCRNN), to forecast chine Learning (ML) have been investigated to enable network
traffic load on the links of a real backbone network. We evaluate intelligence [2], thanks to the ability of ML to extract useful
DRCNN’s ability to forecast the volume of expected traffic and
to predict events of congestion, and we compare this approach (and sometimes “hidden”) information from data. However,
to other existing approaches (as LSTM, and Fully-Connected despite the significant amount of research in this direction, the
Neural Networks). Results show that DCRN outperforms the topological relation among the links has not been traditionally
other methods both in terms of its forecasting ability (e.g., MAPE leveraged by these machine learning algorithms, and, to the
is reduced from 210% to 43%) and in terms of the prediction of best of our knowledge, no existing solution is specifically
congestion events, and represent promising starting point for the
application of DRCNN to other network management problems. designed to process graph-structured data.
Index Terms—traffic forecasting, graph-based machine learn- In this paper we employ a recently-proposed machine
ing, network congestion learning algorithm (originally developed to do road traffic
forecasting [3]) to predict the traffic load on the links of a
I. I NTRODUCTION telecom network. This algorithm is referred to as Diffusion
As telecom networks become more and more complex Convolution Recurrent Neural Network (DCRNN) and, dif-
(see, e.g., the enormous set of adjustable parameters to be ferently from traditional machine learning approaches, it can
managed in modern systems), is also becoming increasingly capture important topological properties of the network, which
important to limit human intervention and speed up network are expected to significantly influence the patterns followed by
management procedures. Novel software solutions for network the traffic when propagating through the network.
automation allow to automatically configure, provision, man- Our objective is to predict to next load on a link of a telecom
age and test network devices and can be used to increase network, given the sequence of the past observations of link
the infrastructure efficiency and reduce human error and loads. The problem is modeled as a regression where the
operational expenditures. objective is to minimize the error between the predicted and
In particular, network traffic prediction plays an important the actual next load on the links. In the literature, this specific
role in many areas of networking, such as network manage- problem has already been addressed by using ML methods.
ment, network design, short and long-term resource allocation, However, to the best of our knowledge, this is the first time
traffic (re)-routing and anomaly detection. Two categories of that a ML algorithm able to capture the topological relations of
prediction methods, based on long and short term’s periods, the links of telecom network is employed to perform this task.
are typically considered. Long-term traffic prediction is used Specifically, we train a DCRNN using real data gathered from
to estimate future capacity requirements, and therefore enables a backbone network (i.e., Abilene) and compare this approach

978-1-7281-1878-9/19/$31.00 ©2019 IEEE 246

2019 IEEE INFOCOM WKSHPS: NI 2019: Network Intelligence: Machine Learning for Networking

with several baselines traffic-prediction algorithms (e.g., the To our knowledge, none of the existing methods of traffic
LSTM network [4]). A comparison of both the effectiveness prediction explicitely considers the topological information of
of the regression (e.g., measured in terms of mean absolute the network. Arguably, this is due to the fact that ML solutions
error) and the ability to detect congestions events (which are specifically designed to process data that do not belong to the
defined following a threshold-based criterion) is carried out. Euclidean domain [10], and in particular those with a graph-
Results show a remarkable improvement of the DCRNN with based structure, have appeared in the literature only recently
respect to all the baseline methods on both the aspects, and [10]. At a high-level, these methods are based on filtering
encourage its application also for other network management operations designed to be suitable for graphs. These filters are
tasks. used within machine learning algorithms and their parameters
The rest of the paper is structured as follows. In Section II are learned to make them able to capture hidden patterns of
we review some works related to the use of machine learning the relations among the nodes of the graph. For example, [11],
as a tool for network traffic prediction, as well as the several [12] propose a generalization of the CNN that is suitable to
machine learning algorithms specifically designed to work on process graphs to perform, for example, classification of the
graph-structured data. Section III briefly reviews the concepts nodes. The authors of [3] propose the diffusion convolution
needed to understand the proposed methodology, such as operator and build a machine learning algorithms based on
the recurrent neural networks and the diffusion convolutional this. This algorithm is then used to perform traffic forecasting
operator. In Section IV, we present the problem statement on road traffic in [3], [13]. Here, we use the same methodology
and describe the employed methodology. Description of the to forecast the load on the links of a telecommunication
simulation settings and presentation of results is given in network.
Section V. Finally, Section VI concludes the paper.
III. BACKGROUND
II. R ELATED W ORK In this Section, we briefly review background concepts to
An accurate prediction of network traffic is of utmost understand the DCRNN, as well as benchmarks algorithms
importance for network operators, as it enables an efficient that we compare with the proposed approach.
management of resources and load balancing. Given the A. Convolutional Neural Networks
importance of the topic, the related literature is abundant.
Convolution is widely employed in signal processing to
We focus here on several related works evaluating ML-based
perform filtering operations. The convolution between two
methods for network traffic prediction.
signals x and w is defined as:
The authors of [5] propose a framework for network Traffic
Matrix (TM) prediction based on Recurrent Neural Networks T
X
equipped with the Long Short-Term Memory units, i.e., RNN (x ∗ w)(t) = x(t) · w(t − τ ) (1)
LSTM. TM prediction is defined as the problem of estimating τ =0
future network traffic matrix from the previous ones. Similar where w is generally referred to as kernel of filter and T is
approaches can be found in [6], [7]. [6] proposes an end-to- its support. In general, the kernel w is hand-crafted by expert
end deep learning architecture consisting of a convolutional designers in such a way that the convolution captures some
and a recurrent module that, combined, can extract both desidered properties of the signal. A Convolutional Neural
spatial and temporal information from the traffic flows. [7] Network (CNN) is a machine learning module that is trained
proposes a model of neural network which can be used to learn the parameters of a number of filters (whose support,
to combine LSTM with Deep Neural Networks (DNN). An i.e., their length, is fixed).
autocorrelation coefficient is added to the model to improve CNN networks can be formed by stacking together multiple
the accuracy of predictions. The main novelty of [7] is to CNN layers. In general, these architectures are characterized
include autocorrelation of the time series in the input of by a Dropout layer on top of each CNN. Although CNNs
the ML algorithm, which leads to superior performance with are more commonly used in the 2D domain (e.g., to perform
respect to existing methods. The combination of a special type image recognition), it is not rare to see their employment
of LSTM unit, i.e., the Gated Recurrent Units (GRU) and also in the 1D domain, e.g., for time-series forecasting. The
the Convolutional Neural Network (CNN) in the 2D domain support of the kernel tunes the level of temporal dynamic that
(CNN-2D) has been proposed for the task of network traffic the filters can capture. Namely, filters with long support can
prediction in datacenters in [8]. The underlying idea of the extract longer temporal dynamic with respect to shorter ones.
work in [8] is to treat network matrices as images and use
the CNN2D to find the correlations among traffic exchanged B. Recurrent Neural Networks
between different pairs of nodes. Note that, in literature, the Recurrent Neural Networks (RNNs) have been designed
prediction of traffic exchanged among network nodes is more with the specific purpose to overcome the limitations of
common than the prediction of the load on network links. feedforward neural networks in modeling sequences. RNN
However, examples of application of ML to this specific task networks are composed of units (i.e., neurons) capable of
can be found, e.g., in [9] where Support Vector Machines are keeping track of past observations. This allows RNNs to
employed to perform the regression. model the input data based on both current and previously

247
2019 IEEE INFOCOM WKSHPS: NI 2019: Network Intelligence: Machine Learning for Networking

seen observations. Among the proposed RNNs architectures, this Section, we briefly review one of the most prominent
the Long Short-Term Memory (LSTM) proved particularly solutions of this kind, which is based on the idea that the
effective in modeling long-range temporal dependendies of relation between two nodes can be represented as a diffusion
input data. process. Specifically, the probability that a random walk of K
More formally, given an input vector x(t) and the current steps that starts at the first node and ends at the second can
observation (say x(t+1)), the LSTM unit recursively performs be computed knowing the state transition matrix D0 −1 · W
the following operations: (with D0 being the out-degree diagonal matrix of the graph).
Intuitively, the diffusion process gives important clues on
i(t) = σ(Wi [x(t), h(t − 1)] + bi ) (2) the influence that each node excercises on all the others.
This contextual knowledge may be used to improve the
representation of the nodes within the feature space (i.e., X)
c̃(t) = tanh(Wc [x(t), h(t − 1)] + bc ) (3)
through the application of filtering performed using appropri-
ate convolutional operations.
f (t) = σ(Wf [x(t), h(t − 1)] + bf ) (4) The K-steps diffusion convolution between a graph signal
X ∈ RN XP and a filter fθ is referred to as ∗G and defined
as:
c(t) = f (t) c(t − 1) + i(t) c̃(t) (5)
K−1
X
o(t) = σ(Wo [x(t), h(t − 1)] + bo ) (6) X ∗G fθ = θk,1 (D0 −1 W)k + θk,2 (D0 −1 W| )k ·X
k=0
(8)
h(t) = o(t) tanh(C(t)) (7) where θ ∈ RKX2 are the parameters of the filter, D0 −1 ·
W is the state transition matrix of the diffusion process and
where is the element-wise matrix multiplication.
D0 −1 · W| is its transpose.
Wi , Wc , Wf , Wo and bi , bc , bf , bo are learnable kernels
The diffusion convolutional operator can be used as building
and biases, respectively, whereas i, c̃, f , c, o are referred to
block of a Diffusion Convolutional Layer of Neural Network
as input, input modulation, forget, cell and output gates and
and θ learnt using common training approaches (e.g., back-
jointly perform operations to make the LSTM able select the
propagation). Specifically, this layer can be trained to map
information to remember and to forget from the input data.
the feature matrix X ∈ RN XP to an output H ∈ RN XQ as
Finally, the hidden state h encodes what the LSTM unit retains
follows:
about past observations and, along with x, is successively used
as input data.
P
Many variants of LSTM units have been proposed in
X
the literature, and the above formulation only refers to its H:,q = σ X:,p ∗G fΘq,p,:,: , ∀q ∈ {1, ..., Q} (9)
p=1
most common implementation. For example, the RNN Gated
Recurrent Units (GRU) is a widely-used and simplified version where Θ ∈ RQXP XKX2 is the tensor of the trainable
of the LSTM, which is used for example in the recently- parameters. By replacing the matrix multiplications described
proposed Diffusion Convolutional Recurrent Neural Network in Section III-B with the diffusion convolutional operation,
(DCRNN). the RNN unit becomes the Diffusion Convolutional Gated
C. Diffusion Convolutional Recurrent Neural Network Recurrent Unit (DCGRU) [3]. For the sake of precision, the
authors of [3] present a modified version of the RNN GRU
Machine Learning algorithms have been originally thought mentioned in Section III-B and formally described by the
to learn models of data defined on Euclidean domains and following equations:
their application to other types of data, such as graphs, is
not straightforward [10]. Specifically, a graph is defined as r(t) = σ(Θr ∗G [X(t), H(t − 1)] + br ) (10)
the pair G = (V, E), where V is the set of nodes and E is
the set of edges. If the graph is characterized by attributes
(e.g., properties of nodes and edges), G can be alternatively
described as (X ∈ RN XP , W ∈ RN XN ), where N is the C(t) = tanh(ΘC ∗G [X(t), (r(t) H(t − 1))] + bc ) (11)
number of nodes and P the number of their attributes (i.e.,
features). X is the feature matrix and W is a weighted matrix u(t) = σ(Θu ∗G [X(t), H(t − 1)] + bu ) (12)
that encodes the relations among the nodes, e.g., the adjacency
matrix of the graph.
Traditional ML algorithms (e.g., LSTM or CNN) can easily H(t) = u(t) H(t − 1) + (1 − u(t)) C(t) (13)
process X, but fall short in including the information encoded
in W. Recent approaches proposed in the literature aim to where is the element-wise tensor multiplication. r, u and
enrich the feature matrix with this relational information. In C are referred to as reset, update and cell gates respectively

248
2019 IEEE INFOCOM WKSHPS: NI 2019: Network Intelligence: Machine Learning for Networking

and perform similar operations of the gates described in V. E XPERIMENTS

Section III-B. Θr , Θu , ΘC are the parameters of the kernels A. Experiment Setup
learnt during the training (along with their relative biases br ,
bu , bC ). X(t) (resp., H(t)) is the input (resp., the output) of The objective of this work is to evaluate the ability of
the model at time t. different deep learning architectures to forecast the traffic load
on the links of a backbone network. In the following Sections,
IV. T HE PROPOSED FORECASTING APPROACH we provide details about the considered network and about the
In this work, we employ a deep learning approach to baseline methods that we use for comparison.
perform the network traffic forecasting task. Specifically, 1) Dataset Preprocessing: We consider the backbone Abi-
our objective is to predict the load on network links given lene network, of which several information are public1 . For
historical records of traffic loads. example, its topology (characterized by 12 nodes and 30
unidirectional links connecting them) and some statistics of
A. Problem Statement a trace of real traffic crossing it is available. Specifically, we
We consider a telecom backbone network composed of a know the volume of traffic, aggregated over slots of 5 minutes,
set of nodes and a set of links connecting them. The traffic that is exchanged between each pair of nodes starting from
exchanged among the nodes is assumed to be routed according March 1st 2004 to September 10th 2004.
to the shortest path. The resulting traffic load measured on Assuming that traffic is routed considering the shortest
the network at time t can be represented as a matrix X(t) ∈ path between two nodes, we can compute the traffic load
RM X1
≥0 , where M is the number of links of the network. on the links at each time slot. From this data, we have
Given the sequence of traffic loads measured during the derived another dataset that gives information about the traffic
previous T time slots, it is possible to forecast the load at load on each network link aggregated over slots of 1 hour,
time t + 1, ∀ links l ∈ {1, ..., M }, i.e., X(t+1) , using well- which results in 4000 vectors with 30 components. These data
known machine learning techniques (e.g., a LSTM network). are arranged in cronological order and grouped together in
However, whilst the topological properties of the network play sequences of 10 vectors, which are used as input of the ML
a significant role in the diffusion of the traffic (e.g., because algorithm. The output (i.e., the next value of the sequence to
they constrain the traffic to flow only on existing paths), they predict) is a single vector obtained by applying a shift to the
are not easy to consider using the common approaches. corresponding sequence. Then, starting from the topology of
Instead, we employ an existing deep-learning architecture the Abilene network, we have obtained a 30X30 adjacency
described in Section IV-B and based on the DCGRU described matrix representing the graph whose nodes are univocally
in Section III-C that is specifically designed to take advantage associated with the network links and the edges encode the
from the topological properties of graphs. To exploit this relation among them (i.e., an edge exists iff the corresponding
additional information, we represent the traffic crossing the links are connected in Abilene).
network as a directed graph G that can be described by the 2) Deep Learning Architectures and Training Methodol-
matrix X(t) ∈ RM ≥0
X1
(which encodes the attributes of the M ogy: We consider a DCRNN architecture composed of two
nodes, i.e., the load for each link of the telecommunication layers with 4 DCGRU units each. The first layer acts as
network) and by its adjacency matrix W, where wij = 1 encoder and the second as the decoder. We compare the
iff li and lj are connected, and 0 otherwise, which encodes DCRNN with the following baseline: a LSTM-based network,
the relation between the nodes. The forecasting problem is a CNN-based network, a CNN-LSTM-based network and a
formulated as follows: Fully-Connected Neural Network. The analysis of the hyper-
parameters, which we omit in this paper, led to select archi-
X(t+1) = F W, X(t−T ) , ..., X(t)

(14) tectures with the following characteristics:
where F is the estimator that we learn by employing the • The LSTM-based network is composed of 5 recurrent

architecture described in the following. layers with 20 LSTM units each

• The CNN-based network is composed of 1 layer that
B. DCRNN for Network Traffic Prediction implements the convolution using 32 kernels of size 2
The deep-learning architecture proposed in [3] belongs to • The CNN-LSTM-based network is composed of 1 recur-
the family of Sequence-to-Sequence deep-learning architec- rent layer of 20 LSTM units stacked on top of a CNN
tures [14], which are characterized by an encoder and a de- layer (with 16 kernels of size 2)
coder. The former learns a map between the input (which can • The Fully-Connected Neural Network is composed of
be a sequence of unknown length) and a fixed-sized encoding 3 layers of 30, 20 and 10 units that apply a sigmoid
vector. The latter learns how to map the encoding vector to the operation to their input
output sequence. Encoder and decoder perform symmetrical The sequences described in the previous Section are taken
operations and are composed by the same (arbitrary) number in cronological order and divided such that 70% is used
of layers. In the architecture described in [3] and employed for training, 20% for validation, and the remaining 10% for
in our link load forecasting task, each layer is composed of
U DCGRU units described in Section III-C. 1 http://sndlib.zib.de/home.action

249
2019 IEEE INFOCOM WKSHPS: NI 2019: Network Intelligence: Machine Learning for Networking

TABLE I
C OMPARISON OF THE DEEP LEARNING ARCHITECTURES CONSIDERING THEIR ABILITY TO PERFORM THE FORECAST OF THE NEXT TRAFFIC LOAD

MAPE MAE (Mbit/s) RMSE (Mbit/s) Convergence Epoch Convergence Time (sec)
DCRNN 43.2% 92.5 497.1 225 525.1
LSTM 210.34% 142.43 525.21 87 19.83
CNN 234.75% 121.32 506.55 252 9.82
CNN-LSTM 248.16% 127.18 512.91 240 5.76
Fully-Connected 220.75% 138.24 522.65 201 3.14

testing. The training of the ML architectures is performed to TABLE II

minimize the Mean Absolute Error (MAE) between predic- P ERCENTILE OF THE LOAD MEASURED ON LINKS FOR THE TEST SET

tions and ground-truth and stops when no improvement on 25% 50% 75% 100%
the validation set is noticed for at least 50 training epochs. Traffic On Links (Mbits/s) 59.67 180.33 389.41 5929.52
The traning is performed using the Adam optimizer [15] with
initial learning rate set to 0.01. In the following Section, we
describe the results derived by averaging the results obtained baselines. We assume that a congestion occurs on a link if the
in 100 simulations. traffic load is above a threshold that is directly proportional (of
a factor α) to the average amount of traffic observed on that
B. Experiment Results link. In this way, we perform a fair comparison that takes into
The first set of experiments evaluates the employed method consideration the different patterns of link load that Abilene
considering the Mean Absolute Percentage Error (MAPE), the presents. The evaluation is done considering the following
Mean Absolute Error (MAE) and the Root Mean Squared metrics: percentage of false positives, false negatives, true
Error (RMSE), as well as metrics related to the speed of positives and true negatives, from which we derive precision,
convergence, i.e., number of epochs and time at which training accuracy, recall and F-score.
is interrupted due to an early stopping event. In Table III we show the results obtained with α = 3 (i.e., a
The results of the evaluation are summarized in Table link is congested when the volume of traffic is above 3 times
I, where it is possible to notice how the DCRNN method the average load). The DCRNN outperforms the baselines
significantly outperforms the baselines in MAPE, MAE and for all the considered metrics. In particular, the precision
RMSE. In particular, the MAPE drops from ∼ 210% obtained (i.e., the percentage of congestion predictions that are actually
with the LSTM-based architecture to ∼ 43% by using the congestion events) is increased of up to 25% with respect the
DCRNN. We notice also an improvement with respect to the the best baseline (i.e., the LSTM-based architecture).
best MAE and RMSE (both obtained with the CNN-based As far as the congestion prediction task is concerned,
architecture) which decrease from ∼ 121 to ∼ 92 Mbit/s and the recall is the percentage of congestion events that are
from ∼ 506 to ∼ 497 Mbit/s, respectively. correctly predicted, whereas the accuracy is the percentage
The improvement of the MAE of ∼ 30 Mbit/s with respect of correct predictions (being they referred to congestion or
to the best baseline is significant considering an average traffic normal loads). Hence, they both give essential indications to
on links of ∼ 301 Mbit/s and that 50% of the measured a network operator that takes decision based on the likelihood
loads are below 180 Mbit/s (see Table II, where we show that congestion will (or will not) occur.
several values of the percentile of the load on links). The We depict the accuracy and the recall in Fig. 1(a) and Fig.
improvement of the RMSE is the least impressive. This result 1(b), respectively, as a function of the threshold congestion
can be explained saying that the DCRNN performs in general expressed by α ∈ [1.5, ..., 5]. We notice that the DCRNN
a better prediction of the next link loads (as indicated by always outperforms the baselines also in this task. Increasing
the remarkable decrease of the MAPE), but it hardly predicts α maens to limit the congestion events only to traffic volumes
sudden high peaks (i.e., burst events). We do not consider this that are significantly higher than the average link load. This
a limitation of the model, since the prediction of this type of has two opposite effects on the accuracy and on the recall. In
event is essentialy not possible. As for the convergence speed, fact, the accuracy increases as a consequence of the increased
the time needed to train the DCRNN (i.e., ∼ 512sec) is one number of non-congestion events, which positively affects the
order of magnitude higher than the LSTM-based architecture, number of correct classifications. Conversely, the recall shows
which presents the most time-consuming training process a general decrease with increasing α. This can be explained
among the baselines (i.e., ∼ 19sec). We underline that the considering that the models hardly predict very high and
forecasting process introduces a negligible delay for all the sudden peaks, as already discussed in relation to the RMSE.
considered models. α = 5 represents the hardest conditions to detect a con-
A straightforward application of a reliable estimator of gestion event. In this scenario, in fact, the DCRNN reaches
traffic load is the early detection of congestion events. In a recall of ∼ 34%, which means that 56% of the actual
the second set of experiements, we assess the ability of our congestions are not detected. Notice, however, that this result
approach to perform this task and we compare it with the is still 32% higher than the recall obtained by the best baseline

250
2019 IEEE INFOCOM WKSHPS: NI 2019: Network Intelligence: Machine Learning for Networking

95
40
Accuracy %

Recall %
90
CNN 20 CNN
LSTM LSTM
Fully-Connected Fully-Connected
85 CNN-LSTM CNN-LSTM
DCRNN 0 DCRNN

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Threshold of congestion Threshold of congestion
(a) Accuracy of the effectiveness to detect a congestion event (b) Recall of the effectiveness to detect a congestion event

Fig. 1. Comparison of all the methods considering the ability to detect a congestion event in terms of Accuracy and Recall

TABLE III
C OMPARISON OF THE DEEP LEARNING ARCHITECTURES CONSIDERING THEIR ABILITY TO DETECT A CONGESTION EVENT WHEN THRESHOLD FACTOR
α=3

TP TN FP FN Accuracy Precision Recall F-score

DCRNN 1,97 94,70 0,93 2,40 96,67 67,93 45,01 54.14
LSTM 1,14 93,64 1,92 3,03 95,05 42,37 31,80 36,33
CNN 1,58 93,15 2,40 2,85 94,74 41.86 35,67 37,85
CNN-LSTM 1,36 93,57 1,98 3,08 94,93 40,71 30.70 34,93
Fully-Connected 1,15 93,44 2.11 0.029 94,91 41,31 32,94 36,45

(i.e., the CNN-based architecture). [2] R. Alvizu, S. Troia, G. Maier, and A. Pattavina, “Matheuristic with
machine-learning-based prediction for software-defined mobile metro-
VI. C ONCLUSIONS core networks,” Journal of Optical Communications and Networking,
vol. 9, no. 9, pp. D19–D30, 2017.
In this work, we employ an existing graph-based machine [3] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent
learning algorithm (i.e., the DCRNN) to forecast the next neural network: Data-driven traffic forecasting,” 2018.
[4] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
traffic load on the links of the backbone telecom network Abi- computation, vol. 9, no. 8, pp. 1735–1780, 1997.
lene. The main novelty of this appraoch is the ability to learn [5] A. Azzouni and et al, “Neutm: A neural network-based framework for
a representation of the telecom network that considers both traffic matrix prediction in sdn,” CoRR, vol. abs/1710.06799, 2017.
[6] Y. Liu and et al, “Short-term traffic flow prediction with conv-lstm,” in
the features (i.e., the load on the links) and the topological Wireless Communications and Signal Processing (WCSP), 2017. IEEE,
relations among them (i.e., if the links are connected or not). 2017, pp. 1–6.
The DCRNN is compared to the baselines (e.g., LSTM and [7] Q. Zhuo and et al, “Long short-term memory neural network for network
traffic prediction,” in ISKE. IEEE, 2017, pp. 1–6.
CNN) considering the effectiveness of the forecasting and the [8] X. Cao and et al, “Interactive temporal recurrent convolution network
ability to detect congestion events. For example, a reduction of for traffic prediction in data centers,” IEEE Access, vol. 6, pp. 5276–
the MAPE from 210% to 43% is observed. These promising 5289, 2018.
[9] P. Bermolen and D. Rossi, “Support vector regression for link load
results suggest that the forecasting of events within a telecom prediction,” Computer Networks, vol. 53, no. 2, pp. 191–201, 2009.
network may significantly benefit from using ML approaches [10] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst,
explicitely-designed to capture, along with the properties of “Geometric deep learning: going beyond euclidean data,” IEEE Signal
Processing Magazine, vol. 34, no. 4, pp. 18–42, 2017.
the events themselves, also the structure of the network. [11] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
VII. ACKNOWLEDGEMENTS [12] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural
networks on graphs with fast localized spectral filtering,” in Advances
The work leading to these results has been supported by in Neural Information Processing Systems, 2016, pp. 3844–3852.
the European Community under grant agreement no. 761727 [13] X. Wang, C. Chen, Y. Min, J. He, B. Yang, and Y. Zhang, “Efficient
Metro-Haul project and by the EU FP7 ERANET program metropolitan traffic prediction based on graph recurrent neural network,”
arXiv preprint arXiv:1811.00740, 2018.
under grant CHIST-ERA-2016 UPRISE-IOT. [14] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning
with neural networks,” in Advances in neural information processing
R EFERENCES systems, 2014, pp. 3104–3112.
[1] M. Joshi and T. H. Hadi, “A review of network traffic analysis and [15] D. P. Kingma and et al, “Adam: A method for stochastic optimization,”
prediction techniques,” CoRR, vol. abs/1507.05722, 2015. [Online]. arXiv preprint arXiv:1412.6980, 2014.
Available: http://arxiv.org/abs/1507.05722