Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Forecasting of Nonlinear Time Series Using Ann: Sciencedirect

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Available online at www.sciencedirect.

com

ScienceDirect
Future Computing and Informatics Journal 2 (2017) 39e47
http://www.journals.elsevier.com/future-computing-and-informatics-journal/

Forecasting of nonlinear time series using ANN


Ahmed Tealab a,*, Hesham Hefny a, Amr Badr b
a
Computer Science Department, Institute of Statistical Studies and Research, Cairo University, Giza, Egypt
b
Computer Science Department, Faculty of Computers and Information, Cairo University, Giza, Egypt
Received 19 March 2017; accepted 14 May 2017
Available online 3 July 2017

Abstract

When forecasting time series, it is important to classify them according linearity behavior that the linear time series remains at the forefront
of academic and applied research, it has often been found that simple linear time series models usually leave certain aspects of economic and
financial data unexplained. The dynamic behavior of most of the time series in our real life with its autoregressive and inherited moving average
terms issue the challenge to forecast nonlinear times series that contain inherited moving average terms using computational intelligence
methodologies such as neural networks. It is rare to find studies that concentrate on forecasting nonlinear times series that contain moving
average terms. In this study, we demonstrate that the common neural networks are not efficient for recognizing the behavior of nonlinear or
dynamic time series which has moving average terms and hence low forecasting capability. This leads to the importance of formulating new
models of neural networks such as Deep Learning neural networks with or without hybrid methodologies such as Fuzzy Logic.
2017 Faculty of Computers and Information Technology, Future University in Egypt. Production and hosting by Elsevier B.V. This is an open
access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Keywords: Forecasting; Nonlinear time series; Neural networks; Moving averages

1. Introduction With regard to the ANN, it is that its theory is very wide,
and it has been applied in modeling and forecasting data from
Although the forecasting of time series has generally been different knowledge areas [1e3,8e14]; however, in the liter-
made under the assumption of linearity, which has promoted ature there is a large part of the proposed ANN models that are
the study and use of linear models such as the autoregressive exclusively based on a nonlinear autoregressive structure, and
(AR), Moving Averages (MA), autoregressive moving aver- only a few of them considered the generating process of the
ages (ARMA) and autoregressive integrated moving averages nonlinear time series that has in addition to the autoregressive,
(ARIMA) [1.2], it has been found that in reality the systems a moving averages component. To address this case, some
often have unknown nonlinear structure [3]. To address this authors suggest using the neural network NARMA and the
problem type, several nonlinear models have been proposed, autoregressive neural network ARNN of high order; in Refs.
such as the bilinear models, autoregressive conditional heter- [15,16] present such specific cases.
oskedasticity (ARCH) and its extensions, smooth transition However, in reviewing the relevant literature finds that:
autoregressive (STAR), nonlinear autoregressive (NAR),
wavelet networks and artificial neural networks (ANN) [1e7]. The theory of NARMA ( p,q) model considers that the
process of data generation corresponds to a nonlinear
structure with both autoregressive and moving average
* Corresponding author.
components; which is done by ignoring the autoregressive
E-mail addresses: a.tech.gouda@gmail.com (A. Tealab), hehefny@ieee.
org (H. Hefny), a.badr.fci@gmail.com (A. Badr). component (making p 0) to obtain a nonlinear model of
Peer review under responsibility of Faculty of Computers and Information moving averages (NLMA); however, in the literature there
Technology, Future University in Egypt.

http://dx.doi.org/10.1016/j.fcij.2017.05.001
2314-7288/ 2017 Faculty of Computers and Information Technology, Future University in Egypt. Production and hosting by Elsevier B.V. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
40 A. Tealab et al. / Future Computing and Informatics Journal 2 (2017) 39-47
are no studies that examine the capability of forecasting of Nonlinear integrated moving average by Engle and Smith
NARMA (0,q) when it is applied in a nonlinear time series [20].
that present an inherent MA component.
There is no evidence, reported that a nonlinear MA model Different to the nonlinear autoregressive model (NAR), the
can be approximated by a nonlinear infinite order AR NLMA model has been little explored, both empirically and
model, like what happens in the case of linear models theoretically. This is due, in part to the difficulty to establish
when they meet certain invertibility conditions. the invertibility property model [21]; that property refers to the
possibility of rebuilding innovations t from the observations
The objective of this research is to answer the research yt, assuming that the true model is known. However, Chen and
questions presented below in order to clarify the above gaps: Wang [22] reached that the NLMA model can become locally
invertible; that can be done by set the initial conditions that
1. Can a nonlinear high order AR model, represented by allow the innovations reconstruction asymptotically from the
ARNN network, be well approximated to nonlinear observations.
reduced order MA model? Mean Absolute Percentage Error (MAPE) is calculated
2. When in a NARMA that assumes there is no autoregressive using the absolute error in each period divided by the observed
process, can be predicted adequately a nonlinear time series values that are apparent for that period. Then, averaging those
containing inherent moving averages components? absolute percentages. This approach is useful when the size or
size of a prediction variable is important in evaluating the
These questions will be resolved on the basis of the accuracy of a prediction. MAPE indicates how much error in
approach of the invertibility of the nonlinear MA models and predicting compared with real value.
the use of experimental data simulations.
The importance and originality of this work is based on the 3. Neural networks models associated with moving
fact that to date there is no evidence in the reviewed literature averages components
of studies that analyze and identify the problems that arises
when modeling and forecasting time series with inherent MA Mathematically, a neuron is a nonlinear function, bounded
and parameterized in the form [23]:
components using neural networks. The article is organized as
follows: in sections 2 and 3 present the nonlinear MA model, o f x1 ; x2 ; ; xn ; u1 ; u2 ; :; up f x; u
and the NARMA and NAR neural networks, respectively.
where:
Subsequently, in section 4, it shows the methodology used and
the results obtained to assess the capacity of these networks to
x x1 ; x 2 ; ; xn is the entry vector of variables into the
predict nonlinear time series with MA component. Section 5
neuron.
presents the obtained results, while in section 6 provides an-
u u1 ; u2 ; ; u p is the weight (parameters) vector
swers to the research questions raised. Finally, it is the
associated with the inputs of the neuron.
conclusion in section 7.
f $ is a nonlinear activation function.
2. Nonlinear moving average model
In turn, an artificial neural network is defined as a
composition of nonlinear functions of the form:
In the nonlinear moving average model of order q, denoted
as NLMA (q), the current value of the time series, yt, is a y g1 +g2 ++gN f1 x; u; f2 x; u; fp x; u
nonlinear function known as h(.) of the q past innovations
{t 1, $ $ $, t q} and the current innovation t . where:
This is:
y is the response variable or output of the artificial neural
yt t h t 1 ; //; t q ; q ; t 1; 2; 3; 1
network.
where q represents the parameters vector of function (.) and y g1 for i 1,.,N are nonlinear functions.
{t} is a sequence of independent random variables which are fj x; u for j 1,,p are functions defined as in (1).
identically distributed, centered at zero and with constant N represents the number of hidden layers in the network.
variance. p denotes the number of neurons in the hidden layers.
Depending on the form of the function (), the following The symbol + between functions indicates the operation
NLMA models have been proposed: composition.

Polynomial moving averages proposed by Robinson [17]. The neural networks, according to its architecture and
Asymmetric moving averages proposed by Brannas and interconnection between neurons, can be classified into two
Ohlsson [18]. classes: feed-forward networks and feed-back (recurrent)
Nonlinear response moving averages with long scope networks. The feed-forward network, also known as static,
proposed by Robinson and Zaffaroni [19]. constitutes a nonlinear function of their entries and is
A. Tealab et al. / Future Computing and Informatics Journal 2 (2017) 39e47 41
represented as a set of interconnected neurons, in which in- where f() function is the activation function and (b0,
formation flows only in the forward direction, from inputs to b1, bq, a1,, aq, 11,, qn) is the parameters vector.
outputs. Specifically, in Ref. [24] a feed-forward network
model, with a single output neuron and q hidden layers, is 3.2. Recurrent neural network NARMA
defined as follows:
!! A generalized linear model ARMA in the nonlinear case is
X q X
n
given by
ot b0 bi J a i uij xj;t : f xt ; q 2
i1 j1
yt h yt 1 ; :; yt p ; t 1 ; ; t q t
where
where () is a known nonlinear function and { t} is defined
Ot is the estimator of the target variable yt. as in (3). This model is called NARMA ( p,q). Since the
xt (x1,t,.,xn,t) are input variables in time measures t. sequence t 1,, t q is not directly observable, then you
must find one ^yt using recursive estimation algorithm that
F () and J () are the activation functions of the neural considers the following calculations:
network.
(b0, b1, bq, a1,, aq, 11,, qn) represents the ybt h yt 1 ; :; yt p ; b
t 1 ; ; b
t q 5
parameters vector of the neural network, which is calcu-
lated based on the minimization of the sum of squared b
j yt 1 byj; j t 1; ; t q 6
differences
under appropriate initial conditions [16]. By considering the
X
n
approximation in (5) and (6) the recurrent neural network
yt ^ot 2 model NARMA ( p,q) can be expressed using the recurrent
t1
network:
!
It is noteworthy that the kind of neural networks is more X
h X p pq

studied and applied in the literature, mainly due to they are a y t a0 aj g b0j bij yt i bij b t p i 7
j1 i1 ip1
universal function approximator [25e27]; and moreover, in
practice they are more simple networks in their implementa- where b
tp i yt p i Yb tp i .
tion and simulation. Meanwhile, the feed-back network, also
known as dynamic or recurrent, its architecture is character- By observing the mathematical formulation of the model
ized by cycles: the outputs of the neurons in a layer can be (7), it can be considered as an alternative to a nonlinear time
inputs to the same neuron or inputs to neurons of previous series model with an inherent moving averages component is
layers. For more information of this type of network is sug- to use a NARMA (0,q) model. This observation will be dis-
gested to check [23] and [28]. Below are described special cussed in the following section.
cases of these types of networks: the autoregressive neural
network ARNN, which is of type feed-forward and recurrent 4. Used methodology
neural network NARMA.
The evaluation of the ability to forecast NARMA ( p,q)
3.1. Autoregressive neural network (ARNN) neural networks models and ARNN ( p) was performed using
two sets of experimental data from the models described in
The nonlinear autoregressive model of order p, NAR ( p), Table 1. In Model 1{ t} is defined as in (3), and corresponds to
defined as: the NLMA (2) model reviewed by Zhang et al. [29]. On the
other hand the Model 2 was considered by Burges and Refenes
yt h yt 1 ; :; yt p t 3 [15] to illustrate the use of neural networks with feed-back
is a direct generalization of linear AR model, where () is a error under the expectationemaximization, EM variant algo-
rithm in the training process.
nonlinear known function. It is assumed that { t} is a sequence Note that the two models do not contain autoregressive
of random independent variables and identically distributed terms (do not consider past yt values), also correspond to
2
with zero mean and finite variance s . different levels of complexity of the function () defined in
The autoregressive
forward neurala nonlinear
network constitutes network (ARNN), is a feed-
approximation (), (1). 100 time series were generated from each model. Of
which is defined as: which, in each series generated, the first observations were

Table 1
ybt h yt 1 ; ; yt p ! Data generation models.
X I X
p
4 Model Model structure
y t b0 b j f ai uij yt j
i1 j1
1 yt t 0.3 t 1 0.2 t 2 0.4 t 1 t 2
2 yt t 0.5 t 1 06 t 1 t 2

42 1 2
A. Tealab0 et al.0 / Future Computing and Informatics Journal 2 (2017) 39e47

in the Model 2 that 0 and y rand(), where


rand() return a standard uniform random number.
used to estimate the model parameters and the remaining were The experiments focused on two aspects: (i) Analysis the
used as validation set. In Fig. 1 of the series Model 1 is plotted ability to capture all the nonlinear moving averages process
with n 360 observations. In the data generation process, using a recurrent neural network NARMA (0,q) or ARNN( p)
different beginnings with random sampled distribution N(0; with large enough p, (for which the Model 1 was used), and
1,5) are used with the error term of model 1, and it is assumed (ii) compare results obtained with any of the networks
considered in this work with those found in the literature to
NLMA model processes. In this case the Model 2 was used,
and compared the Burges and Sayings results [15] obtained by
a ARNN( p) network. In that case, the methodology used for
each model has some distinctive aspects:
Model 1:

Different sample sizes are considered for n {100; 200;


360} and data rates for network training (50, 65 and 80), to
examine the effect of their election on the predicted values. The objective function was to minimize the normalized
For the ARNN model values were examined with large lags means square error (NMSE).
of p {10; 15; 25; 50; 100} for the purpose to answer the All networks used were of one hidden layer with four
first research question. neurons.
The following lags values p {10; 25; 50} were
considered.
100 additional data were generated, and were taken as test
data.

In both models the activation function used was the logis-


tics, for each training, initial weights and biases of the network
were generated from a continuous uniform distribution in the
range ( 5; 5). Also, the choice of the best model was per-
formed by taking into account the 100 series and different
configurations of the network, under the cross-validation
procedure suggested by Zemouri et al. [30], namely:

1. Made from i 1 to M 1000 times from different


starting points:
Train the network using the training data.
Validate the trained network using the n.val validation
data. Calculate the forecasting mean error E(i) and
standard deviation std(i) on the validation set:
The network structure was considered to be used based on 1 X n:val
Ei yj y j
9 b
the results found by Zhang et al. [29], who via simulation n:val j1
show that the best network structure corresponds to a hidden
layer with a maximum of two neurons. The objective 1 X n:val
2
function was minimizing the mean square error (MSE). stdi y yj 10
n:val j1 j
In the case of NARMA model, in addition to the structure of
previous network, the following settings for the moving
averages process were considered that p {1; 2; 3; 4; 5; 6; 2. Calculate the following measures to evaluate the fore-
7; 8; 9; 10}. casting performance P of the network:
M
A set of 150 additional observations was generated and used M1 E 1=M i1 Ei: It corresponds to an esti-
as test data. mate of the average of the overall forecasting mean
errors, and evaluates the proximity between the pre-
dicted and actual values. If M1 0, then probability
Model 2: It is considered the same experimental conditions
that the forecasting is centered on the actual data is
employed by Burges and Refenes [15] in order to be able to
very high.
compare the results: PM
M2 std 1=M i1 stdi: It is used for measuring
The size of the series was 400 observations, of which the forecasts accuracy (in terms of variability). The ideal
initial 70% is used to train the network and the remaining value is M2 0, because it indicates that there is a
30% for validation. significant probability that the predicted values are
not scattered (i.e.; they have low variability).
q
PM
2
M3 1=M
q
i E
1 Ei
PM
1=M i1 stdi std 2 =2. It is used to indicate
whether the training process of the network is
repeatable (in which case M 3), so that you always
get the same structure of the neural network in each
run of the training process, regardless of the initial
values.
M4 1=M1 M2 M3. It is to examine the accu-
racy of the forecast. If the outputs of the network are
very close to the actual values, then the measures M1,
Fig. 1. Example of time series generated by Model 1. M2 and M3 are close to zero, and in that case M4 will
43 A. T
Ae.aTlaebalaebt aelt. a/l.Fu/tF
uuretuC
reom
Copm
utpinugtianngdanIndfoInrm
foarm
ticastiJcosuJronuarln2al (20(1270)1379)e
394e
7 47 43

take very large values, so that M4 >> 0 is the ideal


value to have forecasts confidence.
3. Perform the verification using the test data: Select the best
candidate network as it having the higher M4 value and
lower M1, M2, and M3 values on the validation set. This
will avoid over-fitting and under-fitting problems. Finally,
the M is made for that network is reached and the model
with the lowest E(i) is selected.
4. Perform data verification test: calculate E(i) and std(i) for
each selected configuration (one for each series in ques-
tion). Choose the model that provides lower E(i). Fig. 4. Performance measures for the ARNN model with n 360, p {10, 15,
25, 50, 100} and training percentage (50, 65, 80).
The above measures were used to validate the accuracy of
the results obtained from the network under study.

5. Results

The obtained results are presented below for each consid-


ered model.

5.1. Model 1

Figs. 2e4 show the values obtained for the measures


M1eM4 on the validation set of ARNN network for each
sample size under different numbers and considered training Fig. 5. E(i) and std(i) measures in the validation set for ARNN model, ac-
cording to the sample size, lags and training percentage.
lags percentages. In turn Fig. 5 contains the values of per-
formance measures E(i) and std(i) obtained in the validation
the test data under nine considered scenarios and the values of
set. Table 2 shows the results found for the ARNN network,
large lags p {10; fifteen; 25; 50; 100}. The first column
contains the sample size, the second number of lags p, and the
last three columns show the measures E(i) and std(i) values

Table 2
Performance measures for the ARNN model with the test data.
n p Measure Percentage of training
50 65 80
100 10 E(i) 2.334 0.1958 0.2934
std(i) 1.5836 1.7273 1.7324
15 E(i) 0.7113 0.3311 0.3016
std(i) 1.8923 2.2368 1.5623
25 E(i) 0.3702 0.316 *
std(i) 1.5233 1.5139 *
Fig. 2. Performance measures for the ARNN model with n 100, p {10, 15,
200 10 E(i) 0.6903 0.339 0.2041
25} and training percentage (50, 65, 80).
std(i) 1.7291 1.5689 1.6601
15 E(i) 0.3939 0.3065 0.3379
std(i) 1.6468 1.6451 1.7154
25 E(i) 0.3945 0.1167 0.129
std(i) 1.5177 1.5716 1.5575
50 E(i) 0.5299 0.2362 *
std(i) 1.756 1.4851 *
360 10 E(i) 0.3284 0.299 0.346
std(i) 1.6458 1.7647 1.5909
15 E(i) 0.3678 0.371 0.2601
std(i) 1.559 1.5777 1.5391
25 E(i) 0.1201 0.123 0.2232
std(i) 1.5785 1.5136 1.6092
50 E(i) 0.1744 0.1713 0.1388
std(i) 1.2746 1.3208 1.2823
100 E(i) 0.2222 0.05965 *
Fig. 3. Performance measures for the ARNN model with n 200, p {10, 15,
25, 50} and training percentage (50, 65, 80). std(i) 1.06824 1.01172 *
44 A. T
Ae.aTlaebalaebt aelt. a/l.Fu/tF
uuretuC
reom
Copm
utpinugtianngdanIndfoInrm
foarm
ticastiJcosuJronuarln2al (20(1270)1379)e
394e
7 47 44

the largest set of training; which leads to expect that the use of
ARNN networks to forecast series with inherent MA compo-
nent, tends to suffer from parameterization problems. This is
confirmed by examining the behavior of the MSE according to
the number of lags and layers of the network. It was observed
that in the way of increasing the order of the nonlinear AR
model, the MSE tends to decrease regardless of the nodes
considered; however, minors MSE is obtained when consid-
ering the network with two nodes in the hidden layer (see
Fig. 6).
The best result found for the ARNN network (in terms of
better measures results on the test data) was obtained when
considering 360 observations, of which 65% were used to train
Fig. 6. Number of lags of the nonlinear model versus the MSE of the ARNN the network with the maximum number of lags (100) and 2
network with one (ARNN1) and two (ARNN2) nodes in the hidden layer. nodes in the hidden layer. However, it is not able to capture all
the nonlinear process of moving averages (see graphic (a) in
Fig. 7).
found for the test percentage of training sets. In this table, the
Moreover, the results found on the predictive ability of the
* symbol indicates that the value of the lag p is greater than the recurrent neural network NARMA with presence of moving
size of the sample for all validation set, so you cannot examine averages are shown in Table 3 and the graph (b) of Fig. 7.
the ability of forecasting in this group of data. In Table 3, the first column shows the sample size, and the
From Figs. 2e5, it appears that whatever the value of the last three columns shown for each percentage of the following
gap, there is a direct relationship between the percentage of training results: selected configuration (number of lags p and
training and forecast accuracy. Regarding the reproducibility number of nodes in the hidden layer ) measurements values
of the model, it is observed that networks generally adjusted obtained for M1eM4 on the validation set, and the E(i) and
always satisfy this condition. Finally, the greater forecast ac- std(i) values for the whole test and the last three columns show
curacy is obtained by combining the maximum lag allowed to the values obtained from these measurements for each training
the maximum percentage of training and sample size. Note to percentage.
month, that the quality of the forecast, in terms of declining In this table it is concluded that the NARMA network re-
values E(i) and std(i) is better as p / . That in turn makes quires considering large sample sizes to fit models that capable
the overall mean and forecast accuracy converges to their ideal of reduce the forecasts heterogeneity in test set. Likewise, as
values. in the networks ARNN, the percentage of data used for
In addition, Table 2 and Figs. 2e4 follow that the number
training the network has a direct relationship with the accuracy
of lags selected in the final ARNN model depends on the size
of the forecast, for any sample size. It was found that the best
of the series and the percentage of data used for network
outcome for the NARMA network (in terms of the measures
training: for the network to be able to predict adequately, it is
on the test data) was provided considering two nodes in the
necessary to choose the maximum number of lags allowed and
hidden layer, q 2 lags and 360 observations, of which 80

Fig. 7. Comparison between the test data and their found forecasts with the best network (a) ARNN (100) and (b) NARMA (q 2, k 2).
45 A. T
Ae.aTlaebalaebt aelt. a/l.Fu/tF
uuretuC
reom
Copm
utpinugtianngdanIndfoInrm
foarm
ticastiJcosuJronuarln2al (20(1270)1379)e
394e
7 47 45

nTable 3 p Measure Percentage of training Taobdleel5


M M1 M2 M3 M4 E(i) std(i)
Measures of performance for the NARMA model. Performance measures of the NARMA and ARNN models.
50 65 80 ARNN (10) 0.115 1.999 0.134 0.445 0.0394 1.0708
0.2934 ARNN (25) 0.0904 1.852 0.15 0.478 0.00544 1.101
100 10 E(i) 2.334 0.1958
std(i) 1.5836 1.7273
15 E(i) 0.7113 0.3311 1.703.320416 NR
AANRM
NA(50(1)) 0.102.197 12.60074 0.0526756 0.55387 0.0481471 01.18593
std(i) 1.8923 2.2368 NARMA (2) 0.218 1.912 0.202 0.527 0.0672 1.865
1.5623

25 E(i) 0.3702 0.316 * NARMA (3) 0.254 1.211 0.248 0.584 0.0249 1.852
std(i) 1.5233 1.5139 *
200 10 E(i) 0.6903 0.339 0.2041
In Table 4 shows that the NARMA networks adjusted in
std(i) 1.7291 1.5689 1.6601
15 E(i) 0.3939 0.3065 0.3379 this work, for each lag, have lower NMSE in the validation set
std(i) 1.6468 1.6451 1.7154 than their corresponding found by Burges and Refenes [15];
25 E(i) 0.3945 0.1167 0.129 for the case of ARNN networks, that none of them produced
std(i) 1.5177 1.5716 1.5575 (under the validation set) a lower NMSE than best value found
50 E(i) 0.5299 0.2362 *
by the authors.
std(i) 1.756 1.4851 *
360 10 E(i) 0.3284 0.299 0.346 In the second experiment, it evidenced again that the
std(i) 1.6458 1.7647 1.5909 problem of over parameterization of the ARNN networks have
15 E(i) 0.3678 0.371 0.2601 leading to inconsistency observed between NMSE values
std(i) 1.559 1.5777 1.5391 found for the three data sets (see Table 4). Following the
25 E(i) 0.1201 0.123 0.2232
proposed approach by Zemouri et al. [30], the best models are:
std(i) 1.5785 1.5136 1.6092
50 E(i) 0.1744 0.1713 0.1388 ARNN (25) and NARMA (3). Note that there is a consistency
std(i) 1.2746 1.3208 1.2823 to select the best model using NMSE or E(i) measure (ob-
100 E(i) 0.2222 0.05965 * tained for test data).
std(i) 1.06824 1.01172 * However, there is evidence that these models do not have a
good predictive capability, given that in Fig. 8 clouds of points
percent was used to train the network. It is noted that although are far from the 45 line.
the NARMA network is not also capable of capturing all data
behavior with performance of moving averages (see Fig. 7 6. Discussion
graph (b)); it is found that using a lower number of parame-
ters to be estimated has a better performance than the ARNN In this section we answer the raised research questions.
network.
1. Can a nonlinear high order AR model, represented by
5.2. Model 2 ARNN network, be well approximated to nonlinear
reduced order MA model?
Table 4 contains the values of the normalized means square
error (NMSE) found by Burges and Refenes [15] for NARMA In examining whether the ARNN network with a high order
models with 1, 2 and 3 lags (first three rows), and those ob- for the lag p, is capable of approximating a NLMA correctly it
tained in this work by using the ARNN network and high order found that while increasing the number of lags p, the MSE
NARMA network with 1, 2 and 3 lags. The information for training tends to decrease (as shown in Fig. 6) and the mea-
ARNN and NARMA models considered in this table is pre- sures E(i) and std(i) show better results, this fact is not re-
sented in Table 5, which contains information for each model flected in the forecasting capacity of the model (Figure (a) of
of the performance measures suggested by Zemouri et al. [30]. Fig. 7).
The actual values of test versus the best forecasts of the net- It is noteworthy that the forecast ability does not depend
works ARNN AND NARMA are shown in Fig. 8. only on the value of the lag value, but the sample size and the
data percentage used to train the network. The best results are
Table 4 obtained for ARNN networks with larger values of lags
Comparison of results for simulated data model (11). accompanied by large sample sizes, of which a large per-
Model Data of training Data of validation Data of proof centage is used for training. However, keep in mind that this
NARMA (1) [15] 0.813 0.846 NA leads to not adjust parsimonious or short term models and over
NARMA (2) [15] 0.692 0.755 NA parameterization problems.
NARMA (3) [15] 0.689 0.789 NA If in addition to this, it is considered that NLMA model is
ARNN (10) 0.714 0.858 0.0858 not globally invertible, then the answer to the question is
ARNN (25) 0.636 0.864 0.0198
nonlinear autoregressive model (in this case approximated by
ARNN (50) 0.623 0.767 0.139
NARMA (1) 0.743 0.783 0.909 an ARNN network) of a high order is not capable of repre-
NARMA (2) 0.773 0.714 0.876 senting a nonlinear moving averages model (NLMA) of low
NARMA (3) 0.757 0.787 0.855 order.
46 A. T
Ae.aTlaebalaebt aelt. a/l.Fu/tF
uuretuC
reom
Copm
utpinugtianngdanIndfoInrm
foarm
ticastiJcosuJronuarln2al (20(1270)1379)e
394e
7 47 46

Fig. 8. Comparison between the test data and their forecasts found with the network (a) ARNN (25) and (b) NARMA (3).
2. When in a NARMA that assumes there is no autoregressive nonlinear time series with inherent MA component, which can
process, can be predicted adequately a nonlinear time have NARMA as a starting point.
series containing inherent moving averages components?
References
Figs. 7 and 8 and in Tables 2 and 5, it is observed that
although the selected NARMA model has better performance [1] De Gooijer JG, Hyndman RJ. 25 years of time series forecasting. Int J
(in terms of performance measures proposed by Zemouri et al. Forecast 2006;22(3):443e73.
[2] Iacus SM. Statistical data analysis of financial time series and option
[30] approach to get straight 45 ) than the other tested net- pricing in R. Chicago: R/Finance, USA; 2011.
works, the predicted values by this model are far from the [3] Terasvirta T. Forecasting economic variables with nonlinear models.
actual values of the nonlinear series time with MA component. SSE/EFI working paper in economics and finance. Stockholm: Depart-
(See graphs (b) of Figs. 7 and 8). Considering this fact, the ment of Economic Statistics; 2005. p. 598.
answer is that a recurrent network NARMA (0, q) cannot [4] Engle Robert F. Risk and volatility: econometric models and financial
practice. 44 West Fourth Street, New York, NY 10012-1126, USA.: New
adequately predict nonlinear time series containing inherent York University, Department of Finance (Salomon Centre); December 2003.
moving averages components. [5] van Dijk Dick, Medeiros Marcelo C, Terasvirta Timo. Linear models,
However, in testing it was noted that as is the case with smooth transition autoregressions, and neural networks for forecasting
mathematical expressions, practically NARMA network has a macroeconomic time series: a re-examination. Department of Economics
better approach to model NLMA (from the point of view of PUC-Rio, Pontifical Catholic University of Rio de Janeiro Rua Marques
de S^ao Vicente 225-Rio de Janeiro 22453-900, RJ; 2004.
better forecasting capacity measures) than ARNN network. [6] Oscar Jorda Alvaro Escribano. Improved testing and specification of smooth
This indicates that this network can be a good candidate to transition regression models. Universidad Carlos III de Madrid; 1997.
nonlinear data model containing moving averages compo- [7] Robinson Peter M. Modelling memory of economic and financial time
nents, but requires to be studied in detail, and so a new series. London School of Economics and Political Science; 2005.
research question arises: From the theoretical approach point [8] La Rocca Michele, Perna Cira. Model selection for neural network
models: a statistical perspective. In: Emmert-Streib Frank, Matthias
of view, what are the considerations that the recurrent network Dehmer Stefan Pickl, editors. Computational network theory: theoretical
NARMA (0, q) must have so it can predict properly nonlinear foundations and applications. 1st ed. Wiley-VCH Verlag GmbH & Co.
time series containing inherent moving average components? KGaA; 2015.
[9] Pretorius Philip, Sibanda Wilbert. Artificial neural networks e a review
7. Conclusion of applications of neural networks in the modeling of HIV epidemic. Int J
Comput Appl April 2012;44(16).
[10] Qi M, Zhang GP. An investigation of model selection criteria for neural
It is shown that both the recurrent neural network NARMA network time series forecasting. Eur J Oper Res 2001;132(1):666e80.
model and autoregressive neural network ARNN model are [11] Lin CF, Granger CWJ, Terasvirta T. Power of the neural network line-
unable to fully capture the behavior of a nonlinear time series arity test. J Time Ser Anal 1993;14(2):209e23.
containing inherent moving average (MA) component. This [12] Bijari Mehdi, Hejazi Seyed Reza, Khashei Mehdi. Combining seasonal
ARIMA models with computational intelligence techniques for time
raises the need to develop an artificial neural network model or series forecasting. Soft Comput June 2012;16(6):1091e105. Springer.
a hybrid model with Fuzzy Logic to adequately predict
47 A. T
Ae.aTlaebalaebt aelt. a/l.Fu/tF
uuretuC
reom
Copm
utpinugtianngdanIndfoInrm
foarm
ticastiJcosuJronuarln2al (20(1270)1379)e
394e
7 47 47

[13] Qi Min, Zhang Peter G. Neural network forecasting for seasonal and [22] Chen D, Wang H. The stationarity and invertibility of a class of nonlinear
trend time series. Eur J Oper Res February 2005;160(2):501e14. ARMA models. Sci China Math March 2011;54(3):469e78.
[14] Benhra j, El Hassani H, Benkachcha S. Seasonal time series forecasting [23] Haykin Simon O. Neural networks and learning machines. 3rd ed. 2006.
models based on artificial neural network. Int J Comput Appl April 2015; [24] Gencay R, Liu T. Nonlinear modelling and prediction with feedforward
116(20). and recurrent networks. Phys D 1997;108(1e2):119e34.
[15] Burges AN, Refenes A-PN. Modelling non-linear moving average pro- [25] Stinchcombe M, White H, Hornik K. Multilayer feedforward networks
cesses using neural networks with error feedback: an application to are universal approximators. Neural Netw 1989;2(2):359e66.
implied volatility forecasting. Signal Process 1999;74(1):89e99. [26] Stinchcombe M, White H, Homik K. Universal approximation of an
[16] Connor JT, Martin RD. Recurrent neural networks and robust time series unknown mapping and its derivatives using multilayer feedforward net-
prediction. IEEE Trans Neural Netw 1994;5(2):240e53. works. Neural Netw 1990;3(5):551e60.
[17] Robinson PM. The estimation of a nonlinear moving average model. [27] Homik K. Approximation capabilities of multilayer feedforward net-
Stoch Process Their Appl 1977;5(1):81e90. works. Neural Netw 1991;4(2):251.
[18] Brannas Kurt, Ohlsson Henry. Asymmetric time series and temporal [28] Rosa Joao Luis G. Artificial neural networks e models and applications.
aggregation. Rev Econ Stat May 1999;81(2):341e4. 2016.
[19] Robinson PM, Zaffaroni P. Modelling nonlinearity and long memory in [29] Patuwo BE, Hu MY, Zhang GP. A simulation study of artificial neural
time series. Fields Inst Commun 1997;11:161e70. networks for nonlinear time series forecasting. Comput Oper Res 2001;
[20] Engle RF, Smith A. Stochastic permanent breaks. Rev Econ Stat 1999; 28(4):381e96.
81(4):553e74. [30] Gouriveau R, Zerhouni N, Zemouri R. Defining and applying prediction
[21] Kamil Scotto, Manuel Gonzalez, de Zea Bermudez Turkman. Nonlinear performance metrics on a recurrent NARX time series model. Neuro-
time series models. Extreme events and integer value problems. 2014. computing 2010;73(13e15):2506e21.
p. 23e90.

You might also like