Time Series Forecasting Using Backpropagation Neural Networks

Neurocomputing 2 (1990/91) 147-159 147
Elsevier
Time series forecasting using

backpropagation neural networks
F .S . Wong
National University of Singapore, Institute of Systems Science, Kent Ridge . Republic of Singapore 0511
Abstract
Wong, F .S ., Time series forecasting using hackpropagation neural networks, Neurocomputing 2 (1990/91) 147-159 .
This paper describes a neural network approach for time series forecasting . This approach has several significant
advantages over other conventional forecasting methods such as regression and Box-Jcnkins : besides simplicity, another
major advantage is that it does not require any assumption to be made about the underlying function or model to he used .
All it needs are the historical data of the target and those relevant input factors for training the network . In some cases,
even the historical targets alone are sufficient to train the network for forecasting . Once the network is well trained and the
error between the target and the network forecasts has converged to an acceptable level, it is ready for use . The proposed
network has a three-dimensional structure which is proposed for capturing the temporal information contained in the input
time series . Several real applications, including forecasting of electricity load, stock market and interbank interest rate
forecastings were tested with the proposed network and the findings were very encouraging .
Keywords . Backpropagation ; forecasting; prediction ; time series
1 . Introduction values of common stocks, electricity usage, pro-

perty prices, profitability of new ventures, etc .,
A problem that occurs repeatedly among based on various historical and current composite
the business, government and scientific communi- indices .
ties is that of prediction of future sample values Very little research work has been reported in
of a time series . The predicted tracks of air- the literature on using neural networks for time
crafts are often needed for a number of applica- series prediction [7], a large amount of work on
tions like prevention of mid-air collision . Prediction neural network applications is focused on pattern
of the coordinates of aircraft position at some recognition and classification . However, Alan
time in the future on the basis of measured Lapedes and Robert Farber [1] attempted to do
positions provided by a radar sensor can be prediction of time series using a multilayer per-
viewed as a problem in prediction of time series . eeptron, and they found that the neural network
Prediction of sunspot activity can also be gave superior predictions of certain types of
viewed in this light . And, of course, there is nonlinear, dynamical systems than several con-
always a great deal of interest in predicting the ventional methods based on adaptive filtering
tourist arriving pattern, migration and immigra- and polynomial curve-fitting . It is also reported
tion trends, number of passenger registration, in the 1988 DARPA Neural Network Study that
national GNP, manpower and employment, neural network signal processors for time series
bank interest and foreign exchange rates, future prediction applications will likely outperform
0925-2312/91/$03 .50 © 1991-Elsevier Science Publishers B .V

1 48 F.S . Wong
those using conventional techniques only, and floating-point computations to such an extent
costs are also expected to be lower . that supercomputing or parallel processing is
needed to deliver the speed . This is especially so
during the design phase where a large number of
2. The Integrated Neural Network (INN) tests and training of the networks are required .
approach To make the situation worse, integration with
other techniques would give rise to even more
The objective of our work is to use neural substantial increase in the computation load, and
networks as the underlying technique for predic- supercomputing or parallel processing will natur-
tive analysis, and then to integrate them with the ally become a must . On the other hand, the
conventional predictive techniques such as adap- neural network approach is able to add more
tive filtering, spectral analysis, moving averages, flexibility to the system capabilities and leads in
parabolic interpolation, trigonometric curve fit- certain circumstances to much better perform-
ting, multiple regression, autoregession, Box- ance, as had been proven by Lapedes and Farber
Jenkins method [2], etc . Other areas of artificial [1] . As for the speed requirement, a low-cost
intelligence (AI) such as Expert Systems and alternative is to implement the neural network
fuzzy theory are also being investigated for such on a hardware accelerator such as the INMOS
integration . A proposed stock selection system transputer using an appropriate parallel comput-
[8] which integrates neural network and fuzzy ing model (e .g . [6]) .
logic for assessing country risks and rating of In general, a time series can be broken down
stocks is shown in Fig . 1 . into the following four components : secular
The use of neural networks introduces non- trend, cyclical variation, seasonal fluctuation and
linear processing which requires massive, irregular fluctuation [2] . Secular trends are long-
state
Market
Forecast
------------
Country
List
Stock
List
Knowledge Ccnpartg es to Country

R les Rules Rules
Acquisition ~`
ig . 1 . The proposed Intelligent Stock Selection System . 'the FuzzyNet makes use of fuzzy logic and neural network techniques
for processing of fuzzy rules and data .

Time series forecasting 149
term, slow moving, low frequency components ; Sings Pnaiasd Oulpul

cyclical variations usually take a few years to
complete a cycle, and seasonal fluctuations with- Output Layer
in,a year [3] ; irregular fluctuations are random in
nature and usually difficult to predict .
Hidden Layer #2
Using spectral analysis, one can determine the
relative strength of these components for a par-
ticular time series, and extract these components
to extrapolate the future values of that time I 8 I I e Hidden Layer #1
series in the time domain . The Fast Fourier
Transform (FFT) or the Fast Hartley Transform
(FHT) techniques could be used for this part of
the analysis .
As for the irregular, random components, they
n f
1
}}}}}}}}4}
10 I Input Layer
can be further divided into two subcomponents : Constant -r Single Record of Input Data
the deterministic chaotic behavior and the sto- Fig . 2 . The 2-D neural forecast model . The numbers in the
chastic noise . Conventional signal processing layers indicate the number of cells .
techniques, such as correlation function analysis,

cannot distinguish between these two subcompo-
There is no theoretical limit on the number of
nents . In this proposal, a feedforward, back-
hidden layers but typically one or two are suffici-
propagation neural network is proposed to un-
ent to handle most problems ; a network with
cover the underlying, deterministic algorithm of
three hidden layers can solve some very compli-
the chaotic subcomponent . The stochastic sub-
cated pattern classification problems .
component is random and also indeterministic ; In a BP network, each layer is fully connected
however, one can always extract the statistical
to the succeeding layer . The arrows indicate flow
information of a time series from its past values .
of information during computation . There are
The scope of this paper includes the descrip-
two phases in using the network : learning and
tion of the proposed neural network and how it recall . The network can be either hetero-associa-
can he applied to time series forecasting .
tive or auto-associative (to be discussed later) .
The standard BP processing element (also re-
ferred to as cell or node) is illustrated in Fig. 3,
where :
3. Back Propagation Neural Network (BPNN)
3 .1 . Historical perspective and overview
Back propagation (BP) is a technique for solv- outputs

ing the credit assignment problem posed by Min-
sky and Papert in Perceptrons . Rumelhart et al . Transfer Function
[4] are usually associated with the invention of
BP. Parker introduced a similar algorithm about Summation
the same time [5] and others have studied similar

kinds of networks .
weights (Vaneb , set, axaq maeirsme)
Figure 2 shows the schematic diagram of a / '
typical BP network . It always has an input layer, inputs
an output layer, and at least one hidden layer . Fig . 3 . A typical processing cell .

li0 F .S . Wong
x,[s] : current output state of the Ith neuron in differentiable function of all the connection
layer s . weights in the network . The actual error function
is unimportant to understand the mechanism of
w„[s] : weight of connection joining the ith BP. The critical value that is propagated back
neuron in layer (s - 1) to the jth neuron in layer through the network layers is defined by :
S.
e,[s] = nEldll[s] .
II[s] : weighted summation of inputs to the jth
neuron in layer s . It will be shown that this can be considered a
measure of the local error at cell j in layer s .
A BP cell therefore transfers its inputs as Using the chain rule twice in succession gives
follows : rise to a relationship between the local error at a
particular cell at level s and all the local errors at
X, [s ]=f(£i(w/iIs]*xi[S I]))=f(I[s]), the layers s + 1 :
where f is commonly the sigmoid function but el[s] = f (Jl[S]) *!k(ek[S + 1] * Wk)[S + 1]) -
can be any function, preferably differentiable .
The sigmoid function is defined as : If f is the sigmoid function as defined previously,
then its derivative can be expressed as a simple
f(z) _ (1 .0 + e - ') - i . function of itself as follows :
It is shown in Fig . 4 together with three other f'(z) = f(z) * ( L0 - f(z))

commonly used functions .
Therefore the error function can be rewritten as :
3.2 . Backpropagating the local error
e, [s] = e,[s] * (1 .0
Supposing now that the network has a global
error function E associated with it which is a -xj[s])*Zk(ek[s+1]*Wki[s+1]),
Sigmold function :
provided the transfer function is a sigmoid . The
t
same procedure can be used to derive the error
Y(s) =
t+e •s
$ function when the transfer function is a hy-
Jo perbolic tangent or sine .
(also sine function : y(s) = sin (s)
Tanh function : y(s) = tanh (s))
In short, the main mechanism in a BP network
Heaviside function :
is to forward the input through the hidden to the
output layers, determine the error between the
0
actual output and the desired output (target),
V
and then backpropagate the error from the out-
Signum function ;
put to the input through the hidden layer(s) .
3 .3 . Minimizing the global error

Hard-limiting function :
During the learning phase, the objective is to
minimize the global error E by modifying the
0
weights associated with the links .
Fig . 4 . Some common transfer functions . Given the current set of weights w„[s], one
needs to know how to increase or decrease them ek (O) = -dEldlk (0)

in order to reduce the global error . This can be _ -d Eldo k
done using a gradient descent rule as follows : - dk - O k
d w„ [s] = -b * (d Eld w, ;[s]) , which is indeed the local error .

E defines the global error of the network for a
where b is the learning coefficient . The relation- particular (i, d) . An overall global error function
ship between b and E can be easily obtained can be defined as the sum of all the pattern
from a set of simple experiments and the recom- specific error functions . Then each time a par-
mended starting value for b is around 0 .9 . As a ticular (i, d) is presented, the BP algorithm mod-
rule of thumb, the larger the value of b, the ifies the weights to reduce that particular compo-
faster the global error will subside . On the other nent of the overall error function .
hand, a large value of b is likely to give rise to an
unstable network . The above gradient descent 3 .5 . The momentum rate
rule can be used to change each weight according One of the problems of a gradient descent
to its size and direction of negative gradient on algorithm is how to set the learning rate . Chang-
the error surface. ing the weights as a linear function of the partial
The partial derivatives in w,,[s] can be calcu- derivative as defined above makes the assump-
lated directly from the local error values dis- tion that the error surface is locally linear . At
cussed in the previous subsection using the chain points of high curvature this linearity assumption
rule : does not hold and a divergent behavior might
occur near such points . It is therefore important
dEldw;,[s] = (dEldl,[s]) * (at,[s] law, ;[s]) to keep the learning coefficient low in order to
=-ei[s]*x,[s-1] avoid such behavior .
On the other hand a small learning coefficient
combining 8Eldw,,[s] into dw~,[s] : can lead to very slow learning . The concept of
momentum term was introduced to resolve this
dw1;[s]=b*e[s]*x,[s-1] . dichotomy . The delta weight equation is mod-
ified so that a portion of the previous delta
3 .4 . The global error function weight is fed through to the current delta weight :
The above discussion has assumed the exist-
dw,;[s]=R*et[s]*x ;[s-1]+a*dw,;[s] .
ence of a global error function without actually
specifying it . This function is needed to define This acts as a low-pass filter on the delta weight
the local errors at the output layer so that they terms since general trends are reinforced, where-
can be propagated back through the network .
as oscillatory behavior cancel itself out . This
Suppose a vector i is presented to the network, d allows a low learning coefficient but faster learn-
is the desired output (target), and o is the actual ing rate . a is referred to as the momentum rate .
output, then a measure of the error is given by :
The recommended starting value of a is around
0 .7 .
E=0 .5*F'k(dk-Ok)',
3 .6. Creating a BP network

where k indexes the components of d and o .
From the previous definition of the local error at A typical BP network consists of at least three
each output cell, layers . The input layer acts as a buffer for ac-

152 F. S. Wong
cepting input data, the hidden layer(s) and out- a training set could be presented cyclically until
put layer are associated with modifiable weights weights stabilize .
which will be adjusted during the learning pro-
cess to approximate the system function . A BP Step 3: calculate actual outputs .
network can be either auto-associative (unsuper- Use the sigmoid nonlinearity from above to
vised learning) or hetero-associative (supervised calculate outputs y,,, y„ y 2 , . . . , y M , .
learning) depending on the objective of the net-
work . In our study, the network is hetero- Step 4 : adjust weights .
associative, i .e . it has to be trained with histori- Use a recursive algorithm starting at the out-
cal data patterns. Although one hidden layer is put nodes and work back to the first hidden
sufficient for most cases, there is no guarantee layer . Adjust weights using :
that the network with one hidden layer will
attain the performance of those with more than w,J (t + 1) = w,t (t) + P5'x ; .
one . The number of input cells must always
match the number of fields in the input record In this equation, w,,(t) is the weight from hidden
set . As a rule of thumb in this application, the node i or from an input to node j at time t, x' is
number of cells in the first hidden layer is usually either the output of node i or is an input, P is the
twice the number of input cells . The number of learning coefficient, and S, is an error term for
output cell(s) must match that of the target(s) to node j . If node j is an output node, then :
be forecasted, and is usually one .
Si = y ;(1 - y,)(d1 - y) ,
3 .7 . The BP training algorithm in summary
where d1 is the desired output of node j and y, is
The BP training algorithm is an iterative gradi- the actual output .
ent descent algorithm designed to minimize the If node j is an internal hidden node, then
mean square error between the actual output of
a multilayer feed-forward neural network and 3i = x'(1-xj)I k d k w, k ,
the desired output . It requires continuous dif-
ferentiable nonlinearities such as the sigmoid where k is overall index for nodes in the layer
logistic function f(z) defined below : after node j . Internal node thresholds are adjus-
ted in a similar manner by assuming they are
f(z) = (1 + e -2 ) _' . connection weights on links from auxiliary con-
stant-valued inputs . Convergence is sometimes
Step 1 : initialize weights and offsets . faster if a momentum term is added and weight
Set all weights and node offsets to small ran- changes are smoothed by :
dom values .
w,i (t + 1) = W;r (t) + /38,x ;
Step 2 : present input and desired outputs .
Present a continuous valued input vector + a(w ii (t) - w 1i (t - 1)) ,
x 0 , x„ x„ . . . , x,v _, and specify the desired out-
puts d o , d l , . . . , d M , . If the net is used as a where 0 < a < 1 and a is the momentum rate
classifier then all desired outputs are typically set coefficient .
to zero except for that corresponding to the class
the input is from . That desired output is 1 . The Step S: repeat by going to Step 2 until the error
input could be new on each trial or samples from term is reduced to an acceptable level .

Time series forecasting 1 53
3 .8. Discussions one which is deterministic but 'random'- rather

The generally good performance found for the similar to deterministic, pseudo-random number
generators . A time series, by definition, is an
BP algorithm is somewhat surprising considering
that it is a gradient search technique that may ordered succession of numbers representing the
values of a particular variable over a given
find a local minimum in the error term instead of
period of time (e .g . monthly sales figures for
the desired global minimum . Suggestions to
1970 through now) .
avoid this problem include :
(1) train the BP network many times, each Clearly, if one can uncover the underlying,
with different initial random weights (this can be deterministic algorithm from a chaotic time
achieved by varying the seed value used by the series, using a neural network or a non-neural
network approach, one is likely to forecast its
random generator) ;
future values extrapolation quite accurately .
(2) lower the learning coefficient ;
(3) add extra hidden layers .
Another problem noted with the BP algorithm 4 .2 . The forecasting power of BPNN
is the rather large amount of training required
BP applications involve using binary (0, 1) or
for convergence (e .g . 100 to 400 passes through
all the training data) . Although a number of analog data (any floating-point numbers) as in-
puts and outputs . Let us consider a natural sys-
training algorithms have been proposed to speed
tem described by nonlinear differential equations
up the convergence, it seems unlikely that a
which have an infinite dimensional phase space
significant reduction in the amount of training
(i .e . an infinite number of values are necessary
can be achieved .
to describe the initial conditions) . The Glass-
Mackey equation is one such a delay, nonlinear
4 . Using a BPNN for time series forecasting differential equation :
x'(t) = a `x(t - T) /(1 + x "'(t - r)) - b"x(t - T) .

4 .1 . Overview
A great deal of interest has been shown in The initial condition is specified by an initial
using neural networks to perform qualitative function defined over a strip of width T (hence
reasoning, and relatively little work has been the infinite dimensional phase space, i .e . initial
done in exploring their ability in processing float- functions, not initial constants, are required) .
ing point numbers in a massively parallel fash- By choosing this function to be a constant
ion . This section will examine the forecasting of function, and by integrating x'(t) with respect to
`chaotic' time series using a backpropagation time t, one is able to obtain a time series x(t)
(BP) network . The results presented here can be which is function of a, b and T . For example, let
applied to many engineering, business and indus- a = 0 . 2, b = 0 . 1, and T = 17, a time series x(t)
trial management problems which require accur- which is chaotic with a fractal attractor of dimen-
ate forecast . sion of 2.1 is obtained . By varying the value of T,
Many conventional signal processing tests, e .g . one is able to obtain other chaotic systems com-
correlation function, cannot distinguish deter- monly found in nature .
ministic chaotic behavior from stochastic noise . The time series can now be used to determine
Particularly difficult systems to predict are those the forecasting accuracy of BP networks . The
nonlinear and chaotic ones . Chaos has a techni- procedure is as follows : take a set of n values
cal definition based on nonlinear, dynamical sys- x(t- n + 1) . . . . x(t -1), x(t) as training inputs,
tems theory, but intuitively a chaotic system is and x(t + P) as the target output to train the BP

154 F.S . Wong
network, such that (t + P) -_ T _~ Tcurrentt where Besides, there are other nonlinear system
P is the number of forecasting steps ahead of modeling examples to show the ability of neural
time t. Repeat the training process until the networks in inferring the correct mappings used
Mean Square Error between the network output to transform the input data sets . This somewhat
and the target has dropped to a certain accept- mysterious ability to infer mappings is actually
able level. Now feed the BP network with test nothing more than real valued function approxi-
inputs x(t - n + 1) . . . . x(t-1), x(t) such that mation when viewed in the context of signal
min(T, Tcu rrent - P) ~_ t g Tcurrenr~ compare the processing .
output (i.e. the forecast) with the actual values
of x(t) and compute the accuracy measurements
for P. Repeat the training and testing procedures 5. The neural forecaster models
for a range of values for P and obtain the
corresponding measurements . 5.1 . The 2-D model
The fundamental nature of chaos dictates that
forecasting accuracy will decrease as P Based on the BP network described before, a
increases - this applies to all existing forecasting two-dimensional neural network model has been
built and is shown in Fig . 5 . The cells in the
methods including the BP approach . The ques-
input layers use linear transfer functions, and
tions now are "How rapidly does the degra-
half of the number of cells of the second layer
dation occur? Does the BP approach fare better
use the sigmoid function, and the other half use
than others?" The work of Lapedes and Farber
the sine function . All the cells in the third layer
showed that the BP approach can be more accur-
use the sigmoid function, and the output cells
ate than conventional methods, especially at
large values of P. use a linear function . The input data file struc-
ture used by the model is presented in Figs . 6
and 7 . It only accepts a single record at a time,
4 .3 . Discussions
and generates a single predicted output . This
The nonlinear nature of the activation (trans- 2-D model is found to be useful for classification
fer) function used by the hidden neurons allows problems such as credit rating, in which the input
chaotic time series to be forecasted . Chaotic time
series are emitted by deterministic nonlinear sys- SinglePredcedOUn s
tems and are sufficiently complicated that they
appear to be `random' time series . However,
because there is an underlying deterministic map
that generates the series, there is a closer ana-
logy to pseudo-random number generators than
to stochastic randomness . BP networks are able
to forecast well because they extract, and accu-
rately approximate these underlying maps . 1 8 Sigmoid I
Deterministic chaos has been implicated in a
large number of physical situations including the
onset of turbulence in fluids, chemical reactions,
lasers and plasma physics, to name but a few . In
addition to these engineering applications, there
are also several independent studies on using BP constant ='1'
Single Record o1 Input Data
for financial applications with very promising Fig . 5 . The transfer functions of the various layers in a 2-D
findings . neural forecast network .

Dale Target (Inl) In2 In3 In4 Ins In6 In7 InS
( Jan89 - .05305 1 .8 17 .9 0 .0602 0 .0632 -0 .01593 0.058441 5.4 ) A Single Input Record
Feb89 0 .034148 1.9 17.5 0.0613 0.0634 -0.05305 0.061688 7.3
Mar89 0.022623 .4 .9 6 .46 0 .0622 0.0613 0.034148 0.054838 9
Apr89 0 .000456 3 1056 0 .059 0.0606 0.022623 0.054662 10.2 1
May89 -0 .06634 3 10.56 0 .0632 0 .0629 0.000456 0 .050955 9.5
Jun89 -0.05656 3.3 9.6 0.0619 0 .0623 -0 .06634 0 .050793 9.4 Multiple Input Records
b .: 1 I.
Aug89 -0 .02083 3.3 9 .6 0.0628 0.0724 0.047492 0.066878 7 .5
s Sep89 0.056189 3.5 9.05 0.0653 0.0708 -0.02083 0 .050156 9.7
0 Oct89 -0 .03313 3 .3 9.6 0 .0653 0 .072 0. 056189 0 .059190 8.8
d Nov89 -0.01534 3.4 9.31 0.0676 0.0774 -0.03313 0 .052631 4.8
z
DecS9 -0.0709 3 .5 9 .05 0.0692 0.0828 -0.01534 0.049079 3.4
No. of Fields 21
Fig . 6 . A typical input file structure used to train and test the Neural Forecaster . When the number of fields equals 1, the
training/testing is done using the target time series itself .
Dale Target (Inl) In2 In3 In4 Ins In6 TO Toll
Ian89 -005305 1 .8 179 00607 00632

-0 01591 0.058441 5 4
Current Feb89 0.034148 .9
1 175 0.0613 0.0634
-0.05305 0.061688 7 .3
Record Mar89 0.022623 4 .9 6 .46 0.0622 0.0613
0.034148 0.054838 9
Apr89 .0000456 3 10 .56 0.059 0.022623
0.0606 0.054662 10 .2
May89 A 06634 3 10.56 0.0632 0.0629
0 .000456 0.050955 95
Jun89 -0 .05656 ) 3 .3 9.6 0.0619 0.0623
-0.06634 0.050793 9 .4
p42704g9}2 3 .4
Jul89y 0.,0 9.31 0.0612 0.0728
-0.05656 0 .053968 8
Multiple 0.047492 0.066878 7.5
Step Ahead
Targets
Setr84 0.056189 ) 3.5
Oc189 -0.03313 ) 3 .3
9.05 0.0653 0.07
9 .6 0 .0653 0 .072
8
-0.02083
0.056189
0.050156
0
.059190 8.8
9.7
Nov89 .0.01534 3.4 9.31 0.0676 0.0774 -0.03313 0 .052631 4.8

Dec89 -0.0709 3 .5 9 .05 0.0692 0.0828 -0.01534 0 .049079 3.4
Fig . 7 . The current record(s) for training! testing and the targets .
to the networks are attributes of the problems

4}t / Multiple Predicted Outputs
which need not be a time series .
Linear
5.2. The 3-D model
The three-dimensional model is an extension

of the 2-D model (see Fig. 8) . Instead of taking
one set of attributes at a time, it is designed to
accept several sets together, thus capturing the
temporal information contained in the sets . This
model is also used to generate multiple forecasts
as depicted in Figs . 7 and 8 . Because it is de-
signed to capture the temporal information of
the input record sets, it is therefore only suitable
for processing time series . For non-time series
applications, the number of records per set
should be reduced to one, in which case the 3-D Constant ='1'
model is the same as the 2-D . Fig . 8 . The architecture of a 3-D neural forecast network .
1 56 F .S . Wong
5.3. Discussions on the ST and KLSE indices . Singapore (Fig .

10) and US stock market returns, interest rate,
The 2-D model is easier to implement than the and the Singapore electricity usage patterns (Fig .
3-D model, and has more applications than the 11) .
latter . On the other hand, the 3-D model is able
to capture the temporal information contained in
each set of the records, giving rise to more
8. Other potential applications
accurate results when processing time series . The
penalties one has to pay for using the 3-D model
Forecasts are one of the most important
are much higher computer storage requirement
methods managers develop to support the deci-
and computation time .
sion-making process . Virtually every important
operating decision depends to some extent on a
forecast . Inventory accumulation is related to the
6. Implementation forecast of expected demand ; the production
department has to schedule employment needs
Both the 2-D and 3-D Neural Forecasters have and raw materials orders for the next month or
been implemented using the C language and run two ; the finance department must arrange short-
on the Apple Macintosh Il (compiled with term financing for the next quarter ; the person-
Think's C), IBM PC/AT and PS/2 (compiled nel department must determine hiring and layoff
with Turbo C), and will be implemented on requirements . The list of forecasting applications
workstations soon . Currently they are developed is quite lengthy .
as individual packages, but they can be integ- Executives are always keenly aware of the
rated with other applications . importance of forecasting . Indeed, a great deal
of time is spent studying trends in economic and
political affairs and how events might affect de-
7. Application examples mand for products and/or services . One issue of
interest here is the importance executives place
To illustrate the forecasting power of the neuron quantitative forecasting versus their own
al network, we applied it to forecast the S & opinions . But one problem with quantitative
P500 weekly closing price using the 3-D network . forecasting methods is that they depend on his-
The 3-D network was chosen because the train- torical data . For this reason they are less effec-
ing and testing data were in the form of a time tive in calling the turn that often results in
series . The training file contained the closing sharply higher or lower demand . Through tech-
prices for 140 consecutive weeks, whereas the niques such as neural networks, the computer
test file contained the same with an addition of can aid managers in automating sensitivity to
the following 12 weeks . With reference to the sharp changes in demand .
terms used in Figs . 6 and 7, the number of fields The description so far has centred around
in this case is 1 (only the target itself), the forecasting of time series . In fact, the BP net-
number of records per set is 10, the forecasting work can also be used for classification applica-
step is 1 . Therefore the total number of sets tions which do not involve time series . One
available for training is (140-1-10) = 129, and promising use is for processing credit card or
(140 + 12-10) = 142 in the case of testing . The bank loan applications, based on facts about the
neural network forecasts are shown in Fig . 9 applicants . These facts usually include such
together with the 10-week moving average for things as salary, number of checking accounts
comparison . Similar tests were also carried out and previous credit history . Large banks and
lithe series forecasting 157
L M R L M R L M R L M R
0.1200 0.1330 0 .1179 0.2620 0.2739 0 .2093 0 .3760 0 .4651 0.4388 0 .7240 0 .6722 0.6819
0.1160 0.1305 0 .1199 0 .2700 0.2783 0 .2202 0 .3960 0 .4396 0.4329 0 .7500 0 .7028 0.6825
0.1220 0.1284 0 .1204 0 .2830 0 .2837 0 .2313 0 .3840 0 .4304 0.4320 0 .7640 0 .7408 0.6850
0.1310 0.1299 0 .1202 0 .2600 0 .2926 0 .2425 0 .3980 0 .4250 0.4289 0 .7550 0.7614 0.6964
0.1160 0.1352 0 .1203 0.2510 0.2918 0.2503 0.4120 0.4289 0 .4290 0 .7580 0.7720 0 .7038
0.1350 0.1330 0 .1199 0 .2730 0.2874 0 .2545 0.4110 0.4368 0 .4302 0 .7890 0.7784 0 .7139
0.1510 0.1404 0 .1204 0 .2870 0.2924 0 .2597 0.4340 0.4400 0 .4263 0 .7730 0.8045 0,7235
0.1620 0.1511 0 .1231 0 .3130 0.3045 0 .2658 0.4470 0 .4431 0 .4231 0 .8090 0.8076 0 .7321
0.1610 0.1604 0 .1281 0 .3370 0,3239 0 .2736 0.4500 0.4432 0.4197 0 .8310 0.8216 0 .7482
0.1690 0 .1659 0 .1331 0.3490 0 .3409 0.2810 0 .4440 0.4403 0 .4163 0.8890 0.8352 0 .7644
0.1610 0 .1725 0 .1383 0,3420 0.3533 0 .2885 0 .4610 0.4375 0 .4152 0.8960 0.8645 0 .7842
0.1620 0 .1722 0 .1424 0 .3970 0.3604 0 .2965 0 .4850 0.4446 0 .4237 0.9000 0.8752 0 .8014
0.1730 0.1713 0 .1470 0 .3970 0.3711 0 .3092 0 .4700 0.4569 0 .4326 0.8650 0.8776 0 .8164
0.1760 0.1743 0 .1521 0 .4090 0.3782 0 .3206 0 .4650 0.4650 0 .4412 0 .8260 0.8682 0 .8265
0 .1800 0 .1760 0 .1566 0 .4090 0 .3871 0.3355 0 .4580 0 .4688 0 .4479 0.8310 0 .8443 0 .8336
0 .1920 0 .1761 0 .1630 0 .3970 0 .3953 0.3513 0 .4470 0 .4688 0 .4525 0.8220 0 .8250 0 .8409
0.1850 0 .1829 0 .1687 0 .4300 0 .4000 0.3637 0 .5080 0.4668 0 .4561 0.8560 0 .7999 0 .8442
0.1740 0 .1840 0 .1721 0 .4380 0 .4083 0.3780 0 .5460 0.4858 0 .4635 0.8560 0 .7727 0 .8525
0.1670 0 .1799 0 .1733 0 .4300 0 .4165 0.3905 0 .5850 0.5196 0 .4734 0.7880 0 .7348 0 .8572
0 .1520 0 .1731 0 .1739 0 .4040 0 .4225 0.3998 0 .5920 0,5647 01.4869 05070 0 .6514 0 .8529
0 .1590 0 .1612 0 .1722 0 .4020 0 .4238 0.4053 0 .6210 0 .5944 0 .5017 0 .4740 0 .4491 0 .8147
0 .1580 0 .1564 0 .1720 0 .4210 0 .4254 0.4113 0 .6140 0 .6212 0 .5177 0A930 0 .4300 0.7725
0 .1550 0 .1532 0 .1716 0 .4540 0 .4265 0.4137 0 .6430 0 .6331 0 .5306 0 .4570 0 .4428 0.7318
0 .1550 0 .1502 0 .1698 0 .4430 0 .4347 0.4194 0,6360 0 .6529 0 .5479 0 .4480 0 .4385 0 .6910
0 .1300 0 .1492 0 .1677 0 .4430 0 .4404 .4228
0 0,6690 0 .6618 0 .5650 0 .4460 0 .4328 0.6532
0 .1350 0 .1394 0 .1627 0 .4520 0 .4454 0 .4262 0,6710 0 .6813 0 .5861 0 .3820 0 .4328 0.6147
0 .1390 0 .1353 0 .1570 0 .4630 0 .4514 0 .4317 0 .7060 0 .6905 0 .6085 0 .4090 0 .4407 0.5707
0 .1350 0 .1355 0 .1524 0 .4780 0 .4552 0.4350 0 .7240 0 .7107 0 .6283 0 .4600 0 .4696 0.5260
0 .1530 0 .1342 0 .1485 0 .4350 0 .4616 0.4390 0 .7180 0 .7306 0 .6461 0 .4800 0 .4810 0.4864
0 .1590 0 .1408 0 .1471 0 .4050 0 .4543 0 .4395 0 .7250 0 .7391 0.6594
0 .1710 0 .1472 0 .1478 0 .4150 0 .4453 0 .4396 0 .6500 0 .7456 0 .6727
0 .1820 0 .1575 0 .1490 0 .3970 0 .4428 0 .4409 0 .6810 0 .7069 0.6756
0 .2090 0 .1715 0 .1514 0 .4000 0,4384 0 .4385 0 .6570 0 .6988 0.6823
0 .2210 0 .1939 0 .1568 0 .4500 0 .4352 0 .4331 0 .6930 0 .6808 0.6837
0.2260 0.2156 0 .1634 0.4660 0.4428 0 .4338 0.6870 06825 0.6894
0.2350 0.2287 0 .1730 0.4810 0.4552 0 .4361 0.6480 06788 0.6912
0.2630 0.2399 0 .1830 0.4840 0.4690 0 .4390 0.6690 0.6489 0.6889
0 .2740 0.2588 0 .1954 0.4550 0.4757 0 .4411 0.6910 06516 0.6852
Fig. 9 . The S & P500 weekly closing price (the targets) from 1985 to 1987 on the left (L) side of the columns, the neural network
predictions in the middle (M) . and the ten-week average is on the right (R) . All the data were normalized to the range [0, 1] .
lenders lose millions each year from bad debts . marketing company which sends out advertise-
Even a small increase in the ability to predict the ment brochures enclosed in monthly credit card
credibility of the applicants can result in hun- bills on a regular basis . What the company would
dreds of thousands of dollars saved each year . like to do is to send out only a small percentage
Neural network techniques have also been ap- of these brochures and keep information on
plied to the field of marketing . For years, adver- those who respond . Once the company has these
tising agencies and marketing companies have data, it can build a predictive model using neural
been trying to identify and sell to target, or networks to select only those who are likely to
specific markets . For example, consider a direct respond, thus cutting down the expenses .

1 58 F.S. Wong
Fig . 10 . The actual stock market returns, outputs from neural forecaster and regression . The neural network was trained on data
prior to January 1987 .
1 .0 iii .hiii IIIII11111117111 1 111111 , n i iii u

TIME
Fig . 11 . The Singapore power demand pattern (daily from 2 pm to 12 noon next day) and the 3-D neural forecaster output (thick
line) .
Time series forecasting 1 59
Another possible application is in evaluating [2] J . Hanke and A . Reitsch, Business Forecasting, 3rd ed .
and forecasting property prices . The network can (Allyn & Bacon, Newton, MA, 1989) .
[3] E . Helfert, Techniques of Financial Analysis, 6th ed .
be trained with attributes such as the location,
(Irwin . Homewood, IL, 1987) .
land size, built-in area, number of rooms, to- [4] D . Rumelhart, G . Hinton and R . Williams, Parallel
gether with other economic input data, to sug- Distributed Processing (MIT Press, Cambridge, MA,
gest the current and future selling prices . 1986) .
[5] D . Parker, Learning-Logic, Report I'R-47.. MIT Center
for Computational Research in Economics and Manage-
ment Science, 1985 .
9 . Conclusions [6] F . Wong, P. Tan and K . Tan, Parallel implementation of
the ncocognitron neural network on an array of transpu-
ters, Technical Report, Institute of Systems Science,
We have outlined an integrated neural net-
National University of Singapore, Jan 1990 .
work (INN) approach for business and industrial [7] F . Wong, Time series predictive analysis using back
forecasting, using the backpropagation neural propagation neural network, Technical Report, Institute
network (BPNN) as the underlying technique . of Systems Science, National University of Singapore,
We have also presented the construction, train- May 1990 .
[8] F. Wong, P .Z . Wang and H .H . Heng, A stock selection
ing, testing and performance of the BPNN for
strategy using fuzzy neural networks, to appear in : L .F .
use in forecasting, and the 2-D and 3-D Neural Pau, ed . . Comput . Sci . in Economics and Management .
Forecasters . Although the work presented here (Kluwer, Dordrecht, The Netherlands),
is mainly centred around time series forecasting,
we expect that many other business and industri-
Dr . Francis Wong is Research Staff
al application areas may be fruitfully investigated Member of the Institute of Systems
with the approach, especially for areas where no Science, National University of Sing-
apore . Prior to joining ISS, he has
a priori theoretical model or underlying mathe- worked as assistant professor and
matical function is available nor can be easily technical consultant in US and
Canada . He has B . Eng . (Hons),
determined . MASc and PhD degrees in Electrical
and Computer Engineering . His cur-
rent research interests include : paral-
lel processing, transputer program-
References ming, pattern recognition, neural
network and fuzzy engineering for
various financial and industrial applications with emphasis on
[1] A . Lapedcs and R . Farber, How neural nets work, in : forecasting selection and monitoring problems . He has im-
plemented several prototypes and commercial systems for the
Proc . Conf. on Neural Information Processing Systems
above-mentioned applications and published over 25 techni-
(AIP, 1987) . cal papers in these areas .

Time Series Forecasting Using Backpropagation Neural Networks

Uploaded by

Copyright:

Available Formats

Time Series Forecasting Using Backpropagation Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series Forecasting Using Backpropagation Neural Networks

Uploaded by

Copyright:

Available Formats

Neurocomputing 2 (1990/91) 147-159 147

Time series forecasting using

Keywords . Backpropagation ; forecasting; prediction ; time series

1 . Introduction values of common stocks, electricity usage, pro-

0925-2312/91/$03 .50 © 1991-Elsevier Science Publishers B .V

Knowledge Ccnpartg es to Country

Time series forecasting 149

term, slow moving, low frequency components ; Sings Pnaiasd Oulpul

techniques, such as correlation function analysis,

3 .1 . Historical perspective and overview

Back propagation (BP) is a technique for solv- outputs

the same time [5] and others have studied similar

It is shown in Fig . 4 together with three other f'(z) = f(z) * ( L0 - f(z))

3 .3 . Minimizing the global error

needs to know how to increase or decrease them ek (O) = -dEldlk (0)

d w„ [s] = -b * (d Eld w, ;[s]) , which is indeed the local error .

3 .6. Creating a BP network

Time series forecasting 1 53

3 .8. Discussions one which is deterministic but 'random'- rather

x'(t) = a `x(t - T) /(1 + x "'(t - r)) - b"x(t - T) .

154 F.S . Wong

Time series forecasting 155

Dale Target (Inl) In2 In3 In4 Ins In6 TO Toll

Ian89 -005305 1 .8 179 00607 00632

Nov89 .0.01534 3.4 9.31 0.0676 0.0774 -0.03313 0 .052631 4.8

to the networks are attributes of the problems

The three-dimensional model is an extension

5.3. Discussions on the ST and KLSE indices . Singapore (Fig .

1 .0 iii .hiii IIIII11111117111 1 111111 , n i iii u

You might also like