Time Series Forecasting Using Backpropagation Neural Networks
Time Series Forecasting Using Backpropagation Neural Networks
Time Series Forecasting Using Backpropagation Neural Networks
Elsevier
Abstract
Wong, F .S ., Time series forecasting using hackpropagation neural networks, Neurocomputing 2 (1990/91) 147-159 .
This paper describes a neural network approach for time series forecasting . This approach has several significant
advantages over other conventional forecasting methods such as regression and Box-Jcnkins : besides simplicity, another
major advantage is that it does not require any assumption to be made about the underlying function or model to he used .
All it needs are the historical data of the target and those relevant input factors for training the network . In some cases,
even the historical targets alone are sufficient to train the network for forecasting . Once the network is well trained and the
error between the target and the network forecasts has converged to an acceptable level, it is ready for use . The proposed
network has a three-dimensional structure which is proposed for capturing the temporal information contained in the input
time series . Several real applications, including forecasting of electricity load, stock market and interbank interest rate
forecastings were tested with the proposed network and the findings were very encouraging .
1 48 F.S . Wong
those using conventional techniques only, and floating-point computations to such an extent
costs are also expected to be lower . that supercomputing or parallel processing is
needed to deliver the speed . This is especially so
during the design phase where a large number of
2. The Integrated Neural Network (INN) tests and training of the networks are required .
approach To make the situation worse, integration with
other techniques would give rise to even more
The objective of our work is to use neural substantial increase in the computation load, and
networks as the underlying technique for predic- supercomputing or parallel processing will natur-
tive analysis, and then to integrate them with the ally become a must . On the other hand, the
conventional predictive techniques such as adap- neural network approach is able to add more
tive filtering, spectral analysis, moving averages, flexibility to the system capabilities and leads in
parabolic interpolation, trigonometric curve fit- certain circumstances to much better perform-
ting, multiple regression, autoregession, Box- ance, as had been proven by Lapedes and Farber
Jenkins method [2], etc . Other areas of artificial [1] . As for the speed requirement, a low-cost
intelligence (AI) such as Expert Systems and alternative is to implement the neural network
fuzzy theory are also being investigated for such on a hardware accelerator such as the INMOS
integration . A proposed stock selection system transputer using an appropriate parallel comput-
[8] which integrates neural network and fuzzy ing model (e .g . [6]) .
logic for assessing country risks and rating of In general, a time series can be broken down
stocks is shown in Fig . 1 . into the following four components : secular
The use of neural networks introduces non- trend, cyclical variation, seasonal fluctuation and
linear processing which requires massive, irregular fluctuation [2] . Secular trends are long-
state
Market
Forecast
------------
Country
List
Stock
List
ig . 1 . The proposed Intelligent Stock Selection System . 'the FuzzyNet makes use of fuzzy logic and neural network techniques
for processing of fuzzy rules and data .
can be further divided into two subcomponents : Constant -r Single Record of Input Data
the deterministic chaotic behavior and the sto- Fig . 2 . The 2-D neural forecast model . The numbers in the
chastic noise . Conventional signal processing layers indicate the number of cells .
an output layer, and at least one hidden layer . Fig . 3 . A typical processing cell .
li0 F .S . Wong
x,[s] : current output state of the Ith neuron in differentiable function of all the connection
layer s . weights in the network . The actual error function
is unimportant to understand the mechanism of
w„[s] : weight of connection joining the ith BP. The critical value that is propagated back
neuron in layer (s - 1) to the jth neuron in layer through the network layers is defined by :
S.
e,[s] = nEldll[s] .
II[s] : weighted summation of inputs to the jth
neuron in layer s . It will be shown that this can be considered a
measure of the local error at cell j in layer s .
A BP cell therefore transfers its inputs as Using the chain rule twice in succession gives
follows : rise to a relationship between the local error at a
particular cell at level s and all the local errors at
X, [s ]=f(£i(w/iIs]*xi[S I]))=f(I[s]), the layers s + 1 :
where f is commonly the sigmoid function but el[s] = f (Jl[S]) *!k(ek[S + 1] * Wk)[S + 1]) -
can be any function, preferably differentiable .
The sigmoid function is defined as : If f is the sigmoid function as defined previously,
then its derivative can be expressed as a simple
f(z) _ (1 .0 + e - ') - i . function of itself as follows :
Sigmold function :
provided the transfer function is a sigmoid . The
t
same procedure can be used to derive the error
Y(s) =
t+e •s
$ function when the transfer function is a hy-
Jo perbolic tangent or sine .
(also sine function : y(s) = sin (s)
Tanh function : y(s) = tanh (s))
In short, the main mechanism in a BP network
Heaviside function :
is to forward the input through the hidden to the
output layers, determine the error between the
0
actual output and the desired output (target),
V
and then backpropagate the error from the out-
Signum function ;
put to the input through the hidden layer(s) .
152 F. S. Wong
cepting input data, the hidden layer(s) and out- a training set could be presented cyclically until
put layer are associated with modifiable weights weights stabilize .
which will be adjusted during the learning pro-
cess to approximate the system function . A BP Step 3: calculate actual outputs .
network can be either auto-associative (unsuper- Use the sigmoid nonlinearity from above to
vised learning) or hetero-associative (supervised calculate outputs y,,, y„ y 2 , . . . , y M , .
learning) depending on the objective of the net-
work . In our study, the network is hetero- Step 4 : adjust weights .
associative, i .e . it has to be trained with histori- Use a recursive algorithm starting at the out-
cal data patterns. Although one hidden layer is put nodes and work back to the first hidden
sufficient for most cases, there is no guarantee layer . Adjust weights using :
that the network with one hidden layer will
attain the performance of those with more than w,J (t + 1) = w,t (t) + P5'x ; .
one . The number of input cells must always
match the number of fields in the input record In this equation, w,,(t) is the weight from hidden
set . As a rule of thumb in this application, the node i or from an input to node j at time t, x' is
number of cells in the first hidden layer is usually either the output of node i or is an input, P is the
twice the number of input cells . The number of learning coefficient, and S, is an error term for
output cell(s) must match that of the target(s) to node j . If node j is an output node, then :
be forecasted, and is usually one .
Si = y ;(1 - y,)(d1 - y) ,
3 .7 . The BP training algorithm in summary
where d1 is the desired output of node j and y, is
The BP training algorithm is an iterative gradi- the actual output .
ent descent algorithm designed to minimize the If node j is an internal hidden node, then
mean square error between the actual output of
a multilayer feed-forward neural network and 3i = x'(1-xj)I k d k w, k ,
the desired output . It requires continuous dif-
ferentiable nonlinearities such as the sigmoid where k is overall index for nodes in the layer
logistic function f(z) defined below : after node j . Internal node thresholds are adjus-
ted in a similar manner by assuming they are
f(z) = (1 + e -2 ) _' . connection weights on links from auxiliary con-
stant-valued inputs . Convergence is sometimes
Step 1 : initialize weights and offsets . faster if a momentum term is added and weight
Set all weights and node offsets to small ran- changes are smoothed by :
dom values .
w,i (t + 1) = W;r (t) + /38,x ;
Step 2 : present input and desired outputs .
Present a continuous valued input vector + a(w ii (t) - w 1i (t - 1)) ,
x 0 , x„ x„ . . . , x,v _, and specify the desired out-
puts d o , d l , . . . , d M , . If the net is used as a where 0 < a < 1 and a is the momentum rate
classifier then all desired outputs are typically set coefficient .
to zero except for that corresponding to the class
the input is from . That desired output is 1 . The Step S: repeat by going to Step 2 until the error
input could be new on each trial or samples from term is reduced to an acceptable level .
network, such that (t + P) -_ T _~ Tcurrentt where Besides, there are other nonlinear system
P is the number of forecasting steps ahead of modeling examples to show the ability of neural
time t. Repeat the training process until the networks in inferring the correct mappings used
Mean Square Error between the network output to transform the input data sets . This somewhat
and the target has dropped to a certain accept- mysterious ability to infer mappings is actually
able level. Now feed the BP network with test nothing more than real valued function approxi-
inputs x(t - n + 1) . . . . x(t-1), x(t) such that mation when viewed in the context of signal
min(T, Tcu rrent - P) ~_ t g Tcurrenr~ compare the processing .
output (i.e. the forecast) with the actual values
of x(t) and compute the accuracy measurements
for P. Repeat the training and testing procedures 5. The neural forecaster models
for a range of values for P and obtain the
corresponding measurements . 5.1 . The 2-D model
The fundamental nature of chaos dictates that
forecasting accuracy will decrease as P Based on the BP network described before, a
increases - this applies to all existing forecasting two-dimensional neural network model has been
built and is shown in Fig . 5 . The cells in the
methods including the BP approach . The ques-
input layers use linear transfer functions, and
tions now are "How rapidly does the degra-
half of the number of cells of the second layer
dation occur? Does the BP approach fare better
use the sigmoid function, and the other half use
than others?" The work of Lapedes and Farber
the sine function . All the cells in the third layer
showed that the BP approach can be more accur-
use the sigmoid function, and the output cells
ate than conventional methods, especially at
large values of P. use a linear function . The input data file struc-
ture used by the model is presented in Figs . 6
and 7 . It only accepts a single record at a time,
4 .3 . Discussions
and generates a single predicted output . This
The nonlinear nature of the activation (trans- 2-D model is found to be useful for classification
fer) function used by the hidden neurons allows problems such as credit rating, in which the input
chaotic time series to be forecasted . Chaotic time
series are emitted by deterministic nonlinear sys- SinglePredcedOUn s
tems and are sufficiently complicated that they
appear to be `random' time series . However,
because there is an underlying deterministic map
that generates the series, there is a closer ana-
logy to pseudo-random number generators than
to stochastic randomness . BP networks are able
to forecast well because they extract, and accu-
rately approximate these underlying maps . 1 8 Sigmoid I
Deterministic chaos has been implicated in a
large number of physical situations including the
onset of turbulence in fluids, chemical reactions,
lasers and plasma physics, to name but a few . In
addition to these engineering applications, there
are also several independent studies on using BP constant ='1'
Single Record o1 Input Data
for financial applications with very promising Fig . 5 . The transfer functions of the various layers in a 2-D
findings . neural forecast network .
Dale Target (Inl) In2 In3 In4 Ins In6 In7 InS
( Jan89 - .05305 1 .8 17 .9 0 .0602 0 .0632 -0 .01593 0.058441 5.4 ) A Single Input Record
Feb89 0 .034148 1.9 17.5 0.0613 0.0634 -0.05305 0.061688 7.3
Mar89 0.022623 .4 .9 6 .46 0 .0622 0.0613 0.034148 0.054838 9
Apr89 0 .000456 3 1056 0 .059 0.0606 0.022623 0.054662 10.2 1
May89 -0 .06634 3 10.56 0 .0632 0 .0629 0.000456 0 .050955 9.5
Jun89 -0.05656 3.3 9.6 0.0619 0 .0623 -0 .06634 0 .050793 9.4 Multiple Input Records
b .: 1 I.
Aug89 -0 .02083 3.3 9 .6 0.0628 0.0724 0.047492 0.066878 7 .5
s Sep89 0.056189 3.5 9.05 0.0653 0.0708 -0.02083 0 .050156 9.7
0 Oct89 -0 .03313 3 .3 9.6 0 .0653 0 .072 0. 056189 0 .059190 8.8
d Nov89 -0.01534 3.4 9.31 0.0676 0.0774 -0.03313 0 .052631 4.8
z
DecS9 -0.0709 3 .5 9 .05 0.0692 0.0828 -0.01534 0.049079 3.4
No. of Fields 21
Fig . 6 . A typical input file structure used to train and test the Neural Forecaster . When the number of fields equals 1, the
training/testing is done using the target time series itself .
Fig . 7 . The current record(s) for training! testing and the targets .
model is the same as the 2-D . Fig . 8 . The architecture of a 3-D neural forecast network .
1 56 F .S . Wong
L M R L M R L M R L M R
0.1200 0.1330 0 .1179 0.2620 0.2739 0 .2093 0 .3760 0 .4651 0.4388 0 .7240 0 .6722 0.6819
0.1160 0.1305 0 .1199 0 .2700 0.2783 0 .2202 0 .3960 0 .4396 0.4329 0 .7500 0 .7028 0.6825
0.1220 0.1284 0 .1204 0 .2830 0 .2837 0 .2313 0 .3840 0 .4304 0.4320 0 .7640 0 .7408 0.6850
0.1310 0.1299 0 .1202 0 .2600 0 .2926 0 .2425 0 .3980 0 .4250 0.4289 0 .7550 0.7614 0.6964
0.1160 0.1352 0 .1203 0.2510 0.2918 0.2503 0.4120 0.4289 0 .4290 0 .7580 0.7720 0 .7038
0.1350 0.1330 0 .1199 0 .2730 0.2874 0 .2545 0.4110 0.4368 0 .4302 0 .7890 0.7784 0 .7139
0.1510 0.1404 0 .1204 0 .2870 0.2924 0 .2597 0.4340 0.4400 0 .4263 0 .7730 0.8045 0,7235
0.1620 0.1511 0 .1231 0 .3130 0.3045 0 .2658 0.4470 0 .4431 0 .4231 0 .8090 0.8076 0 .7321
0.1610 0.1604 0 .1281 0 .3370 0,3239 0 .2736 0.4500 0.4432 0.4197 0 .8310 0.8216 0 .7482
0.1690 0 .1659 0 .1331 0.3490 0 .3409 0.2810 0 .4440 0.4403 0 .4163 0.8890 0.8352 0 .7644
0.1610 0 .1725 0 .1383 0,3420 0.3533 0 .2885 0 .4610 0.4375 0 .4152 0.8960 0.8645 0 .7842
0.1620 0 .1722 0 .1424 0 .3970 0.3604 0 .2965 0 .4850 0.4446 0 .4237 0.9000 0.8752 0 .8014
0.1730 0.1713 0 .1470 0 .3970 0.3711 0 .3092 0 .4700 0.4569 0 .4326 0.8650 0.8776 0 .8164
0.1760 0.1743 0 .1521 0 .4090 0.3782 0 .3206 0 .4650 0.4650 0 .4412 0 .8260 0.8682 0 .8265
0 .1800 0 .1760 0 .1566 0 .4090 0 .3871 0.3355 0 .4580 0 .4688 0 .4479 0.8310 0 .8443 0 .8336
0 .1920 0 .1761 0 .1630 0 .3970 0 .3953 0.3513 0 .4470 0 .4688 0 .4525 0.8220 0 .8250 0 .8409
0.1850 0 .1829 0 .1687 0 .4300 0 .4000 0.3637 0 .5080 0.4668 0 .4561 0.8560 0 .7999 0 .8442
0.1740 0 .1840 0 .1721 0 .4380 0 .4083 0.3780 0 .5460 0.4858 0 .4635 0.8560 0 .7727 0 .8525
0.1670 0 .1799 0 .1733 0 .4300 0 .4165 0.3905 0 .5850 0.5196 0 .4734 0.7880 0 .7348 0 .8572
0 .1520 0 .1731 0 .1739 0 .4040 0 .4225 0.3998 0 .5920 0,5647 01.4869 05070 0 .6514 0 .8529
0 .1590 0 .1612 0 .1722 0 .4020 0 .4238 0.4053 0 .6210 0 .5944 0 .5017 0 .4740 0 .4491 0 .8147
0 .1580 0 .1564 0 .1720 0 .4210 0 .4254 0.4113 0 .6140 0 .6212 0 .5177 0A930 0 .4300 0.7725
0 .1550 0 .1532 0 .1716 0 .4540 0 .4265 0.4137 0 .6430 0 .6331 0 .5306 0 .4570 0 .4428 0.7318
0 .1550 0 .1502 0 .1698 0 .4430 0 .4347 0.4194 0,6360 0 .6529 0 .5479 0 .4480 0 .4385 0 .6910
0 .1300 0 .1492 0 .1677 0 .4430 0 .4404 .4228
0 0,6690 0 .6618 0 .5650 0 .4460 0 .4328 0.6532
0 .1350 0 .1394 0 .1627 0 .4520 0 .4454 0 .4262 0,6710 0 .6813 0 .5861 0 .3820 0 .4328 0.6147
0 .1390 0 .1353 0 .1570 0 .4630 0 .4514 0 .4317 0 .7060 0 .6905 0 .6085 0 .4090 0 .4407 0.5707
0 .1350 0 .1355 0 .1524 0 .4780 0 .4552 0.4350 0 .7240 0 .7107 0 .6283 0 .4600 0 .4696 0.5260
0 .1530 0 .1342 0 .1485 0 .4350 0 .4616 0.4390 0 .7180 0 .7306 0 .6461 0 .4800 0 .4810 0.4864
0 .1590 0 .1408 0 .1471 0 .4050 0 .4543 0 .4395 0 .7250 0 .7391 0.6594
0 .1710 0 .1472 0 .1478 0 .4150 0 .4453 0 .4396 0 .6500 0 .7456 0 .6727
0 .1820 0 .1575 0 .1490 0 .3970 0 .4428 0 .4409 0 .6810 0 .7069 0.6756
0 .2090 0 .1715 0 .1514 0 .4000 0,4384 0 .4385 0 .6570 0 .6988 0.6823
0 .2210 0 .1939 0 .1568 0 .4500 0 .4352 0 .4331 0 .6930 0 .6808 0.6837
0.2260 0.2156 0 .1634 0.4660 0.4428 0 .4338 0.6870 06825 0.6894
0.2350 0.2287 0 .1730 0.4810 0.4552 0 .4361 0.6480 06788 0.6912
0.2630 0.2399 0 .1830 0.4840 0.4690 0 .4390 0.6690 0.6489 0.6889
0 .2740 0.2588 0 .1954 0.4550 0.4757 0 .4411 0.6910 06516 0.6852
Fig. 9 . The S & P500 weekly closing price (the targets) from 1985 to 1987 on the left (L) side of the columns, the neural network
predictions in the middle (M) . and the ten-week average is on the right (R) . All the data were normalized to the range [0, 1] .
lenders lose millions each year from bad debts . marketing company which sends out advertise-
Even a small increase in the ability to predict the ment brochures enclosed in monthly credit card
credibility of the applicants can result in hun- bills on a regular basis . What the company would
dreds of thousands of dollars saved each year . like to do is to send out only a small percentage
Neural network techniques have also been ap- of these brochures and keep information on
plied to the field of marketing . For years, adver- those who respond . Once the company has these
tising agencies and marketing companies have data, it can build a predictive model using neural
been trying to identify and sell to target, or networks to select only those who are likely to
specific markets . For example, consider a direct respond, thus cutting down the expenses .
1 58 F.S. Wong
Fig . 10 . The actual stock market returns, outputs from neural forecaster and regression . The neural network was trained on data
prior to January 1987 .
Fig . 11 . The Singapore power demand pattern (daily from 2 pm to 12 noon next day) and the 3-D neural forecaster output (thick
line) .
Time series forecasting 1 59
Another possible application is in evaluating [2] J . Hanke and A . Reitsch, Business Forecasting, 3rd ed .
and forecasting property prices . The network can (Allyn & Bacon, Newton, MA, 1989) .
[3] E . Helfert, Techniques of Financial Analysis, 6th ed .
be trained with attributes such as the location,
(Irwin . Homewood, IL, 1987) .
land size, built-in area, number of rooms, to- [4] D . Rumelhart, G . Hinton and R . Williams, Parallel
gether with other economic input data, to sug- Distributed Processing (MIT Press, Cambridge, MA,
gest the current and future selling prices . 1986) .
[5] D . Parker, Learning-Logic, Report I'R-47.. MIT Center
for Computational Research in Economics and Manage-
ment Science, 1985 .
9 . Conclusions [6] F . Wong, P. Tan and K . Tan, Parallel implementation of
the ncocognitron neural network on an array of transpu-
ters, Technical Report, Institute of Systems Science,
We have outlined an integrated neural net-
National University of Singapore, Jan 1990 .
work (INN) approach for business and industrial [7] F . Wong, Time series predictive analysis using back
forecasting, using the backpropagation neural propagation neural network, Technical Report, Institute
network (BPNN) as the underlying technique . of Systems Science, National University of Singapore,
We have also presented the construction, train- May 1990 .
[8] F. Wong, P .Z . Wang and H .H . Heng, A stock selection
ing, testing and performance of the BPNN for
strategy using fuzzy neural networks, to appear in : L .F .
use in forecasting, and the 2-D and 3-D Neural Pau, ed . . Comput . Sci . in Economics and Management .
Forecasters . Although the work presented here (Kluwer, Dordrecht, The Netherlands),
is mainly centred around time series forecasting,
we expect that many other business and industri-
Dr . Francis Wong is Research Staff
al application areas may be fruitfully investigated Member of the Institute of Systems
with the approach, especially for areas where no Science, National University of Sing-
apore . Prior to joining ISS, he has
a priori theoretical model or underlying mathe- worked as assistant professor and
matical function is available nor can be easily technical consultant in US and
Canada . He has B . Eng . (Hons),
determined . MASc and PhD degrees in Electrical
and Computer Engineering . His cur-
rent research interests include : paral-
lel processing, transputer program-
References ming, pattern recognition, neural
network and fuzzy engineering for
various financial and industrial applications with emphasis on
[1] A . Lapedcs and R . Farber, How neural nets work, in : forecasting selection and monitoring problems . He has im-
plemented several prototypes and commercial systems for the
Proc . Conf. on Neural Information Processing Systems
above-mentioned applications and published over 25 techni-
(AIP, 1987) . cal papers in these areas .