Forecasting Stock Indices With Back Propagation Neural Network
Forecasting Stock Indices With Back Propagation Neural Network
Forecasting Stock Indices With Back Propagation Neural Network
Short communication
a r t i c l e i n f o a b s t r a c t
Keywords: Stock prices as time series are non-stationary and highly-noisy due to the fact that stock markets are
Wavelet de-noising affected by a variety of factors. Predicting stock price or index with the noisy data directly is usually sub-
BP neural network ject to large errors. In this paper, we propose a new approach to forecasting the stock prices via the Wave-
WDBP neural network let De-noising-based Back Propagation (WDBP) neural network. An effective algorithm for predicting the
Stock prices
stock prices is developed. The monthly closing price data with the Shanghai Composite Index from Jan-
uary 1993 to December 2009 are used to illustrate the application of the WDBP neural network based
algorithm in predicting the stock index. To show the advantage of this new approach for stock index fore-
cast, the WDBP neural network is compared with the single Back Propagation (BP) neural network using
the real data set.
Ó 2011 Elsevier Ltd. All rights reserved.
0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.04.222
J.-Z. Wang et al. / Expert Systems with Applications 38 (2011) 14346–14355 14347
(I) Hidden layer stage: The outputs of all neurons in the hidden
layer are calculated by the following steps: 2.2. Wavelet transform
X
n
netj ¼ v ij xi ; j ¼ 1; 2; . . . ; m; ð1Þ Wavelet transform is used in analyzing non-stationary time ser-
i¼0 ies for generating information in both the time and frequency do-
mains. It may be regarded as a special type of Fourier transform at
yj ¼ fH ðnet j Þ; j ¼ 1; 2; . . . ; m: ð2Þ multiple scales and decomposes a signal into shifted and scaled
versions of a ‘‘mother’’ wavelet. The continuous wavelet transform,
Here netj is the activation value of the jth node, yj is the output of
denoted by CWT, is defined as the convolution of a time series x(t)
the hidden layer, and fH is called the activation function of a node,
with a wavelet function w(t) (Goswami & Chan, 1999):
usually a sigmoid function as follows:
Z
1 1 tb
fH ðxÞ ¼ : ð3Þ CWT wx ðb; aÞ ¼ /wx ðb; aÞ ¼ pffiffiffiffiffiffi xðtÞ w dt; ð5Þ
1 þ expðxÞ jaj a
(II) Output stage: The outputs of all neurons in the output layer
are given as follows: where a is a scale parameter, b is the translational parameter and
! is the complex conjugate of w(t). Let a = 1/2s and b = k/2s, where s
X
m
O ¼ fo xjk yj ; ð4Þ and k belong to the integer set Z. The CWT of x(t) is a number at
j¼0 (k/2s, 1/2s) on the time-scale plane. It represents the correlation be-
tween x(t) and w(t) at that time-scale point. A discrete version of
Eq. (5) is thus obtained as
Z
k 1 1
t k=2s
DWT wx ðk; sÞ ¼ /wx ; ¼ xðtÞ w dt; ð6Þ
2s 2s 1 1=2s
2.3. The WDBP neural network model algorithm 1. Wavelet decomposition: This step decomposes the signal into
low-pass filter B and high-pass filter D by the wavelet trans-
The WDBP neural network algorithm first decomposes the ori- form. The low-pass filter B reflects the main features of the sig-
ginal data into several layers via the wavelet transform, then estab- nal, and the high-pass filter D represents random factors often
lishes a BP neural network model using the low-frequency signal of called the noise. The main purpose of the wavelet de-composi-
every layer for prediction. The algorithm is described as follows tion is to separate the basic characteristics from the noise of the
and the flowchart is shown in Fig. 3. signal.
No
No
6000
4000
Stock closings price
2000
0
0 34 68 102 136 170 204
Time
!1=2
X
T
2
RMSE ¼ T 1 ðC MP
n CnÞ ; ð8Þ
n¼1
T
X
MP MP
MAPE ¼ T 1 ðC n C n Þ=C n ; ð9Þ
n¼1
Fig. 5. The six levels of db3 wavelet de-composition.
where C MP
n and Cn are the actual value and predicted value, respec-
2. Main wave extraction: This step extracts the low-pass filter Bi tively, and T is the sample size.
(i = 1, 2, . . . , n) and remove the high-pass filter Di Smaller values of these measures indicate more accurate fore-
(i = 1, 2, . . . , n) in the data. Here n is the number of layers of casted results and if the results are not consistent among three cri-
the wavelet de-composition. teria, we choose the relatively more stable MAPE, as suggested by
3. Normalization: Low frequency data Bi is normalized into Ai. Makridakis (1993), to be the benchmark. In this paper, we use all
4. Training and testing data sets determination: Divide the low-pass three measures to evaluate the forecasting performance.
filter data Ai into two subsets: training Ai(1) and testing Ai(2)
data. A validation must be performed by using Ai(2) to test
3. Experimentation design
how well the network is able to generalize for unknown data.
To cover wide ranges of outcomes, it is necessary to achieve a
3.1. Data preparation
balance between the training and validation data set sizes.
5. Relation estimation: Estimate the relation between input(s) and
The data for our experiments are SCI closing prices, collected on
output(s) through training the BP network using the training set
the Shanghai Stock Exchange (SSE). The total number of values for
Ai(1). To find the appropriate number of hidden nodes, repeat
the SCI closing prices is 204 trading months, from January 1993 to
these steps using different training parameters for networks
December 2009. Fig. 4 shows the original data series. The data set
with 1 to q nodes in their hidden layer. Training continues until
is partitioned into a training set (80%) and a testing set (20%) for
the estimation error is below a threshold.
validation.
6. Validation: Validate the network using Ai(2) and make forecasts.
As the network’s output value is between 0 and 1 (a character-
7. De-normalization: De-normalize the predicted values.
ðiÞ istics of the transfer function), we need to normalize the original
8. Evaluation: Check if PMAPE , the mean absolute percentage error
ðiÞ data series. A stock closings price p is normalized to p0 by
(MAPE) is no more than a threshold. If PMAPE 6 c, the training
is stopped and the algorithm ends; otherwise, set i = i + 1 and pi pmin
p0i ¼ ; i ¼ 1; 2; 3; . . . ; 204; ð10Þ
go to the Step 4. pmax pmin
14350 J.-Z. Wang et al. / Expert Systems with Applications 38 (2011) 14346–14355
D1
5000 0
0 -1000
0 34 68 102 136 170 204 0 34 68 102 136 170 204
D2
5000 0
0 -1000
0 34 68 102 136 170 204 0 34 68 102 136 170 204
D3
5000 0
0 -1000
0 34 68 102 136 170 204 0 34 68 102 136 170 204
D4
5000 0
0 -1000
0 34 68 102 136 170 204 0 34 68 102 136 170 204
D5
5000 0
0 -1000
0 34 68 102 136 170 204 0 34 68 102 136 170 204
D6
5000 0
0 -1000
0 34 68 102 136 170 204 0 34 68 102 136 170 204
Time Time
Original signals
6000
3000
0
0 34 68 102 136 170 204
Signals Reconstruction
6000
3000
0
0 34 68 102 136 170 204
-8
x 10 Error signals
5
-5
-10
0 34 68 102 136 170 204
3000
Actual Value
BP1 Fitting Value
Error
2000
1000
-1000
0 33 66 99 132 165
Fig. 8. The comparison chart of the actual value and BP1 fitting value.
3000
Actual Value
BP2 Fitting Value
Error
2000
1000
-1000
0 33 66 99 132 165
Fig. 9. The comparison chart of the actual value and BP2 fitting value.
where pmax and pmin are the maximum and minimum value of the obtained by superimposing detail D2 on approximation A2. Further-
original series, respectively. more, A2, A3, A4 and A5 are obtained by the similar iterations.
The appropriate number of levels of wavelet decomposition can
3.2. Wavelet de-noising be determined by the nature of a time series, according to domi-
nant frequency components (Mörchen, 2003), entropy criterion
From Fig. 4, we can see that the observed data are contaminated (Coifman & Wickerhauser, 1992), or application’s characteristics
by a lot of noise. The noise can be removed from the observed data (Li & Shue, 2004). Generally speaking, the number of levels of
by DWT. decomposition depends on the length of the time series. For exam-
There are a variety of wavelets proposed in the literature for ple, a six level decomposition is used in this study.
performing DWT. Each has its own application domain with unique The noise of SCI closings prices sample data in our study has
resolution capability, efficiency, and computational cost etc. In this been reduced by the db3 wavelet’s decomposition. Decomposition
study, the computationally efficient Daubechies (db3) wavelet is and reconstruction processes are shown in Figs. 6 and 7,
used. The Daubechies (db3) wavelet is the common wavelet and respectively.
its decomposition process is shown in Fig. 5.In Fig. 5, Ai and Di
(i = 1, 2, 3, 4, 5, 6) are the approximation and detail components, 3.3. WDBP neural network
respectively. Ai’s (i = 1, 2, 3, 4, 5, 6) represent the high-scale and
low-frequency components of the time series and Di’s Although many different neural network models have been ap-
(i = 1, 2, 3, 4, 5, 6) the low-scale and high-frequency components. plied in finance, we have chosen to use the BP neural network in this
These approximation and detail records are reconstructed from paper due to its popular use in the short-term forecasting situations.
the wavelet coefficients. The first high-pass filter provides the Unlike other BP models directly constructed by the original data, we
detail D1. The first low-pass filter is approximation A1, which is first decompose the original data into many layers by the wavelet
14352 J.-Z. Wang et al. / Expert Systems with Applications 38 (2011) 14346–14355
3000
Actual Value
BP3 Fitting Value
Error
2000
1000
-1000
0 33 66 99 132 165
Fig. 10. The comparison chart of the actual value and BP3 fitting value.
3000
Actual Value
BP4 Fitting Value
Error
2000
1000
-1000
0 33 66 99 132 165
Fig. 11. The comparison chart of the actual value and BP4 fitting value.
transform. Every layer has a low-frequency signal and high-fre- this experiment, we compare our models with the BP network
quency signal. Then we establish a BP neural network model by established by the original data without de-noising processing.
the low-frequency signal of each layer and predict its future. In all This BP without de-noising values versus actual values are shown
BP models, we choose three values of every quarter as an input sam- graphically in Fig. 14. In addition, the comparisons between the ba-
ple, and the total observations (the stock closings prices) are divided sic BP and our six models are presented in Table 1 and are graph-
into 68 groups. The former 55 groups of every layer’s low-frequency ically shown in Fig. 15. Table 2 shows accuracies of these models
data is used for training and the remaining groups for testing. based on MAE, RME and MAPE and indicate that the BP4 model
In this paper, we chose a three-layer BP neural network which has the smallest errors among these models. Hence, BP4 should
has a 3-neuron input layer, a 10 neuron hidden layer, and a 3- be chosen as our WDBP model. The WDBP model selected in this
neuron output layer. Decomposing the original data into six layers, process outperforms the conventional BP model significantly.
we establish the six models shown in Figs. 8–13.
In these Figs. 8–13, we show how BPi values fit the actual values,
where BPi (i = 1, 2, 3, 4, 5, 6) denotes the model based on the ith 4. Concluding remarks
layer of data. The errors are also shown in these figures.
This paper proposes an improved way of forecasting the stock
closings price based on the neural network. In previous studies,
3.4. Results and discussion neural networks are established with the original data for predic-
tion. However, the stock market data are highly random and
In order to verify the proposed WDBP method, we have con- non-stationary, thus contain much noise. The prediction accuracy
ducted a forecasting experiment with the SCI closings prices. In of traditional neural networks without the de-noising process is
J.-Z. Wang et al. / Expert Systems with Applications 38 (2011) 14346–14355 14353
3000
Actual Value
BP5 Fitting Value
Error
2000
1000
-1000
0 33 66 99 132 165
Fig. 12. The comparison chart of the actual value and BP5 fitting value.
3000
Actual Value
BP6 Fitting Value
Error
2000
1000
-1000
0 33 66 99 132 165
Fig. 13. The comparison chart of the actual value and BP6 fitting value.
3000
Actual Value
BP Fitting Value
Error
2000
1000
-1000
0 33 66 99 132 165
Fig. 14. The comparison chart of the actual value and BP fitting value.
14354 J.-Z. Wang et al. / Expert Systems with Applications 38 (2011) 14346–14355
Table 1
Comparing the predicted values of seven models (the BPi (i = 1, 2, 3, 4, 5, 6) network and the BP network).
9000
Actual
BP
8000 BP1
BP2
BP3
7000 BP4
BP5
BP6
6000
5000
4000
3000
2000
1000
0
0 3 6 9 12 15 18 21 24 27 30 33
Fig. 15. The comparison chart of the actual value and the predicted value of the seven models.
Table 2
Comparing the three criteria of the seven models (the BPi (i = 1, 2, 3, 4, 5, 6) network and the BP network).
not satisfactory. The lack of a good forecasting model motivates us Franses, P. H., & Ghijsels, H. (1999). Additive outliers, GARCH and forecasting
volatility. International Journal of Forecasting, 15(1), 1–9.
to find an improved method of making forecasts called WDBP
Goswami, J. C., & Chan, A. K. (1999). Fundamentals of Wavelets: Theory, Algorithms,
model. In this method, we first decompose the original data into and Applications. Wiley Publishers, pp. 149–152.
multiple layers by the wavelet transform, and every layer has a Hansen, J. V., & Nelson, R. D. (2002). Data mining of time series using stacked
low-frequency signal and high-frequency signal. Then we establish generalizers. Neurocomputing, 43(1–4), 173–184.
Kim, K. J., & Han, I. (2000). Genetic algorithms approach to feature discretization in
a BP neural network model based on the low-frequency signal artificial neural networks for the prediction of stock price index. Expert Systems
which is used to predict the future value respectively. Real data with Applications, 19(2), 125–132.
are used to illustrate the application of the WDBP neural network Li, S. T., & Kuo, S. C. (2008). Knowledge discovery in financial investment for
forecasting and trading strategy through wavelet-based SOM networks. Expert
and show the improved accuracy of using the WDBO model. Systems with Applications, 34, 935–951.
Wavelet transform is an effective data pre-processing tool Li, S. T., & Shue, L. Y. (2004). Data mining to aid policy making in air pollution
which may be combined with other forecasting methods, such as management. Expert Systems with Applications, 27(3), 331–340.
Mörchen. F. (2003). Time series feature extraction for data mining using DWT and DFT.
statistical and other AI models. Investigating these forecasting ap- Department of Mathematics and Computer Science Philips-University Marburg.
proaches could be a future research topic. Technical report, 33.
Oh, K. J., & Kim, K.-J. (2002). Analyzing stock market tick data using piecewise
nonlinear model. Expert System with Applications, 22(3), 249–255.
Acknowledgments Sarantis, N. (2001). Nonlinearities, cyclical behavior and predictability in stock
markets: International evidence. International Journal of Forecasting, 17(3),
The research was supported by the Ministry of Education over- 459–482.
Shen, L., & Loh, H. T. (2004). Applying rough sets to market timing decisions.
seas cooperation ‘‘Chunhui Projects’’ under Grant Z2007-1- 62012
Decision Support Systems, 37, 583–597.
and the Fundamental Research Fund for Physics and Mathematics Thawornwong, S., & Enke, D. (2004). The adaptive selection of financial and
of Lanzhou University under Grant (LZULL200910). The third and economic variables for use with artificial neural networks. Neurocomputing, 56,
corresponding author wishes to thank the support from NSERC 205–232.
Ture, M., & Kurt, I. (2006). Comparison of four different time series methods to
Grant RGPIN197319 of Canada. forecast hepatitis A virus infection. Expert Systems with Applications, 31, 41–46.
Vellido, A., Lisboa, P. J. G., & Meehan, K. (1999). Segmentation of the on-line
References shopping market using neural networks. Expert Systems with Applications, 17,
303–314.
Wang, Y. (2002a). Predicting stock price using fuzzy grey prediction system. Expert
Armano, G., Marchesi, M., & Murru, A. (2005). A hybrid genetic-neural architecture System with Applications, 22, 33–39.
for stock indexes forecasting. Information Sciences, 170(1), 3–33. Wang, Y. (2003). Mining stock prices using fuzzy rough set system. Expert System
Azadeh, A., Ghaderi, S. F., & Sohrabkhani, S. (2008). Annual electricity consumption
with Applications, 24(1), 13–23.
forecasting by neural network in high energy consuming industrial sectors. Wang, Y. F. (2002b). Predicting stock price using fuzzy grey prediction system.
Energy Conversion and Management, 49, 2272–2278.
Expert Systems with Applications, 22(1), 33–39.
Chen, A. S., Leung, M. T., & Daouk, H. (2003). Application of neural networks to an Wang, Y. H. (2009). Nonlinear neural network forecasting model for stock index
emerging financial market: Forecasting and trading the Taiwan stock index. option price: Hybrid GJR-GARCH approach. Expert Systems with Applications, 36,
Computers and Operations Research, 30, 901–923.
564–570.
Chun, S. H., & Kim, S. H. (2004). Data mining for financial prediction and trading: Zhang, G. (2003). Time series forecasting using a hybrid ARIMA and neural network
Application to single and multiple markets. Expert Systems with Applications, 26,
model. Neurocomputing, 50, 159–175.
131–139. Zhang, Y. D., & Wu, L. N. (2009). Stock market prediction of S&P 500 via combination
Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based algorithms for best- of improved BCO approach and BP neural network. Expert Systems with
basis selection. IEEE Transactions on Information Theory, 38, 713–718. Applications, 36, 8849–8854.
Enke, D., & Thawornwong, S. (2005). The use of data mining and neural networks for
forecasting stock market returns. Expert Systems with Applications, 29, 927–940.