Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
129 views

Stock Price Prediction Using Deep Learning Algorithm and Its Comparison With Machine Learning Algorithms

Uploaded by

notkurusboy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views

Stock Price Prediction Using Deep Learning Algorithm and Its Comparison With Machine Learning Algorithms

Uploaded by

notkurusboy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Received: 22 February 2019 Revised: 20 September 2019 Accepted: 20 September 2019

DOI: 10.1002/isaf.1459

RESEARCH ARTICLE

Stock price prediction using DEEP learning algorithm and its


comparison with machine learning algorithms

Mahla Nikou1 | Gholamreza Mansourfar1 | Jamshid Bagherzadeh2

1
Faculty of Economics and Management,
Urmia University, Urmia, Iran Summary
2
Faculty of Electrical and Computer Security indices are the main tools for evaluation of the status of financial markets.
Engineering, Urmia University, Urmia, Iran
Moreover, a main part of the economy of any country is constituted of investment
Correspondence in stock markets. Therefore, investors could maximize the return of investment if it
Gholamreza Mansourfar, Faculty of Economics
becomes possible to predict the future trend of stock market with appropriate
and Management, Urmia University. Urmia,
Iran. methods. The nonlinearity and nonstationarity of financial series make their predic-
Email: g.mansourfar@urmia.ac.ir
tion complicated. This study seeks to evaluate the prediction power of machine‐
learning models in a stock market. The data used in this study include the daily close
price data of iShares MSCI United Kingdom exchange‐traded fund from January 2015
to June 2018. The prediction process is done through four models of machine‐
learning algorithms. The results indicate that the deep learning method is better in
prediction than the other methods, and the support vector regression method is in
the next rank with respect to neural network and random forest methods with less
error.

K E Y W OR D S

artificial neural network, deep learning, prediction, random forest, support vector regression

1 | I N T RO D U CT I O N identified as a difficult problem on financial variables and especially


price index. Predictions are not precise, but their rate of error depends
Stock price indices are very significant among the world financial mar- on the algorithm used. The important point is that, according to
kets as the main criteria for evaluating the function of securities and researchers, the behaviour of most variables follows a nonlinear trend
stock market. They are achieved from accumulation of stock price in financial markets (Thomaidis, 2006); therefore, it is likely that linear
movements of all companies or certain class of existing companies in prediction does not yield appropriate results for examination of the
exchange market (Wang, Wang, Zhang, & Guo, 2012). Therefore, as future path of financial variables.
financial markets present a high degree of competition among partici- In the financial literature, price prediction methods are classified
pants (Parot, Michell, & Kristjanpoller, 2019), the study of stock price into four groups of technical analytical methods, fundamental analysis,
index prediction models would be very necessary for investors to turn and prediction based on time series and machine learning. Through
the securities market to a profitable place. This leads to updating discovering new patterns in historical data, machine learning intends
investors' knowledge in making proper decisions in selection of an to identify the underlying function based on which data are formed
appropriate portfolio, on the one hand, and the provision of interna- and realize the linear and nonlinear models that exist in these data
tional investment opportunities for investors, on the other hand. (Kalyvas, 2001). In past years, algorithms such as artificial neural net-
Nevertheless, prediction of financial markets is a complicated task, works and support vector machine (SVM) were widely used for predic-
in that financial time series are noisy, nonstationary, and irregular. tion of financial series and achievement of high precision in prediction
Although there are many statistical and computational methods for (Das & Padhy, 2012; Guo, Wang, Liu, & Yang, 2014; Lu, Lee, & Chiu,
prediction of these series (Atsalakis & Valavanis, 2009), it is still 2009). The results of studies indicate the considerable superiority of

Intell Sys Acc Fin Mgmt. 2019;1–11. wileyonlinelibrary.com/journal/isaf © 2019 John Wiley & Sons, Ltd. 1
2 NIKOU ET AL.

such methods over the traditional methods in stock price prediction efficient. Using a neural network model, Yim (2002) predicted the
(Aydin & Cavdar, 2015; Giovanis, 2009; Yim, 2002). For this reason, return of the daily index of Brazil stock and compared the results of
in many countries, such as the USA, Germany, France and the UK, prediction using root‐mean‐squared error (RMSE), mean absolute
researchers have succeeded in predicting stock market behaviour error (MAE), and the Chung and Hendry model with the results of gen-
through various machine learning methods (Huang, Nakamori, & eralized autoregressive conditional heteroskedasticity and ARMA and
Wang, 2005; Tay & Cao, 2001; Wei, 2012). Machine learning methods structural prediction models and showed the superiority of the artifi-
are often considered to be a subfield of artificial intelligence. How- cial neural network.
ever, there are a large number of studies employing machine learning, Kumar and Thenmozhi (2005) carried out some studies on the effi-
as opposed to other artificial intelligence methodologies (Fisher, ciency of each of the ARMA, RF, SVM, and artificial neural network
Garnsey, & Hughes, 2016). methods in prediction of the S&P CNX NIFTY index rate and com-
Deep learning is one of the machine learning models that has pared the function of these three nonlinear models and one linear
recently been innovated. It is a branch of machine learning and artifi- model with each other. The results indicated that the SVM could yield
cial intelligence and a set of algorithms that try to model high‐level more precise results. Zhang, Shi, Zhang, and Shi (2006) predicted the
conceptual concepts using learning in various levels and layers. Learn- stock exchange price trend of Shanghai using an SVM and concluded
ing could be supervised, semi‐supervised, or unsupervised (Bengio, that SVM has a high prediction capability and the combined model
Courville, & Vincent, 2013; LeCun, Bengio, & Hinton, 2015; of an SVM and intelligent models has even better results than the
Schmidhuber, 2015). The most important advantages of deep learning SVM model. Lee (2009) dealt with prediction of the NASDAQ index
are automatic learning of features, multilayer learning of features, high through a combined support vector regression (SVR) model and its
precision in results, high generalization power, and identification of comparison with artificial networks. In this study, SVR was combined
new data. Therefore, the aim of this study is to forecast the close price with F ‐score and supported sequential forward search and utilized
of iShares MSCI United Kingdom through this method and to compare for index change by 29 technical indices as a set of complete features.
its prediction power with other machine learning methods. The results of the study indicated the superiority of the combined SVR
estimator model over the neural network. Giovanis (2009) investi-
gated the stock price in the Athens stock market and then forecasted
2 | Theoretical Background the stock price values. He concluded that the neural network model
had a lower rate of error and benefitted from relative superiority. Kara,
Although the initial concepts of machine learning emerged in the Boyacioglu, and Baykan (2011) predicted the stock price movement
1950s (Samuel, 1959), it was studied in the 1990s as an independent direction of Istanbul through fuzzy‐neural network models and SVM.
field (Michalski, Carbonell, & Mitchell, 2013). Machine learning as a The prediction performance of the fuzzy‐neural network was 74.5%
highly applicable branch of artificial intelligence uses computers for and that of the SVM was 52.71%, and so the fuzzy‐neural network
simulation of human learning and building algorithms that could per- functioned better than the SVM. Aydin and Cavdar (2015) studied
form learning and prediction based on data (Anon., 1998; Portugal, the USD–TRY relation, gold price, and Borsa Instanbul (BIST) 100
Alencar, & Cowan, 2018). There are various types of machine learning using an artificial neural network and VAR and compared the results
algorithms, the most important of which include artificial neural net- obtained from the two methods with each other. The results showed
works, SVMs, random forests (RFs), and deep learning. that the neural network approach functions better than VAR in
As far as stock markets are usually nonlinear and chaotic systems, prediction.
nonlinear algorithms are preferred to linear algorithms in predicting Gao (2016) forecasted the stock market using a recurrent neural
stock behaviour. The most important nonlinear models that have been network (RNN) through long short‐term memory (LSTM). This study
widely used in recent years in financial markets and achieved desired intended to investigate the feasibility and efficiency of LSTM in stock
results are artificial neural network models. Whitcomb for the first market forecasting. Based on the results, the average precision and
time sought to answer the question of whether neural networks are accuracy of the LSTM model in forecasting six shares was 54.83%,
capable of identifying nonlinear laws in time series and unknown laws whith the highest and lowest precision being 59.5% and 49.75%
in asset price movement and stock price variations (Schwartz & respectively. Bao, Yue, and Rao (2017) presented a new deep learning
Whitcomb, 1977). Artificial neural networks usually function better framework where wavelet transforms, stacked auto encoders, and
than regression methods. For example, the findings of Hill, Marquez, LSTM methods were combined for stock price prediction. For exami-
O'Connor, and Remus (1994) on stock price prediction indicated the nation of the performance of the proposed model, six market indices
superiority of this method over other models. Moshiri and Cameron and their related future indices were selected. The results indicated
(2000) predicted the inflation in Canada based on various time series that the proposed model is better in terms of the precision of predic-
models of autoregressive–moving average (ARMA), vector tion and profitability compared with other similar models. Hiransha,
autoregression (VAR), and Bayesian VAR, structural models, and artifi- Gopalakrishnan, Menon, and Soman (2018) forecasted the National
cial neural network models on the basis of a multilayer feedforward Stock Exchange (NSE) of India stock market using four deep learning
neural network. The results indicate that neural network models can models: multilayer perceptrons (MLPs), RNNs, LSTM, and
predict the competitive models and in some cases are even more convolutional neural networks (CNNs). Data used in this study were
NIKOU ET AL. 3

the closing prices of two different stock markets: the NSE and the McClelland, Rumelhart, and Hinton (1986) succeeded in the develop-
New York Stock Exchange. The results indicated that deep learning ment of a multilayer feedforward neural network (MLP). Neural net-
models are better than ARIMA. Moreover, CNNs function better than works have wide applications in financial and investment areas, such
other deep learning models. Using real‐life credit card data linked to as prediction of bankruptcy, decision‐making, and financial planning.
711,397 credit card holders from a large bank in Brazil, Sun and The nodes in feedforward networks are located in consecutive
Vasarhelyi (2018) confirmed that—compared with machine‐learning layers with a one‐directional relation. Upon the arrival of an input pat-
algorithms of logistic regression, naive Bayes, traditional artificial neu- tern at the network, the first layer calculates the output value and
ral networks, and decision trees—deep neural networks have a better delivers it to the next layer. The next layer receives this number as
overall predictive performance. The recent success of deep networks the input and transfers its output values to the next layer. The BP
is partially attributable to their ability to learn abstract features from algorithm includes some calculations whereby the error due to the dif-
raw data. This motivated Galeshchuk and Mukherjee (2017) to inves- ference between network output and real value is returned to the net-
tigate the ability of deep learning for the change direction prediction work and the network parameters are regulated such that with the
of foreign exchange rates by using CNNs over three different next similar input pattern it presents better output with lower error
exchange indexes. They concluded that trained deep networks value (Haykin, 1999).
achieve satisfactory out‐of‐sample prediction accuracy. Neural network learning is done through weights balance with the
aim of minimizing an error function using the balance of initial weights
(Lawrence, 1997; Tan, 2009). The interneuron connections in most
3 | D A T A A N D M E T HO D neural networks are such that the neurons of mid layers receive their
input value from among all neurons of its underlying layer (usually the
3.1 | Data input neurons). In this way, the signals in a neural network move from
the input layer to the upper layers and at the end reach the outer layer
The data used in this study include the close price of daily data of and the network output which is called feed forward.
iShares MSCI United Kingdom from January 2015 to June 2018. The
iShares MSCI United Kingdom fund (NYSEARCA: EWU) is an
exchange‐traded fund that provides investors with access to a
4.2 | Support vector machine
selected equities market by tracking the total performance return of
The other nonlinear model widely used in financial markets in recent
the MSCI United Kingdom Index as its target benchmark. The MSCI
years and which has achieved desirable results is the SVM. The SVM
United Kingdom Index is composed of 111 large‐ and mid‐cap com-
has been used for prediction of time series in nonstationary status
pany stock holdings that together represent nearly 85% of the total
of variables, unjustifiability of classic methods, or time series complex-
free float‐adjusted market capitalization of UK. This index utilizes
ity (Tian, 2015). SVM models are classified into SVM and SVR. SVRs
the standards and methodologies of MSCI Global Investable Market
are a certain type of SVM used for future price prediction (Das &
Indexes, which attempt to cover a wide range of regional, market cap-
Padhy, 2012). The SVM has the capability in the prediction process
italization, sector and style segments of the international equity mar-
to eliminate irrelevant and scattered data and improve the precision
kets. EWU follows the same standards as its benchmark indices. As
of prediction (Wei, 2012). The SVM is based on the structural minimi-
of August 2015, EWU has provided investors with an annualized total
zation of risk taken from statistical training theory.
return of 5.9% since the fund's inception in 1996.
The SVM in financial data modelling is applicable as long as there
Research data has been collected from the Yahoo Finance site. For
are no strong assumptions. The basis for the SVM is the linear classi-
price index prediction using a deep learning algorithm and other
fication of data, where attempts are made to select a line with higher
machine learning methods and for comparison of the results for selec-
reliability. Solving the equations of the optimal line for data is done
tion of the best algorithm, coding has been done in Python 3.5.1 soft-
through quadratic programming methods, which are known methods
ware. The error evaluation criteria include MAE, mean‐square error
in solving constraining problems. Before linear classification, data are
(MSE), and RMSE.
transferred by a phi function to a wider space to enable the machine
to classify the highly complex data. Briefly speaking, this algorithm
uses a nonlinear mapping to convert the main data to a higher dimen-
4 | Method
sion and utilizes a linear optimality for separation of the hyperplane
(Han, Pei, & Kamber, 2011).
4.1 | Artificial neural network

The concept of artificial neural networks was proposed for the first 4.3 | Random Forest
time by McCulloch and Pitts (1943) and later on was widely used by
researchers in experimental modelling of nonlinear processes. The This algorithm was introduced in 2001 for the first time by Leo Breiman
perceptron is one type of neural network, having single‐layer and mul- and is based on a decision trees model known as classification and
tilayer forms. By presenting the back‐propagation (BP) algorithm regression trees (CART; Breiman, Friedman, Olshen, & Stone, 1984).
4 NIKOU ET AL.

Decision trees that are used to forecast classified variables are known as data in a long sequence; however, it is not the case in practice, and they
classification trees, since they place the samples in classes or categories, are so limited in this regard such that they just record data related to a
and the decision trees used for prediction of continuous variables are few past steps. The main feature of an RNN is its hidden state, which
called regression trees. However, the group nature of the RF algorithm stores the data of a sequence. Moreover, it is not necessary to have
leads to compatibility with changes and eliminates instability. an output and input in each time step.
Two main features in construction of RFs are bagging (Breiman, In a standard RNN structure, the accessible content scope is very
1996) and random selection in each node. The bagging method is an limited in practice. The problem is that the effect of a given input on
algorithm based on bootstrapping and composition concepts for the hidden layer, and consequently on the output of the network,
improving machine learning. In machine learning, group algorithms com- decreases exponentially and vanishes, which is known as the vanishing
bine several weak learners to achieve a strong learner, which prevents gradient problem (Pascanu, Mikolov, & Bengio, 2013). RNNs are called
data overfitting. In bagging, good results are produced when the base recurrent since the output of each layer depends on the calculations
classifications are part of unstable learning algorithms (such as a deci- of its previous layers; in other words, these networks have memory that
sion tree with a neural network) such that small changes in the training stores the information related to the observed data. These networks are
data lead to main variations in the model constructed by that algorithm. in fact various copies of ordinary neural networks that are arranged
The characteristic of random features is that, in each node of each beside each other and each transmitting a message to another.
tree, a small group of input features is randomly selected and for node Training in neural networks includes four stages.
division the best feature with the highest information efficiency is
selected for tree growth rather than searching all features. The num- 1. Preparation of training data: the higher the number of data, the
ber of these features is less than the number of main features. Each better the training. In this stage, the data preprocessing is one,
tree in the RF grows using the CART decision tree algorithm to its the aim of which is elimination of incomplete data and undesired
maximum size without any pruning. Breiman (2001) utilized variations and data simulation. For example, normalization (e.g.
h i
logM transmission to a zero–one range) is one of the data preprocessing
2 þ 1 features in each node, where M is the total number of
methods.
input features. More details of RFs can be seen in Booth, Gerding,
and McGroarty (2014) and Breiman (2001). 2. Selection of an appropriate network architecture. In this stage, the
appropriate numbers of neurons, layers, and type of network will
be determined; the number of layers and neurons is determined
4.4 | Deep learning by trial and error.
3. Network training. In this stage, the training cycle is iterated until
Deep learning is generally considered as a subset of machine learning
achievement of the desired results.
and was first introduced in 2005 and has seriously been taken into
account since 2012. The basis of deep learning is on learning repre-
Error or cost is the difference between network output and
sentation of knowledge and features in model layers (Bengio, 2009).
desired output, which is calculated through cost/loss function. Some
Using new facilities and technologies, the deep learning idea, inspired
examples of cost functions are mean squared, cross‐entropy, hinge,
by the human brain structure (Olshausen & Field, 1996), has consider-
and Softmax. Then optimization will be performed for
ably succeeded in most areas related to artificial intelligence and
modification and update of weights to achieve the least error. Some
machine learning. In fact, deep learning means the investigation of
of the optimization algorithms in deep neural networks are SGD,
new methods for artificial neural networks (Collobert, 2011; Gomes,
SGD + Momentum, RMSprop, Adagrad, Adadelta, and Adam. In
2014); in fact, it is a new attitude toward the idea of neural networks.
these networks, the weight modification is done through BP and
Neural networks have an internal hidden layer, and a network with
the training rate is determined so that the size of steps moves
several internal hidden layers is called a deep neural network (Bengio,
toward least error.
2009; Schmidhuber, 2015).
In deep learning, iteration and the creation of higher depths are used
4. Training improvement methods. These methods include batch nor-
to train computers some practices and a deep investigation of the sub-
malization, dropout, and transfer learning.
ject matter is attempted. Deep neural network models include auto‐
encoder neural networks, deep belief networks, CNNs, and RNNs.
The calculations in RNNs are as follows. First, the hidden layer in an
CNNs are unique to visual data and have wide application in image
RNN changes according to:
and video recognition and natural language processing (Collobert & I H
Weston, 2008). Moreover, RNNs are appropriate for timed data (Brax, ath ¼ ∑ ωih xti þ ∑ ωh′ h bht−1

i¼1 h′ ¼1
2000). Therefore, since the data used in this study are time series, RNNs |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
recurrent relations
have been used for prediction. RNNs are neural networks with one or ðin addition to MLPÞ
more recurrent loops. These networks are in fact created for processing
sequential signals and have one type of memory that records the data • ωih is the weight between the ith input and hth neuron from the
observed so far. Theoretically, these networks could record and utilize hidden layer
NIKOU ET AL. 5

• xti is the ith input of network in moment t When a computational problem can be solved using a certain algo-
• ωh′h is the weight between neurons of the hidden layer. rithm, the very next issue, after solvability, is the time and space com-
plexity of the solution method. The interesting point is the behaviour
The BP training changes according to: of the algorithm when applied to larger problems (Rojas, 1996).
If d is the length of input vector and h is the number of neurons in
!
  K H the hidden layer, then the memory space needed for an LSTM cell is O
δth ¼ θ′ ath ∑ δtk ωhk þ ∑ δhtþ1
′ ωhh′
k¼1 h′ ¼1 (d × h)‍ . As the result of the next cell (‍ t+1‍ ) is replaced in the same
|fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl}
recurrent relations
memory for the old values, the amount of memory needed in LSTM
ðin addition to MLPÞ is O(d × h)‍ . Normally, space complexity is not considered a primary
issue, since the computational models are provided with an infinitely
large memory. What is most interesting for applications is time com-
4.5 | Long short‐term memory plexity; therefore, when speaking about the complexity of an algo-
rithm, it is actually about its time complexity (Rojas, 1996).
LSTM is a type of model or structure for sequential data that has In an LSTM unit, there are a total of eight matrix multiplications
emerged from development of RNNs and improved by Gers, and three vector multiplications. As each matrix multiplication is O
Schmidhuber, and Cummins (2000). Long‐term memory refers to (d × h) and vector multiplication is O(d) or O(h), the total time complex-
learned weights and short‐term memory refers to internal states of ity of an LSTM unit is O(d × h)‍ .
cells. LSTM was created for the vanishing gradient problem in RNNs In these systems, learning time is not important since the learning
whose main change is replacement of the RNN mid layer with a block is completed before the running of the actual system and there is
that is called an LSTM block (Hochreiter & Schmidhuber, 1997). The enough time available for it. What is important is the accuracy of
main feature of LSTM is the possibility of long‐term affiliation learning, learning and reducing errors.
which was impossible with RNNs. To forecast the next time step, it is
required to update the weight values in the network, which requires
maintenance of the initial time step data. An RNN could just learn a lim-
ited number of short‐term affiliations; however, long‐term time series, 5 | RESULTS
such as 1000 time steps, cannot be learned by RNNs; in contrast,
LSTMs could properly learn these long‐term affiliations (Schmidhuber, The closing price of iShares MSCI United Kingdom was analysed from
2015). The LSTM structure includes a set of recurrent subnetworks, January 2015 to June 2018 using four machine learning methods.
called memory blocks. Each block includes one or more autoregressive Table 1 presents the descriptive statistics for the closing price of the
memory cells and three multiple units of ‘input, output, and forgetting’ index. Based on the Jarque–Bera test, these data do not have a normal
that present the analogues of continuous writing, reading, and regula- distribution because, concerning the probability value of statistics
tion of the cells' functions. Moreover, there are various types of LSTM (‍ P = 0.00001‍ ) at a reliability level of 95%, the zero assumption on data
blocks, including stacked LSTMs, encoder–decoder LSTMs, bidirec- normality is rejected. The index closing price diagram is shown in
tional LSTMs, CNN LSTMs, and generative LSTMs (Brownlee, 2017). Figure 1.

TABLE 1 Descriptive statistics

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis Jarque–Bera Probability

33.85622 34.04000 39.80000 27.98000 2.533603 0.127012 2.142125 29.31752 0.000000

FIGURE 1 The closing price of iShares MSCI


United Kingdom index
6 NIKOU ET AL.

5.1 | Results of prediction with artificial neural 5.2 | Results of prediction with support vector
network regression

In this study, an MLP neural network was used, and 80% of the data After data examination and preprocessing, the data were divided into
were utilized for training and 20% for testing the results. Ten neurons training and test data and the inputs of the model are similar to the neu-
representing lags of the closing price of iShares MSCI United Kingdom ral network method. It should be noted that 695 data out of 869 data
index were used in the input layer, a rectified linear unit transfer func- are for training and 174 data for test. In order to construct the SVR
tion was used in the hidden layers, and a linear transfer function in model, it is required to optimize the parameters of the model. SVR
was used the output layer. In order to find the best prediction model, utilizes various kernel functions for data processing, including different
the number of hidden layers and neurons in hidden layers are given to types of radial basis, polynomial, annular, and linear functions. Rüping
the program and the functional indices are calculated based on each (2001) showed that the radial basis function kernel function acts well
of them. Moreover, for this purpose, the AdamOptimizer algorithm in various types of time series and learning algorithms. Moreover, in this
and MSE cost function were used, and BP was used for weight modifi- method, the k‐fold cross‐validation method is used to estimate
cation. The best model is achieved by iterating the tests on various the efficiency of the machine learning model. The common values for
values of parameters with an epoch value equal to 100 and a batch size k‐fold are considered as 5 and 10 (Hastie, Tibshirani, & Friedman,
equal to 10. In order to determine the number of hidden layers in the 2009). Moreover, it is required to specify the optimum values of C, epsi-
network, it is concluded that a network with a hidden layer has less lon, and gamma parameters. To this end, the following values were
error. The results indicate that rarely does a network with more than selected:
three hidden layers improve prediction issues (Kaastra & Boyd, 1996).
C: 2−5, 2−4, …, 29, 210
In order to determine the appropriate number of data processing units,
epsilon: 2−10, 2−9, …, 2−1, 20
the number of neurons in the first hidden layer was changed between 5
gamma: 2−20, 2−19, …, 29, 210
and 15, and in the second hidden layer, depending on the first hidden
The optimum values of parameters C, epsilon, and gamma obtained
layer, some values were examined and for prevention of error the afore-
are given in Table 3. Consequently, it is determined that the most
mentioned process was iterated several times (see Table 2).
appropriate SVR model is the one with k‐fold 10.
By iteration of tests on different values of parameters, the results
indicate that for iShares MSCI United Kingdom, the best model is a
single hidden‐layer network with a 10–12–1 structure. This structure 5.3 | Results of prediction with random Forest
has a minimum error mean in which 10, 12, and 1 refer to the number
of neurons in the input layer, hidden layer, and output layer In this method, data are divided into training and test groups as in pre-
respectively. vious methods, with 80% of data considered for training and the rest

TABLE 2 Determination of the number of hidden layers and appropriate number of neurons in the hidden layer

First layer MAE MSE RMSE Second layer MAE MSE RMSE

5 0.51129 0.46978 0.676355 5 0.388393 0.301373 0.540229


6 0.393792 0.300025 0.543319 6 0.33738 0.216018 0.459981
7 0.450537 0.384993 0.612582 7 0.381501 0.254027 0.491368
8 0.370184 0.277556 0.5073 8 0.43648 0.355222 0.589392
9 0.444537 0.351343 0.585778 9 0.323808 0.20477 0.44431
10 0.468681 0.405391 0.614578 10 0.392794 0.311001 0.534902
11 0.405735 0.313126 0.543371 11 0.352541 0.249768 0.492202
12 0.342613 0.207035 0.454131 12 0.3717 0.26475 0.508939
13 0.400007 0.276403 0.520171
14 0.434626 0.306795 0.553386
15 0.347109 0.244483 0.489031

TABLE 3 Determination of optimum parameters of SVR

k‐fold MAE MSE RMSE Optimal gamma C Epsilon

5 0.266410819 0.149248498 0.386326932 0.125 128 0.0625


10 0.240020103 0.116047328 0.3406572 0.125 4 0.03125
NIKOU ET AL. 7

for test. The research data are single‐variable time series; therefore, and the values of the target variable for out‐of‐bag (OOB) samples is
the inputs of model are selected on a 10‐lag basis. First, the optimal estimated. The optimal x‐ratio value with lower OOB error is deter-
number of trees in the forest is determined; in this study we examined mined. In the next step, the RF regressor model is created based on
100–1000 trees in steps of 100 trees. Then the number of x‐ratio var- the number of trees and optimal x‐ratio. The results obtained are pre-
iables for the decision tree is determined, selected here from 0.1 to 1 sented in Table 4. Based on the results, the number of trees in the for-
in steps of 0.1 and the RF regressor is implemented for each x‐ratio est and the optimal x‐ratio should have lower OOB and test error;

TABLE 4 Determination of RF optimal parameters

n‐trees MAE MSE RMSE Optimal x‐ratio

100 0.296496556 0.155479333 0.394308677 0.8


200 0.2973361 0.153507168 0.391782986 0.8
300 0.296117561 0.152496081 0.390493848 0.9
400 0.294691597 0.151731948 0.389482008 0.8
500 0.29645561 0.152896251 0.391011462 0.8
600 0.294768275 0.152013321 0.389887752 0.8
700 0.295439408 0.152220109 0.390144206 0.8
800 0.295599609 0.152755575 0.390828096 0.9
900 0.295587397 0.153428457 0.391697861 0.9
1000 0.295868256 0.153137577 0.391323467 0.9

TABLE 5 The result of determining optimal number of neurons

No. of neurons MAE MSE RMSE

50 0.238623737 0.116605098 0.340227102


100 0.214368733 0.088005688 0.296063146
150 0.21382158 0.088199563 0.296829412
200 0.215998857 0.091932023 0.3028901
250 0.211789958 0.084693724 0.290793214
300 0.204745693 0.078053354 0.279368997
350 0.2122883 0.092136384 0.303410223
400 0.212813319 0.090775564 0.30122451

TABLE 6 Determination of optimal dropout rate p

Dropout rate p MAE MSE RMSE

0.5 0.211620 0.093981 0.306561


0.6 0.212588 0.093990 0.306584
0.7 0.213330 0.094644 0.307618
0.8 0.210350 0.093969 0.306543

TABLE 7 Summary of results obtained for the close price of iShares MSCI United Kingdom

ANN SVR RF LSTM

MAE 0.342613 0.24002 0.294692 0.210350


MSE 0.207035 0.116047 0.151732 0.093969
RMSE 0.454131 0.340657 0.389482 0.306543

ANN: artificial neural network.


8 NIKOU ET AL.

therefore, the appropriate number of trees and optimal x‐ratio are 400 are interrelated. In this method, 80% of data are considered for
and 0.8 respectively. training and 20% for testing. Similar to previous methods, 10 neu-
rons were used in the input layer. After data preprocessing, such
as making data stationary and data scaling between −1 and + 1,
5.4 | Results of prediction with deep learning the RNN model was created with the LSTM block. To this end, first,
algorithm the optimal values of the number of neurons should be determined.
Batch size is considered equal to 1 and epoch value equal to 100.
In this study, RNNs are used from deep neural network models. Moreover, in this method, the AdamOptimizer algorithm has been
Moreover, the LSTM block has been used due to the vanishing gra- used for weight modification and the MSE cost function; and we
dient problem in these networks. From two states of LSTM, stateful used the dropout method to prevent overfitting in training the deep
LSTM will be used because it is applicable in cases where the results neural networks. Dropout has a tuneable hyperparameter p

FIGURE 2 Prediction results with artificial neural network (ANN), RF, SVR, and LSTM

FIGURE 3 iShares MSCI United Kingdom


index prediction with machine learning
methods
NIKOU ET AL. 9

(the probability of retaining a unit in the network). p = 1 implies no investigate different types of LSTM models, such as stacked
dropout, and low values of p mean more dropout. Typical values of p LSTMs, encoder–decoder LSTMs, bidirectional LSTMs, CNN
for hidden units are in the range of 0.5 to 0.8 (Srivastava, Hinton, LSTMs, and generative LSTMs models in prediction of stock price.
Krizhevsky, Sutskever, & Salakhutdinov, 2014). Iterating the tests Finally, due to the use of time series data in this study, the role
on different values of parameters will yield the best model with less of other influential factors on stock price prediction were not
error, the results of which are presented in Tables 5 and 6. investigated; therefore, researchers are recommended to consider
Concerning the results, the optimum value of the number of neurons the role of other factors in future studies and compare the results
is considered as 300. Then we tried different values of p between obtained with the results of this study.
0.5 to 0.8, and the results showed that the best accuracy is achieved
with p = 0.8 Table 7.
ORCID
Figure 2 presents the graphs for each of the aforementioned
methods on predicted test data and real test data. The red lines indi- Gholamreza Mansourfar https://orcid.org/0000-0002-3076-0241

cate the predicted test data and the blue lines indicate the real test
data. As seen, the error in forecasting the close price of iShares MSCI RE FE RE NC ES

United Kingdom is less in deep learning than with the other methods. Anon. (1998). Glossary of terms. Machine Learning, 30(2–3), 271–274.

After deep learning, SVR functions better than the other methods Atsalakis, G. S., & Valavanis, K. P. (2009). Surveying stock market forecast-
ing techniques—Part II: Soft computing methods. Expert Systems with
(Figure 3).
Applications, 36(3), 5932–5941.
Aydin, A. D., & Cavdar, S. C. (2015). Comparison of prediction perfor-
mances of artificial neural network (ANN) and vector autoregressive
6 | C O N CL U S I O N S (VAR) models by using the macroeconomic variables of gold prices,
Borsa Istanbul (BIST) 100 index and US dollar–Turkish lira (USD/TRY)
Stock market prediction is so difficult due to its nonlinear, dynamic exchange rates. Procedia Economics and Finance, 30, 3–14.
and complicated nature; yet, it is so important. Successful prediction Bao, W., Yue, J., & Rao, Y. (2017). A deep learning framework for financial time
has some interesting benefits that usually affect the decision of a series using stacked autoencoders and long‐short term memory. PLoS
ONE, 12(7), e0180944. https://doi.org/10.1371/journal.pone.018094
financial trader on the purchase or sale of a financial instrument.
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and
One of the main factors considered by investors in making invest-
Trends® in Machine Learning, 2(1), 1–127.
ment and stock buy and sell decisions is the stock price index. This
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A
study has been carried out with the aim of predicting the close price review and new perspectives. IEEE Transactions on Pattern Analysis
of iShares MSCI United Kingdom. To this end, four data mining tech- and Machine Intelligence, 35(8), 1798–1828.
niques (i.e. artificial neural network, SVR, RF, and LSTM) were imple- Booth, A., Gerding, E., & McGroarty, F. (2014). Automated trading with
mented using coding in Python and their functions were compared performance weighted random forests and seasonality. Expert Systems
with the daily close price of iShares MSCI United Kingdom from Jan- with Applications, 41(8), 3651–3661.

uary 2015 to June 2018. The analysis and summary of results Brax, C. (2000). Recurrent neural networks for time‐series prediction. MSc
dissertation, University of Skövde, Skövde, Sweden.
obtained is as follows.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
The results of the study show that the recurrent network method
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
with an LSTM block functions better in prediction of the close price of
iShares MSCI United Kingdom than the other methods, and then the Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and
Regression Trees. Belmont, CA: Wadsworth.
SVR method has higher precision than neural network and RF. Hence,
Brownlee, J. (2017). Long Short‐Term Memory Networks with Python
the following recommendations can be made:
Develop Sequence Prediction Models with Deep Learning. Machine Learn-
ing Mastery, EBook: Brownlee.
• For predicting the close price of iShares MSCI United Kingdom, the Collobert, R. (2011). Deep learning for efficient discriminative parsing. In G.
deep learning method is better than the other methods tested; Gordon, D. Dunson, & M. Dudík (Eds.), Proceedings of the Fourteenth
therefore, investors, managers, and market analysers are recom- International Conference on Artificial Intelligence and Statistics (pp.
224–232). Fort Lauderdale, FL, USA, vol. 15. PMLR: JMLR W&CP.
mended to use this method. On the other hand, proper prediction
Collobert, R., & Weston, J. (2008). A unified architecture for natural lan-
of price indices could have a significant effect on profitability and
guage processing: deep neural networks with multitask learning. In
making proper decisions on portfolio of investors and stakeholders. Proceedings of the 25th International Conference on Machine Learning
The results of this study could help investment companies and (pp. 160–167). New York: ACM.
other users in evaluation of the profitability in selling and buying Das, S. P., & Padhy, S. (2012). Support vector machines for prediction of
stock. futures prices in Indian stock market. International Journal of Computer
Applications, 41(3), 22–26.
• Researchers are recommended to use combined models such as a
Fisher, I. E., Garnsey, M. R., & Hughes, M. E. (2016). Natural language pro-
combined support vector regression model with a genetic algo-
cessing in accounting, auditing and finance: A synthesis of the
rithm and other combined models taken from machine learning literature with a roadmap for future research. Intelligent Systems in
algorithms. Moreover, for further research, it is recommended to Accounting, Finance and Management, 23(3), 157–214.
10 NIKOU ET AL.

Galeshchuk, S., & Mukherjee, S. (2017). Deep networks for predicting McClelland, J. L., Rumelhart, D. E., & Hinton, G. E. (1986). The appeal of
direction of change in foreign exchange rates. Intelligent Systems in parallel distributed processing. In D. E. Rumelhart, & J. L. McClelland
Accounting, Finance and Management, 24(4), 100–110. (Eds.), Parallel Distributed Processing: Explorations in the Microstructure
Gao, Q. (2016). Stock market forecasting using recurrent neural network. of Cognition, vol. 1: Foundations (pp. 3–44). Cambridge, MA: MIT
Doctoral dissertation, University of Missouri, Columbia, MO. Press.

Gers, F. A., Schmidhuber, J., & Cummins, F. A. (2000). Learning to forget: McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas imma-
Continual prediction with LSTM. Neural Computation, 12, 2451–2471. nent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4),
115–133.
Giovanis, E. (2009). Application of ARCH–GARCH models and feed‐
forward neural networks with Bayesian regularization in capital asset Michalski, R. S., Carbonell, J. G., & Mitchell, T. M. (Eds.) (2013). Machine
pricing model: the case of two stocks in Athens Exchange stock market. Learning: An Artificial Intelligence Approach. Berlin: Springer.
https://doi.org/10.2139/ssrn.1325842 Moshiri, S., & Cameron, N. (2000). Neural network versus econometric
Gomes, L. (2014). Machine‐learning maestro Michael Jordan on the delu- models in forecasting inflation. Journal of Forecasting, 19(3), 201–217.
sions of big data and other huge engineering efforts. IEEE Spectrum, Olshausen, B. A., & Field, D. J. (1996). Emergence of simple‐cell receptive
October 20. https://spectrum.ieee.org/robotics/artificial‐intelligence/ field properties by learning a sparse code for natural images. Nature,
machinelearning‐maestro‐michael‐jordan‐on‐the‐delusions‐of‐big‐ 381(6583), 607–609.
data‐and‐other‐huge‐engineering‐efforts Parot, A., Michell, K., & Kristjanpoller, W. D. (2019). Using artificial neural
Guo, Z., Wang, H., Liu, Q., & Yang, J. (2014). A feature fusion based fore- networks to forecast exchange rate, including VAR–VECM residual
casting model for financial time series. PLoS ONE, 9(6), e101113. analysis and prediction linear combination. Intelligent Systems in
https://doi.org/10.1371/journal.pone.0101113 Accounting, Finance and Management, 26(1), 3–15.
Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques. Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training
Amsterdam: Elsevier. recurrent neural networks. In S. Dasgupta, & D. McAllester (Eds.), Inter-
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical national Conference on Machine Learning (Vol. 28) (pp. III‐1310–III‐
Learning: Data Mining, Inference, and Prediction (pp. 485–585). New 1318). Miami, Florida, USA: PMLR.
York, NY: Springer. Portugal, I., Alencar, P., & Cowan, D. (2018). The use of machine learning
Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Prentice algorithms in recommender systems: A systematic review. Expert Sys-
Hall International: Englewood Cliffs, NJ. tems with Applications, 97, 205–227.

Hill, T., Marquez, L., O'Connor, M., & Remus, W. (1994). Artificial neural Rojas, R. (1996). Neural Networks: A Systematic Introduction (pp. 149–182).
network models for forecasting and decision making. International Jour- Berlin: Springer.
nal of Forecasting, 10(1), 5–15. Rüping, S. (2001). SVM kernels for time series analysis. Technical Report
Hiransha, M., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P. (2018). 2001,43, Universität Dortmund, Sonderforschungsbereich 475.
NSE stock market prediction using deep‐learning models. Procedia Komplexitätsreduktion in Multivariaten Datenstrukturen, Universität
Computer Science, 132, 1351–1362. Dortmund, Dortmund.

Hochreiter, S., & Schmidhuber, J. (1997). Long short‐term memory. Neural Samuel, A. L. (1959). Some studies in machine learning using the game of
Computation, 9(8), 1735–1780. checkers. IBM Journal of Research and Development, 3(3), 210–229.

Huang, W., Nakamori, Y., & Wang, S. Y. (2005). Forecasting stock market Schmidhuber, J. (2015). Deep learning in neural networks: An overview.
movement direction with support vector machine. Computers and Neural Networks, 61, 85–117.
Operations Research, 32(10), 2513–2522. Schwartz, R. A., & Whitcomb, D. K. (1977). Evidence on the presence and
Kaastra, I., & Boyd, M. (1996). Designing a neural network for forecasting causes of serial correlation in market model residuals. Journal of Finan-
financial and economic time series. Neurocomputing, 10(3), 215–236. cial and Quantitative Analysis, 12(2), 291–313.

Kalyvas, E. (2001). Using neural networks and genetic algorithms to predict Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R.
stock market returns. MSc thesis, University of Manchester. (2014). Dropout: A simple way to prevent neural networks from
overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
Kara, Y., Boyacioglu, M. A., & Baykan, Ö. K. (2011). Predicting direction of
stock price index movement using artificial neural networks and sup- Sun, T., & Vasarhelyi, M. A. (2018). Predicting credit card delinquencies: An
port vector machines: The sample of the Istanbul stock exchange. application of deep neural networks. Intelligent Systems in Accounting,
Expert Systems with Applications, 38(5), 5311–5319. Finance and Management, 25(4), 174–189.

Kumar, M., & Thenmozhi, M. (2005). Forecasting stock index movement: a Tan, C. (2009). Financial time series forecasting using improved wavelet
comparison of support vector machines and random forest. In Proceedings neural network. MSc thesis, University of Copenhagen, Copenhagen.
of Ninth Indian Institute of Capital Markets Conference, Mumbai, India. Tay, F. E., & Cao, L. (2001). Application of support vector machines in
Lawrence, R. (1997). Using neural networks to forecast stock market financial time series forecasting. Omega, 29(4), 309–317.
prices. University of Manitoba. https://people.ok.ubc.ca/rlawrenc/ Thomaidis, N. S. (2006). Efficient statistical analysis of financial time‐series
research/Papers/nn.pdf using neural networks and GARCH models. https://ssrn.com/abstract=
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 957887
(7553), 436–444. Tian, G. (2015). Research on stock price prediction based on optimal wave-
Lee, M. C. (2009). Using support vector machine with a hybrid feature let packet transformation and ARIMA–SVR mixed model. Journal of
selection method to the stock trend prediction. Expert Systems with Guizhou University of Finance and Economics, 6(6), 57–69.
Applications, 36(8), 10896–10904. Wang, J. J., Wang, J. Z., Zhang, Z. G., & Guo, S. P. (2012). Stock index fore-
Lu, C. J., Lee, T. S., & Chiu, C. C. (2009). Financial time series forecasting casting based on a hybrid model. Omega, 40(6), 758–766.
using independent component analysis and support vector regression. Wei, Z. (2012). A SVM approach in forecasting the moving direction of
Decision Support Systems, 47(2), 115–125. Chinese stock indices. MSc thesis, Lehigh University, Bethlehem, PA.
NIKOU ET AL. 11

Yim, J. (2002). A comparison of neural networks with time series models (Eds.), Lecture Notes in Computer ScienceAdvances in Neural Net-
for forecasting returns on a stock market index. In T. Hendtlass, & M. works—ISNN 2006 (Vol. 3973) (pp. 452–457). Berlin: Springer.
Ali (Eds.), Developments in Applied Artificial Intelligence: 15th Interna-
tional Conference on Industrial and Engineering Applications of Artificial
Intelligence and Expert Systems IEA/AIE 2002 Cairns, Australia, June How to cite this article: Nikou M, Mansourfar G,
17–20, 2002 Proceedings (pp. 25–35). Berlin: Springer. Bagherzadeh J. Stock price prediction using DEEP learning
Zhang, Z. Y., Shi, C., Zhang, S. L., & Shi, Z. Z. (2006). Stock time series fore- algorithm and its comparison with machine learning algorithms.
casting using support vector machines employing analyst
Intell Sys Acc Fin Mgmt. 2019;1–11. https://doi.org/10.1002/
recommendations. In J. Wang, Z. Yi, J. M. Zurada, B. L. Lu, & H. Yin
isaf.1459

You might also like