Comparing Bitcoin's Prediction Model Using GRU, RNN, and LSTM by Hyperparameter Optimization Grid Search and Random Search
Comparing Bitcoin's Prediction Model Using GRU, RNN, and LSTM by Hyperparameter Optimization Grid Search and Random Search
Comparing Bitcoin's Prediction Model Using GRU, RNN, and LSTM by Hyperparameter Optimization Grid Search and Random Search
Abstract— After its introduction in 2008, the rise in price of of payment must meet the requirements as a means of payment
bitcoin and the popularity of the other crypto money issues a and be recognized by the government. For now,
growing topic about energy usage was used to produce this cryptocurrency digital money does not meet the requirements,
cryptocurrency. Now Bitcoin becoming the most expensive and there is no recognition from the government as a means of
popular cryptocurrency for now, and now the business path and payment, because Bitcoin is a new phenomenon by some
some research community begin to study bitcoin development. people in Indonesia.
Cryptocurrency is a blockchain-based technology that is often
used as a digital currency. One type of Cryptocurrency is Forecasting is really important things to plan in effective
Bitcoin. However, due to the absence of government regulation, and efficient way. There are two types of forecasting, the first
the price of bitcoin has become uncontrollable, resulting in one is subjective and the second one is objective [1]. The
frequent large fluctuations. The GRU, RNN, and LSTM method subjective forecasting method has a qualitative model and the
are methods that frequently used in Forecasting needs. Those objective forecasting method has two models, it called causal
three methods make use of historical data for a certain period in model and time series model. The qualitative model seems
the prediction process. GRU, RNN, and LSTM are considered include some subjective factors in forecasting, this model will
to be the best method to get a prediction, but it also depends on be very handy but only if accurate quantitative data is hard to
the model that computer could have. In this paper GRU, RNN, get.
and LSTM will be compared and also compared by using Grid
Search and Random Search to find out the better one. Time series is when the data collected, recorded or
watched all through time in arrangement with a few time
Keywords—GRU, RNN, LSTM, Random Search, Grid Search periods to know, quarter, month, week and in a few cases days
(key words) or hours. Time series data are analysed to get some patterns of
past event that can be considered to estimate value for the
I. INTRODUCTION
forecast result because by observing time series data, four
In recent years, digital currencies have developed so components will be seen that will affect past and present data
rapidly because the public's need for transactions has patterns that tend to repeat themselves in the future [2].
increased rapidly. This has made various kinds of innovations
appear in online transaction methods. One method that is In neural network, models can be tuned by adding some
currently popular in the digital world is transacting using Hyperparameters to enhanced the model quality.
crypto currency, or Cryptocurrency. This currency has many Hyperparameter optimization can be deciphered as an
kinds with various advantages offered. Bitcoin is one of the optimization issue where the objective is to discover a value
most popular types of Cryptocurrency today. Bitcoin is the that maximizes the execution and yields a wanted show [3].
first cryptocurrency in the world which was introduced by Hyperparameters that being used in this experiment are Grid
Sathoshi Nakamoto in 2008. Search and Random search because both of them are
frequently used to tune the model, so to find which one would
The bitcoin itself was first mentioned by two programmers fit better to the model given with GRU, RNN, or the LSTM.
in August 2008 named Satoshi Nakamoto and his friend
Martti Malmi try to make a new domain called bitcoin.org. On II. RELATED WORK
the same year precisely on October, Nakamoto publish a The following is a summary of several similar
document (paper) with title “Bitcoin: Peer-to-Peer Electronic studies that have been conducted before by several other
Money System”. At the previous few months, Nakamoto and authors:
a bound of researchers already proposed another version of the 1. Yiying, Wang. & Yeze, Zang. 2019 [4]. This
same concept in a forums and email threads. But in 2008, research applies ANN and LSTM as algorithms to
anything become together. predict cryptocurrency, especially bitcoin. The
Cryptocurrency has several advantages and disadvantages results of his research indicate that the two
if you want to use it as a currency, for now there is no clear methods show fairly good results in predicting.
and definite law to regulate the circulation of digital currency However, in the memory analysis of the model
(cryptocurrency) such as bitcoin for example. If there is an made, ANN relies more on long-term history while
abuse of money in digital form such as fraud, money laundry, LSTM tends to rely on short-term history.
or other criminal acts, the institution that will be responsible 2. Phaladisailoed, Thearasak. & Numnonda,
will not exist. In addition, money that can be used as a means Thanisa. 2018 [5]. Using the RNN, Huber
The 9th International Conference on Cyber and IT Service Management (CITSM 2021)
Bengkulu, September 22-23, 2021
Regression, Theil-Sen Regression and LSTM obligatory pace and it incorporates strategies just like
methods to compare which model is the best for integration, normalization, cleaning and transformation [7].
predicting bitcoin prices. From many predictive Because the dataset that the author got can be categorized as
features such as Open, Close, High, and Low, the quite good and there is no dirty data or defective data, then
prediction results show that all GRU methods are for the preprocessing data we only make the data sorted by
better than regression. However, the model that is date in ascending manner. Because the data received shows
made also depends on the parameters made, its stack shape, every new data is placed at the top (row 1). If
because it greatly affects the prediction results. left unchecked, the time series plotting will start from the
3. Shewalkar, Apeksha., Nyavanandi, Deepika., & most recent data to the first data input. Therefore, the data is
A. Ludwig, Simone. 2019 [6]. Implementing changed in ascending order.
GRU, LSTM, and RNN in Speech Recognition
applications. This is done due to the development C. Splitting Data
of a feedforward neural network that is no longer Data sharing is done by dividing the time span into 2
able to handle speech data properly. This study parts. First, for training data, the time span taken is the first
compares the performance of GRU, LSTM, and 1000 data. Then the validation data taken, the remaining
RNN using the reduced TED-LIUM speech about 356 data.
dataset. These results indicate that the GRU and
LSTM methods have similar results, but slightly D. GRU
higher for LSTMs. However, the training time for Gated Recurrent Unit (GRU) is a one step forward
LSTMs is longer with high accuracy. Therefore improvisation demonstrate on Long Short-Term Memory
this journal concludes that using GRU is better for (LSTM). But the GRU's victory is fundamentally because of
speech recognition using the reduced TED-LIUM the gating arrange signaling that control that will display
speech dataset. input and past memory are utilized to overhaul the current
actuation and create the current state [8]. GRU’s cell create a
III. PROBLEM FORMULATION
number of changes over LSTM cells on combining the Input-
The research method uses a machine learning approach Gate and Forget-Gate as Update-Gate, and it comprises of
using Tensorflow Keras and directly chooses the variable two doors, Update-Gate and Reset-Gate.
“Close” to affect forecasting results. Then enter the general
stages in the data forecasting process such as data collection,
data preprocessing, splitting data into training data and
validation data, creating data models, and predicting models.
A. Data Acquisition
The data we use is time series data obtained from various
websites of historical data providers with Crypto Exchange
(providers of cryptocurrency buying or selling), namely from
https://www.cryptodatadownload.com. The data that was
downloaded came from Exchange Gemini with a span of 17
August 2017 to 13 April 2021. In other words, we took the
dataset from the original data in the field. Figure 1. Gated Recurrent Unit cell
Because the application of forecasting uses time series
data, the authors use all existing datasets without trimming The relationship between input and output can be
them first, because the results of the Prediction model rely on described by,
the historical data from the past. 𝑧𝑡 = 𝜎(𝑊𝑧 . [ℎ𝑡−1 , 𝑥𝑡 ])
The data that we acquire can be said to be non-defective 𝑟𝑡 = 𝜎(𝑊𝑟 . [ℎ𝑡−1 , 𝑥𝑡 ])
or dirty. because the value of the time span and the ̃ℎ𝑡 = 𝑡𝑎𝑛ℎ(𝑊. [𝑟𝑡 × ℎ𝑡−1 , 𝑥𝑡 ])
completeness of the data is complete. There are no null values ℎ𝑡 = (1 − 𝑧𝑡 ) × ℎ𝑡−1 + 𝑧1 × ℎ̃𝑡
or extreme outliers in each of the data indexes.
where t, z, and tr are the output of Update-Gate and Reset
B. Data Preprocessing Gate, Wr and Wz are the weights of the Reset-Gate and
Raw information ordinarily comes with numerous flaws Update-Gate; σ (.) and tanh (.) are Sigmoid and Hyperbolic
such as irregularities, lost values, commotion and/or Tangent functions. Reset-Gate do some “capture” on the
redundancies. Execution of ensuing learning calculations will short-term dependencies at the sequence data and Update-
in this way be undermined in the event that they were Gate assist to obtain long-term dependencies
visualized with low-quality data. In this way by doing some correspondingly.
appropriate preprocessing procedure, we may ready to
essentially impact the quality and unwavering quality of E. RNN
consequent programmed disclosures and choices. In a conventional neural network, The inputs are
Preprocessing data is pointed at changing crude input to independent from another and every input has a single layer
become high-quality one that appropriately match the mining unit to take. In any case, the arrangement of input needs to be
prepare to take after. Preparation will be considered as a considered. On the other hand RNN (Recurrent Neural
Network) only has a single unit for each input with
The 9th International Conference on Cyber and IT Service Management (CITSM 2021)
Bengkulu, September 22-23, 2021
current being dependent on past computation[9]. So RNNs we make the value absolute to not cancel each other out.
are considered to have a memory which captures almost all Repeat the process until n instance, after that we can divide it
data that has been calculated. by n [11].
IV. PROBLEM SOLUTION
Now, to make a better prediction model, we will apply
Hyperparameter Optimization to our model. As previously
stated, the two Hyperparameter Optimization that we will use
are Grid Search and Random Search. Each method that we
tested was applied one by one Hyperparameter Optimization.
In other words, for the GRU method we will apply the Grid
Search, Random Search, and without Hyperparameter
Figure 2. Illustration of Recurrent Neural Network
Optimization respectively. That way, we get 9 comparisons
The picture above explained a RNN being unfolded into that we will get later. But first, we must know what Grid
a full network. By unfolding we mean that we compose out Search and Random Search are.
the organize for the total grouping. A. Grid Search
F. LSTM Grid search is a Hyperparameter Optimization technique
that chooses a value one by one on the hyperparameter space
Unlike conventional RNNs, an LSTM (Long Short Term
and tries it on the focused-on algorithm [12]. Grid search may
Memory) network is better at memorizing the past data or
be a tuning method that endeavors to compute the ideal values
computation because there are more gates added to adjust
of hyperparameters. It is a comprehensive search that's
vanishing gradient or exploding gradient which occur in RNN
performed on the particular parameter values of a model. The
when backpropagation optimization[10].
model is additionally known as an estimator.
B. Random Search
Different from grid search, random search chooses a
random value in the hyperparameter space based on a
probability distribution with a fixed iteration [12]. Random
search is incredible for disclosure and getting hyperparameter
combinations that you just would not have speculated
naturally, in spite of the fact that it regularly requires more
time to execute. Since it characterizes a search space as a
bounded domain of hyperparameter values and arbitrarily
Figure 3. Long Short Term Memory Cell sample points in that domain.
value where y is actual value and λ(x ) is predicted value, then 2. RNN/GRU/LSTM Layer
i i
The 9th International Conference on Cyber and IT Service Management (CITSM 2021)
Bengkulu, September 22-23, 2021
3. RNN/GRU/LSTM Layer
4. RNN/GRU/LSTM Layer
5. Dense Layer
Figure 10. Training of RNN, GRU, and LSTM with Random Search
0.0042529935 0.061953168649
RNN
18323228 061204
Random 0.0045455186 0.083985426468
GRU
Search 871690995 7884
0.0042654447 0.096663039026
LSTM
88501635 69338
0.0749354925 0.393629119318
RNN
8039695 4623
Without
0.0039555295 0.069617547917
Hyper GRU
3871398 52854
Param
0.0042699575 0.089852898359
LSTM
87696327 46308
Tabel 1. MAE result on all models
Management (ICIM), 2019, pp. 97-101, doi: [9] X. Zhang and T. Luo, "A RNN Decoder for Channel Decoding under
10.1109/INFOMAN.2019.8714700. Correlated Noise," 2019 IEEE/CIC International Conference on
[5] T. Phaladisailoed and T. Numnonda, "Machine Learning Models Communications Workshops in China (ICCC Workshops), 2019, pp.
Comparison for Bitcoin Price Prediction," 2018 10th International 30-35, doi: 10.1109/ICCChinaW.2019.8849949.
Conference on Information Technology and Electrical Engineering [10] J. Li and Y. Shen, "Image describing based on bidirectional LSTM and
(ICITEE), 2018, pp. 506-511, doi: 10.1109/ICITEED.2018.8534911. improved sequence sampling," 2017 IEEE 2nd International
[6] Shewalkar, Apeksha & Nyavanandi, Deepika & Ludwig, Simone. Conference on Big Data Analysis (ICBDA), 2017, pp. 735-739, doi:
(2019). Performance Evaluation of Deep Neural Networks Applied to 10.1109/ICBDA.2017.8078733.
Speech Recognition: RNN, LSTM and GRU. Journal of Artificial [11] Mean Absolute Error. In: Sammut C., Webb G.I. (eds) Encyclopedia
Intelligence and Soft Computing Research. 9. 235-245. 10.2478/jaiscr- of Machine Learning. Springer, Boston, MA. 2011.
2019-0006. https://doi.org/10.1007/978-0-387-30164-8_525
[7] Sergio Ramírez-Gallego, Bartosz Krawczyk, Salvador García, Michał [12] Simon Chan, Philip Treleaven, Chapter 5 - Continuous Model
Woźniak, Francisco Herrera, A survey on data preprocessing for data Selection for Large-Scale Recommender Systems, Handbook of
stream mining: Current status and future directions, Neurocomputing, Statistics, Elsevier, Volume 33, 2015, ISSN 0169-7161, ISBN
Volume 239, 2017, ISSN 0925-2312, 9780444634924, https://doi.org/10.1016/B978-0-444-63492-4.00005-
https://doi.org/10.1016/j.neucom.2017.01.078. 8.
[8] R. Li, J. Hu and S. Yang, "Deep Gated Recurrent Unit Convolution [13] Kerautret, B., Colom, M., Lopresti, D., Monasse, P., & Talbot, H.
Network for Radio Signal Recognition," 2019 IEEE 19th International (Eds.). (2019). Reproducible Research in Pattern Recognition. Lecture
Conference on Communication Technology (ICCT), 2019, pp. 159- Notes in Computer Science. doi:10.1007/978-3-030-23987-9
163, doi: 10.1109/ICCT46805.2019.8947225.