Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
High-Strain-Rate Compressive Behavior of UHMWPE Fiber Laminate
Next Article in Special Issue
Flexible Fashion Product Retrieval Using Multimodality-Based Deep Learning
Previous Article in Journal
Study of Lead Rubber Bearings for Vibration Reduction in High-Tech Factories
Previous Article in Special Issue
Detecting and Localizing Dents on Vehicle Bodies Using Region-Based Convolutional Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DERN: Deep Ensemble Learning Model for Short- and Long-Term Prediction of Baltic Dry Index

1
Department of Big Data, Pusan National University, Busan 46241, Korea
2
Department of Industrial Engineering, Pusan National University, Busan 46241, Korea
3
Korea Maritime Institute, Busan 49111, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(4), 1504; https://doi.org/10.3390/app10041504
Submission received: 8 January 2020 / Revised: 16 February 2020 / Accepted: 20 February 2020 / Published: 22 February 2020
(This article belongs to the Special Issue Advances in Deep Learning Ⅱ)

Abstract

:
The Baltic Dry Index (BDI) is a commonly utilized indicator of global shipping and trade activity. It influences stakeholders’ and ship-owners’ decisions respecting investments, chartering, operational plans, and export and import activities. Accurate prediction of the BDI is very challenging due to its volatility, non-stationarity, and complexity. To help stakeholders and ship-owners make sound short- and long-term maritime business decisions and avoid market risk, we performed short- and long-term predictions of BDI using an ensemble deep-learning approach. In this study, we propose to apply recurrent neural network models for BDI prediction. The state-of-the-art of sequential deep-learning models such as RNN, LSTM, and GRU are employed to predict one- and multi-step-ahead BDI values. In order to increase the accuracy, we assemble the models. In experiments, we compared our results with those of traditional methods such as ARIMA and MLP. The results showed that our proposed method outperforms ARIMA, MLP, RNN, LSTM, and GRU in both short- and long-term prediction of BDI.

1. Introduction

The Baltic Dry Index (BDI) is a freight index created by the London-based Baltic Exchange. This index indicates shipment costs for dry bulk cargoes consisting of commodities such as grain, coal, iron, ore, and copper. The BDI is a composite of three sub-indices, namely Capesize, Panamax, and Supramax. Those indices have different bulk-carrier capacities, 180,000, 74,000, and 58,000 dwt, respectively. The BDI has been widely used as a world-trade economic indicator [1]. Many stakeholders make serious efforts to forecast it, precisely, so as to be able to make smart investment and trading decisions. However, the volatility, non-stationarity, and complexity of the BDI is known to be more intractable than stock prices. Therefore, it is a challenging task to perform predictions against BDI values.
The BDI is regarded as a barometer not only of the shipping industry and international trade, but also of the global economy [2]. Investors, speculators, and researchers have long found it to be useful, theoretically challenging, and relevant when projecting future profits. However, because many managerial decisions are based on future prospects, forecasting accuracy is essential for organizations and companies in order to avoid market risk. Recent advances in both analytical and computational methods have resulted in a number of new ways of mining freight-index time-series data.
Ship owners, stakeholders and investors need to be concerned about not only short-term prediction of time-series data but also long-term prediction. Predicting a long-term sequence of time-series data is more difficult than short-term prediction [3]. For example, in making a decision on a vessel, there are multiple options available to the vessel’s owners. If the BDI trend is increasing, the vessel’s owners wait for the right time to sell the vessel to get the maximum profit. Instead, they will immediately sell it if the BDI index tends to decline. In other cases, if the BDI index has an upward trend, the vessel’s owner will not rent the vessel to the market. Instead, they will operate the ship by themselves. Conversely, if the BDI trend decreases, they will rent the vessel. Therefore, in this study, we developed an analytic method for accurate prediction of short- and long-term values of the BDI for helping vessel’s owners.
The rest of this paper is organized as follows. Section 2 provides the background of our research and discusses the relevant BDI-related work. Section 3 introduces our proposed method. Section 4 analyzes the experimental results and compares them against econometrics and machine learning methods. Finally, Section 5 draws conclusions and anticipates future work.

2. Background

In this section, we present the background on our BDI-related research and discuss and some time-series prediction models available in the literature.

2.1. Related Works

Cullinane et al. [4] were the pioneers in conducting research on BDI prediction using the ARIMA model. In the past several years, there has been some research done on BDI prediction. Cho and Lin used a fuzzy neural network model to analyze and forecast BDI [5], and Kamal et al. [6] forecast BDI as a high-dimensional multivariate regression problem by using deep neural networks. Sahin et al. [7] predicted one-step-ahead BDI values by their proposed three artificial neural networks, specifically a univariate model and two bivariate models, by harnessing historical BDI data and the world price of crude oil. Qingcheng et al. [8] proposed a decomposition technique for BDI data, and then used a neural network for prediction. Zhang et al. [9] compared econometric models such as ARIMA and GARCH with artificial neural network models such as BPNN, RBFNN, and ELM. The majority of the previous research has treated BDI prediction as regression and short-term prediction tasks. In the present study, by contrast, we conducted research on both short- and long-term prediction of BDI to facilitate ship-owners’ short- and long-term decision-making. In addition, in terms of models, the majority of the previous studies have harnessed artificial neural network models and statistical models. Note that in terms of sequential learning for prediction of time-series data, the recurrent neural network is the state-of-the-art method. Moreover, nowadays, deep learning with a deep architecture is a promising approach for accurate prediction.
Over the course of the past few decades, there have been many outstanding approaches to the prediction of time-series data, such as ARIMA [10], Support Vector Regressor (SVR) [11], fuzzy systems [12], and deep learning [13,14,15,16]. Nonetheless, by those methods alone, accurate prediction of real data is unobtainable, since real, time-series data commonly is volatile and non-stationary. For enhanced prediction, some researchers have proposed data transformation [17], decomposition [18,19,20], and even ensemble methods [21,22]. In the present study, we harnessed and combined deep-learning approaches including a deep recurrent neural network (Deep RNN), a long-short-term memory network (LSTM) and a gated rectified unit neural network (GRU) to obtain more accurate prediction results.

2.2. Sequential Model

Recurrent Neural Network (RNN) is a type of neural network with loops that allow for retention of information from the past. Specifically, the loops enable RNN to use information from past time slices to produce output for the current time slice t. Thus, we can say that the decision made at a time slice t 1 affects the decision to be made at t. Therefore, the response of the network to the new data depends on the current input as well as the output from the recent past data. The RNN output calculation is based on iterative calculation of the output of the following two equations:
h t = H ( W x h x t + W h h h t 1 + b h )
y t = w h y h t + b y
In Equations (1) and (2), x t is the input sequence, y t is the output sequence, and h t represents the hidden vector sequence at time slice t (t = 1, 2, …, T); W and b represent weight matrices and biases, respectively; and lastly, H is an activation function used for the hidden layer. The back-propagation through time (BPTT) technique is usually used to train RNNs [23]. However, it is difficult to use BPTT to train traditional RNNs, due to the gradient-vanishing and exploding problem [24]. Errors from later time steps are difficult to propagate back to previous time steps for proper updates of network parameters. To address this problem, the long short-term memory (LSTM) unit has been developed [25].
LSTM is a special type of recurrent neural network with memory cells. These memory cells are the essential component for handling of long-term temporal dependencies in the data. LSTM has the option to add or delete information from this cell state. This operation is done by special structures in LSTM, which are called gates. The three types of gate are input gate ( i t ), forget gate ( f t ), and output gate ( o t ), shown in Equations (3) to (8). C t ˜ is a “candidate” hidden state that is computed based on the current input and the previous hidden state. C t is the internal memory of the unit. It is a combination of the previous memory, as multiplied by the forget gate, and the newly computed hidden state, as multiplied by the input gate. h t is the output hidden state, and is computed by multiplying the memory by the output gate [26].
h t = σ ( W i · [ h t 1 , x t ] + b i )
h t = σ ( W f · [ h t 1 , x t ] + b f )
o t = σ ( W o · [ h t 1 , x t ] + b o )
C t ˜ = tanh ( W c · [ h t 1 , x t ] + b c )
C t = f t * C t 1 + i t * C t ˜
h t = o t * tanh ( C t )
GRU is another type of RNN with memory cells [27]. It is similar to LSTM but with a simpler cell architecture. GRU also has a gating mechanism to control the flow of information through the cell state, but has fewer parameters and does not contain an output gate. It consists of two gates, r being a reset gate, and z an update gate. The reset gate regulates the flow of new input to the previous memory, and the update gate determines how much of the previous memory to keep. The following equations are used in the GRU output calculations:
z t = σ ( W x z x t + W h z h t 1 + b z )
r t = σ ( W x r x t + W h r h t 1 + b r )
h t ˜ = tanh ( W x h x t + W h h ( r t h t 1 ) + b h )
h t ˜ = z t h t 1 + ( 1 z t ) h t ˜
In previous studies [28,29], it has been noted that GRU is comparable to, or even outperforms, the LSTM. Regarding the obtainment of high accuracy in prediction of the BDI, in this study, we combined RNN, LSTM, and GRU into an ensemble method. The idea was to combine the predictions from multiple different sequential models. Each model has different strength and weakness, meaning that its predictions are better than any other in a certain condition. Importantly, the models must be good in different ways: they must make different prediction errors. In addition to reducing the variance in the prediction, our ensemble can also result in better predictions than any single best model. For instance, Krizhevsky et al. used model averaging across multiple well-performing CNN models to achieve outstanding results [30].

3. Method

In this section, our data pre-processing technique, followed by the system design of our proposed method and the metric measure used to assess the accuracy of our approach are explained.

3.1. Data Pre-Processing and Analysis

The BDI data plotted in Figure 1a shows a sharp increase and a dramatic decrease between 2007 and 2008. Therefore, we pre-processed the data in a way to make it more stationary. By using a decomposition technique, the BDI data is separated into three components: trend, seasonality, and noise, as depicted in Figure 1c. The trend can be observed as increasing or decreasing the trend value in the time-series data; however, the BDI data does not show any significant increasing or decreasing trend, but rather a peak in 2008 and two long tails. The seasonality repeats the short-term cycle in the BDI, and the noise corresponds to random variation in the series.
Due to the complexity of the bulk shipping market and the non-linear nature of freight rates series [31], in this study, some data transformation techniques, including difference transform, power transform, log transform, standardization and normalization, were employed. As indicated in Table 1, for each transformation technique, we conducted a Dickey-Fuller stationary test to ensure its effectiveness.
Normalization is a rescaling of data so that all values are within a certain range. As shown in Equations (1), (8) and (11), the value range of the t a n h function is between −1 and 1; therefore, we shrunk the BDI to this range. Different from normalization, the standardization technique rescales the dataset according to the distribution of values so that the mean of the observed values is 0 and the standard deviation is 1. Further, simple transformation techniques such as power transform and log transform are performed. The 1st difference transform applied to a time series x creates a new series z whose value at time t is the difference between x ( t + 1 ) and x ( t ) . This method works very well in removing trends and cycles. As shown in Table 1, `1st difference transform’ results in the smallest p-value, which indicates that it generates more stationary time-series data than the original data. Therefore, in this research, we transformed the BDI data into a more stationary form by using `1st difference transform’, before feeding it into the model. The 1st-difference-transformed data is plotted in Figure 1b.

3.2. Deep Ensemble Recurrent Network

The design system of our approach is depicted in Figure 2, and the training process is expressed in Algorithm 1. Firstly, the pre-processed data is independently trained by Deep RNN, LSTM, and GRU. After those models converge in learning BDI data, in the testing phase, each of the predictions of RNN, LSTM, and GRU, represented as P 1 , P 2 and P 3 , are summarized using weighting w p 1 , w p 2 and w p 3 , respectively, to predict the P f that represents the next value of the BDI. The values of w p 1 , w p 2 and w p 3 are learned in supervised learning using basic forward- and back-propagation techniques as in the neural network. As explained in Section 2.2, all of them are powerful sequence models. Deep RNN is the best recurrent model, which is able to learn time dependency to predict the next value; however, it cannot be combined with long-term dependency. LSTM is equipped with complex cell memory to handle long-term dependency. As for GRU, whereas it harnesses cell memory as well, it uses a simpler cell than LSTM. Therefore, for obtainment of more accurate results, we combined all of those models in an ensemble called deep ensemble recurrent network (DERN). The purpose behind this was to minimize the error rate, since in terms of memory cells, RNN is the simplest (under-fit) model, while GRU is a simpler model than LSTM albeit more complex (over-fit) than RNN, which complexity can be denoted as RNN < GRU < LSTM. Moreover, in machine-learning theory, there is no method that is universally better than any other method (the “no free lunch” theorem), and each method may make mistakes in different facets of operation. Stacking of multiple different sequential models may lead to performance improvement over individual models. The multi-model ensemble is a technique by which the predictions of the collection of models are given as inputs to a second-stage learning model. The second-stage model is trained to combine the predictions from the first-stage models optimally in order to obtain a final set of predictions.
The mechanism that we used to make short- and long-term predictions in the present study is depicted in Figure 3. Figure 3a is one-step-ahead (short-term) prediction model. Therefore, we transform the univariate time-series data of BDI into x i as a predicted variable and x i + 1 as a response variable. Afterwards, to predict the x i + 2 , x i + 1 is appended to the training data. In Figure 3b meanwhile, the model predicts multi-values of BDI at a time; thus, it predicts multiple sequences from x i + 1 to x n at once, where n is the number of step predictions. The challenging part of this technique is creating a model that performs long-sequence prediction at once.
Algorithm 1: DERN
Applsci 10 01504 i001
We conducted some extensive experiments to decide the hyper-parameters of our models. After some trials and errors, the optimal architecture of RNN, LSTM, and GRU are described in Table 2. We utilized two stacked of recurrent layers, such as two-layer stacked of RNN, LSTM, and GRU layer in RNN, LSTM, and GRU model, respectively. Each layer consist of 500 hidden unites with 20% dropout units and tanh as gate activation. We set the output of the recurrent layers to return a sequence. The length of the sequence corresponds to the number of steps of prediction. For instance, if the task is a five-steps-ahead prediction, the sequence length is five. The sequence will be wrapped by one TimeDistributed layer with the number of hidden units correspond to the sequence length, and will be activated by Sigmoid function. Each model is trained independently until converge by using Mean Square Error (MSE) as a loss function and Adam as an optimizer. In the ensemble layer, the output of each model is trained by a standard neural network with one dense layer, Sigmoid activation function, and Adam optimizer is employed to decide the final prediction.
To assess how well our prediction predicted short- and long-term BDI, Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Error (MAPE), denoted in Equations (13)–(15), respectively, were employed, where n corresponds to the number of data, y t is the actual value of BDI at time t, and y ^ corresponds to the predicted BDI at time t.
R M S E = 1 n t = 1 n ( y t y t ^ ) 2
M A E = 1 n t = 1 n ( y t y t ^ )
M A P E = 100 % n t = 1 n ( y t y t ^ ) y t

4. Experiments

The original BDI data is stored on a daily basis starting from November 1999 and extending to February 2018. To make our prediction simpler in performing long-term prediction, we sampled the data on a weekly basis by using average values. The summary statistics of BDI data is shown in Table 3. In the experiments, the data-shuffling technique was not implemented; instead, we employed a sliding window technique to split our training and testing data. The data was divided into 70–30%, 80–20% and 90–10% slices for training and testing. We compared the results with deep-learning models such as Deep RNN, LSTM, and GRU. Further, we compared our proposed method with ARIMA and Multi-Layer Perceptron (MLP). To obtain the best parameters and architectures, the grid search technique was employed.

4.1. Short-Term Prediction

Short-term prediction finds one-step-ahead BDI data; the predicting mechanism follows the model in Figure 3a, due to the fact that the data is weekly. Therefore, it predicts one week ahead of the BDI. Figure 4 depicts the prediction of BDI in the testing phase using ARIMA, MLP, LSTM, Deep RNN, GRU, and DERN. As we can see, all of them approximately predict the BDI correctly. In this case, the worst method is the ARIMA model. As shown in Table 4, the ARIMA error rate is three times higher than DERN in terms of RMSE. Notice that by changing the portion of training data, the error rate is uncertain since the BDI data is volatile, as indicated in Table 3: the swings the data takes relative to the variance are enormously increased in the interval from 25 to 75%.
Table 4 shows that DEEP RNN, LSTM, and GRU roughly have a similar error rate. GRU slightly outperforms LSTM, while LSTM outperforms Deep RNN. This was due to GRU and LSTM having a cell memory gate to handle long-term dependency. It is known that GRU and LSTM have the same mechanism for effective tracking of long-term dependencies while mitigating the vanishing/exploding gradient problems. The LSTM uses more complex gates than GRU. Therefore, in this case, the LSTM model, relative to GRU, tended to over-fit more. The Figure 5 shows that DERN is the nearest result to the original data, and this is an average value among RNN, GRU, and LSTM, due to our having combined them into an ensemble using a weighted technique. Moreover, in its short-term prediction of BDI, our approach outperforms the previous, Artificial Neural Network (ANN) model, for which the average of MAPE was never lower than five [5]. In the ensemble layer, we obtain the weight w P 1 , w P 2 , and w P 3 of RNN, LSTM and GRU, respectively. The value of w P 1 , w P 2 , and w P 3 are 0.337, 0.330, and 0.333, respectively. We can infer that in short-term prediction, each of the models has an approximately equal effect to the prediction value.

4.2. Long-Term Prediction

In long-term prediction, we try to predict the BDI more than one step ahead. In the present study, value three, five and seven weeks ahead of BDI were predicted. The experiment showed that long-term prediction resulted in a higher error rate than short-term prediction in terms of RMSE, MAE, and MAPE; therefore, we conducted the experiment up to seven-steps-ahead prediction. The experimental results of the long-term prediction are shown in Table 5, Table 6 and Table 7 for three-, five- and seven-steps-ahead prediction of BDI. As indicated in all of those tables, DERN obtained the best error rate among the methods. In this study, ARIMA failed to predict long-term BDI, its average error rate being more than seven times higher than that of our proposed method. A visualization of the comparison of RNN, LSTM, and GRU with DERN is provided in Figure 6, Figure 7 and Figure 8 for three-, five-, and seven-ahead prediction of BDI, respectively. Notice that the error rate trend is increasing over time. Even though the error rate grows with the increasing number of steps, the models follow the trend of the testing data. The overall error rate averages in terms of RMSE, MAE and MAPE, respectively, are plotted in Figure 9. Note that ARIMA was omitted due to its large error. From the data, we could infer that one-step-ahead prediction results in a much lower error rate than that of long-term prediction. Therefore, long-term prediction is more challenging than short-term prediction; nonetheless, ship-owners and stakeholders commonly are more interested in long-term prediction. The weight of ensemble layer in three-steps-ahead prediction are 0.281, 0.357, and 0.362 for RNN ( w P 1 ), LSTM ( w P 2 ), and GRU ( w P 3 ) respectively. In the five-steps-ahead prediction is 0.300, 0.350, and 0.350 for w P 1 , w P 2 , and w P 3 , respectively. While in seven-steps-ahead prediction are 0.294, 0.362, and 0.344 for w P 1 , w P 2 , and w P 3 , respectively. Unlike in short-term prediction, RNN has a smaller effect on the final prediction, while LSTM and RNN are approximately the same effects to it.

5. Conclusions and Outlook

The Baltic Dry Index (BDI) is a parameter representative of international shipping activities. It is an essential tool with which ship-owners and stakeholders plan their maritime businesses and avoid market risk. Unlike common time-series data, the BDI index is characterized by volatility, non-stationarity, and complexity; therefore, its prediction is very challenging indeed. In previous work, most researchers have used the Artificial Neural Networks (ANN) and statistical models. In keeping with the popularity of deep learning in this decade, in this paper, we propose a deep-learning approach whereby the deep sequential models (RNN, LSTM, and GRU) are combined in an ensemble called Deep Ensemble Recurrent Networks (DERN) for accurate prediction of short- and long-term BDI. In short-term prediction using the RMSE indicator, DERN had an error rate roughly a half of GRU, LSTM, Deep RNN and MLP, and approximately a third of ARIMA. However, in long-term prediction, the error rate was not as good as the short-term prediction. Specifically, the results showed that with increasing prediction steps, the error rate grew. Therefore, the long-term prediction is more challenging than short-term prediction. Nonetheless, DERN still outperforms the conventional methods in long-term prediction. Ship-owners and stakeholders, not to mention investors, prevalently are more interested in long-term prediction. In future work, we will propose a more fine-grained approach entailing sequence-to-sequence learning for more accurate long-term prediction.

Author Contributions

Conceptualization, I.M.K. and H.B.; Data curation, I.M.K., S.S. and H.Y.; Formal analysis, I.M.K., H.B., S.S. and H.S.; Methodology, I.M.K.; Funding acqusition, H.Y.; Investigation, S.S.; Supervision, H.B. and H.Y.; writing—original draft, I.M.K.; Writing—review & editing, H.B. and H.Y.; Project administrator, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by principal cooperative research of KMI (the Korea Maritime Institute) and the project titled `Development of IoT Infrastructure Technology for Smart Port’ funded by the Ministry of Oceans and Fisheries, Korea.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bakshi, G.; Panayotov, G.; Skoulakis, G. The Baltic Dry Index as a Predictor of Global Stock Returns, Commodity Returns, and Global Economic Activity. In Proceedings of the AFA 2012 Chicago Meetings; Austin, TX, USA, 2011. [Google Scholar] [CrossRef] [Green Version]
  2. Jacks, D.S.; Pendakur, K. Global Trade and the Maritime Transport Revolution. Rev. Econ. Stat. 2010, 92, 745–755. [Google Scholar] [CrossRef]
  3. Sovilj, D.; Sorjamaa, A.; Yu, Q.; Miche, Y.; Séverin, E. OPELM and OPKNN in long-term prediction of time series using projected input data. Neurocomputing 2010, 73, 1976–1986. [Google Scholar] [CrossRef] [Green Version]
  4. Cullinane, K.; Cape, M.K. A Comparison of Models for Forecasting the Baltic Freight Index: Box-Jenkins Revisited. Int. J. Marit. Econ. 1999, 1, 15–39. [Google Scholar] [CrossRef]
  5. Chou, C.; Lin, K.S. A Fuzzy Neural Network Model for Analysing Baltic Dry Index in the Bulk Maritime Industry. Int. J. Marit. Eng. 2017, 159. [Google Scholar] [CrossRef]
  6. Kamal, I.M.; Bae, H.; Sim, S.; Kim, H.; Kim, D.; Choi, Y.; Yun, H. Forecasting High-Dimensional Multivariate Regression of Baltic Dry Index (BDI) using Deep Neural Networks (DNN). ICIC Express Lett. 2019, 13, 427–434. [Google Scholar] [CrossRef]
  7. Sahin, B.; Gürgen, S.; Ünver, B.; Altin, I. Forecasting the Baltic Dry Index by using an artificial neural network approach. Turk. J. Electr. Eng. Comput. Sci. 2018, 26, 1673–1684. [Google Scholar] [CrossRef]
  8. Zeng, Q.; Qu, C.; Ng, A.K.; Zhao, X. A new approach for Baltic Dry Index forecasting based on empirical mode decomposition and neural networks. Marit. Econ. Logist. 2016, 18, 192–210. [Google Scholar] [CrossRef]
  9. Zhang, X.; Xue, T.; Eugene Stanley, H. Comparison of Econometric Models and Artificial Neural Networks Algorithms for the Prediction of Baltic Dry Index. IEEE Access 2019, 7, 1647–1657. [Google Scholar] [CrossRef]
  10. Ohyver, M.; Pudjihastuti, H. Arima Model for Forecasting the Price of Medium Quality Rice to Anticipate Price Fluctuations. Procedia Comput. Sci. 2018, 135, 707–711. [Google Scholar] [CrossRef]
  11. Henrique, B.M.; Sobreiro, V.A.; Kimura, H. Stock price prediction using support vector regression on daily and up to the minute prices. J. Financ. Data Sci. 2018, 4, 183–201. [Google Scholar] [CrossRef]
  12. Yolcu, O.C.; Lam, H.K. A combined robust fuzzy time series method for prediction of time series. Neurocomputing 2017, 247, 87–101. [Google Scholar] [CrossRef] [Green Version]
  13. Sagheer, A.; Kotb, M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing 2019, 323, 203–213. [Google Scholar] [CrossRef]
  14. Sun, X.; Li, T.; Li, Q.; Huang, Y.; Li, Y. Deep belief echo-state network and its application to time series prediction. Knowl. Based Syst. 2017, 130, 17–29. [Google Scholar] [CrossRef]
  15. Wang, L.; Wang, Z.; Qu, H.; Liu, S. Optimal Forecast Combination Based on Neural Networks for Time Series Forecasting. Appl. Soft Comput. 2018, 66, 1–17. [Google Scholar] [CrossRef]
  16. Tealab, A. Time series forecasting using artificial neural networks methodologies: A systematic review. Future Comput. Inform. J. 2018, 3, 334–340. [Google Scholar] [CrossRef]
  17. Salles, R.; Belloze, K.; Porto, F.; Gonzalez, P.H.; Ogasawara, E. Nonstationary time series transformation methods: An experimental review. Knowl. Based Syst. 2019, 164, 274–291. [Google Scholar] [CrossRef]
  18. Mabrouk, A.B.; Abdallah, N.B.; Dhifaoui, Z. Wavelet decomposition and autoregressive model for time series prediction. Appl. Math. Comput. 2008, 199, 334–340. [Google Scholar] [CrossRef]
  19. Joo, T.W.; Kim, S.B. Time series forecasting based on wavelet filtering. Expert Syst. Appl. 2015, 42, 3868–3874. [Google Scholar] [CrossRef]
  20. Godfrey, L.B.; Gashler, M.S. Neural Decomposition of Time-Series Data for Effective Generalization. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2973–2985. [Google Scholar] [CrossRef] [Green Version]
  21. Galicia, A.; Talavera-Llames, R.; Troncoso, A.; Koprinska, I.; Martinez-Alvarez, F. Multi-step forecasting for big data time series based on ensemble learning. Knowl. Based Syst. 2019, 163, 830–841. [Google Scholar] [CrossRef]
  22. Adhikari, R. A neural network based linear ensemble framework for time series forecasting. Neurocomputing 2015, 157, 231–242. [Google Scholar] [CrossRef]
  23. Williams, R.J.; Peng, J. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories. Neural Comput. 1990, 2, 490–501. [Google Scholar] [CrossRef]
  24. Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
  25. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  26. Deng, Y.; Jiao, Y.; Lu, B.L. Driver Sleepiness Detection Using LSTM Neural Network. In Neural Information Processing; Cheng, L., Leung, A.C.S., Ozawa, S., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 622–633. [Google Scholar]
  27. Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); Association for Computational Linguistics: Doha, Qatar, 2014; pp. 1724–1734. [Google Scholar] [CrossRef]
  28. Chung, J.; Gülçehre, Ç.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
  29. Wang, Y.; Liao, W.; Chang, Y. Gated Recurrent Unit Network-Based Short-Term Photovoltaic Forecasting. Energies 2018, 11, 2163. [Google Scholar] [CrossRef] [Green Version]
  30. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1; NIPS’12. Curran Associates Inc.: Red Hook, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
  31. Goulielmos, A.M.; Psifia, M.E. Forecasting weekly freight rates for one-year time charter 65,000 dwt bulk carrier, 1989-2008, using nonlinear methods. Marit. Policy Manag. 2009, 36, 411–436. [Google Scholar] [CrossRef]
Figure 1. (a) BDI index data; (b) Difference transform of BDI data; (c) Decomposition of BDI data.
Figure 1. (a) BDI index data; (b) Difference transform of BDI data; (c) Decomposition of BDI data.
Applsci 10 01504 g001
Figure 2. Ensemble deep RNN, LSTM and GRU model.
Figure 2. Ensemble deep RNN, LSTM and GRU model.
Applsci 10 01504 g002
Figure 3. (a) Short-term (one-step-ahead) prediction; (b) Long-term (multi-steps-ahead) prediction.
Figure 3. (a) Short-term (one-step-ahead) prediction; (b) Long-term (multi-steps-ahead) prediction.
Applsci 10 01504 g003
Figure 4. Short-term or one-step-ahead prediction of BDI in testing phase.
Figure 4. Short-term or one-step-ahead prediction of BDI in testing phase.
Applsci 10 01504 g004
Figure 5. Zoom-in of Figure 4: DERN result in more accurate prediction.
Figure 5. Zoom-in of Figure 4: DERN result in more accurate prediction.
Applsci 10 01504 g005
Figure 6. Long-term prediction: three-steps-ahead prediction of BDI: (a) Deep RNN; (b) LSTM; (c) GRU; (d) DERN.
Figure 6. Long-term prediction: three-steps-ahead prediction of BDI: (a) Deep RNN; (b) LSTM; (c) GRU; (d) DERN.
Applsci 10 01504 g006
Figure 7. Long-term prediction: five-steps-ahead prediction of BDI: (a) Deep RNN; (b) LSTM; (c) GRU; (d) DERN.
Figure 7. Long-term prediction: five-steps-ahead prediction of BDI: (a) Deep RNN; (b) LSTM; (c) GRU; (d) DERN.
Applsci 10 01504 g007
Figure 8. Long-term prediction: seven-steps-ahead prediction of BDI: (a) Deep RNN; (b) LSTM; (c) GRU; (d) DERN.
Figure 8. Long-term prediction: seven-steps-ahead prediction of BDI: (a) Deep RNN; (b) LSTM; (c) GRU; (d) DERN.
Applsci 10 01504 g008
Figure 9. Averages of RMSE, MAE and MAPE for one-to-seven-steps-ahead prediction of BDI using MLP, Deep RNN, LSTM, GRU, and DERN, respectively.
Figure 9. Averages of RMSE, MAE and MAPE for one-to-seven-steps-ahead prediction of BDI using MLP, Deep RNN, LSTM, GRU, and DERN, respectively.
Applsci 10 01504 g009
Table 1. Dickey-Fuller test of pre-processing data.
Table 1. Dickey-Fuller test of pre-processing data.
DataTest Statisticsp-ValueCritical Value 1%Critical Value 5%Critical Value 10%
Original data−2.1904420.209733−3.437455−2.864676−2.568440
Normalization−2.1904420.209733−3.437455−2.864676−2.568440
Standardization−2.1904420.209733−3.437455−2.864676−2.568440
Log transform−2.4340860.132346−3.437348−2.864630−2.568415
Power transform−2.5460630.104665−3.437348−2.864630−2.568415
1st difference transform−7.6902641.42 × 10 11 −3.430000−2.864676−2.568440
Table 2. Model configuration.
Table 2. Model configuration.
Hyper-ParameterRNNLSTMGRU
Recurrent layer2 RNN layer2 LSTM layer2 LSTM layer
Hidden units500 of each layer500 of each layer500 of each layer
Gate activationtanhtanhtanh
Dropout20% of each layer20% of each layer20% of each layer
Wrapper layer1 TimeDistributed layer1 TimeDistributed layer1 TimeDistributed layer
Hidden units# of step prediction# of step prediction# of step prediction
ActivationSigmoidSigmoidSigmoid
Loss functionMSEMSEMSE
OptimizerAdamAdamAdam
Ensemble layer1 Dense layer
ActivationSigmoid
Loss functionMSE
OptimizerAdam
Table 3. Summary statistics of BDI data.
Table 3. Summary statistics of BDI data.
ParameterValue
Count4567.00
Mean2417.19
Standard Deviation2127.62
Minimum290.00
25%993.00
50%1588.00
75%3020.00
Maximum11,793.00
Table 4. Experimental results of one-step-ahead (short-term) prediction of BDI.
Table 4. Experimental results of one-step-ahead (short-term) prediction of BDI.
Method70–30%80–20%90–10%
RMSEMAEMAPERMSEMAEMAPERMSEMAEMAPE
ARIMA47.4633.343.2937.6928.993.2439.7531.843.19
MLP5.994.560.4718.8416.112.5827.2227.202.94
Deep RNN5.924.450.464.233.890.593.312.840.36
LSTM5.554.430.555.364.310.7010.278.840.99
GRU4.753.740.473.623.390.503.953.570.44
DERN2.532.130.441.981.580.453.522.990.31
Table 5. Experimental results of three-steps-ahead prediction of BDI.
Table 5. Experimental results of three-steps-ahead prediction of BDI.
PortionAccuracy MeasureModel t 1 t 2 t 3
70–30%RMSEARIMA701.29789.60989.10
MLP121.19199.15229.12
Deep RNN94.09141.05208.98
LSTM71.79139.65208.80
GRU70.93133.71204.14
DERN64.12123.98234.25
MAEARIMA693.11732.51932.43
MLP89.62148.18189.19
Deep RNN57.72117.78168.06
LSTM55.62107.58166.01
GRU56.63108.76146.91
DERN58.0099.96126.03
MAPEARIMA98.81101.08131.43
MLP15.3117.9125.93
Deep RNN9.0813.6120.93
LSTM6.4512.7320.46
GRU6.1311.9920.09
DERN4.059.8718.46
80–20%RMSEARIMA694.93872.92991.80
MLP152.71195.15299.92
Deep RNN101.01155.05274.97
LSTM91.79149.11278.82
GRU91.10134.31269.22
DERN89.04132.93258.21
MAEARIMA901.971002.221032.43
MLP68.92199.98208.01
Deep RNN65.02147.38196.91
LSTM63.92127.38186.91
GRU62.97128.48176.91
DERN60.57126.35167.76
MAPEARIMA103.91140.08181.13
MLP17.7825.7136.44
Deep RNN12.2817.7333.04
LSTM9.7815.7323.99
GRU9.6214.9522.09
DERN10.0111.8520.17
90–10%RMSEARIMA894.031012.421300.10
MLP119.89197.04309.92
Deep RNN99.91179.14292.82
LSTM99.11169.14288.81
GRU101.07159.25278.91
DERN98.22147.50268.62
MAEARIMA981.071302.221687.03
MLP108.12187.01206.11
Deep RNN72.32160.01199.61
LSTM64.32145.33189.51
GRU63.92144.99189.01
DERN60.61154.19175.11
MAPEARIMA114.21180.18221.90
MLP16.9524.9533.91
Deep RNN11.6518.5529.20
LSTM10.6416.6324.01
GRU11.8015.0324.91
DERN10.7112.0119.82
Table 6. Experimental results of five-steps-ahead prediction of BDI.
Table 6. Experimental results of five-steps-ahead prediction of BDI.
PortionAccuracy MeasureModel t 1 t 2 t 3 t 4 t 5
70–30%RMSEARIMA731.07799.10999.081009.981020.05
MLP145.10209.31241.97255.06259.99
Deep RNN91.96160.28214.43257.35290.34
LSTM79.40142.33201.69248.45293.33
GRU79.92155.84219.02276.65325.09
DERN75.11139.05200.15245.23293.31
MAEARIMA698.81772.81952.69978.01981.97
MLP94.92158.19191.09201.44210.03
Deep RNN64.38111.64153.25188.69214.41
LSTM58.0999.40141.41180.80212.31
GRU57.54108.50154.50198.63231.78
DERN57.9099.00136.03178.81212.05
MAPEARIMA100.01121.18141.03152.87164.09
MLP17.9319.0823.9327.8729.65
Deep RNN9.4613.1919.3025.7527.46
LSTM5.7310.8615.9019.6422.89
GRU5.5810.6215.1719.5122.95
DERN4.959.8512.9617.0019.32
80–20%RMSEARIMA724.03842.921000.801010.751024.62
MLP155.29197.45309.62320.74350.97
Deep RNN98.81138.35167.67212.59224.87
LSTM67.56122.51160.39190.65213.23
GRU62.96119.82160.87191.39214.23
DERN62.04118.98160.99190.01210.00
MAEARIMA900.171005.211042.331050.391072.11
MLP73.48201.08218.01240.55255.12
Deep RNN60.96104.67129.80167.03179.80
LSTM53.1892.75123.78151.54171.19
GRU49.1088.78122.20148.33170.18
DERN48.5687.39121.76144.72170.00
MAPEARIMA123.31143.18183.83189.11192.99
MLP19.1826.1735.4137.1139.98
Deep RNN9.0714.2420.7624.5327.81
LSTM6.9311.8717.7420.2623.93
GRU6.5410.9816.9419.3921.48
DERN6.508.8114.1716.9819.52
90–10%RMSEARIMA890.011042.121370.071490.101640.97
MLP126.39207.45259.11298.55301.54
Deep RNN93.99137.53166.94175.15221.83
LSTM71.00131.49178.91216.40241.35
GRU70.43127.65172.92210.40243.12
DERN67.29127.51168.92209.55240.76
MAEARIMA991.471402.211707.091811.871991.08
MLP118.22197.53216.10223.09245.12
Deep RNN58.2598.50132.02162.15182.85
LSTM55.67103.69142.64174.16194.45
GRU56.07102.74139.42172.02205.66
DERN55.61101.19135.19171.04200.11
MAPEARIMA124.01189.08231.10256.98279.14
MLP18.3527.9136.0439.4342.76
Deep RNN8.9913.9820.2825.1528.20
LSTM6.7311.9516.0122.2524.06
GRU6.8810.9717.9020.2125.67
DERN5.019.1314.8717.0219.09
Table 7. Experimental results of seven-steps-ahead prediction of BDI.
Table 7. Experimental results of seven-steps-ahead prediction of BDI.
PortionAccuracy MeasureModel t 1 t 2 t 3 t 4 t 5 t 6 t 7
70–30%RMSEARIMA739.21889.881008.981208.981568.721759.661973.26
MLP131.09197.15229.82267.11299.56337.98353.31
Deep RNN90.57157.58212.35255.03288.60310.13324.06
LSTM73.67143.11200.43249.52290.83317.09329.46
GRU77.74150.71214.46270.16323.02364.61390.65
DERN69.53143.08198.09240.92287.43300.41320.04
MAEARIMA699.93734.08962.73974.23999.001020.871127.11
MLP90.62147.18179.69199.04244.42261.81280.00
Deep RNN63.38109.58151.40186.30212.95232.66250.13
LSTM57.11101.85142.55181.92211.23230.28243.53
GRU56.63107.01153.64196.62232.58264.67290.33
DERN56.0099.93136.03167.39201.04225.01237.33
MAPEARIMA98.01112.78134.49156.03175.53187.67199.90
MLP14.9115.9118.9322.0925.0026.2229.34
Deep RNN7.4512.1716.4819.9921.9324.1431.27
LSTM5.3710.3414.4018.4421.5123.4124.79
GRU5.5210.8115.6220.0723.8227.0529.73
DERN4.0510.8713.9618.0120.1123.4823.69
80–20%RMSEARIMA699.23887.921004.211220.891400.671502.771701.12
MLP159.01181.25209.92220.90250.11270.99300.86
Deep RNN87.46123.43166.68192.01219.95237.83255.86
LSTM62.74117.33155.29186.05209.26227.93242.47
GRU61.07117.77156.74188.74214.68237.32257.57
DERN60.94102.33154.21178.00208.11217.43240.12
MAEARIMA900.061001.221202.491442.111558.001872.711991.91
MLP69.91198.18218.31236.16257.12277.09296.66
Deep RNN69.0893.11129.07153.14177.12191.42211.68
LSTM55.7686.98118.25147.24168.10182.84194.88
GRU47.6387.48118.09146.51170.46187.73207.20
DERN46.5184.35118.76146.13165.10180.33194.00
MAPEARIMA102.90141.08185.03198.01210.77240.11290.19
MLP17.6826.7138.1440.1143.0345.3247.71
Deep RNN8.4813.7116.0420.8822.0326.1230.80
LSTM5.5210.1914.0217.8720.7722.8424.71
GRU5.5210.2714.0917.9521.3023.6726.11
DERN5.5110.1213.1717.0519.0923.0824.50
90–10%RMSEARIMA880.531002.411040.901321.111509.971610.111891.09
MLP129.80199.19319.95342.65364.12390.17402.11
Deep RNN71.73126.45164.00195.55219.68243.05264.41
LSTM67.87127.67172.43207.99230.10253.48276.82
GRU66.25127.80173.42207.92232.08254.53276.37
DERN65.21126.20161.23206.01241.08251.53267.11
MAEARIMA981.071312.121487.031700.761880.121976.322019.19
MLP109.12180.31233.11258.88298.12320.11379.11
Deep RNN56.7397.33129.47160.01180.24197.67220.86
LSTM53.68100.67139.20168.14188.49201.61217.05
GRU51.6198.23134.01165.39189.76209.58231.24
DERN50.7199.45133.11160.09180.01195.11202.11
MAPEARIMA124.91190.38251.93271.33296.39301.11329.81
MLP14.4524.0533.3143.1156.6571.1284.49
Deep RNN6.9113.0117.1320.1622.1427.0133.34
LSTM5.5710.6114.7117.9119.9121.1022.46
GRU5.229.9513.4716.6019.1421.1523.23
DERN5.679.1311.8215.8418.0119.2221.21

Share and Cite

MDPI and ACS Style

Kamal, I.M.; Bae, H.; Sunghyun, S.; Yun, H. DERN: Deep Ensemble Learning Model for Short- and Long-Term Prediction of Baltic Dry Index. Appl. Sci. 2020, 10, 1504. https://doi.org/10.3390/app10041504

AMA Style

Kamal IM, Bae H, Sunghyun S, Yun H. DERN: Deep Ensemble Learning Model for Short- and Long-Term Prediction of Baltic Dry Index. Applied Sciences. 2020; 10(4):1504. https://doi.org/10.3390/app10041504

Chicago/Turabian Style

Kamal, Imam Mustafa, Hyerim Bae, Sim Sunghyun, and Heesung Yun. 2020. "DERN: Deep Ensemble Learning Model for Short- and Long-Term Prediction of Baltic Dry Index" Applied Sciences 10, no. 4: 1504. https://doi.org/10.3390/app10041504

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop