Explaining When Deep Learning Models Are Better for Time Series Forecasting †
Abstract
:1. Introduction
2. Literature Review
3. Materials and Methods
3.1. Data
3.2. Models
3.3. Training and Testing Process
3.4. Features for Time Series Characterization
3.5. Linear Regression
4. Results
4.1. General Results
4.2. Comparison between Deep Learning Models and Statistical Models
4.3. Comparison between Deep Learning Models and Machine Learning Models
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Papacharalampous, G.; Tyralis, H.; Koutsoyiannis, D. Comparison of stochastic and machine learning methods for multi-step ahead forecasting of hydrological processes. Stoch. Environ. Res. Risk Assess. 2019, 33, 481–514. [Google Scholar] [CrossRef]
- Petropoulos, F.; Makridakis, S.; Assimakopoulos, V.; Nikolopoulos, K. Horses for Courses’ in demand forecasting. Eur. J. Oper. Res. 2014, 237, 152–163. [Google Scholar] [CrossRef]
- Makridakis, S.; Hibon, M. The M3-Competition: Results, conclusions and implications. Int. J. Forecast. 2000, 16, 451–476. [Google Scholar] [CrossRef]
- Crone, S.F.; Hibon, M.; Nikolopoulos, K. Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction. Int. J. Forecast. 2011, 27, 635–660. [Google Scholar] [CrossRef]
- Sharma, A.; Jain, S.K. Deep Learning Approaches to Time Series Forecasting. In Recent Advances in Time Series Forecasting; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar] [CrossRef]
- Gamboa, J.C.B. Deep learning for time-series analysis. arXiv 2017, arXiv:1701.01887. [Google Scholar]
- Parmezan, A.R.S.; Souza, V.M.; Batista, G.E. Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model. Inf. Sci. 2019, 484, 302–337. [Google Scholar] [CrossRef]
- Kiefer, D.; Grimm, F.; Bauer, M.; Van Dinther, C. Demand forecasting intermittent and lumpy time series: Co mparing statistical, machine learning and deep learning methods. In Proceedings of the 54th Hawaii International Conference on System Sciences, Online, 4–9 January 2021. [Google Scholar] [CrossRef]
- Solís, M.; Calvo-Valverde, L.A. Performance of Deep Learning models with transfer learning for multiple-step-ahead forecasts in monthly time series. Intel. Artif. 2022, 25, 110–125. [Google Scholar] [CrossRef]
- Elsayed, S.; Thyssens, D.; Rashed, A.; Jomaa, H.S.; Schmidt-Thieme, L. Do we really need deep learning models for time series forecasting? arXiv 2021, arXiv:2101.02118. [Google Scholar]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A comparison of ARIMA and LSTM in forecasting time series. In Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; IEEE: New York, NY, USA, 2018; pp. 1394–1401. [Google Scholar] [CrossRef]
- Kanavos, A.; Kounelis, F.; Iliadis, L.; Makris, C. Deep learning models for forecasting aviation demand time series. Neural Comput. Appl. 2021, 33, 329–343. [Google Scholar] [CrossRef]
- M4 Team. M4 Competitor’s Guide: Prizes and Rules. 2018. Available online: https://www.m4.unic.ac.cy/wpcontent/uploads/2018/03/M4-CompetitorsGuide.pdf (accessed on 25 January 2020).
- Wang, X.; Smith, K.; Hyndman, R. Characteristic-based clustering for time series data. Data Min. Knowl. Discov. 2006, 13, 335–364. [Google Scholar] [CrossRef]
- Narajewski, M.; Kley-Holsteg, J.; Ziel, F. tsrobprep—An R package for robust preprocessing of time series data. SoftwareX 2021, 16, 100809. [Google Scholar] [CrossRef]
- Komsta, L. Package ‘Outliers’. 2022. Available online: https://cran.r-project.org/web/packages/outliers/outliers.pdf (accessed on 25 January 2023).
- Yang, Y.; Hyndman, R. Introduction to the Tsfeatures Package. 2022. Available online: https://pkg.robjhyndman.com/tsfeatures/articles/tsfeatures.html (accessed on 30 August 2023).
- Day, O.; Khoshgoftaar, T.M. A survey on heterogeneous transfer learning. J. Big Data 2017, 4, 1–42. [Google Scholar] [CrossRef]
CNN:
| TCN:
| LSTM:
|
XGBoost:
| Support Vector Machines:
| Random Forest:
|
Name | Explanation | Package |
---|---|---|
size | Number time series points | - |
stl_features_trend | The trend is a long-term change in the mean level [14]. This metric measures the strength of the trend. Higher values mean higher strength of the trend. The formula applied is analogous to [14]. , where = smoothed trend component, = i-th seasonal component, and = remainder component. Trend = . | stl_features function from package tsfeatures |
alpha_tsf | Alpha parameter of simple exponential smoothing. This feature measures the relevance of recent periods. High values mean more weight to recent lags for the forecast, and lower values imply a more even distribution of weight. | tsfeatures |
kurtosis_fisher | It is a measure to characterize the probability distribution of the time series. Higher values indicate more concentration around the mean. | PerformanceAnalytics |
out | It averages the proportion of outliers according to two different measures. | detect_outliers of package tsrobprep and scores from outliers package |
pearson_test | The p-value of the normality Pearson’s test. Higher values indicate that the distribution of the time series is closer to the normal assumption. | pearson.test from nortest library |
adf | The p-value of the augmented Dickey–Fuller test for the null hypothesis of a non-stationary time series. Higher values indicate that the time series is non-stationary. | adf.test from tseries package. K = 12 |
stl_features_linearity | Measures the linearity of a time series based on the coefficients of an orthogonal quadratic regression [17]. | tsfeatures |
stl_features_seasonal_strength | Measure the strength of the seasonality. Higher values mean higher strength. Following the decomposition of the time series described in the previous feature, the computation is next: seasonal = | tsfeatures |
gamma_tsf | Gamma parameter from ETS model. High values mean more weight to the recent stational period. | tsfeatures |
white_noise | The p-value of the Box test. The null hypothesis is that the time series points are independently distributed. Higher values indicate the time series points are independent, and therefore, the data can be white noise. | Box.test from tseries package. We apply lag = 12 |
nonlinearity_tsf | It is based on Teräsvirta’s nonlinearity test, which takes larger values when the time series is nonlinear. | nonlinearity from tsfeatures |
distance | We created this feature that measures the average Euclidean distances between the feature vector of a target time series and the other 2000 feature vectors of the training dataset. The purpose was to measure the distance between a target time series and the training dataset. Higher values represent higher distances. |
Features | ARIMA | ETS | THETA | ||||||
---|---|---|---|---|---|---|---|---|---|
TCN | LSTM | CNN | TCN | LSTM | CNN | TCN | LSTM | CNN | |
distance | 0.19 | 0.27 | 0.49 | 0.11 | 0.16 | 0.44 | 0.02 | 0.01 | 0.18 |
size | −0.01 | −0.05 | 0.04 | 0.03 | 0.00 | 0.11 | 0.03 | 0.02 | 0.08 |
trend | 0.08 | 0.09 | 0.14 | 0.03 | 0.02 | 0.08 | 0.04 | 0.02 | 0.08 |
alpha_tsf | 0.03 | 0.05 | 0.08 | 0.02 | 0.04 | 0.08 | −0.02 | 0.00 | 0.00 |
kurtosis_fisher | −0.04 | −0.05 | −0.21 | −0.07 | −0.09 | −0.23 | 0.01 | 0.03 | −0.09 |
out | −0.07 | −0.11 | −0.17 | −0.05 | −0.07 | −0.14 | 0.00 | 0.00 | −0.06 |
pearson_test | −0.07 | −0.09 | −0.18 | −0.04 | −0.06 | −0.16 | 0.00 | 0.01 | −0.06 |
adf | −0.02 | 0.00 | −0.03 | 0.00 | 0.02 | 0.00 | −0.03 | −0.01 | −0.04 |
linearity | −0.01 | −0.04 | 0.07 | 0.00 | −0.02 | 0.07 | −0.02 | −0.02 | 0.04 |
seasonal_strength | 0.03 | 0.00 | 0.16 | 0.01 | −0.01 | 0.13 | 0.06 | 0.06 | 0.18 |
gamma | 0.05 | 0.15 | 0.15 | 0.07 | 0.18 | 0.18 | −0.02 | 0.04 | 0.00 |
white_noise | −0.10 | −0.15 | −0.26 | −0.07 | −0.09 | −0.24 | −0.01 | −0.01 | −0.10 |
nonlinearity | −0.05 | −0.08 | −0.14 | −0.01 | −0.02 | −0.09 | 0.02 | 0.03 | 0.01 |
Features | XGBoost | SVM | RF | ||||||
---|---|---|---|---|---|---|---|---|---|
TCN | LSTN | CNN | TCN | LSTN | CNN | TCN | LSTN | CNN | |
distance | −0.15 | −0.12 | −0.03 | −0.14 | −0.14 | −0.13 | −0.21 | −0.21 | −0.14 |
size | 0.08 | 0.06 | 0.12 | 0.02 | 0.01 | 0.03 | 0.11 | 0.10 | 0.17 |
trend | −0.19 | −0.23 | −0.22 | −0.28 | −0.29 | −0.30 | −0.23 | −0.27 | −0.28 |
alpha_tsf | 0.04 | 0.07 | 0.07 | 0.03 | 0.04 | 0.03 | 0.04 | 0.08 | 0.09 |
kurtosis_fisher | 0.08 | 0.07 | 0.01 | 0.09 | 0.10 | 0.08 | 0.10 | 0.11 | 0.05 |
out | 0.04 | 0.03 | 0.00 | 0.05 | 0.05 | 0.04 | 0.06 | 0.06 | 0.03 |
pearson_test | 0.04 | 0.03 | 0.00 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.03 |
adf | 0.02 | 0.04 | 0.04 | 0.01 | 0.01 | 0.02 | 0.02 | 0.03 | 0.03 |
linearity | −0.02 | −0.02 | 0.00 | −0.03 | −0.03 | −0.03 | 0.01 | 0.00 | 0.03 |
seasonal_strength | 0.05 | 0.05 | 0.12 | −0.02 | −0.02 | 0.00 | 0.04 | 0.04 | 0.11 |
gamma | 0.12 | 0.22 | 0.23 | 0.03 | 0.04 | 0.04 | 0.10 | 0.18 | 0.18 |
white_noise | 0.08 | 0.07 | 0.02 | 0.07 | 0.07 | 0.06 | 0.12 | 0.12 | 0.09 |
nonlinearity | 0.12 | 0.13 | 0.11 | 0.05 | 0.06 | 0.05 | 0.15 | 0.17 | 0.15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Solís, M.; Calvo-Valverde, L.-A. Explaining When Deep Learning Models Are Better for Time Series Forecasting. Eng. Proc. 2024, 68, 1. https://doi.org/10.3390/engproc2024068001
Solís M, Calvo-Valverde L-A. Explaining When Deep Learning Models Are Better for Time Series Forecasting. Engineering Proceedings. 2024; 68(1):1. https://doi.org/10.3390/engproc2024068001
Chicago/Turabian StyleSolís, Martín, and Luis-Alexander Calvo-Valverde. 2024. "Explaining When Deep Learning Models Are Better for Time Series Forecasting" Engineering Proceedings 68, no. 1: 1. https://doi.org/10.3390/engproc2024068001