Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

An improved self-attention for long-sequence time-series data forecasting with missing values

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Long-sequence time-series data forecasting based on deep learning has been applied in many practical scenarios. However, the time-series data sequences obtained in the real world inevitably contain missing values due to the failures of sensors or network fluctuations. Current research works dedicate to imputing the incomplete time-series data sequence during the data preprocessing stage, which will lead to the problems of unsynchronized prediction and error accumulation. In this article, we propose an improved multi-headed self-attention mechanism, DecayAttention, which can be applied to the existing X-former models to handle the missing values in the time-series data sequences without decreasing their prediction accuracy. We apply DecayAttention to Transformer and two state-of-the-art X-former models, and the best prediction accuracy improves by 8.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data and code availability

The data and code analyzed during the current study are available from the corresponding author on reasonable request. The implementation source codes are available in the GitHub repository at https://github.com/newbeezzc/DecayAttention.

Notes

  1. Available at https://github.com/zhouhaoyi/ETDataset.

  2. Available at https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014.

  3. Available at https://www.ncei.noaa.gov/data/local-climatological-data/.

  4. Available at https://github.com/laiguokun/multivariate-time-series-data.

  5. Available at https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html.

References

  1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17. Curran Associates Inc., Red Hook, pp 6000–6010

  2. Kitaev N, Kaiser L, Levskaya A (2020) Reformer: the efficient transformer. In: 8th international conference on learning representations, Addis Ababa, Ethiopia

  3. Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y-X, Yan X (2019) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Curran Associates Inc., Red Hook

    Google Scholar 

  4. Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: self-attention with linear complexity. arXiv preprint arXiv:2006.04768

  5. Madhusudhanan K, Burchert J, Duong-Trung N, Born S, Schmidt-Thieme L (2023) U-net inspired transformer architecture for far horizon time series forecasting. Machine learning and knowledge discovery in databases. Springer, Cham, pp 36–52

    Chapter  Google Scholar 

  6. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: beyond efficient transformer for long sequence time-series forecasting. In: Thirty-fifth AAAI conference on artificial intelligence. AAAI Press, Virtual Event, pp 11106–11115

  7. Contreras-Reyes JE, Hernández-Santoro C (2020) Assessing Granger-causality in the southern Humboldt current ecosystem using cross-spectral methods. Entropy 22(10):1071

    Article  ADS  MathSciNet  PubMed  PubMed Central  Google Scholar 

  8. Yoon J, Jordon J, Schaar M (2018) Gain: missing data imputation using generative adversarial nets. In: International conference on machine learning. PMLR, pp 5689–5698

  9. Luo Y, Cai X, Zhang Y, Xu J, Yuan X (2018) Multivariate time series imputation with generative adversarial networks. In: Advances in neural information processing systems. Annual conference on neural information processing systems 2018, Montréal, Canada, pp 1603–1614

  10. Miao X, Wu Y, Wang J, Gao Y, Mao X, Yin J (2021) Generative semi-supervised learning for multivariate time series imputation. In: Proceedings of the AAAI conference on artificial intelligence, vol 35. Virtual Event, pp 8983–8991

  11. Yoon J, Zame WR, van der Schaar M (2018) Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Trans Biomed Eng 66(5):1477–1490

    Article  PubMed  Google Scholar 

  12. Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8(1):6085

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  13. Cao W, Wang D, Li J, Zhou H, Li L, Li Y (2018) BRITS: bidirectional recurrent imputation for time series. In: Advances in neural information processing systems. Annual conference on neural information processing systems 2018, Montréal, Canada, pp 6776–6786

  14. Cini A, Marisca I, Alippi C (2022) Filling the gaps: multivariate time series imputation by graph neural networks. In: International conference on learning representations

  15. Connor JT, Martin RD, Atlas LE (1994) Recurrent neural networks and robust time series prediction. IEEE Trans Neural Netw 5(2):240–254

    Article  CAS  PubMed  Google Scholar 

  16. Zhang A, Zhu W, Li J (2018) Spiking echo state convolutional neural network for robust time series classification. IEEE Access 7:4927–4935

    Article  Google Scholar 

  17. Krstanovic S, Paulheim H (2017) Ensembles of recurrent neural networks for robust time series forecasting. In: International conference on innovative techniques and applications of artificial intelligence, Cambridge, pp 34–46

  18. Guo T, Xu Z, Yao X, Chen H, Aberer K, Funaya K (2016) Robust online time series prediction with recurrent neural networks. In: 2016 IEEE international conference on data science and advanced analytics (DSAA), Montreal, QC, Canada, pp 816–825

  19. Li Y, Lu X, Wang Y, Dou D (2022) Generative time series forecasting with diffusion, denoise, and disentanglement. Adv Neural Inf Process Syst 35:23009–23022

    Google Scholar 

  20. Belkhouja T, Yan Y, Doppa JR (2022) Training robust deep models for time-series domain: novel algorithms and theoretical analysis. In: Proceedings of the AAAI conference on artificial intelligence, vol 36. Virtual Event, pp 6055–6063

  21. Liu L, Park Y, Hoang TN, Hasson H, Huan L (2023) Robust multivariate time-series forecasting: adversarial attacks and defense mechanisms. In: The eleventh international conference on learning representations, Kigali, Rwanda

  22. Williams BM, Hoel LA (2003) Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. J Transp Eng 129(6):664–672

    Article  Google Scholar 

  23. Wu X, Wang Y (2012) Extended and unscented Kalman filtering based feedforward neural networks for time series prediction. Appl Math Model 36(3):1123–1131

    Article  MathSciNet  Google Scholar 

  24. Joo TW, Kim SB (2015) Time series forecasting based on wavelet filtering. Expert Syst Appl 42(8):3868–3874

    Article  Google Scholar 

  25. Han Z, Liu Y, Zhao J, Wang W (2012) Real time prediction for converter gas tank levels based on multi-output least square support vector regressor. Control Eng Pract 20(12):1400–1409

    Article  Google Scholar 

  26. Eric Z, Hui W (2006) Vector autoregressive models for multivariate time series. Springer, New York, pp 385–429

    Google Scholar 

  27. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232

    Article  MathSciNet  PubMed  Google Scholar 

  28. Lipton ZC, Kale DC, Elkan C, Wetzel RC (2016) Learning to diagnose with LSTM recurrent neural networks. In: 4th international conference on learning representations, San Juan, Puerto Rico

  29. Yang JB, Nguyen MN, San PP, Li XL, Krishnaswamy S (2015) Deep convolutional neural networks on multichannel time series for human activity recognition. In: Proceedings of the 24th international conference on artificial intelligence. IJCAI’15. AAAI Press, Buenos Aires, pp. 3995–4001

  30. Hoermann S, Bach M, Dietmayer K (2018) Dynamic occupancy grid prediction for urban autonomous driving: a deep learning approach with fully automatic labeling. In: 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, Australia, pp 2056–2063

  31. Ding X, Zhang Y, Liu T, Duan J (2015) Deep learning for event-driven stock prediction. In: Proceedings of the 24th international conference on artificial intelligence. IJCAI’15. AAAI Press, Buenos Aires, pp 2327–2333

  32. Zhang J, Zheng Y, Qi D (2017) Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of the thirty-first AAAI Conference on Artificial Intelligence. AAAI’17. AAAI Press, San Francisco, pp 1655–1661

  33. Wang HZ, Li GQ, Wang GB, Peng JC, Jiang H, Liu YT (2017) Deep learning based ensemble approach for probabilistic wind power forecasting. Appl Energy 188:56–70

    Article  ADS  Google Scholar 

  34. Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley series in probability and statistics. John Wiley & Sons location, Hoboken

    Google Scholar 

  35. Wu S, Xiao X, Ding Q, Zhao P, Wei Y, Huang J (2020) Adversarial sparse transformer for time series forecasting. Adv Neural Inf Process Syst 33:17105–17115

    Google Scholar 

  36. Xu J, Wang J, Long M et al (2021) Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Adva Neural Inf Process Syst 34:22419–22430

    Google Scholar 

  37. Arslan S (2022) A hybrid forecasting model using LSTM and prophet for energy consumption with decomposition of time series data. PeerJ Comput Sci 8:1001

    Article  Google Scholar 

  38. Huy PC, Minh NQ, Tien ND, Anh TTQ (2022) Short-term electricity load forecasting based on temporal fusion transformer model. IEEE Access 10:106296–106304

    Article  Google Scholar 

  39. Liu Y, Hu T, Zhang H, Wu H, Wang S, Ma L, Long M (2023) itransformer: inverted transformers are effective for time series forecasting. CoRRarXiv:2310.06625

  40. Khedhiri S et al (2022) Comparison of SARFIMA and LSTM methods to model and to forecast Canadian temperature. Reg Stat 12(02):177–194

    Article  Google Scholar 

  41. Yoon T, Park Y, Ryu EK, Wang Y (2022) Robust probabilistic time series forecasting. In: International conference on artificial intelligence and statistics. Proceedings of machine learning research, vol 151. Virtual Event, pp 1336–1358

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1 Model implementation details

Appendix 1 Model implementation details

See Table 7 and Fig. 10.

Table 7 Transformer and Transformer* model implementation details
Fig. 10
figure 10

Model structure details of DecayAttention on Transformer

See Table 8 and Fig. 11.

Table 8 Informer and Informer* model implementation details
Fig. 11
figure 11

Model structure details of DecayAttention on Informer

See Table 9 and Fig. 12.

Table 9 Autoformer and Autoformer* model implementation details
Fig. 12
figure 12

Model structure details of DecayAttention on Autoformer

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Zc., Wang, Y., Peng, Jj. et al. An improved self-attention for long-sequence time-series data forecasting with missing values. Neural Comput & Applic 36, 3921–3940 (2024). https://doi.org/10.1007/s00521-023-09347-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09347-6

Keywords