An improved self-attention for long-sequence time-series data forecasting with missing values

Zhang, Zhi-cheng; Wang, Yong; Peng, Jian-jian; Duan, Jun-ting

doi:10.1007/s00521-023-09347-6

An improved self-attention for long-sequence time-series data forecasting with missing values

Review
Published: 23 December 2023

Volume 36, pages 3921–3940, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Zhi-cheng Zhang¹,
Yong Wang ORCID: orcid.org/0000-0002-0422-5691¹,
Jian-jian Peng¹ &
…
Jun-ting Duan¹

458 Accesses
1 Citation
Explore all metrics

Abstract

Long-sequence time-series data forecasting based on deep learning has been applied in many practical scenarios. However, the time-series data sequences obtained in the real world inevitably contain missing values due to the failures of sensors or network fluctuations. Current research works dedicate to imputing the incomplete time-series data sequence during the data preprocessing stage, which will lead to the problems of unsynchronized prediction and error accumulation. In this article, we propose an improved multi-headed self-attention mechanism, DecayAttention, which can be applied to the existing X-former models to handle the missing values in the time-series data sequences without decreasing their prediction accuracy. We apply DecayAttention to Transformer and two state-of-the-art X-former models, and the best prediction accuracy improves by 8.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-attention-based time-variant neural networks for multi-step time series forecasting

Article 05 February 2022

Double-Layer Attention for Long Sequence Time-Series Forecasting

DPHM-Net:de-redundant multi-period hybrid modeling network for long-term series forecasting

Article 22 June 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data and code availability

The data and code analyzed during the current study are available from the corresponding author on reasonable request. The implementation source codes are available in the GitHub repository at https://github.com/newbeezzc/DecayAttention.

Notes

Available at https://github.com/zhouhaoyi/ETDataset.
Available at https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014.
Available at https://www.ncei.noaa.gov/data/local-climatological-data/.
Available at https://github.com/laiguokun/multivariate-time-series-data.
Available at https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html.

References

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17. Curran Associates Inc., Red Hook, pp 6000–6010
Kitaev N, Kaiser L, Levskaya A (2020) Reformer: the efficient transformer. In: 8th international conference on learning representations, Addis Ababa, Ethiopia
Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y-X, Yan X (2019) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Curran Associates Inc., Red Hook
Google Scholar
Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: self-attention with linear complexity. arXiv preprint arXiv:2006.04768
Madhusudhanan K, Burchert J, Duong-Trung N, Born S, Schmidt-Thieme L (2023) U-net inspired transformer architecture for far horizon time series forecasting. Machine learning and knowledge discovery in databases. Springer, Cham, pp 36–52
Chapter Google Scholar
Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: beyond efficient transformer for long sequence time-series forecasting. In: Thirty-fifth AAAI conference on artificial intelligence. AAAI Press, Virtual Event, pp 11106–11115
Contreras-Reyes JE, Hernández-Santoro C (2020) Assessing Granger-causality in the southern Humboldt current ecosystem using cross-spectral methods. Entropy 22(10):1071
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Yoon J, Jordon J, Schaar M (2018) Gain: missing data imputation using generative adversarial nets. In: International conference on machine learning. PMLR, pp 5689–5698
Luo Y, Cai X, Zhang Y, Xu J, Yuan X (2018) Multivariate time series imputation with generative adversarial networks. In: Advances in neural information processing systems. Annual conference on neural information processing systems 2018, Montréal, Canada, pp 1603–1614
Miao X, Wu Y, Wang J, Gao Y, Mao X, Yin J (2021) Generative semi-supervised learning for multivariate time series imputation. In: Proceedings of the AAAI conference on artificial intelligence, vol 35. Virtual Event, pp 8983–8991
Yoon J, Zame WR, van der Schaar M (2018) Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Trans Biomed Eng 66(5):1477–1490
Article PubMed Google Scholar
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8(1):6085
Article ADS PubMed PubMed Central Google Scholar
Cao W, Wang D, Li J, Zhou H, Li L, Li Y (2018) BRITS: bidirectional recurrent imputation for time series. In: Advances in neural information processing systems. Annual conference on neural information processing systems 2018, Montréal, Canada, pp 6776–6786
Cini A, Marisca I, Alippi C (2022) Filling the gaps: multivariate time series imputation by graph neural networks. In: International conference on learning representations
Connor JT, Martin RD, Atlas LE (1994) Recurrent neural networks and robust time series prediction. IEEE Trans Neural Netw 5(2):240–254
Article CAS PubMed Google Scholar
Zhang A, Zhu W, Li J (2018) Spiking echo state convolutional neural network for robust time series classification. IEEE Access 7:4927–4935
Article Google Scholar
Krstanovic S, Paulheim H (2017) Ensembles of recurrent neural networks for robust time series forecasting. In: International conference on innovative techniques and applications of artificial intelligence, Cambridge, pp 34–46
Guo T, Xu Z, Yao X, Chen H, Aberer K, Funaya K (2016) Robust online time series prediction with recurrent neural networks. In: 2016 IEEE international conference on data science and advanced analytics (DSAA), Montreal, QC, Canada, pp 816–825
Li Y, Lu X, Wang Y, Dou D (2022) Generative time series forecasting with diffusion, denoise, and disentanglement. Adv Neural Inf Process Syst 35:23009–23022
Google Scholar
Belkhouja T, Yan Y, Doppa JR (2022) Training robust deep models for time-series domain: novel algorithms and theoretical analysis. In: Proceedings of the AAAI conference on artificial intelligence, vol 36. Virtual Event, pp 6055–6063
Liu L, Park Y, Hoang TN, Hasson H, Huan L (2023) Robust multivariate time-series forecasting: adversarial attacks and defense mechanisms. In: The eleventh international conference on learning representations, Kigali, Rwanda
Williams BM, Hoel LA (2003) Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. J Transp Eng 129(6):664–672
Article Google Scholar
Wu X, Wang Y (2012) Extended and unscented Kalman filtering based feedforward neural networks for time series prediction. Appl Math Model 36(3):1123–1131
Article MathSciNet Google Scholar
Joo TW, Kim SB (2015) Time series forecasting based on wavelet filtering. Expert Syst Appl 42(8):3868–3874
Article Google Scholar
Han Z, Liu Y, Zhao J, Wang W (2012) Real time prediction for converter gas tank levels based on multi-output least square support vector regressor. Control Eng Pract 20(12):1400–1409
Article Google Scholar
Eric Z, Hui W (2006) Vector autoregressive models for multivariate time series. Springer, New York, pp 385–429
Google Scholar
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232
Article MathSciNet PubMed Google Scholar
Lipton ZC, Kale DC, Elkan C, Wetzel RC (2016) Learning to diagnose with LSTM recurrent neural networks. In: 4th international conference on learning representations, San Juan, Puerto Rico
Yang JB, Nguyen MN, San PP, Li XL, Krishnaswamy S (2015) Deep convolutional neural networks on multichannel time series for human activity recognition. In: Proceedings of the 24th international conference on artificial intelligence. IJCAI’15. AAAI Press, Buenos Aires, pp. 3995–4001
Hoermann S, Bach M, Dietmayer K (2018) Dynamic occupancy grid prediction for urban autonomous driving: a deep learning approach with fully automatic labeling. In: 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, Australia, pp 2056–2063
Ding X, Zhang Y, Liu T, Duan J (2015) Deep learning for event-driven stock prediction. In: Proceedings of the 24th international conference on artificial intelligence. IJCAI’15. AAAI Press, Buenos Aires, pp 2327–2333
Zhang J, Zheng Y, Qi D (2017) Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of the thirty-first AAAI Conference on Artificial Intelligence. AAAI’17. AAAI Press, San Francisco, pp 1655–1661
Wang HZ, Li GQ, Wang GB, Peng JC, Jiang H, Liu YT (2017) Deep learning based ensemble approach for probabilistic wind power forecasting. Appl Energy 188:56–70
Article ADS Google Scholar
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley series in probability and statistics. John Wiley & Sons location, Hoboken
Google Scholar
Wu S, Xiao X, Ding Q, Zhao P, Wei Y, Huang J (2020) Adversarial sparse transformer for time series forecasting. Adv Neural Inf Process Syst 33:17105–17115
Google Scholar
Xu J, Wang J, Long M et al (2021) Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Adva Neural Inf Process Syst 34:22419–22430
Google Scholar
Arslan S (2022) A hybrid forecasting model using LSTM and prophet for energy consumption with decomposition of time series data. PeerJ Comput Sci 8:1001
Article Google Scholar
Huy PC, Minh NQ, Tien ND, Anh TTQ (2022) Short-term electricity load forecasting based on temporal fusion transformer model. IEEE Access 10:106296–106304
Article Google Scholar
Liu Y, Hu T, Zhang H, Wu H, Wang S, Ma L, Long M (2023) itransformer: inverted transformers are effective for time series forecasting. CoRRarXiv:2310.06625
Khedhiri S et al (2022) Comparison of SARFIMA and LSTM methods to model and to forecast Canadian temperature. Reg Stat 12(02):177–194
Article Google Scholar
Yoon T, Park Y, Ryu EK, Wang Y (2022) Robust probabilistic time series forecasting. In: International conference on artificial intelligence and statistics. Proceedings of machine learning research, vol 151. Virtual Event, pp 1336–1358

Download references

Author information

All authors have made equal contributions to this work.

Authors and Affiliations

School of Computer Science and Engineering, Center for Cyber Security, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, Sichuan, People’s Republic of China
Zhi-cheng Zhang, Yong Wang, Jian-jian Peng & Jun-ting Duan

Authors

Zhi-cheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian-jian Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jun-ting Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1 Model implementation details

See Table 7 and Fig. 10.

Table 7 Transformer and Transformer* model implementation details

Full size table

See Table 8 and Fig. 11.

Table 8 Informer and Informer* model implementation details

Full size table

See Table 9 and Fig. 12.

Table 9 Autoformer and Autoformer* model implementation details

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Zc., Wang, Y., Peng, Jj. et al. An improved self-attention for long-sequence time-series data forecasting with missing values. Neural Comput & Applic 36, 3921–3940 (2024). https://doi.org/10.1007/s00521-023-09347-6

Download citation

Received: 19 May 2023
Accepted: 26 November 2023
Published: 23 December 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00521-023-09347-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved self-attention for long-sequence time-series data forecasting with missing values

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Self-attention-based time-variant neural networks for multi-step time series forecasting

Double-Layer Attention for Long Sequence Time-Series Forecasting

DPHM-Net:de-redundant multi-period hybrid modeling network for long-term series forecasting

Data and code availability

Notes

References