research-article

TS-Fastformer: Fast Transformer for Time-series Forecasting

Authors:

Wonik ChoiAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 15, Issue 2

Article No.: 24, Pages 1 - 20

https://doi.org/10.1145/3630637

Published: 22 February 2024 Publication History

Abstract

Many real-world applications require precise and fast time-series forecasting. Recent trends in time-series forecasting models are shifting from LSTM-based models to Transformer-based models. However, the Transformer-based model has a limited ability to represent sequential relationships in time-series data. In addition, the transformer-based model suffers from slow training and inference speed due to the bottleneck incurred by a deep encoder and step-by-step decoder inference. To address these problems, we propose a time-series forecasting optimized Transformer model, called TS-Fastformer. TS-Fastformer introduces three new optimizations: First, we propose a Sub Window Tokenizer for compressing input in a simple manner. The Sub Window Tokenizer reduces the length of input sequences to mitigate the complexity of self-attention and enables both single and multi-sequence learning. Second, we propose Time-series Pre-trained Encoder to extract effective representations through pre-training. This optimization enables TS-Fastformer to capture both seasonal and trend representations as well as to mitigate bottlenecks of conventional transformer models. Third, we propose the Past Attention Decoder to forecast target by incorporating past long short-term dependency patterns. Furthermore, Past Attention Decoder achieves high performance improvement by removing a trend distribution that changes over a long period. We evaluate the efficiency of our model with extensive experiments using seven real-world datasets and compare our model to six representative time-series forecasting approaches. The results show that the proposed TS-Fastformer reduces MSE by 10.1% compared to state-of-the-art model and demonstrates 21.6% faster training time compared to the existing fastest transformer, respectively.

References

[1]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).

[2]

Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).

[3]

Minkang Chai, Fei Xia, Shuotao Hao, Daogang Peng, Chenggang Cui, and Wei Liu. 2019. PV power prediction based on LSTM with adaptive hyperparameter adjustment. IEEE Access 7 (2019), 115473–115486.

[4]

Minghao Chen, Houwen Peng, Jianlong Fu, and Haibin Ling. 2021. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12270–12280.

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[6]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[7]

Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee Keong Kwoh, Xiaoli Li, and Cuntai Guan. 2021. Time-series representation learning via temporal and contextual contrasting. arXiv preprint arXiv:2106.14112 (2021).

[8]

Jean-Yves Franceschi, Aymeric Dieuleveut, and Martin Jaggi. 2019. Unsupervised scalable representation learning for multivariate time series. Adv. Neural Inf. Process. Syst. 32 (2019).

[9]

Yoav Goldberg and Omer Levy. 2014. Word2vec Explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014).

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[11]

Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016).

[12]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.

Digital Library

[13]

Ying-Yi Hong, John Joel F. Martinez, and Arnel C. Fajardo. 2020. Day-ahead solar irradiation forecasting utilizing Gramian angular field and convolutional long short-term memory. IEEE Access 8 (2020), 18741–18753.

[14]

Feyza Duman Keles, Pruthuvi Mahesakya Wijewardena, and Chinmay Hegde. 2023. On the computational complexity of self-attention. In Proceedings of the International Conference on Algorithmic Learning Theory. PMLR, 597–619.

[15]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[16]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. PMLR, 1188–1196.

[17]

Gangqiang Li, Huaizhi Wang, Shengli Zhang, Jiantao Xin, and Huichuan Liu. 2019. Recurrent neural networks based photovoltaic power forecasting approach. Energies 12, 13 (2019), 2538.

[18]

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. 2019. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst. 32 (2019).

[19]

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2017. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017).

[20]

Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, and Shih-Fu Chang. 2019. CDSA: Cross-dimensional self-attention for multivariate, geo-tagged time series imputation. arXiv preprint arXiv:1905.09904 (2019).

[21]

Pankaj Malhotra, Vishnu T. V., Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. 2017. TimeNet: Pre-trained deep recurrent neural network for time series classification. arXiv preprint arXiv:1706.08838 (2017).

[22]

Yasuko Matsubara, Yasushi Sakurai, Willem G. Van Panhuis, and Christos Faloutsos. 2014. FUNNEL: Automatic mining of spatially coevolving epidemics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 105–114.

Digital Library

[23]

Manohar Mishra, Pandit Byomakesha Dash, Janmenjoy Nayak, Bighnaraj Naik, and Subrat Kumar Swain. 2020. Deep learning and wavelet transform integrated approach for short-term solar PV power prediction. Measurement 166 (2020), 108250.

[24]

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2022. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730 (2022).

[25]

Seyyed Mostafa Nosratabadi, Rahmat-Allah Hooshmand, and Eskandar Gholipour. 2017. A comprehensive review on microgrid and virtual power plant concepts employed for distributed energy resources scheduling in power systems. Renew. Sustain. Energy Rev. 67 (2017), 341–363.

[26]

Spiros Papadimitriou and Philip Yu. 2006. Optimal multi-scale patterns in time series streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 647–658.

Digital Library

[27]

Biplob Ray, Rakibuzzaman Shah, Md Rabiul Islam, and Syed Islam. 2020. A new data driven long-term solar yield analysis model of photovoltaic power plants. IEEE Access 8 (2020), 136223–136233.

[28]

Matthew Rhudy, Brian Bucci, Jeffrey Vipperman, Jeffrey Allanach, and Bruce Abraham. 2009. Microphone array analysis methods using crosscorrelations. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition, Vol. 43888. 281–288.

[29]

Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A primer in BERTology: What we know about how BERT works. Trans. Assoc. Comput. Ling. 8 (2020), 842–866.

[30]

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533–536.

[31]

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018).

[32]

Huan Song, Deepta Rajan, Jayaraman Thiagarajan, and Andreas Spanias. 2018. Attend and diagnose: Clinical time series analysis using attention models. In Proceedings of the AAAI Conference on Artificial Intelligence 32.

[33]

Stephen M. Stigler. 1989. Francis Galton's account of the invention of correlation. Statist. Sci. (1989), 73–79.

[34]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 27 (2014).

[35]

Sana Tonekaboni, Danny Eytan, and Anna Goldenberg. 2021. Unsupervised representation learning for time series with temporal neighborhood coding. arXiv preprint arXiv:2106.00750 (2021).

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).

[37]

Zhiyuan Wang, Xovee Xu, Weifeng Zhang, Goce Trajcevski, Ting Zhong, and Fan Zhou. 2022. Learning latent seasonal-trend representations for time series forecasting. In Proceedings of the Conference on Advances in Neural Information Processing Systems.

[38]

Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven Hoi. 2022. CoST: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. arXiv preprint arXiv:2202.01575 (2022).

[39]

Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, and Michael Witbrock. 2018. Random warping series: A random features method for time-series embedding. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 793–802.

[40]

Dukhwan Yu, Seowoo Lee, Sangwon Lee, Wonik Choi, and Ling Liu. 2020. Forecasting photovoltaic power generation using satellite images. Energies 13, 24 (2020), 6603.

[41]

Rose Yu, Stephan Zheng, Anima Anandkumar, and Yisong Yue. 2017. Long-term forecasting using tensor-train rnns. arXiv preprint arXiv:1711.00073 (2017).

[42]

Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. 2022. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8980–8987.

[43]

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2022. Are transformers effective for time series forecasting? arXiv preprint arXiv:2205.13504 (2022).

[44]

George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. 2021. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2114–2124.

Digital Library

[45]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11106–11115.

[46]

Hangxia Zhou, Yujin Zhang, Lingfan Yang, Qian Liu, Ke Yan, and Yang Du. 2019. Short-term photovoltaic power forecasting based on long short term memory neural network and attention mechanism. IEEE Access 7 (2019), 78063–78074.

[47]

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning. PMLR, 27268–27286.

[48]

Yunyue Zhu and Dennis Shasha. 2002. StatStream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th International Conference on Very Large Databases (VLDB’02). Elsevier, 358–369.

Index Terms

TS-Fastformer: Fast Transformer for Time-series Forecasting
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by regression

Recommendations

Time-series forecasting using flexible neural tree model

Time-series forecasting is an important research and application area. Much effort has been devoted over the past several decades to develop and improve the time-series forecasting models. This paper introduces a new time-series forecasting model based ...
Read More
Progressive neural network for multi-horizon time series forecasting
Abstract
In this paper, we introduce ProNet, an novel deep learning approach designed for multi-horizon time series forecasting, adaptively blending autoregressive (AR) and non-autoregressive (NAR) strategies. Our method involves dividing the forecasting ...
Highlights
- Forecasting of electricity energy consumption and solar energy generation.
- Informer-based model marrying forecasting horizon segmentation and variational inference techniques.
- Comparing SARIMAX, DeepAR, DeepSSM, LogTrans, N-BEATS, ...
Read More
CAMul: Calibrated and Accurate Multi-view Time-Series Forecasting
WWW '22: Proceedings of the ACM Web Conference 2022

Probabilistic time-series forecasting enables reliable decision making across many domains. Most forecasting problems have diverse sources of data containing multiple modalities and structures. Leveraging information from these data sources for accurate ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 15, Issue 2

April 2024

481 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3613561

Editor:
Huan Liu
Arizona State University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2024

Online AM: 30 October 2023

Accepted: 19 October 2023

Revised: 06 October 2023

Received: 08 December 2022

Published in TIST Volume 15, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea
Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT)
Artificial Intelligence Convergence Innovation Human Resources Development (Inha University)
USA NSF CISE
IBM Faculty Award, and a CISCO Edge AI grant

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
993
Total Downloads

Downloads (Last 12 months)993
Downloads (Last 6 weeks)112

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents