Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

TS-Fastformer: Fast Transformer for Time-series Forecasting

Published: 22 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Many real-world applications require precise and fast time-series forecasting. Recent trends in time-series forecasting models are shifting from LSTM-based models to Transformer-based models. However, the Transformer-based model has a limited ability to represent sequential relationships in time-series data. In addition, the transformer-based model suffers from slow training and inference speed due to the bottleneck incurred by a deep encoder and step-by-step decoder inference. To address these problems, we propose a time-series forecasting optimized Transformer model, called TS-Fastformer. TS-Fastformer introduces three new optimizations: First, we propose a Sub Window Tokenizer for compressing input in a simple manner. The Sub Window Tokenizer reduces the length of input sequences to mitigate the complexity of self-attention and enables both single and multi-sequence learning. Second, we propose Time-series Pre-trained Encoder to extract effective representations through pre-training. This optimization enables TS-Fastformer to capture both seasonal and trend representations as well as to mitigate bottlenecks of conventional transformer models. Third, we propose the Past Attention Decoder to forecast target by incorporating past long short-term dependency patterns. Furthermore, Past Attention Decoder achieves high performance improvement by removing a trend distribution that changes over a long period. We evaluate the efficiency of our model with extensive experiments using seven real-world datasets and compare our model to six representative time-series forecasting approaches. The results show that the proposed TS-Fastformer reduces MSE by 10.1% compared to state-of-the-art model and demonstrates 21.6% faster training time compared to the existing fastest transformer, respectively.

    References

    [1]
    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
    [2]
    Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).
    [3]
    Minkang Chai, Fei Xia, Shuotao Hao, Daogang Peng, Chenggang Cui, and Wei Liu. 2019. PV power prediction based on LSTM with adaptive hyperparameter adjustment. IEEE Access 7 (2019), 115473–115486.
    [4]
    Minghao Chen, Houwen Peng, Jianlong Fu, and Haibin Ling. 2021. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12270–12280.
    [5]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
    [6]
    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
    [7]
    Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee Keong Kwoh, Xiaoli Li, and Cuntai Guan. 2021. Time-series representation learning via temporal and contextual contrasting. arXiv preprint arXiv:2106.14112 (2021).
    [8]
    Jean-Yves Franceschi, Aymeric Dieuleveut, and Martin Jaggi. 2019. Unsupervised scalable representation learning for multivariate time series. Adv. Neural Inf. Process. Syst. 32 (2019).
    [9]
    Yoav Goldberg and Omer Levy. 2014. Word2vec Explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014).
    [10]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
    [11]
    Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016).
    [12]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.
    [13]
    Ying-Yi Hong, John Joel F. Martinez, and Arnel C. Fajardo. 2020. Day-ahead solar irradiation forecasting utilizing Gramian angular field and convolutional long short-term memory. IEEE Access 8 (2020), 18741–18753.
    [14]
    Feyza Duman Keles, Pruthuvi Mahesakya Wijewardena, and Chinmay Hegde. 2023. On the computational complexity of self-attention. In Proceedings of the International Conference on Algorithmic Learning Theory. PMLR, 597–619.
    [15]
    Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [16]
    Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. PMLR, 1188–1196.
    [17]
    Gangqiang Li, Huaizhi Wang, Shengli Zhang, Jiantao Xin, and Huichuan Liu. 2019. Recurrent neural networks based photovoltaic power forecasting approach. Energies 12, 13 (2019), 2538.
    [18]
    Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. 2019. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst. 32 (2019).
    [19]
    Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2017. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017).
    [20]
    Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, and Shih-Fu Chang. 2019. CDSA: Cross-dimensional self-attention for multivariate, geo-tagged time series imputation. arXiv preprint arXiv:1905.09904 (2019).
    [21]
    Pankaj Malhotra, Vishnu T. V., Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. 2017. TimeNet: Pre-trained deep recurrent neural network for time series classification. arXiv preprint arXiv:1706.08838 (2017).
    [22]
    Yasuko Matsubara, Yasushi Sakurai, Willem G. Van Panhuis, and Christos Faloutsos. 2014. FUNNEL: Automatic mining of spatially coevolving epidemics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 105–114.
    [23]
    Manohar Mishra, Pandit Byomakesha Dash, Janmenjoy Nayak, Bighnaraj Naik, and Subrat Kumar Swain. 2020. Deep learning and wavelet transform integrated approach for short-term solar PV power prediction. Measurement 166 (2020), 108250.
    [24]
    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2022. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730 (2022).
    [25]
    Seyyed Mostafa Nosratabadi, Rahmat-Allah Hooshmand, and Eskandar Gholipour. 2017. A comprehensive review on microgrid and virtual power plant concepts employed for distributed energy resources scheduling in power systems. Renew. Sustain. Energy Rev. 67 (2017), 341–363.
    [26]
    Spiros Papadimitriou and Philip Yu. 2006. Optimal multi-scale patterns in time series streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 647–658.
    [27]
    Biplob Ray, Rakibuzzaman Shah, Md Rabiul Islam, and Syed Islam. 2020. A new data driven long-term solar yield analysis model of photovoltaic power plants. IEEE Access 8 (2020), 136223–136233.
    [28]
    Matthew Rhudy, Brian Bucci, Jeffrey Vipperman, Jeffrey Allanach, and Bruce Abraham. 2009. Microphone array analysis methods using crosscorrelations. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition, Vol. 43888. 281–288.
    [29]
    Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A primer in BERTology: What we know about how BERT works. Trans. Assoc. Comput. Ling. 8 (2020), 842–866.
    [30]
    David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533–536.
    [31]
    Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018).
    [32]
    Huan Song, Deepta Rajan, Jayaraman Thiagarajan, and Andreas Spanias. 2018. Attend and diagnose: Clinical time series analysis using attention models. In Proceedings of the AAAI Conference on Artificial Intelligence 32.
    [33]
    Stephen M. Stigler. 1989. Francis Galton's account of the invention of correlation. Statist. Sci. (1989), 73–79.
    [34]
    Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 27 (2014).
    [35]
    Sana Tonekaboni, Danny Eytan, and Anna Goldenberg. 2021. Unsupervised representation learning for time series with temporal neighborhood coding. arXiv preprint arXiv:2106.00750 (2021).
    [36]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
    [37]
    Zhiyuan Wang, Xovee Xu, Weifeng Zhang, Goce Trajcevski, Ting Zhong, and Fan Zhou. 2022. Learning latent seasonal-trend representations for time series forecasting. In Proceedings of the Conference on Advances in Neural Information Processing Systems.
    [38]
    Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven Hoi. 2022. CoST: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. arXiv preprint arXiv:2202.01575 (2022).
    [39]
    Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, and Michael Witbrock. 2018. Random warping series: A random features method for time-series embedding. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 793–802.
    [40]
    Dukhwan Yu, Seowoo Lee, Sangwon Lee, Wonik Choi, and Ling Liu. 2020. Forecasting photovoltaic power generation using satellite images. Energies 13, 24 (2020), 6603.
    [41]
    Rose Yu, Stephan Zheng, Anima Anandkumar, and Yisong Yue. 2017. Long-term forecasting using tensor-train rnns. arXiv preprint arXiv:1711.00073 (2017).
    [42]
    Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. 2022. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8980–8987.
    [43]
    Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2022. Are transformers effective for time series forecasting? arXiv preprint arXiv:2205.13504 (2022).
    [44]
    George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. 2021. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2114–2124.
    [45]
    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11106–11115.
    [46]
    Hangxia Zhou, Yujin Zhang, Lingfan Yang, Qian Liu, Ke Yan, and Yang Du. 2019. Short-term photovoltaic power forecasting based on long short term memory neural network and attention mechanism. IEEE Access 7 (2019), 78063–78074.
    [47]
    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning. PMLR, 27268–27286.
    [48]
    Yunyue Zhu and Dennis Shasha. 2002. StatStream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th International Conference on Very Large Databases (VLDB’02). Elsevier, 358–369.

    Index Terms

    1. TS-Fastformer: Fast Transformer for Time-series Forecasting

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 2
      April 2024
      481 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3613561
      • Editor:
      • Huan Liu
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 February 2024
      Online AM: 30 October 2023
      Accepted: 19 October 2023
      Revised: 06 October 2023
      Received: 08 December 2022
      Published in TIST Volume 15, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Deep learning
      2. transformer
      3. time-series forecasting
      4. time-series representation

      Qualifiers

      • Research-article

      Funding Sources

      • Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea
      • Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT)
      • Artificial Intelligence Convergence Innovation Human Resources Development (Inha University)
      • USA NSF CISE
      • IBM Faculty Award, and a CISCO Edge AI grant

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 993
        Total Downloads
      • Downloads (Last 12 months)993
      • Downloads (Last 6 weeks)112

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media