Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.24963/ijcai.2023/759guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Transformers in time series: a survey

Published: 19 August 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations. In particular, we examine the development of time series Transformers in two perspectives. From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers in order to accommodate the challenges in time series analysis. From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance.

    References

    [1]
    Ankur Bapna, Mia Xu Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. Training deeper neural machine translation models with transparent attention. In EMNLP, 2018.
    [2]
    Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix, et al. Deep learning for time series forecasting: Tutorial and literature survey. ACM Computing Surveys, 55(6):1-36, 2022.
    [3]
    Ane Blázquez-García, Angel Conde, Usue Mori, et al. A review on outlier/anomaly detection in time series data. ACM Computing Surveys, 54(3):1-33, 2021.
    [4]
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, et al. Language models are few-shot learners. NeurIPS, 2020.
    [5]
    Ling Cai, Krzysztof Janowicz, Gengchen Mai, Bo Yan, and Rui Zhu. Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting. Transactions in GIS, 24(3):736-755, 2020.
    [6]
    Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, et al. Pre-trained image processing transformer. In CVPR, 2021.
    [7]
    Minghao Chen, Houwen Peng, Jianlong Fu, and Haibin Ling. AutoFormer: Searching transformers for visual recognition. In CVPR, 2021.
    [8]
    Zekai Chen, Dingshuo Chen, Xiao Zhang, Zixuan Yuan, and Xiuzhen Cheng. Learning graph structures with transformer for multivariate time series anomaly detection in IoT. IEEE Internet of Things Journal, 2021.
    [9]
    Weiqi Chen, Wenwei Wang, Bingqing Peng, Qingsong Wen, Tian Zhou, and Liang Sun. Learning to rotate: Quaternion transformer for complicated periodical time series forecasting. In KDD, 2022.
    [10]
    Kukjin Choi, Jihun Yi, Changhwa Park, and Sungroh Yoon. Deep learning for anomaly detection in timeseries data: Review, analysis, and guidelines. IEEE Access, 2021.
    [11]
    Ranak Roy Chowdhury, Xiyuan Zhang, Jingbo Shang, Rajesh K Gupta, and Dezhi Hong. TARNet: Taskaware reconstruction for time-series transformer. In KDD, 2022.
    [12]
    Razvan-Gabriel Cirstea, Chenjuan Guo, Bin Yang, Tung Kieu, Xuanyi Dong, and Shirui Pan. Triformer: Triangular, variable-specific attentions for long sequence multivariate time series forecasting. In IJCAI, 2022.
    [13]
    Robert Cleveland, William Cleveland, Jean McRae, et al. STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1):3-73, 1990.
    [14]
    Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc V. Le, et al. Transformer-XL: Attentive language models beyond a fixed-length context. In ACL, 2019.
    [15]
    Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. Universal transformers. In ICLR, 2019.
    [16]
    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
    [17]
    Elsken, Thomas, Jan Hendrik Metzen, and Frank Hutter. Neural architecture search: A survey. Journal of Machine Learning Research, 2019.
    [18]
    Zhihan Gao, Xingjian Shi, Hao Wang, Yi Zhu, Bernie Wang, Mu Li, et al. Earthformer: Exploring space-time transformers for earth system forecasting. In NeurIPS, 2022.
    [19]
    Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. In ICML, 2017.
    [20]
    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, et al. Generative adversarial nets. NeurIPS, 2014.
    [21]
    Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Liang Zhang, et al. Pretrained models: Past, present and future. AI Open, 2021.
    [22]
    Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, et al. A survey on vision transformer. IEEE TPAMI, 45(1):87-110, 2022.
    [23]
    Rob J Hyndman and Yeasmin Khandakar. Automatic time series forecasting: the forecast package for r. Journal of statistical software, 27:1-22, 2008.
    [24]
    Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. Deep learning for time series classification: a review. Data mining and knowledge discovery, 2019.
    [25]
    Guolin Ke, Di He, and Tie-Yan Liu. Rethinking positional encoding in language pre-training. In ICLR, 2021.
    [26]
    Jacob Devlin Ming-Wei Chang Kenton et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2019.
    [27]
    Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2014.
    [28]
    Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In NeurIPS, 2019.
    [29]
    Longyuan Li, Jian Yao, Li Wenliang, Tong He, Tianjun Xiao, Junchi Yan, David Wipf, and Zheng Zhang. Grin: Generative relation and intention network for multi-agent trajectory prediction. In NeurIPS, 2021.
    [30]
    Yuxuan Liang, Yutong Xia, Songyu Ke, Yiwei Wang, Qingsong Wen, Junbo Zhang, Yu Zheng, and Roger Zimmermann. AirFormer: Predicting nationwide air quality in china with transformers. In AAAI, 2023.
    [31]
    Bryan Lim and Stefan Zohren. Timeseries forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society, 2021.
    [32]
    Bryan Lim, Sercan Ö Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4):1748-1764, 2021.
    [33]
    Yang Lin, Irena Koprinska, and Mashud Rana. SSDNet: State space decomposition neural network for time series forecasting. In ICDM, 2021.
    [34]
    Minghao Liu, Shengqi Ren, Siyuan Ma, Jiahui Jiao, Yizhou Chen, Zhiguang Wang, and Wei Song. Gated transformer networks for multivariate time series classification. arXiv preprint arXiv:2103.14438, 2021.
    [35]
    Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X. Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In ICLR, 2022.
    [36]
    Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Non-stationary transformers: Exploring the stationarity in time series forecasting. In NeurIPS, 2022.
    [37]
    Sachin Mehta, Marjan Ghazvininejad, Srini Iyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. Delight: Deep and light-weight transformer. In ICLR, 2021.
    [38]
    Hongyuan Mei, Chenghao Yang, and Jason Eisner. Transformer embeddings of irregularly spaced events and their participants. In ICLR, 2022.
    [39]
    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Longterm forecasting with transformers. In ICLR, 2023.
    [40]
    Marc Ruswurm and Marco Körner. Self-attention for raw optical satellite time series classification. ISPRS J. Photogramm. Remote Sens., 169:421-435, 11 2020.
    [41]
    Amin Shabani, Amir Abdi, Lili Meng, and Tristan Sylvain. Scaleformer: iterative multi-scale refining transformers for time series forecasting. In ICLR, 2023.
    [42]
    Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position representations. In NAACL, 2018.
    [43]
    Oleksandr Shchur, Ali Caner Türkmen, Tim Januschowski, and Stephan Günnemann. Neural temporal point processes: A review. In IJCAI, 2021.
    [44]
    David So, Quoc Le, and Chen Liang. The evolved transformer. In ICML, 2019.
    [45]
    Binh Tang and David Matteson. Probabilistic transformer for time series analysis. In NeurIPS, 2021.
    [46]
    Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. Efficient transformers: A survey. ACM Computing Surveys, 55(6):1-28, 2022.
    [47]
    José F. Torres, Dalil Hadjout, Abderrazak Sebaa, Francisco Martínez-Álvarez, and Alicia Troncoso. Deep learning for time series forecasting: a survey. Big Data, 2021.
    [48]
    Shreshth Tuli, Giuliano Casale, and Nicholas R Jennings. TranAD: Deep transformer networks for anomaly detection in multivariate time series data. In VLDB, 2022.
    [49]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, et al. Attention is all you need. In NeurIPS, 2017.
    [50]
    Xiaoxing Wang, Chao Xue, Junchi Yan, Xiaokang Yang, Yonggang Hu, et al. MergeNAS: Merge operations into one for differentiable architecture search. In IJCAI, 2020.
    [51]
    Xixuan Wang, Dechang Pi, Xiangyan Zhang, et al. Variational transformer-based anomaly detection approach for multivariate time series. Measurement, page 110791, 2022.
    [52]
    Qingsong Wen, Jingkun Gao, Xiaomin Song, Liang Sun, Huan Xu, et al. RobustSTL: A robust seasonal-trend decomposition algorithm for long time series. In AAAI, 2019.
    [53]
    Qingsong Wen, Zhe Zhang, Yan Li, and Liang Sun. Fast RobustSTL: Efficient and robust seasonal-trend decomposition for time series with complex patterns. In KDD, 2020.
    [54]
    Qingsong Wen, Kai He, Liang Sun, Yingying Zhang, Min Ke, et al. RobustPeriod: Time-frequency mining for robust multiple periodicities detection. In SIGMOD, 2021.
    [55]
    Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue Wang, and Huan Xu. Time series data augmentation for deep learning: A survey. In IJCAI, 2021.
    [56]
    Qingsong Wen, Linxiao Yang, Tian Zhou, and Liang Sun. Robust time series analysis and applications: An industrial perspective. In KDD, 2022.
    [57]
    Sifan Wu, Xi Xiao, Qianggang Ding, Peilin Zhao, Ying Wei, and Junzhou Huang. Adversarial sparse transformer for time series forecasting. In NeurIPS, 2020.
    [58]
    Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, and Song Han. Lite transformer with long-short range attention. In ICLR, 2020.
    [59]
    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In NeurIPS, 2021.
    [60]
    Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy J. Lin. DeeBERT: Dynamic early exiting for accelerating bert inference. In ACL, 2020.
    [61]
    Mingxing Xu, Wenrui Dai, Chunmiao Liu, Xing Gao, Weiyao Lin, Guo-Jun Qi, and Hongkai Xiong. Spatialtemporal transformer networks for traffic flow forecasting. arXiv preprint arXiv:2001.02908, 2020.
    [62]
    Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Anomaly Transformer: Time series anomaly detection with association discrepancy. In ICLR, 2022.
    [63]
    Junchi Yan, Hongteng Xu, and Liangda Li. Modeling and applications for temporal point processes. In KDD, 2019.
    [64]
    Chao-Han Huck Yang, Yun-Yun Tsai, and Pin-Yu Chen. Voice2series: Reprogramming acoustic models for time series classification. In ICML, 2021.
    [65]
    Cunjun Yu, Xiao Ma, Jiawei Ren, Haiyu Zhao, and Shuai Yi. Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In ECCV, 2020.
    [66]
    Yuan Yuan and Lei Lin. Self-supervised pretraining of transformers for satellite image time series classification. IEEE J-STARS, 14:474-487, 2020.
    [67]
    Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, et al. Are transformers universal approximators of sequence-to-sequence functions? In ICLR, 2020.
    [68]
    Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? In AAAI, 2023.
    [69]
    George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. A transformer-based framework for multivariate time series representation learning. In KDD, 2021.
    [70]
    Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In ICLR, 2023.
    [71]
    Qiang Zhang, Aldo Lipani, Omer Kirnap, and Emine Yilmaz. Self-attentive Hawkes process. In ICML, 2020.
    [72]
    Hongwei Zhang, Yuanqing Xia, et al. Unsupervised anomaly detection in multivariate time series through transformer-based variational autoencoder. In CCDC, 2021.
    [73]
    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence timeseries forecasting. In AAAI, 2021.
    [74]
    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In ICML, 2022.
    [75]
    Simiao Zuo, Haoming Jiang, Zichong Li, Tuo Zhao, and Hongyuan Zha. Transformer Hawkes process. In ICML, 2020.

    Cited By

    View all
    • (2024)A Survey of Multimodal Controllable Diffusion ModelsJournal of Computer Science and Technology10.1007/s11390-024-3814-039:3(509-541)Online publication date: 1-May-2024
    • (2023)DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion ModelsProceedings of the 31st ACM International Conference on Advances in Geographic Information Systems10.1145/3589132.3625614(1-12)Online publication date: 13-Nov-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    IJCAI '23: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
    August 2023
    7242 pages
    ISBN:978-1-956792-03-4

    Sponsors

    • International Joint Conferences on Artifical Intelligence (IJCAI)

    Publisher

    Unknown publishers

    Publication History

    Published: 19 August 2023

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Survey of Multimodal Controllable Diffusion ModelsJournal of Computer Science and Technology10.1007/s11390-024-3814-039:3(509-541)Online publication date: 1-May-2024
    • (2023)DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion ModelsProceedings of the 31st ACM International Conference on Advances in Geographic Information Systems10.1145/3589132.3625614(1-12)Online publication date: 13-Nov-2023

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media