research-article

Transformers in time series: a survey

AUTHORs:

Liang SunAuthors Info & Claims

IJCAI '23: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

Article No.: 759, Pages 6778 - 6786

https://doi.org/10.24963/ijcai.2023/759

Published: 19 August 2023 Publication History

Abstract

Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations. In particular, we examine the development of time series Transformers in two perspectives. From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers in order to accommodate the challenges in time series analysis. From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance.

References

[1]

Ankur Bapna, Mia Xu Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. Training deeper neural machine translation models with transparent attention. In EMNLP, 2018.

[2]

Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix, et al. Deep learning for time series forecasting: Tutorial and literature survey. ACM Computing Surveys, 55(6):1-36, 2022.

Digital Library

[3]

Ane Blázquez-García, Angel Conde, Usue Mori, et al. A review on outlier/anomaly detection in time series data. ACM Computing Surveys, 54(3):1-33, 2021.

Digital Library

[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, et al. Language models are few-shot learners. NeurIPS, 2020.

[5]

Ling Cai, Krzysztof Janowicz, Gengchen Mai, Bo Yan, and Rui Zhu. Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting. Transactions in GIS, 24(3):736-755, 2020.

[6]

Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, et al. Pre-trained image processing transformer. In CVPR, 2021.

[7]

Minghao Chen, Houwen Peng, Jianlong Fu, and Haibin Ling. AutoFormer: Searching transformers for visual recognition. In CVPR, 2021.

[8]

Zekai Chen, Dingshuo Chen, Xiao Zhang, Zixuan Yuan, and Xiuzhen Cheng. Learning graph structures with transformer for multivariate time series anomaly detection in IoT. IEEE Internet of Things Journal, 2021.

[9]

Weiqi Chen, Wenwei Wang, Bingqing Peng, Qingsong Wen, Tian Zhou, and Liang Sun. Learning to rotate: Quaternion transformer for complicated periodical time series forecasting. In KDD, 2022.

Digital Library

[10]

Kukjin Choi, Jihun Yi, Changhwa Park, and Sungroh Yoon. Deep learning for anomaly detection in timeseries data: Review, analysis, and guidelines. IEEE Access, 2021.

[11]

Ranak Roy Chowdhury, Xiyuan Zhang, Jingbo Shang, Rajesh K Gupta, and Dezhi Hong. TARNet: Taskaware reconstruction for time-series transformer. In KDD, 2022.

[12]

Razvan-Gabriel Cirstea, Chenjuan Guo, Bin Yang, Tung Kieu, Xuanyi Dong, and Shirui Pan. Triformer: Triangular, variable-specific attentions for long sequence multivariate time series forecasting. In IJCAI, 2022.

[13]

Robert Cleveland, William Cleveland, Jean McRae, et al. STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1):3-73, 1990.

[14]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc V. Le, et al. Transformer-XL: Attentive language models beyond a fixed-length context. In ACL, 2019.

[15]

Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. Universal transformers. In ICLR, 2019.

[16]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.

[17]

Elsken, Thomas, Jan Hendrik Metzen, and Frank Hutter. Neural architecture search: A survey. Journal of Machine Learning Research, 2019.

[18]

Zhihan Gao, Xingjian Shi, Hao Wang, Yi Zhu, Bernie Wang, Mu Li, et al. Earthformer: Exploring space-time transformers for earth system forecasting. In NeurIPS, 2022.

[19]

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. In ICML, 2017.

Digital Library

[20]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, et al. Generative adversarial nets. NeurIPS, 2014.

Digital Library

[21]

Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Liang Zhang, et al. Pretrained models: Past, present and future. AI Open, 2021.

[22]

Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, et al. A survey on vision transformer. IEEE TPAMI, 45(1):87-110, 2022.

[23]

Rob J Hyndman and Yeasmin Khandakar. Automatic time series forecasting: the forecast package for r. Journal of statistical software, 27:1-22, 2008.

[24]

Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. Deep learning for time series classification: a review. Data mining and knowledge discovery, 2019.

[25]

Guolin Ke, Di He, and Tie-Yan Liu. Rethinking positional encoding in language pre-training. In ICLR, 2021.

[26]

Jacob Devlin Ming-Wei Chang Kenton et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2019.

[27]

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2014.

[28]

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In NeurIPS, 2019.

[29]

Longyuan Li, Jian Yao, Li Wenliang, Tong He, Tianjun Xiao, Junchi Yan, David Wipf, and Zheng Zhang. Grin: Generative relation and intention network for multi-agent trajectory prediction. In NeurIPS, 2021.

[30]

Yuxuan Liang, Yutong Xia, Songyu Ke, Yiwei Wang, Qingsong Wen, Junbo Zhang, Yu Zheng, and Roger Zimmermann. AirFormer: Predicting nationwide air quality in china with transformers. In AAAI, 2023.

Digital Library

[31]

Bryan Lim and Stefan Zohren. Timeseries forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society, 2021.

[32]

Bryan Lim, Sercan Ö Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4):1748-1764, 2021.

[33]

Yang Lin, Irena Koprinska, and Mashud Rana. SSDNet: State space decomposition neural network for time series forecasting. In ICDM, 2021.

[34]

Minghao Liu, Shengqi Ren, Siyuan Ma, Jiahui Jiao, Yizhou Chen, Zhiguang Wang, and Wei Song. Gated transformer networks for multivariate time series classification. arXiv preprint arXiv:2103.14438, 2021.

[35]

Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X. Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In ICLR, 2022.

[36]

Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Non-stationary transformers: Exploring the stationarity in time series forecasting. In NeurIPS, 2022.

[37]

Sachin Mehta, Marjan Ghazvininejad, Srini Iyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. Delight: Deep and light-weight transformer. In ICLR, 2021.

[38]

Hongyuan Mei, Chenghao Yang, and Jason Eisner. Transformer embeddings of irregularly spaced events and their participants. In ICLR, 2022.

[39]

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Longterm forecasting with transformers. In ICLR, 2023.

[40]

Marc Ruswurm and Marco Körner. Self-attention for raw optical satellite time series classification. ISPRS J. Photogramm. Remote Sens., 169:421-435, 11 2020.

[41]

Amin Shabani, Amir Abdi, Lili Meng, and Tristan Sylvain. Scaleformer: iterative multi-scale refining transformers for time series forecasting. In ICLR, 2023.

[42]

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position representations. In NAACL, 2018.

[43]

Oleksandr Shchur, Ali Caner Türkmen, Tim Januschowski, and Stephan Günnemann. Neural temporal point processes: A review. In IJCAI, 2021.

[44]

David So, Quoc Le, and Chen Liang. The evolved transformer. In ICML, 2019.

[45]

Binh Tang and David Matteson. Probabilistic transformer for time series analysis. In NeurIPS, 2021.

[46]

Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. Efficient transformers: A survey. ACM Computing Surveys, 55(6):1-28, 2022.

Digital Library

[47]

José F. Torres, Dalil Hadjout, Abderrazak Sebaa, Francisco Martínez-Álvarez, and Alicia Troncoso. Deep learning for time series forecasting: a survey. Big Data, 2021.

[48]

Shreshth Tuli, Giuliano Casale, and Nicholas R Jennings. TranAD: Deep transformer networks for anomaly detection in multivariate time series data. In VLDB, 2022.

[49]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, et al. Attention is all you need. In NeurIPS, 2017.

Digital Library

[50]

Xiaoxing Wang, Chao Xue, Junchi Yan, Xiaokang Yang, Yonggang Hu, et al. MergeNAS: Merge operations into one for differentiable architecture search. In IJCAI, 2020.

[51]

Xixuan Wang, Dechang Pi, Xiangyan Zhang, et al. Variational transformer-based anomaly detection approach for multivariate time series. Measurement, page 110791, 2022.

[52]

Qingsong Wen, Jingkun Gao, Xiaomin Song, Liang Sun, Huan Xu, et al. RobustSTL: A robust seasonal-trend decomposition algorithm for long time series. In AAAI, 2019.

Digital Library

[53]

Qingsong Wen, Zhe Zhang, Yan Li, and Liang Sun. Fast RobustSTL: Efficient and robust seasonal-trend decomposition for time series with complex patterns. In KDD, 2020.

Digital Library

[54]

Qingsong Wen, Kai He, Liang Sun, Yingying Zhang, Min Ke, et al. RobustPeriod: Time-frequency mining for robust multiple periodicities detection. In SIGMOD, 2021.

Digital Library

[55]

Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue Wang, and Huan Xu. Time series data augmentation for deep learning: A survey. In IJCAI, 2021.

[56]

Qingsong Wen, Linxiao Yang, Tian Zhou, and Liang Sun. Robust time series analysis and applications: An industrial perspective. In KDD, 2022.

Digital Library

[57]

Sifan Wu, Xi Xiao, Qianggang Ding, Peilin Zhao, Ying Wei, and Junzhou Huang. Adversarial sparse transformer for time series forecasting. In NeurIPS, 2020.

[58]

Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, and Song Han. Lite transformer with long-short range attention. In ICLR, 2020.

[59]

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In NeurIPS, 2021.

[60]

Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy J. Lin. DeeBERT: Dynamic early exiting for accelerating bert inference. In ACL, 2020.

[61]

Mingxing Xu, Wenrui Dai, Chunmiao Liu, Xing Gao, Weiyao Lin, Guo-Jun Qi, and Hongkai Xiong. Spatialtemporal transformer networks for traffic flow forecasting. arXiv preprint arXiv:2001.02908, 2020.

[62]

Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Anomaly Transformer: Time series anomaly detection with association discrepancy. In ICLR, 2022.

[63]

Junchi Yan, Hongteng Xu, and Liangda Li. Modeling and applications for temporal point processes. In KDD, 2019.

Digital Library

[64]

Chao-Han Huck Yang, Yun-Yun Tsai, and Pin-Yu Chen. Voice2series: Reprogramming acoustic models for time series classification. In ICML, 2021.

[65]

Cunjun Yu, Xiao Ma, Jiawei Ren, Haiyu Zhao, and Shuai Yi. Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In ECCV, 2020.

Digital Library

[66]

Yuan Yuan and Lei Lin. Self-supervised pretraining of transformers for satellite image time series classification. IEEE J-STARS, 14:474-487, 2020.

[67]

Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, et al. Are transformers universal approximators of sequence-to-sequence functions? In ICLR, 2020.

[68]

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? In AAAI, 2023.

Digital Library

[69]

George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. A transformer-based framework for multivariate time series representation learning. In KDD, 2021.

Digital Library

[70]

Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In ICLR, 2023.

[71]

Qiang Zhang, Aldo Lipani, Omer Kirnap, and Emine Yilmaz. Self-attentive Hawkes process. In ICML, 2020.

[72]

Hongwei Zhang, Yuanqing Xia, et al. Unsupervised anomaly detection in multivariate time series through transformer-based variational autoencoder. In CCDC, 2021.

[73]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence timeseries forecasting. In AAAI, 2021.

[74]

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In ICML, 2022.

[75]

Simiao Zuo, Haoming Jiang, Zichong Li, Tuo Zhao, and Hongyuan Zha. Transformer Hawkes process. In ICML, 2020.

Cited By

Jiang RZheng GLi TYang TWang JLi X(2024)A Survey of Multimodal Controllable Diffusion ModelsJournal of Computer Science and Technology10.1007/s11390-024-3814-039:3(509-541)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s11390-024-3814-0
Wen HLin YXia YWan HWen QZimmermann RLiang YRenz MNascimento M(2023)DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion ModelsProceedings of the 31st ACM International Conference on Advances in Geographic Information Systems10.1145/3589132.3625614(1-12)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3589132.3625614

Index Terms

Transformers in time series: a survey

Index terms have been assigned to the content through auto-classification.

Recommendations

Transformers in Time-Series Analysis: A Tutorial
Abstract
Transformer architectures have widespread applications, particularly in Natural Language Processing and Computer Vision. Recently, Transformers have been employed in various aspects of time-series analysis. This tutorial provides an overview of ...
Time series behavior modeling with digital twin for Internet of Vehicles
Abstract
Electric vehicle (EV) is considered eco-friendly with low carbon emission and maintenance costs. Given the current battery and charging technology, driving experience of EVs relies heavily on the availability and reachability of EV charging ...
Transforming transformers

Use of high-temperature superconducting (HTS) windings may soon turn power transformers into compact high-performers on good terms with the environment. The potential for HTS transformers is being examined in major design and hardware development ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI '23: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

August 2023

7242 pages

ISBN:978-1-956792-03-4

Editor:
Edith Elkind

Copyright © 2023 International Joint Conferences on Artificial Intelligence.

Sponsors

International Joint Conferences on Artifical Intelligence (IJCAI)

Publisher

Unknown publishers

Publication History

Published: 19 August 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiang RZheng GLi TYang TWang JLi X(2024)A Survey of Multimodal Controllable Diffusion ModelsJournal of Computer Science and Technology10.1007/s11390-024-3814-039:3(509-541)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s11390-024-3814-0
Wen HLin YXia YWan HWen QZimmermann RLiang YRenz MNascimento M(2023)DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion ModelsProceedings of the 31st ACM International Conference on Advances in Geographic Information Systems10.1145/3589132.3625614(1-12)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3589132.3625614

View Options

View options

Media

Figures

Other

Tables

View Table of Contents