Abstract
Transformer architectures have widespread applications, particularly in Natural Language Processing and Computer Vision. Recently, Transformers have been employed in various aspects of time-series analysis. This tutorial provides an overview of the Transformer architecture, its applications, and a collection of examples from recent research in time-series analysis. We delve into an explanation of the core components of the Transformer, including the self-attention mechanism, positional encoding, multi-head, and encoder/decoder. Several enhancements to the initial Transformer architecture are highlighted to tackle time-series tasks. The tutorial also provides best practices and techniques to overcome the challenge of effectively training Transformers for time-series analysis.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The manuscript has no associated data.
References
A.F. Agarap, Deep learning using rectified linear units (relu) (2018). arXiv:1803.08375
S. Ahmed, D. Dera, S.U. Hassan, N. Bouaynaya, G. Rasool, Failure detection in deep neural networks for medical imaging. Front. Med. Technol. (2022). https://doi.org/10.3389/fmedt.2022.919046
S. Albawi, T.A. Mohammed, S. Al-Zawi. Understanding of a convolutional neural network. in 2017 International Conference on Engineering and Technology (ICET) (IEEE, 2017), pp. 1–6. https://doi.org/10.1109/ICEngTechnol.2017.8308186
A.A. Ariyo, A.O. Adewumi, C.K. Ayo, Stock price prediction using the arima model. in 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation (IEEE, 2014), pp. 106–112. https://doi.org/10.1109/UKSim.2014.67
K. ArunKumar, D.V. Kalaga, C.M.S. Kumar, M. Kawaji, T.M. Brenza, Forecasting of COVID-19 using deep layer recurrent neural networks (RNNs) with gated recurrent units (GRUs) and long short-term memory (LSTM) cells. Chaos Solitons Fractals 146, 110861 (2021). https://doi.org/10.1016/j.chaos.2021.110861
J.L. Ba, J.R. Kiros, G.E. Hinton, Layer normalization (2016). arXiv:1607.06450
T. Bachlechner, B.P. Majumder, H. Mao, G. Cottrell, J. McAuley, C. de Campos, M.H. Maathuis, (eds) ReZero is all you need: fast convergence at large depth. in Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence Vol. 161 of Proceedings of Machine Learning Research, ed by C. de Campos, M. H. Maathuis (PMLR, 2021), pp. 1352–1361
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate (2014). arXiv:1409.0473
A. Bapna, M. Chen, O. Firat, Y. Cao, Y. Wu, Training deeper neural machine translation models with transparent attention. in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Brussels, Belgium, 2018), pp. 3028–3033. https://doi.org/10.18653/v1/D18-1338
L. Behera, S. Kumar, A. Patnaik, On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Trans. Neural Netw. 17(5), 1116–1125 (2006). https://doi.org/10.1109/TNN.2006.878121
C. Bergmeir, R.J. Hyndman, B. Koo, A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 120, 70–83 (2018)
L. Cai, K. Janowicz, G. Mai, B. Yan, R. Zhu, Traffic transformer: capturing the continuity and periodicity of time series for traffic forecasting. Trans. GIS 24(3), 736–755 (2020). https://doi.org/10.1111/tgis.12644
Z. Che, S. Purushotham, K. Cho, D. Sontag, Y. Liu, Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 6085 (2018). https://doi.org/10.1038/s41598-018-24271-9
G. Chen, A gentle tutorial of recurrent neural network with error backpropagation (2016). arXiv:1610.02583
K. Chen, et al. NAST: non-autoregressive spatial-temporal transformer for time series forecasting (2021). arXiv:2102.05624
L. Chen et al., Decision transformer: reinforcement learning via sequence modeling. Adv. Neural. Inf. Process. Syst. 34, 15084–15097 (2021)
W. Chen, et al. Learning to rotate: quaternion transformer for complicated periodical time series forecasting. in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22 (Association for Computing Machinery, New York, NY, USA, 2022), pp. 146–156. https://doi.org/10.1145/3534678.3539234
K. Choromanski, et al. Rethinking attention with performers (2020). arXiv:2009.14794
J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). arXiv:1412.3555
Y.N. Dauphin, A. Fan, M. Auli, D. Grangier, D. Precup, Y.W. Teh, Y.W. (eds) Language modeling with gated convolutional networks. in Proceedings of the 34th International Conference on Machine Learning Vol. 70 of Proceedings of Machine Learning Research, ed. by D. Precup, Y.W. Teh (PMLR, 2017), pp. 933–941
D. Dera, S. Ahmed, N.C. Bouaynaya, G. Rasool, Trustworthy uncertainty propagation for sequential time-series analysis in rnns. IEEE Trans. Knowl. Data Eng. (2023). https://doi.org/10.1109/TKDE.2023.3288628
J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference 1(Mlm), pp. 4171–4186 (2019)
L. Di Persio, O. Honchar, Recurrent neural networks approach to the financial forecast of google assets. Int. J. Math. Comput. Simul. 11, 7–13 (2017)
M. Dixon, J. London, Financial forecasting with \(\alpha \)-rnns: a time series modeling approach. Front. Appl. Math. Stat. 6, 551138 (2021). https://doi.org/10.3389/fams.2020.551138
A. Dosovitskiy, et al. An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:2010.11929
D. Dua, C. Graff, UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2017). http://archive.ics.uci.edu/ml
J. El Zini, Y. Rizk, M. Awad, An optimized parallel implementation of non-iteratively trained recurrent neural networks. J. Artif. Intell. Soft Comput. Res. 11(1), 33–50 (2021). https://doi.org/10.2478/jaiscr-2021-0003
H. Fei, F. Tan, Bidirectional grid long short-term memory (bigridlstm): a method to address context-sensitivity and vanishing gradient. Algorithms 11(11), 172 (2018). https://doi.org/10.3390/a11110172
J. Frankle, M. Carbin, The lottery ticket hypothesis: finding sparse, trainable neural networks (2018). arXiv:1803.03635
J. Gehring, M. Auli, D. Grangier, D. Yarats, Y.N. Dauphin, Convolutional sequence to sequence learning. in 34th International Conference on Machine Learning ICML 2017(3), pp. 2029–2042 (2017)
X. Glorot, Y. Bengio, Y.W. Teh, M. Titterington, (eds) Understanding the difficulty of training deep feedforward neural networks. in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics Vol. 9 of Proceedings of Machine Learning Research, ed. by Y.W. Teh, M. Titterington. (PMLR, Chia Laguna Resort, Sardinia, Italy, 2010), pp. 249–256
A. Gupta, A.M. Rush, Dilated convolutions for modeling long-distance genomic dependencies (2017). arXiv:1710.01278
J. Hao, et al. Modeling recurrence for transformer (2019). arXiv:1904.03092
J. Ho, N. Kalchbrenner, D. Weissenborn, T. Salimans, Axial attention in multidimensional transformers (2019). arXiv:1912.12180
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
X.S. Huang, F. Perez, J. Ba, M. Volkovs, H.D. III, A. Singh, (eds) Improving transformer optimization through better initialization. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed. by H.D. III, A. Singh (PMLR, 2020), pp. 4475–4483
Y. Huang, H. Wallach, et al. (eds) GPipe: efficient training of giant neural networks using pipeline parallelism. in Advances in Neural Information Processing Systems, ed. by H. Wallach, et al. Vol. 32. (Curran Associates, Inc., 019)
R. Interdonato, D. Ienco, R. Gaetano, K. Ose, DuPLO: a DUal view Point deep Learning architecture for time series classificatiOn. ISPRS J. Photogramm. Remote. Sens. 149, 91–104 (2019)
H.V. Jagadish et al., Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014). https://doi.org/10.1145/2611567
A.E. Johnson et al., MIMIC-III, a freely accessible critical care database. Sci. data 3(1), 1–9 (2016)
N. Jouppi, C. Young, N. Patil, D. Patterson, Motivation for and evaluation of the first tensor processing unit. IEEE Micro 38(3), 10–19 (2018). https://doi.org/10.1109/MM.2018.032271057
A. Katharopoulos, A. Vyas, N. Pappas, F. Fleuret, III, H.D., A. Singh, (eds) Transformers are RNNs: fast autoregressive transformers with linear attention. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed. by III, H. D., A. Singh (PMLR, 2020), pp. 5156–5165
A. Kirillov, et al. Segment anything (2023). arXiv:2304.02643
N. Kitaev, Ł. Kaiser, A. Levskaya, Reformer: the efficient transformer (2020). arXiv:2001.04451
G. Lai, W.-C. Chang, Y. Yang, H. Liu, Modeling long- and short-term temporal patterns with deep neural networks. in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18 (Association for Computing Machinery, New York, NY, USA, 2018), pp. 95–104. https://doi.org/10.1145/3209978.3210006
C. Li, et al. Automated progressive learning for efficient training of vision transformers. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE Computer Society, Los Alamitos, CA, USA, 2022), pp. 12476–12486. https://doi.org/10.1109/CVPR52688.2022.01216
S. Li, et al. H. Wallach, et al. (eds) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. in Advances in Neural Information Processing Systems, ed by. H. Wallach, et al. Vol. 32 (Curran Associates, Inc., 2019)
Z. Li, et al. III, H. Daumé, S. Aarti (ed.) Train big, then compress: rethinking model size for efficient training and inference of transformers. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed by. III, H. Daumé, S. Aarti (PMLR, 2020), pp. 5958–5968
B. Lim, S. Arık, N. Loeff, T. Pfister, Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 37(4), 1748–1764 (2021). https://doi.org/10.1016/j.ijforecast.2021.03.012
T. Lin, Y. Wang, X. Liu, X. Qiu, A survey of transformers. AI Open 3, 111–132 (2022). https://doi.org/10.1016/j.aiopen.2022.10.001
Z.C. Lipton, J. Berkowitz, C. Elkan, A critical review of recurrent neural networks for sequence learning. (2015). arXiv:1506.00019
A. Liška, G. Kruszewski, M. Baroni, Memorize or generalize? searching for a compositional rnn in a haystack (2018). arXiv:1802.06467
L. Liu, X. Liu, J. Gao, W. Chen, J. Han, Understanding the difficulty of training transformers. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, Online, 2020), pp. 5747–5763. https://doi.org/10.18653/v1/2020.emnlp-main.463
M. Liu, et al. Gated transformer networks for multivariate time series classification. (2021). arXiv:2103.14438
S. Liu, et al. Pyraformer: low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations (2021)
J. Lu, C. Clark, R. Zellers, R. Mottaghi, A. Kembhavi, Unified-IO: a unified model for vision, language, and multi-modal tasks (2022). arXiv:2206.08916
K. Madhusudhanan, J. Burchert, N. Duong-Trung, S. Born, L. Schmidt-Thieme, Yformer: U-net inspired transformer architecture for far horizon time series forecasting (2021). arXiv:2110.08255
T. Mikolov, K. Chen, G. Corrado, J. Dean, efficient estimation of word representations in vector space (2013). arXiv:1301.3781
Y. Nie, N.H. Nguyen, P. Sinthong, J. Kalagnanam, A time series is worth 64 words: long-term forecasting with transformers (2022). arXiv:2211.14730
I.E. Nielsen, D. Dera, G. Rasool, R.P. Ramachandran, N.C. Bouaynaya, Robust explainability: a tutorial on gradient-based attribution methods for deep neural networks. IEEE Signal Process. Mag. 39(4), 73–84 (2022). https://doi.org/10.1109/MSP.2022.3142719
I. Padhi, et al. Tabular transformers for modeling multivariate time series. in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, Toronto, 2021), pp. 3565–3569. https://doi.org/10.1109/ICASSP39728.2021.9414142
R. Pascanu, T. Mikolov, Y. Bengio, S. Dasgupta, D. McAllester, (eds) On the difficulty of training recurrent neural networks. in Proceedings of the 30th International Conference on Machine Learning Vol. 28 of Proceedings of Machine Learning Research ed. by S. Dasgupta, D. McAllester (PMLR, Atlanta, Georgia, USA, 2013), pp. 1310–1318
C. Pelletier, G.I. Webb, F. Petitjean, Temporal convolutional neural network for the classification of satellite image time series. Remote Sens (2019). https://doi.org/10.3390/rs11050523
J. Pennington, R. Socher, C. Manning, GloVe: global vectors for word representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Vol. 31 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2014), pp. 1532–1543. https://doi.org/10.3115/v1/D14-1162
M.E. Peters, et al. Deep contextualized word representations. NAACL HLT 2018—2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference 1, 2227–2237 (2018). https://doi.org/10.18653/v1/n18-1202
M. Popel, O. Bojar, Training tips for the transformer model. Prague Bull. Math. Linguist. 110(1), 43–70 (2018). https://doi.org/10.2478/pralin-2018-0002
X. Qi, et al. From known to unknown: knowledge-guided transformer for time-series sales forecasting in Alibaba (2021). arXiv:2109.08381
Y. Qin, et al. Knowledge inheritance for pre-trained language models (2021). arXiv:2105.13880
A.H. Ribeiro, K. Tiels, L.A. Aguirre, T. & Schön, S. Chiappa, R. Calandra, (eds) Beyond exploding and vanishing gradients: analysing rnn training using attractors and smoothness. in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics Vol. 108 of Proceedings of Machine Learning Research ed. by S. Chiappa, R. Calandra (PMLR, 2020), pp. 2370–2380
A. Roy, M. Saffar, A. Vaswani, D. Grangier, Efficient content-based sparse attention with routing transformers. Trans. Assoc. Comput. Linguist. 9, 53–68 (2021). https://doi.org/10.1162/tacl_a_00353
M. Rußwurm, M. Körner, Self-attention for raw optical satellite time series classification. ISPRS J. Photogramm. Remote. Sens. 169, 421–435 (2020). https://doi.org/10.1016/j.isprsjprs.2020.06.006
D. Salinas, V. Flunkert, J. Gasthaus, T. Januschowski, DeepAR: probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36(3), 1181–1191 (2020)
G. Salton, Some experiments in the generation of word and document associations. in Proceedings of the December 4–6, 1962, Fall Joint Computer Conference, AFIPS ’62 (Fall) (Association for Computing Machinery, New York, NY, USA, 1962), pp. 234–250. https://doi.org/10.1145/1461518.1461544
F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
S.M. Shankaranarayana, D. Runje, Attention augmented convolutional transformer for tabular time-series (IEEE Computer Society, 2021), pp. 537–541. https://doi.org/10.1109/ICDMW53433.2021.00071
L. Shen, Y. Wang, TCCT: tightly-coupled convolutional transformer on time series forecasting. Neurocomputing 480, 131–145 (2022). https://doi.org/10.1016/j.neucom.2022.01.039
A. Sherstinsky, Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020). https://doi.org/10.1016/j.physd.2019.132306
A. Shewalkar, D. Nyavanandi, S.A. Ludwig, Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 9(4), 235–245 (2019). https://doi.org/10.2478/jaiscr-2019-0006
L.N. Smith, A disciplined approach to neural network hyper-parameters: part 1–learning rate, batch size, momentum, and weight decay (2018). arXiv:1212.5701
H. Song, D. Rajan, J. Thiagarajan, A. Spanias, Attend and diagnose: clinical time series analysis using attention models. Proc. AAAI Conf. Artif. Intell. (2018). https://doi.org/10.1609/aaai.v32i1.11635
G. Sugihara, R.M. May, Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature 344(6268), 734–741 (1990)
C. Szegedy, et al. Going deeper with convolutions. in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2015)
S. Takase, N. Okazaki, Positional encoding to control output sequence length (2019). arXiv:1904.07418
Y. Tay, M. Dehghani, D. Bahri, D. Metzler, Efficient transformers: a survey. ACM Comput. Surv. 55(6), 1–28 (2022). https://doi.org/10.1145/3530811
S.J. Taylor, B. Letham, Forecasting at scale. Am. Stat. 72(1), 37–45 (2018). https://doi.org/10.1080/00031305.2017.1380080
S. Tipirneni, C.K. Reddy, Self-supervised transformer for multivariate clinical time-series with missing values. arXiv:2107.14293 (2021)
M. Tschannen, O. Bachem, M. Lucic, Recent advances in autoencoder-based representation learning (2018). arXiv:1812.05069
A. Vaswani et al., Attention is all you need, in Advances in Neural Information Processing Systems 30 (2017)
J. Vig, (2022). BertViz. https://github.com/jessevig/bertviz Accessed 5 May 2022
C.-Y. Wang, et al. CSPNet: a new backbone that can enhance learning capability of CNN. in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE, 2020), pp. 1571–1580
P. Wang, et al. Learning to grow pretrained models for efficient transformer training (2023). arXiv:2303.00980
Q. Wang, et al. Learning deep transformer models for machine translation (2019). arXiv:1906.01787
Z. Wang, W. Yan, T. Oates, Time series classification from scratch with deep neural networks: a strong baseline. in 2017 International joint conference on neural networks (IJCNN) (IEEE, 2017), pp. 1578–1585. https://doi.org/10.1109/IJCNN.2017.7966039
Z. Wang, Y. Ma, Z. Liu, J. Tang, R-Transformer: recurrent neural network enhanced transformer (2019). arXiv:1907.05572
A. Waqas, H. Farooq, N.C. Bouaynaya, G. Rasool, Exploring robust architectures for deep artificial neural networks. Commun. Eng. 1(1), 46 (2022). https://doi.org/10.1038/s44172-022-00043-2
A. Waqas, A. Tripathi, R.P. Ramachandran, P. Stewart, G. Rasool, multimodal data integration for oncology in the era of deep neural networks: a review (2023). arXiv:2303.06471
N. Wu, B. Green, X. Ben, S. O’Banion, Deep transformer models for time series forecasting: the influenza prevalence case (2020). arXiv:2001.08317
S. Wu, et al. H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin, (eds) Adversarial sparse transformer for time series forecasting. in Advances in Neural Information Processing Systems, ed. by Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H., Vol. 33 (Curran Associates, Inc., 2020), pp. 17105–17115
R. Xiong, et al. III, H. D. & Singh, A. (eds) On layer normalization in the transformer architecture. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed. by III, H. D. & Singh, A. (PMLR, 2020), pp. 10524–10533
J. Xu, H. Wu, J. Wang, M. Long, anomaly transformer: time series anomaly detection with association discrepancy (2021). arXiv:2110.02642
P. Xu, et al. Optimizing deeper transformers on small datasets. in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (Association for Computational Linguistics, Online, 2021), pp. 2089–2102. https://doi.org/10.18653/v1/2021.acl-long.163
H. Yang, AliGraph: a comprehensive graph neural network platform. in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19 (Association for Computing Machinery, New York, NY, USA, 2019), pp. 3165–3166. https://doi.org/10.1145/3292500.3340404
Y. You, et al. Large batch optimization for deep learning: training bert in 76 minutes. (2019). arXiv:1904.00962
F. Yu, V. Koltun, T. Funkhouser, Dilated residual networks. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE Computer Society, Los Alamitos, CA, USA, 2017), pp. 636–644. https://doi.org/10.1109/CVPR.2017.75
L. Yuan, et al. Tokens-to-Token ViT: Training vision transformers from scratch on ImageNet. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 538–547 (2021)
Y. Yuan, L. Lin, Self-supervised pretraining of transformers for satellite image time series classification. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 14, 474–487 (2020)
M.D. Zeiler, Adadelta: an adaptive learning rate method arXiv:1212.5701 (2012)
Y. Zhang, J. Yan, Crossformer: transformer utilizing cross-dimension dependency for multivariate time series forecasting. in The Eleventh International Conference on Learning Representations (2023)
J. Zheng, S. Ramasinghe, S. Lucey, Rethinking positional encoding (2021). arXiv:2107.02561
H. Zhou, et al. Informer: beyond efficient transformer for long sequence time-series forecasting Vol. 35, 11106–11115 (2021). https://doi.org/10.1609/aaai.v35i12.17325
T. Zhou, et al. K. Chaudhuri, et al. (eds) FEDformer: fequency enhanced decomposed transformer for long-term series forecasting. in Proceedings of the 39th International Conference on Machine Learning Vol. 162 of Proceedings of Machine Learning Research, ed. by K. Chaudhuri, et al. (PMLR, 2022), pp. 27268–27286
Acknowledgements
This work was partly supported by the National Science Foundation Awards ECCS-1903466, OAC-2008690, and OAC-2234836.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There are no conflicts of interest or competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ahmed, S., Nielsen, I.E., Tripathi, A. et al. Transformers in Time-Series Analysis: A Tutorial. Circuits Syst Signal Process 42, 7433–7466 (2023). https://doi.org/10.1007/s00034-023-02454-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-023-02454-8