Dominant Shuffle: A Simple Yet Powerful Data Augmentation for Time-series Prediction
Abstract
Recent studies have suggested frequency-domain Data augmentation (DA) is effective for time series prediction. Existing frequency-domain augmentations disturb the original data with various full-spectrum noises, leading to excess domain gap between augmented and original data. Although impressive performance has been achieved in certain cases, frequency-domain DA has yet to be generalized to time series prediction datasets. In this paper, we found that frequency-domain augmentations can be significantly improved by two modifications that limit the perturbations. First, we found that limiting the perturbation to only dominant frequencies significantly outperforms full-spectrum perturbations. Dominant frequencies represent the main periodicity and trends of the signal and are more important than other frequencies. Second, we found that simply shuffling the dominant frequency components is superior over sophisticated designed random perturbations. Shuffle rearranges the original components (magnitudes and phases) and limits the external noise. With these two modifications, we proposed dominant shuffle, a simple yet effective data augmentation for time series prediction. Our method is very simple yet powerful and can be implemented with just a few lines of code. Extensive experiments with eight datasets and six popular time series models demonstrate that our method consistently improves the baseline performance under various settings and significantly outperforms other DA methods. Code can be accessed at https://kaizhao.net/time-series.
1 Introduction
Time-series prediction aims to forecast multivariate future values based on historical observations. It is a long-standing problem with various applications in electricity pricing, weather forecast, traffic prediction [10, 35]. Recently, impressive results have been achieved by using various deep learning architectures, e.g. recurrent neural networks (RNNs) [21, 22, 14], Transformers [35, 31, 37, 13], and temporal convolutional networks (TCNs) [27, 12, 30]. Neural networks require a large volume of training data to effectively fit their numerous parameters. Unfortunately, time-series data acquired from real-world sensors are often limited in many time-series applications. The patterns of the time series heavily depend on specific dynamic system that generates the data and other data sources are not applicable [3, 23].
To mitigate the impact of insufficient data in time series analysis, several data augmentation techniques have been explored [28, 7, 1, 23, 3, 33, 4, 8, 34, 20, 26, 9, 24, 11, 16, 10]. Most of these data augmentation techniques in time series analysis focus on classification [20, 26, 9, 24, 16, 10, 33, 4] and anomaly detection [11, 10, 8]. These augmentations alter the time series sequences while preserving the class labels. However, the prediction task requires more fine-grained temporal information to accurately estimate future dynamics [34, 3]. These perturbations designed for classification can disrupt the data-label coherence and lead to performance degradation [34, 3].
Coherence is a key factor to effective data augmentation [28, 34, 25]. It measures the semantic connection between the augmented data and the label. These augmentations designed for classification often struggle with prediction tasks, due to unilateral perturbations that disrupt the data-label coherence. Recently, to mitigate the data-label coherence, Chen et. al [3] proposed to simultaneously perturb the data (historical sequence) and labels (future sequences) in the frequency domain. Unlike common data augmentations that introduce slight perturbations only to the data while keeping the labels unchanged, this approach enables more radical perturbations, such as frequency mix (FreqMix) and frequency mask (FreqMask), to be applied without severely disrupting the data-label coherence. And the method indeed generates new data-label pairs that are significantly different from the originals.
However, the full-spectrum perturbations in FreqMask and FreqMix introduce external randomization and reduce the domain gap between the augmented and original data. This can lead to unstable and suboptimal results on some benchmarks, especially with a larger amount of augmented samples. As shown in Fig. 5, the performance of FrAug [3] degrades significantly with the rising number of augmented samples, which demonstrates that the augmented samples are out-of-distribution with the original samples.
In this paper, to reduce the domain gap between the augmented and original data, we propose to limit the perturbation and randomization in data augmentation. First, we limit the perturbation to specific frequencies instead of full-spectrum perturbation. Several recent studies have pointed out that a few frequency components are dominating the periodicity and main trends of the time series. And other Frequencies correspond to minor trends or noise [30, 37, 36]. Following [30], we perturb top- frequencies with highest magnitudes. Second, to avoid excess external noise, we use random shuffle for perturbation. Shuffle rearranges existing components without introducing any external randomness.
Extensive comparisons were made among nine different data augmentation methods on eight public datasets using six state-of-the-art time-series prediction network architectures. These comparisons demonstrate that, despite its simplicity, our method significantly outperforms other competitors by a substantial margin. As shown in Fig. 1, our method consistently improves the performance across various datasets, and outperforms other augmentations in most cases.
Comprehensive ablation studies demonstrate that perturbing dominant frequencies yields significantly better performance than various full-spectrum perturbations. And shuffle is proven to be superior to other randomization techniques. Besides, our augmentation demonstrates improved augmented-original gap over other augmentations, as indicated by higher performance with an increased number of augmented samples ( Fig. 5).
2 Related Work
In the last decade, deep learning has emerged as a powerful tool in time-series prediction and has shown superior performance over traditional statistical methods such as ARIMA and Exponential Smoothing [15]. A rich line of studies has introduced various deep-learning architectures, including recurrent neural networks (RNNs) [21, 22, 14], temporal convolution neural networks (TCNs) [27, 12, 30], and Transformers [31, 17, 18, 13, 37]. These models learn to predict the future from large volumes of historical data.
Various data augmentations have been proposed for time series data and many of these techniques were proposed for the classification tasks [28, 20, 26, 9, 24, 16, 10, 33, 4]. Many of these methods regard time series data as one-dimensional image and borrowed data augmentations, e.g. cropping [9, 5] flipping [28], and noise injection [29], from computer vision. Window warping [28] is a time series-specific data augmentation that upsamples (or downsamples) a random range of the time series while keeping other time ranges unchanged.
In addition to time-domain augmentations, there are also methods that perturb the original data in the frequency domain. Gao [8] proposed to add noise on both magnitude and phase in the frequency domain. Zhang [33] proposed to add single or multiple frequency components in the first half of the frequency spectrum. Chen [4] proposed to perform pooling or smoothing operations in the frequency domain.
While most of the augmentations focus on the classification tasks, a few methods for forecasting task have also been explored. Bandara [1] introduces two DA methods for forecasting : (i) Average selected with distance (ASD), which generates augmented time series using the weighted sum of multiple time series, and the weights are determined by the dynamic time warping (DTW) distance[7]; (ii) Moving block bootstrapping (MBB) generates augmented data by manipulating the residual part of the time series after STL Decomposition [23] and recombining it with the other series. Zhang [34] proposed to simultaneously augment in frequency and time domains. Recently, Chen et. al. [3] proposed to augment both the data (historical sequence) and the label (future sequence) in the frequency domain to improve the data-label coherence. Although this method generally achieves decent results, full-spectrum randomization imposes a large domain gap between the augmented and the original data, sometimes leading to degraded performance.
3 Dominant Frequency Shuffle for Time-series
3.1 Time-series Prediction and Frequency Domain Augmentation
Time-series prediction is a sequence-to-sequence problem where the model estimates a future multivariate sequence based on a sequence of historical measurements. Let be the historical sequence, and is the future sequence to be estimated. is the measurement at timestep and is the number of variates. Next, we will use and to denote the historical and future sequences. and are the input and output of deep learning models, respectively.
3.2 Dominant Frequency Shuffle
Deep neural networks learn the mapping from large volume of pairs, and data augmentation is an efficient way of expanding the training data. Frequency-domain augmentation is a family of augmentation methods that perturb time series in the frequency domain. These methods initially convert time series to the frequency domain, apply perturbations there, and then convert the modified data back to the time domain.
Following FrAug [3], we augmented the concatenation of data and label to preserve the data-label consistency. Let be the discrete Fourier transform (DFT)111 We used the torch.fft.rfft() and torch.fft.irfft() for time-to-frequency and inverse conversions. of the time-series where denotes the concatenation of data and label. is the discrete Fourier transform of . We shuffle only the dominant frequencies with highest magnitudes (). Let be the frequency-domain data with dominant frequencies shuffled, is then converted back to time domain using inverse DFT (iDFT): . Where is the augmented data-label pair. Fig. 2 illustrates an example of the process of dominant shuffle with . The prediction models were trained on a combined training set with both augmented and original data.
4 Experiments
In this section, we first introduce the implementation details in Sec. 4.1, and then compared the performance of various SOTA models with and without dominant shuffle in Sec. 4.2. In Sec. 4.3, we thoroughly compared dominant shuffle with various data augmentation methods. Finally, we conducted ablation studies to verify hyperparameter sensitivity and justify design choices in Sec. 4.4.
4.1 Experimental Setups
Implementation details All the experiments were conducted with the PyTorch [19] framework on a single NVIDIA RTX 3090 GPU. Some of the experimental results were from respective original papers, and some were reproduced using official code with default configurations. We only changed the data augmentation for fair comparisons. Please refer to Sec. A.2 for the details about our reimplementations. Following the practice of [3], we performed data augmentations to double the size of the original training dataset unless otherwise specified.
Evaluation protocols We tested our method with short-term and long-term prediction protocols. In the long-term protocol, the prediction period ranges from 96 to 720, with variations at 96, 192, 336, and 720. In contrast, the short-term protocol has prediction periods ranging from 12 to 48, with variations at 12, 24, 36, and 48. Following the common practice of previous works [35, 31, 37, 13, 27, 30], we quantified the performance of the prediction using the mean-squared error (MSE) between the ground-truth and the prediction.
Datasets For long-term prediction, we experimented on eight well-established benchmarks: the ETT datasets (ETTh1, ETTh2, ETTm1, ETTm2) [35], and the Weather, Electricity, Exchange, and Traffic datasets [31]. For short-term prediction, following iTransformer [13], we used four public traffic network datasets (PEMS03, PEMS04, PEMS07, PEMS08) from PEMS [2].
Each dataset is divided into training, testing, and evaluation subsets in specific ratios. The training, testing, and evaluation ratio is 6:2:2 for ETT and PEMS datasets, and the ratio is 7:1:2 for Electricity, Traffic, Weather, and Exchange-rate datasets. Detailed statistics of these datasets are summarized in Sec. A.1. For each setting (dataset+prediction length ), we tuned the optimal number of dominant frequencies on the evaluation set. The optimal on various datasets can be found in Sec. B.3.
Baseline Models We selected diverse models as the baseline in our experiments, including two Transformer-based (iTransformer [13], Autoformer [31]), two MLP-based methods (TiDE [6], Lightts [32]), and two temporal convolutional network (TCN) based methods (MICN [27], SCINet [12]). iTransformer (Liu et al., 2024) is the state-of-the-art in Transformer-based models, TiDE (Das et al., 2023) is the state-of-the-art MLP-based model, and MICN (Wang et al., 2023) is the state-of-the-art TCN-based model. For short-term prediction, we used the SOTA iTransformer [13] model on PEMS [2] dataset as the baseline model.
Other data augmentation methods We compared the proposed method with nine existing data augmentation methods, including three time-domain augmentations (ASD [7], MSB [1] Upsample [23]), five frequency-domain methods (FreqMix [3], FreqMask [3], FreqAdd [33], FreqPool [4], Robusttad [8]), and a temporal-frequency method STAug [34].
4.2 Comparison With State-of-the-arts
We first compared our method with other state-of-the-art time series prediction models published in top-tier venues. We compared the performance of recent models (iTransformer [13] (ICLR2024), SCINet [12] (NIPS2022) AutoFormer [31] (NIPS2021)) with and without dominant shuffle. The averaged mean squared errors (MSE) across various prediction lengths (96, 192, 336, 720) is calculated for each dataset.
The results in Fig. 3 clearly demonstrate that our method consistently reduces the prediction error for all the cases. In some cases, dominant shuffle surpasses even a highly sophisticated model. For example, on the ETTh1 dataset, our approach significantly improves the performance of AutoFormer [31] and MICN [27], and helps them outperform the latest iTransformer [13] model. On the Exchange and Weather dataset, our approach enables AutoFormer to outperform SCINet [12] and assists MICN [27] in surpassing iTransformer [13]. The results in Fig. 3 clearly demonstrate the significant improvements achieved by our method.
4.3 Comparisons With Other Data Augmentations
We compared different data augmentation methods on various datasets and baseline models under short-term and long-term protocols. Fig. 1 demonstrates the relative improvements (%) of various augmentation methods over the baseline. Tab. 3, 4.3 and 2 summarize the average performance of 5 runs with distinct random seeds, and the standard deviations of different runs can be found in Sec. B.4. The best values in each colume are highlighted with color. Example predictions can be found in Fig. 6 in the appendix B.
We first compared different data augmentation methods for long-term prediction. Sec. 4.3 summarizes the mean squared errors (MSE) on ETT datasets and Tab. 2 summarizes the MSE on Weather, Electricity, and Exchange-rate datasets. Limited by the space, we only reported the results of six subsets (ETTh1, ETTh2, ETTm1, Electricity, Weather, and Exchange rate) in Sec. 4.3 and 2, and the results of the other two subsets (ETTm2 and Traffic) can be found in appendix B. We also merged the results of FreqMix and FreqMask by selecting the superior one in each case. The merged results are denoted as ‘MixMask’.
Method | ETTh1 | ETTh2 | ETTm1 | ||||||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | ||
iTransformer [13] | Baseline | 0.392 | 0.447 | 0.483 | 0.516 | 0.303 | 0.381 | 0.412 | 0.434 | 0.344 | 0.383 | 0.421 | 0.494 |
ASD [7] | 0.398 | 0.456 | 0.483 | 0.512 | 0.310 | 0.388 | 0.432 | 0.452 | 0.340 | 0.382 | 0.454 | 0.492 | |
MSB [1] | 0.387 | 0.460 | 0.494 | 0.531 | 0.309 | 0.382 | 0.447 | 0.433 | 0.339 | 0.386 | 0.467 | 0.510 | |
Upsample [23] | 0.391 | 0.445 | 0.481 | 0.519 | 0.305 | 0.381 | 0.419 | 0.430 | 0.351 | 0.381 | 0.432 | 0.489 | |
FreqAdd [33] | 0.389 | 0.446 | 0.475 | 0.510 | 0.300 | 0.384 | 0.416 | 0.438 | 0.350 | 0.385 | 0.422 | 0.490 | |
FreqPool [4] | 0.433 | 0.456 | 0.497 | 0.532 | 0.313 | 0.392 | 0.415 | 0.450 | 0.347 | 0.392 | 0.430 | 0.499 | |
Robusttad [8] | 0.390 | 0.445 | 0.497 | 0.510 | 0.312 | 0.388 | 0.412 | 0.439 | 0.353 | 0.382 | 0.421 | 0.498 | |
STAug [34] | 0.390 | 0.445 | 0.489 | 0.511 | 0.323 | 0.428 | 0.486 | 0.483 | 0.339 | 0.383 | 0.417 | 0.485 | |
MixMask [3] | 0.388 | 0.440 | 0.477 | 0.504 | 0.301 | 0.380 | 0.414 | 0.434 | 0.334 | 0.375 | 0.421 | 0.485 | |
Ours | 0.383 | 0.438 | 0.473 | 0.492 | 0.298 | 0.382 | 0.411 | 0.428 | 0.332 | 0.374 | 0.424 | 0.492 | |
AutoFormer [31] | Baseline | 0.429 | 0.440 | 0.495 | 0.498 | 0.381 | 0.443 | 0.471 | 0.475 | 0.467 | 0.610 | 0.529 | 0.773 |
ASD | 0.450 | 0.485 | 0.523 | 0.556 | 0.370 | 0.465 | 0.476 | 0.503 | 0.480 | 0.620 | 0.502 | 0.633 | |
MSB | 0.462 | 0.517 | 0.612 | 0.579 | 0.434 | 0.523 | 0.556 | 0.462 | 0.499 | 0.645 | 0.553 | 0.721 | |
Upsample | 0.416 | 0.523 | 0.480 | 0.482 | 0.353 | 0.460 | 0.455 | 0.509 | 0.498 | 0.630 | 0.512 | 0.667 | |
FreqAdd | 0.460 | 0.487 | 0.497 | 0.525 | 0.367 | 0.439 | 0.480 | 0.504 | 0.419 | 0.554 | 0.546 | 0.569 | |
FreqPool | 0.446 | 0.457 | 0.523 | 0.512 | 0.392 | 0.442 | 0.470 | 0.493 | 0.479 | 0.623 | 0.510 | 0.754 | |
Robusttad | 0.437 | 0.452 | 0.492 | 0.477 | 0.367 | 0.497 | 0.502 | 0.527 | 0.432 | 0.510 | 0.553 | 0.623 | |
STAug | 0.429 | 0.478 | 0.505 | 0.506 | 0.354 | 0.443 | 0.496 | 0.495 | 0.415 | 0.581 | 0.588 | 0.693 | |
MixMask | 0.420 | 0.445 | 0.467 | 0.474 | 0.358 | 0.421 | 0.470 | 0.467 | 0.415 | 0.510 | 0.491 | 0.588 | |
Ours | 0.409 | 0.436 | 0.458 | 0.486 | 0.335 | 0.419 | 0.453 | 0.452 | 0.392 | 0.506 | 0.491 | 0.559 | |
MICN [27] | Baseline | 0.384 | 0.425 | 0.464 | 0.574 | 0.358 | 0.518 | 0.566 | 0.827 | 0.313 | 0.360 | 0.389 | 0.461 |
ASD | 0.380 | 0.430 | 0.472 | 0.523 | 0.377 | 0.539 | 0.620 | 0.843 | 0.315 | 0.362 | 0.399 | 0.457 | |
MSB | 0.423 | 0.423 | 0.501 | 0.559 | 0.402 | 0.623 | 0.790 | 1.126 | 0.330 | 0.358 | 0.402 | 0.459 | |
Upsample | 0.396 | 0.435 | 0.463 | 0.550 | 0.366 | 0.500 | 0.831 | 0.752 | 0.339 | 0.377 | 0.402 | 0.475 | |
FreqAdd | 0.390 | 0.430 | 0.477 | 0.643 | 0.370 | 0.521 | 0.626 | 0.975 | 0.316 | 0.360 | 0.407 | 0.478 | |
FreqPool | 0.399 | 0.465 | 0.473 | 0.572 | 0.365 | 0.553 | 0.550 | 0.812 | 0.336 | 0.372 | 0.397 | 0.466 | |
Robusttad | 0.392 | 0.436 | 0.491 | 0.556 | 0.339 | 0.529 | 0.553 | 0.998 | 0.339 | 0.359 | 0.396 | 0.472 | |
STAug | 0.374 | 0.429 | 0.489 | 0.608 | 0.413 | 0.760 | 1.330 | 2.608 | 0.313 | 0.360 | 0.418 | 0.483 | |
MixMask | 0.378 | 0.423 | 0.461 | 0.521 | 0.339 | 0.488 | 0.544 | 0.735 | 0.301 | 0.352 | 0.401 | 0.454 | |
Ours | 0.373 | 0.421 | 0.452 | 0.510 | 0.310 | 0.427 | 0.507 | 0.731 | 0.314 | 0.360 | 0.387 | 0.470 | |
SCINet [12] | Baseline | 0.485 | 0.506 | 0.519 | 0.552 | 0.372 | 0.416 | 0.429 | 0.470 | 0.316 | 0.353 | 0.387 | 0.431 |
ASD | 0.494 | 0.480 | 0.491 | 0.559 | 0.362 | 0.402 | 0.432 | 0.499 | 0.331 | 0.367 | 0.389 | 0.453 | |
MSB | 0.489 | 0.466 | 0.502 | 0.547 | 0.359 | 0.396 | 0.458 | 0.476 | 0.320 | 0.351 | 0.396 | 0.478 | |
Upsample | 0.471 | 0.457 | 0.479 | 0.541 | 0.379 | 0.407 | 0.403 | 0.482 | 0.342 | 0.386 | 0.399 | 0.442 | |
FreqAdd | 0.428 | 0.452 | 0.469 | 0.532 | 0.335 | 0.385 | 0.403 | 0.447 | 0.304 | 0.338 | 0.373 | 0.421 | |
FreqPool | 0.499 | 0.510 | 0.557 | 0.549 | 0.410 | 0.453 | 0.432 | 0.475 | 0.331 | 0.362 | 0.379 | 0.432 | |
Robusttad | 0.462 | 0.501 | 0.498 | 0.559 | 0.362 | 0.431 | 0.419 | 0.496 | 0.331 | 0.351 | 0.394 | 0.438 | |
STAug | 0.457 | 0.500 | 0.524 | 0.534 | 0.538 | 0.636 | 0.681 | 0.648 | 0.319 | 0.357 | 0.389 | 0.445 | |
MixMask | 0.427 | 0.452 | 0.465 | 0.548 | 0.335 | 0.377 | 0.400 | 0.438 | 0.302 | 0.341 | 0.376 | 0.423 | |
Ours | 0.417 | 0.443 | 0.461 | 0.527 | 0.335 | 0.375 | 0.392 | 0.421 | 0.302 | 0.338 | 0.372 | 0.420 | |
TiDE [6] | Baseline | 0.401 | 0.434 | 0.521 | 0.558 | 0.304 | 0.350 | 0.331 | 0.399 | 0.311 | 0.340 | 0.366 | 0.420 |
ASD | 0.417 | 0.441 | 0.513 | 0.556 | 0.320 | 0.351 | 0.367 | 0.422 | 0.319 | 0.341 | 0.399 | 0.432 | |
MSB | 0.422 | 0.476 | 0.529 | 0.579 | 0.331 | 0.379 | 0.334 | 0.401 | 0.302 | 0.356 | 0.382 | 0.451 | |
Upsample | 0.431 | 0.452 | 0.533 | 0.604 | 0.346 | 0.372 | 0.350 | 0.456 | 0.324 | 0.339 | 0.378 | 0.463 | |
FreqAdd | 0.385 | 0.420 | 0.477 | 0.505 | 0.289 | 0.336 | 0.330 | 0.390 | 0.309 | 0.339 | 0.365 | 0.417 | |
FreqPool | 0.423 | 0.455 | 0.510 | 0.592 | 0.312 | .376 | 0.339 | 0.397 | 0.319 | 0.352 | 0.397 | 0.453 | |
Robusttad | 0.396 | 0.432 | 0.521 | 0.537 | 0.331 | 0.352 | 0.337 | 0.398 | 0.321 | 0.346 | 0.382 | 0.437 | |
STAug | 0.515 | 0.535 | 0.521 | 0.558 | 0.390 | 0.437 | 0.403 | 0.508 | 0.310 | 0.337 | 0.364 | 0.417 | |
MixMask | 0.385 | 0.420 | 0.478 | 0.507 | 0.289 | 0.339 | 0.330 | 0.391 | 0.299 | 0.332 | 0.367 | 0.416 | |
Ours | 0.385 | 0.414 | 0.467 | 0.498 | 0.283 | 0.332 | 0.324 | 0.388 | 0.297 | 0.328 | 0.365 | 0.412 | |
LightTS [32] | Baseline | 0.448 | 0.444 | 0.663 | 0.706 | 0.369 | 0.476 | 0.738 | 1.165 | 0.323 | 0.347 | 0.428 | 0.476 |
ASD | 0.451 | 0.476 | 0.633 | 0.681 | 0.392 | 0.469 | 0.701 | 0.998 | 0.356 | 0.352 | 0.441 | 0.478 | |
MSB | 0.467 | 0.463 | 0.627 | 0.652 | 0.378 | 0.472 | 0.652 | 1.123 | 0.371 | 0.349 | 0.430 | 0.479 | |
Upsample | 0.449 | 0.472 | 0.610 | 0.637 | 0.401 | 0.487 | 0.714 | 1.245 | 0.329 | 0.366 | 0.453 | 0.492 | |
FreqAdd | 0.417 | 0.430 | 0.578 | 0.622 | 0.351 | 0.453 | 0.689 | 1.125 | 0.322 | 0.352 | 0.400 | 0.450 | |
FreqPool | 0.463 | 0.471 | 0.652 | 0.690 | 0.369 | 0.512 | 0.723 | 1.264 | 0.336 | 0.351 | 0.442 | 0.497 | |
Robusttad | 0.445 | 0.442 | 0.590 | 0.654 | 0.372 | 0.468 | 0.699 | 0.982 | 0.331 | 0.352 | 0.441 | 0.462 | |
STAug | 0.445 | 0.441 | 0.669 | 0.714 | 0.520 | 0.807 | 2.101 | 2.467 | 0.320 | 0.343 | 0.427 | 0.476 | |
MixMask | 0.417 | 0.429 | 0.575 | 0.620 | 0.337 | 0.426 | 0.643 | 0.993 | 0.316 | 0.340 | 0.398 | 0.447 | |
Ours | 0.405 | 0.423 | 0.565 | 0.603 | 0.335 | 0.395 | 0.575 | 0.827 | 0.322 | 0.340 | 0.391 |
As demonstrated in Sec. 4.3 and 2, our method consistently improves the baseline on 96% of the cases, while other augmentation methods, e.g. FreqMix, outperform the baseline for around 87% of the cases.
Method | Electricity | Weather | Exchange Rate | ||||||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | ||
iTransformer [13] | Baseline | 0.152 | 0.159 | 0.179 | 0.230 | 0.175 | 0.224 | 0.281 | 0.362 | 0.086 | 0.180 | 0.335 | 0.856 |
ASD [7] | 0.173 | 0.179 | 0.201 | 0.234 | 0.191 | 0.223 | 0.280 | 0.364 | 0.088 | 0.183 | 0.343 | 0.872 | |
MSB [1] | 0.182 | 0.182 | 0.194 | 0.267 | 0.185 | 0.235 | 0.284 | 0.359 | 0.089 | 0.189 | 0.359 | 0.907 | |
Upsample [23] | 0.166 | 0.188 | 0.216 | 0.221 | 0.204 | 0.257 | 0.291 | 0.373 | 0.086 | 0.180 | 0.338 | 0.834 | |
FreqAdd [33] | 0.150 | 0.157 | 0.172 | 0.204 | 0.181 | 0.230 | 0.285 | 0.362 | 0.087 | 0.181 | 0.333 | 0.837 | |
FreqPool [4] | 0.169 | 0.170 | 0.194 | 0.237 | 0.184 | 0.223 | 0.279 | 0.378 | 0.088 | 0.183 | 0.330 | 0.832 | |
Robusttad [8] | 0.150 | 0.157 | 0.176 | 0.210 | 0.172 | 0.225 | 0.281 | 0.357 | 0.087 | 0.179 | 0.329 | 0.833 | |
STAug [34] | 0.160 | 0.173 | 0.218 | 0.372 | 0.206 | 0.264 | 0.319 | 0.385 | 0.086 | 0.178 | 0.335 | 0.866 | |
MixMask [3] | 0.151 | 0.158 | 0.173 | 0.205 | 0.175 | 0.224 | 0.279 | 0.354 | 0.089 | 0.178 | 0.328 | 0.845 | |
Ours | 0.150 | 0.156 | 0.171 | 0.199 | 0.171 | 0.221 | 0.276 | 0.351 | 0.086 | 0.176 | 0.313 | 0.821 | |
AutoFormer [31] | Baseline | 0.203 | 0.208 | 0.231 | 0.239 | 0.241 | 0.314 | 0.341 | 0.425 | 0.143 | 0.305 | 0.470 | 1.056 |
ASD | 0.247 | 0.216 | 0.221 | 0.235 | 0.652 | 0.392 | 0.416 | 0.513 | 0.141 | 0.280 | 0.579 | 1.240 | |
MSB | 0.237 | 0.256 | 0.295 | 0.236 | 0.256 | 0.379 | 0.402 | 0.468 | 0.156 | 0.254 | 0.513 | 1.339 | |
Upsample | 0.201 | 0.209 | 0.232 | 0.268 | 0.281 | 0.294 | 0.329 | 0.385 | 0.141 | 0.292 | 0.553 | 1.295 | |
FreqAdd | 0.193 | 0.197 | 0.212 | 0.225 | 0.255 | 0.323 | 0.370 | 0.419 | 0.143 | 0.369 | 0.716 | 1.173 | |
FreqPool | 0.213 | 0.224 | 0.234 | 0.257 | 0.237 | 0.339 | 0.372 | 0.446 | 0.142 | 0.336 | 0.532 | 1.014 | |
Robusttad | 0.230 | 0.242 | 0.261 | 0.231 | 0.27 | 0.334 | 0.351 | 0.429 | 0.142 | 0.309 | 0.462 | 1.123 | |
STAug | 0.191 | 0.206 | 0.217 | 0.234 | 0.250 | 0.300 | 0.347 | 0.418 | 0.140 | 0.326 | 0.594 | 1.176 | |
MixMask | 0.177 | 0.194 | 0.206 | 0.224 | 0.240 | 0.302 | 0.330 | 0.422 | 0.141 | 0.284 | 0.453 | 0.778 | |
Ours | 0.171 | 0.191 | 0.203 | 0.219 | 0.214 | 0.273 | 0.327 | 0.383 | 0.136 | 0.243 | 0.418 | 0.695 | |
MICN [27] | Baseline | 0.171 | 0.183 | 0.198 | 0.224 | 0.188 | 0.241 | 0.278 | 0.350 | 0.091 | 0.185 | 0.355 | 0.941 |
ASD | 0.165 | 0.174 | 0.190 | 0.237 | 0.189 | 0.242 | 0.276 | 0.354 | 0.087 | 0.175 | 0.337 | 1.203 | |
MSB | 0.179 | 0.182 | 0.201 | 0.225 | 0.201 | 0.250 | 0.291 | 0.365 | 0.088 | 0.176 | 0.360 | 0.995 | |
Upsample | 0.182 | 0.180 | 0.203 | 0.220 | 0.193 | 0.249 | 0.279 | 0.372 | 0.084 | 0.171 | 0.313 | 0.702 | |
FreqAdd | 0.160 | 0.169 | 0.182 | 0.199 | 0.180 | 0.234 | 0.282 | 0.350 | 0.087 | 0.174 | 0.349 | 0.923 | |
FreqPool | 0.182 | 0.203 | 0.241 | 0.256 | 0.192 | 0.257 | 0.278 | 0.351 | 0.089 | 0.179 | 0.394 | 0.923 | |
Robusttad | 0.179 | 0.220 | 0.234 | 0.227 | 0.192 | 0.239 | 0.292 | 0.343 | 0.085 | 0.179 | 0.336 | 0.932 | |
STAug | 0.180 | 0.195 | 0.210 | 0.224 | 0.272 | 0.356 | 0.433 | 0.559 | 0.092 | 0.183 | 0.313 | 0.790 | |
MixMask | 0.159 | 0.165 | 0.178 | 0.195 | 0.185 | 0.239 | 0.281 | 0.344 | 0.086 | 0.174 | 0.337 | 0.796 | |
Ours | 0.157 | 0.168 | 0.178 | 0.211 | 0.179 | 0.232 | 0.275 | 0.342 | 0.084 | 0.169 | 0.303 | 0.750 | |
SCINet [12] | Baseline | 0.212 | 0.237 | 0.255 | 0.286 | 0.229 | 0.282 | 0.334 | 0.402 | 0.099 | 0.191 | 0.356 | 0.916 |
ASD | 0.229 | 0.241 | 0.239 | 0.282 | 0.254 | 0.276 | 0.356 | 0.462 | 0.095 | 0.204 | 0.379 | 1.230 | |
MSB | 0.232 | 0.237 | 0.228 | 0.274 | 0.279 | 0.265 | 0.374 | 0.454 | 0.093 | 0.267 | 0.402 | 0.965 | |
Upsample | 0.250 | 0.232 | 0.271 | 0.309 | 0.243 | 0.299 | 0.361 | 0.431 | 0.092 | 0.196 | 0.311 | 0.932 | |
FreqAdd | 0.176 | 0.195 | 0.212 | 0.237 | 0.208 | 0.258 | 0.309 | 0.385 | 0.092 | 0.186 | 0.343 | 0.920 | |
FreqPool | 0.230 | 0.221 | 0.242 | 0.339 | 0.261 | 0.290 | 0.337 | 0.456 | 0.096 | 0.183 | 0.551 | 0.938 | |
Robusttad | 0.189 | 0.202 | 0.210 | 0.243 | 0.229 | 0.281 | 0.331 | 0.410 | 0.093 | 0.186 | 0.334 | 0.957 | |
STAug | 0.210 | 0.239 | 0.282 | 0.411 | 0.277 | 0.329 | 0.372 | 0.435 | 0.098 | 0.191 | 0.342 | 0.931 | |
MixMask | 0.171 | 0.188 | 0.204 | 0.230 | 0.205 | 0.250 | 0.310 | 0.374 | 0.093 | 0.179 | 0.336 | 0.928 | |
Ours | 0.172 | 0.188 | 0.200 | 0.225 | 0.197 | 0.246 | 0.299 | 0.379 | 0.091 | 0.175 | 0.342 | 0.890 | |
TiDE [6] | Baseline | 0.207 | 0.197 | 0.211 | 0.238 | 0.177 | 0.220 | 0.265 | 0.323 | 0.093 | 0.184 | 0.330 | 0.860 |
ASD | 0.232 | 0.220 | 0.231 | 0.265 | 0.189 | 0.221 | 0.297 | 0.332 | 0.095 | 0.206 | 0.351 | 0.962 | |
MSB | 0.210 | 0.219 | 0.253 | 0.261 | 0.199 | 0.254 | 0.273 | 0.339 | 0.092 | 0.179 | 0.358 | 0.941 | |
Upsample | 0.206 | 0.199 | 0.223 | 0.274 | 0.203 | 0.267 | 0.331 | 0.355 | 0.091 | 0.182 | 0.331 | 0.852 | |
FreqAdd | 0.150 | 0.163 | 0.177 | 0.209 | 0.173 | 0.216 | 0.263 | 0.322 | 0.088 | 0.180 | 0.330 | 0.848 | |
FreqPool | 0.224 | 0.238 | 0.233 | 0.270 | 0.189 | 0.224 | 0.292 | 0.334 | 0.092 | 0.334 | 0.521 | 1.124 | |
Robusttad | 0.176 | 0.166 | 0.182 | 0.229 | 0.182 | 0.231 | 0.279 | 0.330 | 0.099 | 0.232 | 0.331 | 0.924 | |
STAug | 0.230 | 0.210 | 0.192 | 0.225 | 0.205 | 0.247 | 0.292 | 0.364 | 0.092 | 0.184 | 0.330 | 0.859 | |
MixMask | 0.143 | 0.155 | 0.164 | 0.210 | 0.173 | 0.216 | 0.263 | 0.323 | 0.089 | 0.180 | 0.329 | 0.861 | |
Ours | 0.143 | 0.150 | 0.165 | 0.202 | 0.177 | 0.219 | 0.261 | 0.322 | 0.088 | 0.179 | 0.324 | 0.847 | |
LightTS [32] | Baseline | 0.210 | 0.169 | 0.182 | 0.212 | 0.168 | 0.210 | 0.260 | 0.320 | 0.139 | 0.252 | 0.412 | 0.840 |
ASD | 0.225 | 0.179 | 0.198 | 0.232 | 0.179 | 0.210 | 0.271 | 0.321 | 0.132 | 0.320 | 0.436 | 1.036 | |
MSB | 0.233 | 0.182 | 0.204 | 0.228 | 0.170 | 0.214 | 0.259 | 0.332 | 0.117 | 0.294 | 0.502 | 0.964 | |
Upsample | 0.246 | 0.179 | 0.211 | 0.254 | 0.182 | 0.223 | 0.257 | 0.336 | 0.099 | 0.251 | 0.369 | 0.702 | |
FreqAdd | 0.213 | 0.159 | 0.177 | 0.210 | 0.164 | 0.207 | 0.258 | 0.317 | 0.098 | 0.522 | 0.565 | 1.583 | |
FreqPool | 0.219 | 0.174 | 0.197 | 0.236 | 0.193 | 0.254 | 0.267 | 0.339 | 0.099 | 0.275 | 0.394 | 0.793 | |
Robusttad | 0.212 | 0.169 | 0.181 | 0.223 | 0.172 | 0.223 | 0.259 | 0.324 | 0.092 | 0.279 | 0.451 | 0.796 | |
STAug | 0.224 | 0.267 | 0.294 | 0.351 | 0.214 | 0.263 | 0.382 | 0.371 | 0.096 | 0.212 | 0.380 | 0.690 | |
MixMask | 0.192 | 0.158 | 0.175 | 0.211 | 0.163 | 0.206 | 0.257 | 0.318 | 0.099 | 0.384 | 0.518 | 0.774 | |
Ours | 0.210 | 0.156 | 0.173 | 0.206 | 0.165 | 0.205 | 0.249 | 0.312 | 0.088 | 0.243 | 0.361 |
Our method also outperforms other augmentation methods on more than 77% of the cases. Moreover, our method achieves larger relative improvements as the prediction length increased, highlighting its strong capacity in long-term predictions. Tab. 3 summarizes the MSE of short-term prediction using the iTransformer [13] model on the PEMS datasets [2]. The prediction errors are generally lower than the errors in long-term prediction. Our method outperforms other augmentations in most cases, although the improvements are marginal compared to long-term prediction. This is because short-term prediction is relatively easy, and the performance has already reached saturation.
Methods | PEMS03 | PEMS04 | PEMS07 | |||||||||
12 | 24 | 36 | 48 | 12 | 24 | 36 | 48 | 12 | 24 | 36 | 48 | |
Baseline | 0.070 | 0.097 | 0.134 | 0.164 | 0.088 | 0.124 | 0.160 | 0.196 | 0.067 | 0.097 | 0.128 | 0.156 |
ASD [7] | 0.072 | 0.096 | 0.152 | 0.239 | 0.098 | 0.132 | 0.156 | 0.190 | 0.069 | 0.099 | 0.154 | 0.181 |
MSB [1] | 0.096 | 0.131 | 0.129 | 0.214 | 0.087 | 0.134 | 0.167 | 0.219 | 0.098 | 0.096 | 0.137 | 0.165 |
Upsample [23] | 0.069 | 0.096 | 0.128 | 0.179 | 0.087 | 0.124 | 0.158 | 0.199 | 0.072 | 0.099 | 0.127 | 0.155 |
FreqAdd [33] | 1.036 | 0.104 | 0.251 | 0.362 | 0.088 | 0.125 | 0.159 | 0.201 | 0.067 | 0.097 | 0.127 | 0.155 |
FreqPool [4] | 1.234 | 0.178 | 0.296 | 0.451 | 0.099 | 0.145 | 0.178 | 0.226 | 0.079 | 0.104 | 0.152 | 0.172 |
Robusttad [8] | 0.082 | 0.098 | 0.132 | 1.520 | 0.089 | 0.123 | 0.161 | 0.195 | 0.067 | 0.097 | 0.129 | 0.157 |
STAug [34] | 0.079 | 0.112 | 0.195 | 0.456 | 0.087 | 0.120 | 0.162 | 0.304 | 0.066 | 0.096 | 0.132 | 0.165 |
Mask [3] | 0.443 | 1.205 | 0.233 | 1.510 | 0.086 | 0.119 | 0.158 | 0.346 | 0.065 | 0.095 | 0.125 | 0.156 |
Mix [3] | 1.018 | 0.097 | 0.877 | 1.501 | 0.085 | 0.119 | 0.154 | 0.205 | 0.065 | 0.094 | 0.134 | 0.152 |
Ours | 0.067 | 0.095 | 0.126 | 0.235 | 0.085 | 0.118 | 0.149 | 0.182 | 0.065 | 0.094 | 0.123 |
4.4 Ablation Study
Our method includes a hyper-parameter and two unique designs: 1) perturb the dominant frequencies and 2) shuffle the dominant frequency components. We conducted ablation studies to investigate the impact of hyperparameters and to justify our design choices.
4.4.1 Number of Dominant Frequencies
4.4.2 Shuffle the Dominant Frequencies
In this experiment, we compared the combination of different perturbation strategies and operations.
We first compared perturbing different frequency proportions including dominant frequencies, minor frequencies, and the full spectrum. The results in Tab. 4 clearly indicate that perturbing the dominant frequencies significantly outperforms other options, while perturbing the minor frequencies yields the worst performance. Tab. 5 compares different perturbation operations including masking [3], adding noise [8, 10], randomization, and shuffling (ours). Shuffle consistently surpasses other operations in most of the cases.
ETTh1 | ETTm2 | Weather | ||||||||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | |||
iTrans [13] | Shuffle | full | 0.391 | 0.447 | 0.486 | 0.509 | 0.182 | 0.247 | 0.311 | 0.403 | 0.175 | 0.223 | 0.278 | 0.355 |
min | 0.389 | 0.445 | 0.494 | 0.505 | 0.181 | 0.251 | 0.310 | 0.413 | 0.174 | 0.225 | 0.282 | 0.355 | ||
dom | 0.383 | 0.438 | 0.473 | 0.492 | 0.178 | 0.246 | 0.309 | 0.409 | 0.171 | 0.221 | 0.276 | 0.351 | ||
\cdashline2-15 | Mask | full | 0.390 | 0.442 | 0.475 | 0.503 | 0.179 | 0.251 | 0.311 | 0.411 | 0.178 | 0.228 | 0.284 | 0.359 |
min | 0.389 | 0.444 | 0.487 | 0.499 | 0.183 | 0.252 | 0.311 | 0.412 | 0.180 | 0.226 | 0.282 | 0.361 | ||
dom | 0.388 | 0.442 | 0.486 | 0.505 | 0.180 | 0.251 | 0.309 | 0.410 | 0.173 | 0.224 | 0.280 | 0.356 | ||
MICN [27] | Shuffle | full | 0.385 | 0.427 | 0.466 | 0.604 | 0.184 | 0.293 | 0.375 | 0.594 | 0.182 | 0.239 | 0.280 | 0.348 |
min | 0.390 | 0.430 | 0.480 | 0.565 | 0.191 | 0.281 | 0.365 | 0.580 | 0.197 | 0.236 | 0.283 | 0.349 | ||
dom | 0.373 | 0.421 | 0.452 | 0.510 | 0.174 | 0.263 | 0.348 | 0.502 | 0.179 | 0.232 | 0.275 | 0.342 | ||
\cdashline2-15 | Mask | full | 0.381 | 0.424 | 0.460 | 0.543 | 0.184 | 0.265 | 0.353 | 0.510 | 0.190 | 0.236 | 0.281 | 0.345 |
min | 0.385 | 0.426 | 0.472 | 0.553 | 0.187 | 0.276 | 0.359 | 0.542 | 0.179 | 0.240 | 0.281 | 0.344 | ||
dom | 0.377 | 0.421 | 0.454 | 0.543 | 0.175 | 0.268 | 0.337 | 0.505 | 0.178 | 0.239 | 0.283 | 0.342 | ||
Lightts [32] | Shuffle | full | 0.415 | 0.426 | 0.577 | 0.621 | 0.202 | 0.235 | 0.325 | 0.445 | 0.163 | 0.205 | 0.251 | 0.317 |
min | 0.418 | 0.432 | 0.577 | 0.619 | 0.206 | 0.239 | 0.326 | 0.444 | 0.164 | 0.212 | 0.259 | 0.317 | ||
dom | 0.405 | 0.423 | 0.565 | 0.603 | 0.195 | 0.245 | 0.312 | 0.422 | 0.165 | 0.205 | 0.249 | 0.312 | ||
\cdashline2-15 | Mask | full | 0.418 | 0.432 | 0.573 | 0.621 | 0.204 | 0.238 | 0.321 | 0.435 | 0.163 | 0.206 | 0.258 | 0.317 |
min | 0.419 | 0.433 | 0.578 | 0.621 | 0.205 | 0.233 | 0.324 | 0.452 | 0.163 | 0.208 | 0.260 | 0.317 | ||
dom | 0.418 | 0.424 | 0.579 | 0.618 | 0.198 | 0.240 | 0.312 | 0.430 | 0.162 | 0.201 | 0.250 |
ETTh1 | ETTm2 | Weather | |||||||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | ||
iTrans [13] | Mask | 0.388 | 0.442 | 0.486 | 0.505 | 0.180 | 0.251 | 0.309 | 0.410 | 0.173 | 0.224 | 0.280 | 0.356 |
Noise | 0.387 | 0.445 | 0.482 | 0.510 | 0.180 | 0.256 | 0.312 | 0.409 | 0.177 | 0.222 | 0.281 | 0.359 | |
Random | 0.386 | 0.440 | 0.479 | 0.499 | 0.183 | 0.254 | 0.311 | 0.407 | 0.171 | 0.222 | 0.280 | 0.358 | |
Shuffle | 0.383 | 0.438 | 0.473 | 0.492 | 0.178 | 0.246 | 0.309 | 0.409 | 0.171 | 0.221 | 0.276 | 0.351 | |
MICN [27] | Mask | 0.377 | 0.421 | 0.454 | 0.543 | 0.175 | 0.268 | 0.337 | 0.505 | 0.178 | 0.239 | 0.283 | 0.342 |
Noise | 0.393 | 0.430 | 0.479 | 0.531 | 0.201 | 0.331 | 0.366 | 0.561 | 0.201 | 0.236 | 0.281 | 0.351 | |
Random | 0.381 | 0.423 | 0.476 | 0.670 | 0.183 | 0.284 | 0.367 | 0.614 | 0.182 | 0.233 | 0.282 | 0.349 | |
Shuffle | 0.373 | 0.421 | 0.452 | 0.510 | 0.174 | 0.263 | 0.348 | 0.502 | 0.179 | 0.232 | 0.275 | 0.342 | |
Lightts [32] | Mask | 0.418 | 0.424 | 0.579 | 0.618 | 0.198 | 0.240 | 0.312 | 0.430 | 0.162 | 0.201 | 0.250 | 0.317 |
Noise | 0.432 | 0.451 | 0.566 | 0.636 | 0.221 | 0.236 | 0.351 | 0.433 | 0.169 | 0.219 | 0.259 | 0.321 | |
Random | 0.414 | 0.431 | 0.570 | 0.610 | 0.206 | 0.244 | 0.324 | 0.442 | 0.171 | 0.213 | 0.263 | 0.323 | |
Shuffle | 0.405 | 0.423 | 0.565 | 0.603 | 0.195 | 0.245 | 0.312 | 0.422 | 0.165 | 0.205 | 0.249 |
The results in Tab. 4 and 5 justified the design decisions in dominant shuffle and confirm that both perturbing dominant frequencies and the shuffle operation is superior to other alternatives. More details about the experiments, including how we defined minor frequencies and we implemented mask, noise, and randomization perturbations can be found in Sec. A.2.
4.4.3 Different Augmentation Sizes
In prior experiments, we explored data augmentation that doubled the original datasets. In this experiment, we assessed the performance of various augmentation sizes. The performance with a larger augmentation size reflects the domain gap between augmented and original data. A larger augmentation size indicates more augmented samples in the training set. If these augmented samples are out of distribution compared to the original data, larger augmentation sizes could lead to degraded performance due to a training/test gap.
As shown in Fig. 5, the performance of FreqMix and FreqMask declines significantly after an augmentation size of two. This is due to the domain gap between augmented and original data. Our method is slightly impacted by augmentation size, and even benefits from larger augmentation sizes on the Weather dataset. The results in Fig. 5 reveal a smaller augmented-original gap of our method.
5 Conclusion
We proposed the dominant shuffle, a simple yet highly effective data augmentation technique for time series prediction. Our method mitigates the domain gap between augmented and original data by limiting the perturbation to dominant frequencies, and uses shuffles to avoid external noises. Although being simple and effective, our method is proposed primarily based on heuristics and lacks theoretical explainability. Instead of theoretical justifications, we conducted extensive experiments using a wide range of datasets, baseline models, and augmentation methods to validate its consistent improvements across various configurations. Since dominant shuffle introduces significant perturbation to the original data and therefore disrupts the sample-wise class labels, our method is limited to prediction tasks and cannot be extended to classification tasks. Exploring theoretical justifications and principles of the proposed method would be a promising future direction that helps better understand it.
References
- [1] Kasun Bandara, Hansika Hewamalage, Yuan-Hao Liu, Yanfei Kang, and Christoph Bergmeir. Improving the accuracy of global forecasting models using time series data augmentation. Pattern Recognition, 120:108148, 2021.
- [2] Chao Chen, Karl Petty, Alexander Skabardonis, Pravin Varaiya, and Zhanfeng Jia. Freeway performance measurement system: mining loop detector data. Transportation research record, 1748(1):96–102, 2001.
- [3] Muxi Chen, Zhijian Xu, Ailing Zeng, and Qiang Xu. Fraug: Frequency domain augmentation for time series forecasting. arXiv preprint arXiv:2302.09292, 2023.
- [4] Xi Chen, Cheng Ge, Ming Wang, and Jin Wang. Supervised contrastive few-shot learning for high-frequency time series. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 7069–7077, 2023.
- [5] Zhicheng Cui, Wenlin Chen, and Yixin Chen. Multi-scale convolutional neural networks for time series classification. arXiv preprint arXiv:1603.06995, 2016.
- [6] Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan K Mathur, Rajat Sen, and Rose Yu. Long-term forecasting with tiDE: Time-series dense encoder. Transactions on Machine Learning Research, 2023.
- [7] Germain Forestier, François Petitjean, Hoang Anh Dau, Geoffrey I Webb, and Eamonn Keogh. Generating synthetic time series to augment sparse datasets. In 2017 IEEE international conference on data mining (ICDM), pages 865–870. IEEE, 2017.
- [8] Jingkun Gao, Xiaomin Song, Qingsong Wen, Pichao Wang, Liang Sun, and Huan Xu. Robusttad: Robust time series anomaly detection via decomposition and convolutional neural networks. In MileTS’20: 6th KDD Workshop on Mining and Learning from Time Series, pages 1–6, 2020.
- [9] Arthur Le Guennec, Simon Malinowski, and Romain Tavenard. Data augmentation for time series classification using convolutional neural networks. In ECML/PKDD workshop on advanced analytics and learning on temporal data, 2016.
- [10] Bryan Lim and Stefan Zohren. Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379(2194):20200209, 2021.
- [11] Swee Kiat Lim, Yi Loo, Ngoc-Trung Tran, Ngai-Man Cheung, Gemma Roig, and Yuval Elovici. Doping: Generative data augmentation for unsupervised anomaly detection with gan. In 2018 IEEE international conference on data mining (ICDM), pages 1122–1127. IEEE, 2018.
- [12] Minhao Liu, Ailing Zeng, Muxi Chen, Zhijian Xu, Qiuxia Lai, Lingna Ma, and Qiang Xu. Scinet: Time series modeling and forecasting with sample convolution and interaction. Advances in Neural Information Processing Systems, 35:5816–5828, 2022.
- [13] Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting. In The Twelfth International Conference on Learning Representations, 2024.
- [14] Yunshan Ma, Yujuan Ding, Xun Yang, Lizi Liao, Wai Keung Wong, and Tat-Seng Chua. Knowledge enhanced neural fashion trend forecasting. In Proceedings of the 2020 international conference on multimedia retrieval, pages 82–90, 2020.
- [15] ED McKenzie. General exponential smoothing and the equivalent arma process. Journal of Forecasting, 3(3):333–344, 1984.
- [16] Gue-Hwan Nam, Seok-Jun Bu, Na-Mu Park, Jae-Yong Seo, Hyeon-Cheol Jo, and Won-Tae Jeong. Data augmentation using empirical mode decomposition on neural networks to classify impact noise in vehicle. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 731–735. IEEE, 2020.
- [17] Zelin Ni, Hang Yu, Shizhan Liu, Jianguo Li, and Weiyao Lin. Basisformer: Attention-based time series forecasting with learnable and interpretable basis. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- [18] Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023.
- [19] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- [20] Hangwei Qian, Tian Tian, and Chunyan Miao. What makes good contrastive learning on small-scale wearable-based tasks? In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 3761–3771, 2022.
- [21] Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. Deep state space models for time series forecasting. Advances in neural information processing systems, 31, 2018.
- [22] David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International journal of forecasting, 36(3):1181–1191, 2020.
- [23] Artemios-Anargyros Semenoglou, Evangelos Spiliotis, and Vassilios Assimakopoulos. Data augmentation for univariate time series forecasting with neural networks. Pattern Recognition, 134:109132, 2023.
- [24] Odongo Steven Eyobu and Dong Seog Han. Feature representation and data augmentation for human activity classification based on wearable imu sensor data using a deep lstm neural network. Sensors, 18(9):2892, 2018.
- [25] Jianhua Sun, Hao-Shu Fang, Yuxuan Li, Runzhong Wang, Minghao Gou, and Cewu Lu. Instaboost++: Visual coherence principles for unified 2d/3d instance level data augmentation. International Journal of Computer Vision, 131(10):2665–2681, 2023.
- [26] Terry T Um, Franz MJ Pfister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, and Dana Kulić. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM international conference on multimodal interaction, pages 216–220, 2017.
- [27] Huiqiang Wang, Jian Peng, Feihu Huang, Jince Wang, Junhui Chen, and Yifei Xiao. MICN: Multi-scale local and global context modeling for long-term series forecasting. In The Eleventh International Conference on Learning Representations, 2023.
- [28] Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue Wang, and Huan Xu. Time series data augmentation for deep learning: A survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 4653–4660. International Joint Conferences on Artificial Intelligence Organization, 8 2021. Survey Track.
- [29] Tailai Wen and Roy Keyes. Time series anomaly detection using convolutional neural networks and transfer learning. In IJCAI Workshop on AI4IoT, 2019.
- [30] Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. In The Eleventh International Conference on Learning Representations, 2023.
- [31] Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems, 34:22419–22430, 2021.
- [32] Tianping Zhang, Yizhuo Zhang, Wei Cao, Jiang Bian, Xiaohan Yi, Shun Zheng, and Jian Li. Less is more: Fast multivariate time series forecasting with light sampling-oriented mlp structures. arXiv preprint arXiv:2207.01186, 2022.
- [33] Xiang Zhang, Ziyuan Zhao, Theodoros Tsiligkaridis, and Marinka Zitnik. Self-supervised contrastive pre-training for time series via time-frequency consistency. Advances in Neural Information Processing Systems, 35:3988–4003, 2022.
- [34] Xiyuan Zhang, Ranak Roy Chowdhury, Jingbo Shang, Rajesh Gupta, and Dezhi Hong. Towards diverse and coherent augmentation for time-series forecasting. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
- [35] Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021.
- [36] Tian Zhou, Ziqing Ma, Qingsong Wen, Liang Sun, Tao Yao, Wotao Yin, Rong Jin, et al. Film: Frequency improved legendre memory model for long-term time series forecasting. Advances in Neural Information Processing Systems, 35:12677–12690, 2022.
- [37] Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International conference on machine learning, pages 27268–27286. PMLR, 2022.
Appendix A More Details
A.1 Datasets
We evaluate the performance of different models and different augmentations for long-term forecasting on 8 well-established datasets, including Weather, Traffic, Electricity, Exchange Rate [31], and ETT datasets (ETTh1, ETTh2, ETTm1, ETTm2) [35]. Furthermore, we adopt PEMS [2] datasets for short-term forecasting. We detail the descriptions of the dataset in Tab. 6.
Dataset | Variates | Prediction length () | Total Length (Train:Validation:Test) | Frequency | Information |
ETTh1,ETTh2 | 7 | {96,192,336,720} | (8545, 2,881, 2,881) | Hourly | Temperature |
ETTm1,ETTm2 | 7 | {96, 192, 336, 720} | (34465, 11521, 11521) | 15min | Temperature |
Exchange | 8 | {96, 192, 336, 720} | (5120, 665, 1422) | Daily | Economy |
Weather | 21 | {96,192,336,720} | (36792, 5271, 10540) | 10min | Weather |
ECL | 321 | {96,192, 336, 720} | (18317, 2633, 5261) | Hourly | Electricity |
Traffic | 862 | {96, 192, 336, 720} | (12185, 1757, 3509) | Hourly | Transportation |
PEMS03 | 358 | {12, 24, 36, 48} | (15617, 5135, 5135) | 5min | Traffic network |
PEMS04 | 307 | {12, 24, 36, 48} | (10172, 3375, 3375) | 5min | Traffic network |
PEMS07 | 883 | {12, 24, 36, 48} | (16911, 5622, 5622) | 5min | Traffic network |
PEMS08 | 170 | {12, 24, 36, 48} | (10690, 3548, 3548) | 5min | Traffic network |
A.2 Implementation Details
A.2.1 Reimplementation other methods
For ASD, MSB, and upsample, we reproduce them based on the descriptions in their original paper [1, 7, 23]. For STAug [34] and MixMask [3], we use their official code. For Robusttad [8], we reproduce it by adding Gaussian noise to the frequency components of a time series. For FreqAdd [33], we perturb a single low-frequency component by setting its magnitude to half of the maximum magnitude. For FreqPool [4], we apply it by maximum pooling of the entire spectrum with size=4. For a fair comparison, all frequency-domain methods target both the data-label pair.
A.2.2 Different perturbations
In our ablation study, we define minor frequencies as other components except for the frequency components with the top 10 magnitudes. In Tab. 4, Mask on the full spectrum is similar to FrAug [3]. Mask on dominant frequencies means mask within frequency components with the top 10 magnitudes, Mask on minor frequencies is the opposite. In Tab. 5, Noise means adding Gaussian noise to the selected frequency components. For Random, we first get the maximum and minimum magnitude of the selected frequency components and then randomly assigned magnitude within the max-min range.
Appendix B More Results
B.1 Full forecasting results
Sec. B.1, B.1 and 9 show the full results of the forecasting task. Specifically, our method improves the performance of iTransformer by 13 in Electricity when the predicted length is 720, and it improves the performance of Autoformer by 28 in ETTm1 when the predicted length is 720. Our method also improves the performance of MICN by 18 in ETTh2 when the predicted length is 192 and the performance of SCINet by 21 in Electricity when the predicted length is 720. Similarly, our method improves the performance of Lightts by 29 in ETTh2 when the predicted length is 720 and the performance of TiDE by 24 in Electricity when the predicted length is 192. It is worth noting that the strong baseline MixMask falls short in Exchange rate, whose main goal is to predict trends. But our method improves the performance of Autoformer by 34 in Exchange rate when the predicted length is 720, and it improves the performance of Lightts by 37 in Exchange rate when the predicted length is 96. These results demonstrate the effectiveness of our method for long-term prediction, as it consistently improves the performance of SOTA methods in different datasets.
Method | ETTh1 | ETTh2 | ETTm1 | ETTm2 | |||||||||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | ||
iTransformer [13] | Baseline | 0.392 | 0.447 | 0.483 | 0.516 | 0.303 | 0.381 | 0.412 | 0.434 | 0.344 | 0.383 | 0.421 | 0.494 | 0.183 | 0.251 | 0.311 | 0.412 |
ASD [7] | 0.398 | 0.456 | 0.483 | 0.512 | 0.310 | 0.388 | 0.432 | 0.452 | 0.340 | 0.382 | 0.454 | 0.492 | 0.199 | 0.254 | 0.341 | 0.423 | |
MSB [1] | 0.387 | 0.460 | 0.494 | 0.531 | 0.309 | 0.382 | 0.447 | 0.433 | 0.339 | 0.386 | 0.467 | 0.510 | 0.187 | 0.267 | 0.332 | 0.452 | |
Upsample [23] | 0.391 | 0.445 | 0.481 | 0.519 | 0.305 | 0.381 | 0.419 | 0.430 | 0.351 | 0.381 | 0.432 | 0.489 | 0.196 | 0.279 | 0.320 | 0.411 | |
FreqAdd [33] | 0.389 | 0.446 | 0.475 | 0.510 | 0.300 | 0.384 | 0.416 | 0.438 | 0.350 | 0.385 | 0.422 | 0.490 | 0.187 | 0.253 | 0.311 | 0.415 | |
FreqPool [4] | 0.433 | 0.456 | 0.497 | 0.532 | 0.313 | 0.392 | 0.415 | 0.450 | 0.347 | 0.392 | 0.430 | 0.499 | 0.187 | 0.256 | 0.324 | 0.449 | |
Robusttad [8] | 0.390 | 0.445 | 0.497 | 0.510 | 0.312 | 0.388 | 0.412 | 0.439 | 0.353 | 0.382 | 0.421 | 0.498 | 0.189 | 0.255 | 0.309 | 0.428 | |
STAug [34] | 0.390 | 0.445 | 0.489 | 0.511 | 0.323 | 0.428 | 0.486 | 0.483 | 0.339 | 0.383 | 0.417 | 0.485 | 0.196 | 0.267 | 0.339 | 0.449 | |
MixMask [3] | 0.388 | 0.440 | 0.477 | 0.504 | 0.301 | 0.380 | 0.414 | 0.434 | 0.334 | 0.375 | 0.421 | 0.485 | 0.178 | 0.248 | 0.311 | 0.407 | |
Ours | 0.383 | 0.438 | 0.473 | 0.492 | 0.298 | 0.382 | 0.411 | 0.428 | 0.332 | 0.374 | 0.424 | 0.492 | 0.178 | 0.246 | 0.309 | 0.409 | |
AutoFormer [31] | Baseline | 0.429 | 0.440 | 0.495 | 0.498 | 0.381 | 0.443 | 0.471 | 0.475 | 0.467 | 0.610 | 0.529 | 0.773 | 0.233 | 0.278 | 0.383 | 0.488 |
ASD | 0.450 | 0.485 | 0.523 | 0.556 | 0.370 | 0.465 | 0.476 | 0.503 | 0.480 | 0.620 | 0.502 | 0.633 | 0.231 | 0.282 | 0.379 | 0.499 | |
MSB | 0.462 | 0.517 | 0.612 | 0.579 | 0.434 | 0.523 | 0.556 | 0.462 | 0.499 | 0.645 | 0.553 | 0.721 | 0.232 | 0.285 | 0.389 | 0.487 | |
Upsample | 0.416 | 0.523 | 0.480 | 0.482 | 0.353 | 0.460 | 0.455 | 0.509 | 0.498 | 0.630 | 0.512 | 0.667 | 0.234 | 0.291 | 0.382 | 0.521 | |
FreqAdd | 0.460 | 0.487 | 0.497 | 0.525 | 0.367 | 0.439 | 0.480 | 0.504 | 0.419 | 0.554 | 0.546 | 0.569 | 0.223 | 0.268 | 0.330 | 0.458 | |
FreqPool | 0.446 | 0.457 | 0.523 | 0.512 | 0.392 | 0.442 | 0.470 | 0.493 | 0.479 | 0.623 | 0.510 | 0.754 | 0.250 | 0.291 | 0.394 | 0.482 | |
Robusttad | 0.437 | 0.452 | 0.492 | 0.477 | 0.367 | 0.497 | 0.502 | 0.527 | 0.432 | 0.510 | 0.553 | 0.623 | 0.235 | 0.291 | 0.375 | 0.478 | |
STAug | 0.429 | 0.478 | 0.505 | 0.506 | 0.354 | 0.443 | 0.496 | 0.495 | 0.415 | 0.581 | 0.588 | 0.693 | 0.224 | 0.291 | 0.338 | 0.431 | |
MixMask | 0.420 | 0.445 | 0.467 | 0.474 | 0.358 | 0.421 | 0.470 | 0.467 | 0.415 | 0.510 | 0.491 | 0.588 | 0.211 | 0.267 | 0.340 | 0.451 | |
Ours | 0.409 | 0.436 | 0.458 | 0.486 | 0.335 | 0.419 | 0.453 | 0.452 | 0.392 | 0.506 | 0.491 | 0.559 | 0.210 | 0.266 | 0.329 | 0.429 | |
MICN [27] | Baseline | 0.384 | 0.425 | 0.464 | 0.574 | 0.358 | 0.518 | 0.566 | 0.827 | 0.313 | 0.360 | 0.389 | 0.461 | 0.200 | 0.282 | 0.375 | 0.606 |
ASD | 0.380 | 0.430 | 0.472 | 0.523 | 0.377 | 0.539 | 0.620 | 0.843 | 0.315 | 0.362 | 0.399 | 0.457 | 0.189 | 0.331 | 0.399 | 0.617 | |
MSB | 0.423 | 0.423 | 0.501 | 0.559 | 0.402 | 0.623 | 0.790 | 1.126 | 0.330 | 0.358 | 0.402 | 0.459 | 0.192 | 0.279 | 0.376 | 0.651 | |
Upsample | 0.396 | 0.435 | 0.463 | 0.550 | 0.366 | 0.500 | 0.831 | 0.752 | 0.339 | 0.377 | 0.402 | 0.475 | 0.203 | 0.291 | 0.372 | 0.595 | |
FreqAdd | 0.390 | 0.430 | 0.477 | 0.643 | 0.370 | 0.521 | 0.626 | 0.975 | 0.316 | 0.360 | 0.407 | 0.478 | 0.176 | 0.273 | 0.378 | 0.614 | |
FreqPool | 0.399 | 0.465 | 0.473 | 0.572 | 0.365 | 0.553 | 0.550 | 0.812 | 0.336 | 0.372 | 0.397 | 0.466 | 0.212 | 0.287 | 0.390 | 0.623 | |
Robusttad | 0.392 | 0.436 | 0.491 | 0.556 | 0.339 | 0.529 | 0.553 | 0.998 | 0.339 | 0.359 | 0.396 | 0.472 | 0.200 | 0.296 | 0.356 | 0.617 | |
STAug | 0.374 | 0.429 | 0.489 | 0.608 | 0.413 | 0.760 | 1.330 | 2.608 | 0.313 | 0.360 | 0.418 | 0.483 | 0.180 | 0.264 | 0.323 | 0.670 | |
MixMask | 0.378 | 0.423 | 0.461 | 0.521 | 0.339 | 0.488 | 0.544 | 0.735 | 0.301 | 0.352 | 0.401 | 0.454 | 0.183 | 0.278 | 0.356 | 0.528 | |
Ours | 0.373 | 0.421 | 0.452 | 0.510 | 0.310 | 0.427 | 0.507 | 0.731 | 0.314 | 0.360 | 0.387 | 0.470 | 0.174 | 0.263 | 0.346 | 0.502 | |
SCINet [12] | Baseline | 0.485 | 0.506 | 0.519 | 0.552 | 0.372 | 0.416 | 0.429 | 0.470 | 0.316 | 0.353 | 0.387 | 0.431 | 0.184 | 0.240 | 0.295 | 0.385 |
ASD | 0.494 | 0.480 | 0.491 | 0.559 | 0.362 | 0.402 | 0.432 | 0.499 | 0.331 | 0.367 | 0.389 | 0.453 | 0.197 | 0.238 | 0.296 | 0.432 | |
MSB | 0.489 | 0.466 | 0.502 | 0.547 | 0.359 | 0.396 | 0.458 | 0.476 | 0.320 | 0.351 | 0.396 | 0.478 | 0.182 | 0.237 | 0.289 | 0.449 | |
Upsample | 0.471 | 0.457 | 0.479 | 0.541 | 0.379 | 0.407 | 0.403 | 0.482 | 0.342 | 0.386 | 0.399 | 0.442 | 0.179 | 0.254 | 0.292 | 0.401 | |
FreqAdd | 0.428 | 0.452 | 0.469 | 0.532 | 0.335 | 0.385 | 0.403 | 0.447 | 0.304 | 0.338 | 0.373 | 0.421 | 0.174 | 0.228 | 0.286 | 0.380 | |
FreqPool | 0.499 | 0.510 | 0.557 | 0.549 | 0.410 | 0.453 | 0.432 | 0.475 | 0.331 | 0.362 | 0.379 | 0.432 | 0.185 | 0.239 | 0.302 | 0.399 | |
Robusttad | 0.462 | 0.501 | 0.498 | 0.559 | 0.362 | 0.431 | 0.419 | 0.496 | 0.331 | 0.351 | 0.394 | 0.438 | 0.182 | 0.247 | 0.299 | 0.402 | |
STAug | 0.457 | 0.500 | 0.524 | 0.534 | 0.538 | 0.636 | 0.681 | 0.648 | 0.319 | 0.357 | 0.389 | 0.445 | 0.323 | 0.407 | 0.514 | 0.668 | |
MixMask | 0.427 | 0.452 | 0.465 | 0.548 | 0.335 | 0.377 | 0.400 | 0.438 | 0.302 | 0.341 | 0.376 | 0.423 | 0.174 | 0.230 | 0.289 | 0.368 | |
Ours | 0.417 | 0.443 | 0.461 | 0.527 | 0.335 | 0.375 | 0.392 | 0.421 | 0.302 | 0.338 | 0.372 | 0.420 | 0.174 | 0.228 | 0.283 | 0.372 | |
TiDE [6] | Baseline | 0.401 | 0.434 | 0.521 | 0.558 | 0.304 | 0.350 | 0.331 | 0.399 | 0.311 | 0.340 | 0.366 | 0.420 | 0.166 | 0.220 | 0.273 | 0.356 |
ASD | 0.417 | 0.441 | 0.513 | 0.556 | 0.320 | 0.351 | 0.367 | 0.422 | 0.319 | 0.341 | 0.399 | 0.432 | 0.177 | 0.241 | 0.291 | 0.371 | |
MSB | 0.422 | 0.476 | 0.529 | 0.579 | 0.331 | 0.379 | 0.334 | 0.401 | 0.302 | 0.356 | 0.382 | 0.451 | 0.182 | 0.232 | 0.287 | 0.359 | |
Upsample | 0.431 | 0.452 | 0.533 | 0.604 | 0.346 | 0.372 | 0.350 | 0.456 | 0.324 | 0.339 | 0.378 | 0.463 | 0.203 | 0.246 | 0.306 | 0.366 | |
FreqAdd | 0.385 | 0.420 | 0.477 | 0.505 | 0.289 | 0.336 | 0.330 | 0.390 | 0.309 | 0.339 | 0.365 | 0.417 | 0.164 | 0.219 | 0.273 | 0.355 | |
FreqPool | 0.423 | 0.455 | 0.510 | 0.592 | 0.312 | .376 | 0.339 | 0.397 | 0.319 | 0.352 | 0.397 | 0.453 | 0.179 | 0.231 | 0.299 | 0.371 | |
Robusttad | 0.396 | 0.432 | 0.521 | 0.537 | 0.331 | 0.352 | 0.337 | 0.398 | 0.321 | 0.346 | 0.382 | 0.437 | 0.180 | 0.225 | 0.282 | 0.371 | |
STAug | 0.515 | 0.535 | 0.521 | 0.558 | 0.390 | 0.437 | 0.403 | 0.508 | 0.310 | 0.337 | 0.364 | 0.417 | 0.222 | 0.343 | 0.515 | 0.847 | |
MixMask | 0.385 | 0.420 | 0.478 | 0.507 | 0.289 | 0.339 | 0.330 | 0.391 | 0.299 | 0.332 | 0.367 | 0.416 | 0.165 | 0.219 | 0.271 | 0.347 | |
Ours | 0.385 | 0.414 | 0.467 | 0.498 | 0.283 | 0.332 | 0.324 | 0.388 | 0.297 | 0.328 | 0.365 | 0.412 | 0.165 | 0.218 | 0.271 | 0.350 | |
LightTS [32] | Baseline | 0.448 | 0.444 | 0.663 | 0.706 | 0.369 | 0.476 | 0.738 | 1.165 | 0.323 | 0.347 | 0.428 | 0.476 | 0.212 | 0.237 | 0.350 | 0.473 |
ASD | 0.451 | 0.476 | 0.633 | 0.681 | 0.392 | 0.469 | 0.701 | 0.998 | 0.356 | 0.352 | 0.441 | 0.478 | 0.258 | 0.251 | 0.351 | 0.483 | |
MSB | 0.467 | 0.463 | 0.627 | 0.652 | 0.378 | 0.472 | 0.652 | 1.123 | 0.371 | 0.349 | 0.430 | 0.479 | 0.236 | 0.242 | 0.359 | 0.471 | |
Upsample | 0.449 | 0.472 | 0.610 | 0.637 | 0.401 | 0.487 | 0.714 | 1.245 | 0.329 | 0.366 | 0.453 | 0.492 | 0.241 | 0.255 | 0.366 | 0.492 | |
FreqAdd | 0.417 | 0.430 | 0.578 | 0.622 | 0.351 | 0.453 | 0.689 | 1.125 | 0.322 | 0.352 | 0.400 | 0.450 | 0.206 | 0.237 | 0.327 | 0.455 | |
FreqPool | 0.463 | 0.471 | 0.652 | 0.690 | 0.369 | 0.512 | 0.723 | 1.264 | 0.336 | 0.351 | 0.442 | 0.497 | 0.233 | 0.259 | 0.372 | 0.453 | |
Robusttad | 0.445 | 0.442 | 0.590 | 0.654 | 0.372 | 0.468 | 0.699 | 0.982 | 0.331 | 0.352 | 0.441 | 0.462 | 0.232 | 0.227 | 0.342 | 0.446 | |
STAug | 0.445 | 0.441 | 0.669 | 0.714 | 0.520 | 0.807 | 2.101 | 2.467 | 0.320 | 0.343 | 0.427 | 0.476 | 0.230 | 0.266 | 0.372 | 0.475 | |
MixMask | 0.417 | 0.429 | 0.575 | 0.620 | 0.337 | 0.426 | 0.643 | 0.993 | 0.316 | 0.340 | 0.398 | 0.447 | 0.199 | 0.233 | 0.322 | 0.440 | |
Ours | 0.405 | 0.423 | 0.565 | 0.603 | 0.335 | 0.395 | 0.575 | 0.827 | 0.322 | 0.340 | 0.391 | 0.440 | 0.195 | 0.245 | 0.312 |
Method | Eletricity | Weather | Exchange Rate | Traffic | |||||||||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | ||
iTransformer [13] | Baseline | 0.152 | 0.159 | 0.179 | 0.230 | 0.175 | 0.224 | 0.281 | 0.362 | 0.086 | 0.180 | 0.335 | 0.856 | 0.399 | 0.418 | 0.428 | 0.463 |
ASD [7] | 0.173 | 0.179 | 0.201 | 0.234 | 0.191 | 0.223 | 0.280 | 0.364 | 0.088 | 0.183 | 0.343 | 0.872 | 0.431 | 0.428 | 0.430 | 0.478 | |
MSB [1] | 0.182 | 0.182 | 0.194 | 0.267 | 0.185 | 0.235 | 0.284 | 0.359 | 0.089 | 0.189 | 0.359 | 0.907 | 0.417 | 0.416 | 0.422 | 0.471 | |
Upsample [23] | 0.166 | 0.188 | 0.216 | 0.221 | 0.204 | 0.257 | 0.291 | 0.373 | 0.086 | 0.180 | 0.338 | 0.834 | 0.433 | 0.419 | 0.433 | 0.476 | |
FreqAdd [33] | 0.150 | 0.157 | 0.172 | 0.204 | 0.181 | 0.230 | 0.285 | 0.362 | 0.087 | 0.181 | 0.333 | 0.837 | 0.480 | 0.441 | 0.450 | 0.501 | |
FreqPool [4] | 0.169 | 0.170 | 0.194 | 0.237 | 0.184 | 0.223 | 0.279 | 0.378 | 0.088 | 0.183 | 0.330 | 0.832 | 0.410 | 0.429 | 0.433 | 0.476 | |
Robusttad [8] | 0.150 | 0.157 | 0.176 | 0.210 | 0.172 | 0.225 | 0.281 | 0.357 | 0.087 | 0.179 | 0.329 | 0.833 | 0.406 | 0.417 | 0.429 | 0.458 | |
STAug [34] | 0.160 | 0.173 | 0.218 | 0.372 | 0.206 | 0.264 | 0.319 | 0.385 | 0.086 | 0.178 | 0.335 | 0.866 | 0.413 | 0.432 | 0.449 | 0.481 | |
MixMask [3] | 0.151 | 0.158 | 0.173 | 0.205 | 0.175 | 0.224 | 0.279 | 0.354 | 0.089 | 0.178 | 0.328 | 0.845 | 0.395 | 0.401 | 0.418 | 0.450 | |
Ours | 0.150 | 0.156 | 0.171 | 0.199 | 0.171 | 0.221 | 0.276 | 0.351 | 0.086 | 0.176 | 0.313 | 0.821 | 0.394 | 0.412 | 0.423 | 0.448 | |
AutoFormer [31] | Baseline | 0.203 | 0.208 | 0.231 | 0.239 | 0.241 | 0.314 | 0.341 | 0.425 | 0.143 | 0.305 | 0.470 | 1.056 | 0.640 | 0.645 | 0.611 | 0.658 |
ASD | 0.247 | 0.216 | 0.221 | 0.235 | 0.652 | 0.392 | 0.416 | 0.513 | 0.141 | 0.280 | 0.579 | 1.240 | 0.631 | 0.602 | 0.607 | 0.643 | |
MSB | 0.237 | 0.256 | 0.295 | 0.236 | 0.256 | 0.379 | 0.402 | 0.468 | 0.156 | 0.254 | 0.513 | 1.339 | 0.652 | 0.665 | 0.643 | 0.65 | |
Upsample | 0.201 | 0.209 | 0.232 | 0.268 | 0.281 | 0.294 | 0.329 | 0.385 | 0.141 | 0.292 | 0.553 | 1.295 | 0.653 | 0.676 | 0.702 | 0.694 | |
FreqAdd | 0.193 | 0.197 | 0.212 | 0.225 | 0.255 | 0.323 | 0.370 | 0.419 | 0.143 | 0.369 | 0.716 | 1.173 | 0.613 | 0.598 | 0.617 | 0.639 | |
FreqPool | 0.213 | 0.224 | 0.234 | 0.257 | 0.237 | 0.339 | 0.372 | 0.446 | 0.142 | 0.336 | 0.532 | 1.014 | 0.63 | 0.598 | 0.603 | 0.639 | |
Robusttad | 0.230 | 0.242 | 0.261 | 0.231 | 0.27 | 0.334 | 0.351 | 0.429 | 0.142 | 0.309 | 0.462 | 1.123 | 0.621 | 0.614 | 0.612 | 0.646 | |
STAug | 0.191 | 0.206 | 0.217 | 0.234 | 0.250 | 0.300 | 0.347 | 0.418 | 0.140 | 0.326 | 0.594 | 1.176 | 0.632 | 0.619 | 0.632 | 0.640 | |
MixMask | 0.177 | 0.194 | 0.206 | 0.224 | 0.240 | 0.302 | 0.330 | 0.422 | 0.141 | 0.284 | 0.453 | 0.778 | 0.560 | 0.584 | 0.594 | 0.635 | |
Ours | 0.171 | 0.191 | 0.203 | 0.219 | 0.214 | 0.273 | 0.327 | 0.383 | 0.136 | 0.243 | 0.418 | 0.695 | 0.577 | 0.581 | 0.592 | 0.638 | |
MICN [27] | Baseline | 0.171 | 0.183 | 0.198 | 0.224 | 0.188 | 0.241 | 0.278 | 0.350 | 0.091 | 0.185 | 0.355 | 0.941 | 0.522 | 0.540 | 0.553 | 0.573 |
ASD | 0.165 | 0.174 | 0.190 | 0.237 | 0.189 | 0.242 | 0.276 | 0.354 | 0.087 | 0.175 | 0.337 | 1.203 | 0.505 | 0.534 | 0.541 | 0.539 | |
MSB | 0.179 | 0.182 | 0.201 | 0.225 | 0.201 | 0.250 | 0.291 | 0.365 | 0.088 | 0.176 | 0.360 | 0.995 | 0.513 | 0.532 | 0.528 | 0.556 | |
Upsample | 0.182 | 0.180 | 0.203 | 0.220 | 0.193 | 0.249 | 0.279 | 0.372 | 0.084 | 0.171 | 0.313 | 0.702 | 0.533 | 0.559 | 0.556 | 0.590 | |
FreqAdd | 0.160 | 0.169 | 0.182 | 0.199 | 0.180 | 0.234 | 0.282 | 0.350 | 0.087 | 0.174 | 0.349 | 0.923 | 0.503 | 0.527 | 0.520 | 0.571 | |
FreqPool | 0.182 | 0.203 | 0.241 | 0.256 | 0.192 | 0.257 | 0.278 | 0.351 | 0.089 | 0.179 | 0.394 | 0.923 | 0.531 | 0.539 | 0.556 | 0.592 | |
Robusttad | 0.179 | 0.220 | 0.234 | 0.227 | 0.192 | 0.239 | 0.292 | 0.343 | 0.085 | 0.179 | 0.336 | 0.932 | 0.510 | 0.532 | 0.547 | 0.597 | |
STAug | 0.180 | 0.195 | 0.210 | 0.224 | 0.272 | 0.356 | 0.433 | 0.559 | 0.092 | 0.183 | 0.313 | 0.790 | 0.512 | 0.533 | 0.529 | 0.585 | |
MixMask | 0.159 | 0.165 | 0.178 | 0.195 | 0.185 | 0.239 | 0.281 | 0.344 | 0.086 | 0.174 | 0.337 | 0.796 | 0.490 | 0.512 | 0.519 | 0.538 | |
Ours | 0.157 | 0.168 | 0.178 | 0.211 | 0.179 | 0.232 | 0.275 | 0.342 | 0.084 | 0.169 | 0.303 | 0.750 | 0.501 | 0.507 | 0.518 | 0.556 | |
SCINet [12] | Baseline | 0.212 | 0.237 | 0.255 | 0.286 | 0.229 | 0.282 | 0.334 | 0.402 | 0.099 | 0.191 | 0.356 | 0.916 | 0.550 | 0.526 | 0.545 | 0.596 |
ASD | 0.229 | 0.241 | 0.239 | 0.282 | 0.254 | 0.276 | 0.356 | 0.462 | 0.095 | 0.204 | 0.379 | 1.230 | 0.537 | 0.521 | 0.541 | 0.570 | |
MSB | 0.232 | 0.237 | 0.228 | 0.274 | 0.279 | 0.265 | 0.374 | 0.454 | 0.093 | 0.267 | 0.402 | 0.965 | 0.520 | 0.510 | 0.537 | 0.565 | |
Upsample | 0.250 | 0.232 | 0.271 | 0.309 | 0.243 | 0.299 | 0.361 | 0.431 | 0.092 | 0.196 | 0.311 | 0.932 | 0.519 | 0.536 | 0.528 | 0.576 | |
FreqAdd | 0.176 | 0.195 | 0.212 | 0.237 | 0.208 | 0.258 | 0.309 | 0.385 | 0.092 | 0.186 | 0.343 | 0.920 | 0.492 | 0.497 | 0.512 | 0.550 | |
FreqPool | 0.230 | 0.221 | 0.242 | 0.339 | 0.261 | 0.290 | 0.337 | 0.456 | 0.096 | 0.183 | 0.551 | 0.938 | 0.557 | 0.519 | 0.533 | 0.562 | |
Robusttad | 0.189 | 0.202 | 0.210 | 0.243 | 0.229 | 0.281 | 0.331 | 0.410 | 0.093 | 0.186 | 0.334 | 0.957 | 0.523 | 0.519 | 0.522 | 0.569 | |
STAug | 0.210 | 0.239 | 0.282 | 0.411 | 0.277 | 0.329 | 0.372 | 0.435 | 0.098 | 0.191 | 0.342 | 0.931 | 0.560 | 0.517 | 0.521 | 0.566 | |
MixMask | 0.171 | 0.188 | 0.204 | 0.230 | 0.205 | 0.250 | 0.310 | 0.374 | 0.093 | 0.179 | 0.336 | 0.928 | 0.495 | 0.492 | 0.511 | 0.551 | |
Ours | 0.172 | 0.188 | 0.200 | 0.225 | 0.197 | 0.246 | 0.299 | 0.379 | 0.091 | 0.175 | 0.342 | 0.890 | 0.500 | 0.495 | 0.509 | 0.544 | |
TiDE [6] | Baseline | 0.207 | 0.197 | 0.211 | 0.238 | 0.177 | 0.220 | 0.265 | 0.323 | 0.093 | 0.184 | 0.330 | 0.860 | 0.452 | 0.450 | 0.451 | 0.479 |
ASD | 0.232 | 0.220 | 0.231 | 0.265 | 0.189 | 0.221 | 0.297 | 0.332 | 0.095 | 0.206 | 0.351 | 0.962 | 0.477 | 0.462 | 0.450 | 0.506 | |
MSB | 0.210 | 0.219 | 0.253 | 0.261 | 0.199 | 0.254 | 0.273 | 0.339 | 0.092 | 0.179 | 0.358 | 0.941 | 0.461 | 0.451 | 0.455 | 0.510 | |
Upsample | 0.206 | 0.199 | 0.223 | 0.274 | 0.203 | 0.267 | 0.331 | 0.355 | 0.091 | 0.182 | 0.331 | 0.852 | 0.490 | 0.466 | 0.472 | 0.493 | |
FreqAdd | 0.150 | 0.163 | 0.177 | 0.209 | 0.173 | 0.216 | 0.263 | 0.322 | 0.088 | 0.180 | 0.330 | 0.848 | 0.429 | 0.441 | 0.440 | 0.471 | |
FreqPool | 0.224 | 0.238 | 0.233 | 0.270 | 0.189 | 0.224 | 0.292 | 0.334 | 0.092 | 0.334 | 0.521 | 1.124 | 0.453 | 0.466 | 0.479 | 0.503 | |
Robusttad | 0.176 | 0.166 | 0.182 | 0.229 | 0.182 | 0.231 | 0.279 | 0.330 | 0.099 | 0.232 | 0.331 | 0.924 | 0.449 | 0.430 | 0.438 | 0.482 | |
STAug | 0.230 | 0.210 | 0.192 | 0.225 | 0.205 | 0.247 | 0.292 | 0.364 | 0.092 | 0.184 | 0.330 | 0.859 | 0.466 | 0.455 | 0.471 | 0.480 | |
MixMask | 0.143 | 0.155 | 0.164 | 0.210 | 0.173 | 0.216 | 0.263 | 0.323 | 0.089 | 0.180 | 0.329 | 0.861 | 0.421 | 0.427 | 0.434 | 0.466 | |
Ours | 0.143 | 0.150 | 0.165 | 0.202 | 0.177 | 0.219 | 0.261 | 0.322 | 0.088 | 0.179 | 0.324 | 0.847 | 0.423 | 0.426 | 0.433 | 0.466 | |
LightTS [32] | Baseline | 0.210 | 0.169 | 0.182 | 0.212 | 0.168 | 0.210 | 0.260 | 0.320 | 0.139 | 0.252 | 0.412 | 0.840 | 0.505 | 0.515 | 0.539 | 0.587 |
ASD | 0.225 | 0.179 | 0.198 | 0.232 | 0.179 | 0.21 | 0.271 | 0.321 | 0.132 | 0.320 | 0.436 | 1.036 | 0.510 | 0.514 | 0.534 | 0.579 | |
MSB | 0.233 | 0.182 | 0.204 | 0.228 | 0.170 | 0.214 | 0.259 | 0.332 | 0.117 | 0.294 | 0.502 | 0.964 | 0.532 | 0.510 | 0.539 | 0.584 | |
Upsample | 0.246 | 0.179 | 0.211 | 0.254 | 0.182 | 0.223 | 0.257 | 0.336 | 0.099 | 0.251 | 0.369 | 0.702 | 0.522 | 0.547 | 0.532 | 0.597 | |
FreqAdd | 0.213 | 0.159 | 0.177 | 0.210 | 0.164 | 0.207 | 0.258 | 0.317 | 0.098 | 0.522 | 0.565 | 1.583 | 0.492 | 0.500 | 0.530 | 0.572 | |
FreqPool | 0.219 | 0.174 | 0.197 | 0.236 | 0.193 | 0.254 | 0.267 | 0.339 | 0.099 | 0.275 | 0.394 | 0.793 | 0.501 | 0.519 | 0.533 | 0.592 | |
Robusttad | 0.212 | 0.169 | 0.181 | 0.223 | 0.172 | 0.223 | 0.259 | 0.324 | 0.092 | 0.279 | 0.451 | 0.796 | 0.499 | 0.502 | 0.521 | 0.572 | |
STAug | 0.224 | 0.267 | 0.294 | 0.351 | 0.214 | 0.263 | 0.382 | 0.371 | 0.096 | 0.212 | 0.380 | 0.690 | 0.520 | 0.534 | 0.520 | 0.596 | |
MixMask | 0.192 | 0.158 | 0.175 | 0.211 | 0.163 | 0.206 | 0.257 | 0.318 | 0.099 | 0.384 | 0.518 | 0.774 | 0.486 | 0.499 | 0.517 | 0.555 | |
Ours | 0.210 | 0.156 | 0.173 | 0.206 | 0.165 | 0.205 | 0.249 | 0.312 | 0.088 | 0.243 | 0.361 | 0.676 | 0.483 | 0.497 | 0.515 |
Methods | PEMS03 | PEMS04 | PEMS07 | PEMS08 | ||||||||||||
12 | 24 | 36 | 48 | 12 | 24 | 36 | 48 | 12 | 24 | 36 | 48 | 12 | 24 | 36 | 48 | |
Baseline | 0.070 | 0.097 | 0.134 | 0.164 | 0.088 | 0.124 | 0.160 | 0.196 | 0.067 | 0.097 | 0.128 | 0.156 | 0.088 | 0.136 | 0.191 | 0.248 |
ASD [7] | 0.072 | 0.096 | 0.152 | 0.239 | 0.098 | 0.132 | 0.156 | 0.190 | 0.069 | 0.099 | 0.154 | 0.181 | 0.089 | 0.138 | 0.196 | 0.247 |
MSB [1] | 0.096 | 0.131 | 0.129 | 0.214 | 0.087 | 0.134 | 0.167 | 0.219 | 0.098 | 0.096 | 0.137 | 0.165 | 0.096 | 0.137 | 0.210 | 0.256 |
Upsample [23] | 0.069 | 0.096 | 0.128 | 0.179 | 0.087 | 0.124 | 0.158 | 0.199 | 0.072 | 0.099 | 0.127 | 0.155 | 0.088 | 0.140 | 0.192 | 0.245 |
FreqAdd [33] | 1.036 | 0.104 | 0.251 | 0.362 | 0.088 | 0.125 | 0.159 | 0.201 | 0.067 | 0.097 | 0.127 | 0.155 | 0.089 | 0.135 | 0.192 | 0.253 |
FreqPool [4] | 1.234 | 0.178 | 0.296 | 0.451 | 0.099 | 0.145 | 0.178 | 0.226 | 0.079 | 0.104 | 0.152 | 0.172 | 0.099 | 0.155 | 0.203 | 0.264 |
Robusttad [8] | 0.082 | 0.098 | 0.132 | 1.520 | 0.089 | 0.123 | 0.161 | 0.195 | 0.067 | 0.097 | 0.129 | 0.157 | 0.092 | 0.135 | 0.189 | 0.26 |
STAug [34] | 0.079 | 0.112 | 0.195 | 0.456 | 0.087 | 0.120 | 0.162 | 0.304 | 0.066 | 0.096 | 0.132 | 0.165 | 0.092 | 0.147 | 0.192 | 0.276 |
Mask [3] | 0.443 | 1.205 | 0.233 | 1.510 | 0.086 | 0.119 | 0.158 | 0.346 | 0.065 | 0.095 | 0.125 | 0.156 | 0.089 | 0.131 | 0.186 | 0.239 |
Mix [3] | 1.018 | 0.097 | 0.877 | 1.501 | 0.085 | 0.119 | 0.154 | 0.205 | 0.065 | 0.094 | 0.134 | 0.152 | 0.089 | 0.131 | 0.184 | 0.234 |
Ours | 0.067 | 0.095 | 0.126 | 0.235 | 0.085 | 0.118 | 0.149 | 0.182 | 0.065 | 0.094 | 0.123 | 0.148 | 0.087 | 0.134 | 0.184 |
B.2 Example predictions
We provided example prediction results on different datasets in Fig. 6
B.3 Optimal
We provide the optimal for all long-term prediction datasets using iTranformer [13] in Tab. 10 and 11. As can be seen from the table, our method does not need too much effort to find the optimal parameters.
Hypermeter | ETTh1 | ETTh2 | ETTm1 | ETTm2 | ||||||||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | |
Optimal | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 4 | 3 | 3 | 2 | 2 | 4 | 4 | 2 | 4 |
Hypermeter | Electricity | Traffic | Weather | Exchange Rate | ||||||||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | |
Optimal | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 2 | 4 | 2 | 2 | 8 | 8 |
B.4 Standard deviations
Tab. 12, 13, 14 and 15 shows the standard deviations of different runs, indicating the performance of our method is stable.
Model | ETTh1 | ETTh2 | |||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | ||
iTrans former | Baseline | 0.3920.001 | 0.4470.002 | 0.4830.003 | 0.5160.003 | 0.3030.001 | 0.3810.000 | 0.4120.001 | 0.4340.002 |
Mask [3] | 0.3900.001 | 0.4420.002 | 0.4750.001 | 0.5030.003 | 0.3010.001 | 0.3850.003 | 0.4140.001 | 0.4380.005 | |
Mix [3] | 0.3880.002 | 0.4400.002 | 0.4770.000 | 0.5040.004 | 0.3010.001 | 0.3800.001 | 0.4140.001 | 0.4340.003 | |
Ours | 0.3830.001 | 0.4380.001 | 0.4730.002 | 0.4920.002 | 0.2980.002 | 0.3820.003 | 0.4110.004 | 0.4280.001 | |
Model | ETTm1 | ETTm2 | |||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | ||
iTrans former | Baseline | 0.3440.002 | 0.3830.003 | 0.4210.001 | 0.4940.003 | 0.1830.001 | 0.2510.002 | 0.3110.001 | 0.4120.001 |
Mask [3] | 0.3470.002 | 0.3830.005 | 0.4200.001 | 0.4940.004 | 0.1790.003 | 0.2510.001 | 0.3110.001 | 0.4110.002 | |
Mix [3] | 0.3340.005 | 0.3750.002 | 0.4210.000 | 0.4850.002 | 0.1780.002 | 0.2480.001 | 0.3110.000 | 0.4070.002 | |
Ours | 0.3320.001 | 0.3740.001 | 0.4240.001 | 0.4920.002 | 0.1780.002 | 0.2460.001 | 0.3090.001 | 0.4090.000 | |
Model | Electricity | Traffic | |||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | ||
iTrans former | Baseline | 0.1520.000 | 0.1590.001 | 0.1790.003 | 0.2300.013 | 0.3990.001 | 0.4180.000 | 0.4280.000 | 0.4630.000 |
Mask [3] | 0.1530.001 | 0.1570.001 | 0.1730.001 | 0.2080.005 | 0.3950.001 | 0.4010.005 | 0.4180.001 | 0.4500.002 | |
Mix [3] | 0.1510.000 | 0.1580.001 | 0.1730.000 | 0.2050.003 | 0.4000.003 | 0.4140.004 | 0.4240.002 | 0.4530.003 | |
Ours | 0.1500.000 | 0.1560.001 | 0.1710.000 | 0.1990.002 | 0.3940.000 | 0.4120.002 | 0.4230.002 | 0.4480.001 | |
Model | Weather | Exchange Rate | |||||||
96 | 192 | 336 | 720 | 96 | 192 | 336 | 720 | ||
iTrans former | Baseline | 0.1750.001 | 0.2240.001 | 0.2810.000 | 0.3620.003 | 0.0860.000 | 0.1800.000 | 0.3350.002 | 0.8560.004 |
Mask [3] | 0.1780.001 | 0.2280.002 | 0.2840.002 | 0.3590.001 | 0.0900.002 | 0.1780.001 | 0.3290.006 | 0.8450.008 | |
Mix [3] | 0.1750.001 | 0.2240.000 | 0.2790.000 | 0.3540.000 | 0.0890.001 | 0.1780.001 | 0.3280.006 | 0.8680.008 | |
Ours | 0.1710.001 | 0.2210.000 | 0.2760.000 | 0.3510.002 | 0.0860.001 | 0.1760.001 | 0.3130.006 | 0.8210.003 | |