Self-Supervised Spatiotemporal Masking Strategy-Based Models for Traffic Flow Forecasting
Abstract
:1. Introduction
- We propose a spatiotemporal context mask reconstruction task to force the model to reconstruct the masked traffic features based on the spatiotemporal contextual information, so as to enhance the existing STGNNs’ understanding of spatiotemporal contextual associations and improve their prediction capability;
- A specific spatiotemporal masking strategy is proposed to assist the model in understanding the spatiotemporal associations of each local part of the traffic network, and the effects of different masking strategies and masking ratios on the model performance are compared comprehensively;
- We validated the proposed method on two real-world traffic datasets, and the experimental results show that introducing the spatiotemporal context mask reconstruction task as an auxiliary task can improve the prediction performance of STGNNs under the prediction horizons of 30, 45, and 60 min.
2. Related Work
2.1. Spatial Modeling
2.2. Temporal Modeling
- RNN-based STGNNs. RNN-based STGNNs basically use a chain structure that combines a graph convolution module and a recurrent unit. DCRNNs [5] adopt a variant of an RNN, the gated recurrent unit (GRU), for the extraction of temporal features. The GRU is able to achieve a similar performance as LSTM with fewer parameters, which can effectively reduce the number of parameters and the training time [38]. T-GCNs [7], on the other hand, combine GCNs and the GRU and test them on traffic datasets with two common scenarios, namely, highways and urban roads, and obtain prediction performance that exceeds the baselines. However, due to the limitation of the chain structure of RNNs, the input of the subsequent time step depends on the output of the preceding time step and, thus, does not allow for parallel training of parameters;
- One-dimensional convolution-based STGNNs. STGCNs [6] apply one-dimensional causal convolution and gated linear units to extract temporal features from traffic flow. Graph WaveNet [10] performs one-dimensional dilated causal convolution on temporal features, which makes the reception field of the model grow exponentially, thus facilitating the model to better capture the long-range temporal dependence in the data. One-dimensional convolution-based models are more computationally efficient compared to RNN-based models for modeling temporal dependence and also avoid the problem of gradient vanishing or explosion;
- Self-attention mechanism-based STGNNs. The traffic transformer [8] designs various positional encoding strategies to learn the periodic features in traffic flow. STTNs [39] incorporate the graph convolution process into the spatial transformer, builds a spatiotemporal transformer block with the spatial and temporal transformer, and stacks the blocks to capture the dynamic spatiotemporal correlations in the traffic data. The self-attention mechanism ensures that the model parameters can be trained in parallel while making direct connections between the various time steps of the input sequence, which can help the self-attention-based traffic flow prediction models to better capture the long-range temporal dependence in traffic data.
2.3. Learning Paradigms
3. Methodology
3.1. Overview
3.2. Spatiotemporal Context Masking
3.3. Temporal Shift
3.4. Loss Function
4. Experiments
4.1. Datasets
- METR-LA: Traffic speed dataset. It comprises 4 months of data from the highway of Los Angeles with the temporal range of 2012/3/1–2012/6/30;
- PEMS-BAY: Traffic speed dataset. It comprises 6 months of data from the Bay Area with the temporal range of 2017/1/1–2017/6/30.
4.2. Evaluation Metrics
- Root mean squared error (RMSE)
- Mean absolute error (MAE)
- Mean absolute percentage error (MAPE)
4.3. Backbone Models and Hyperparameter Settings
- Graph WaveNet [10] is an STGNN that can be seen as a 1D convolution-based model, which uses adaptive graph convolution to capture spatial dependence and 1D convolution to capture temporal dependence. For Graph WaveNet, the batch size was set to 32, and the probability of dropout in the graph convolution layer was set to 0.3. Adam [48] was chosen as the optimizer, and the learning rate was set to 0.001.
4.4. Experimental Results and Analysis
4.4.1. Accuracy
4.4.2. Masking Strategies
- Degree centrality masking. Nodes with a higher degree centrality may play more important roles in the traffic network. Compared to the spatiotemporal context masking strategy, this strategy replaces the random node sampling with degree centrality-based node sampling;
- Spatial masking. As shown in Figure 6a, after randomly sampling nodes, all the temporal features of the selected nodes are masked;
- Temporal masking. As shown in Figure 6b, this strategy masks the temporal features of all nodes near the current moment;
- Completely random masking. As shown in Figure 6c, this strategy randomly masks a fixed proportion of traffic feature points in the whole traffic feature matrix.
4.4.3. Hyperparameter Analysis
4.4.4. Visualization
5. Conclusions
- Compared with backbone models, the models based on the self-supervised spatiotemporal masking strategy have a better prediction performance at horizons of 30, 45, and 60 min. The average prediction performance improvement reaches 1.56% at horizons of more than 30 min, which proves that the proposed method improves spatiotemporal dependence understanding of the model and can be helpful for long-term prediction;
- Comparing different masking strategies, it was found that considering a single dimension, such as only the spatial dependence or only the temporal dependence, has a relatively limited improvement effect on the model performance, while considering both spatial and temporal perspectives together can more effectively improve the prediction capability;
- The visualization results show that for scenarios with large fluctuations, the proposed method is able to give prediction results with a better fit to the actual values. However, sometimes the model is also affected by confounding spurious spatiotemporal correlations, leading to erroneous prediction results.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
CNN | Convolutional neural network |
GNN | Graph neural network |
RNN | Recurrent neural network |
STGNN | Spatiotemporal graph neural network |
GRU | Gated recurrent unit |
MLP | Multi-layer perceptrons |
References
- Yuan, H.; Li, G. A survey of traffic prediction: From spatio-temporal data to intelligent transportation. Data Sci. Eng. 2021, 6, 63–85. [Google Scholar] [CrossRef]
- Nagy, A.M.; Simon, V. Survey on traffic prediction in smart cities. Pervasive Mob. Comput. 2018, 50, 148–163. [Google Scholar] [CrossRef]
- Hashemi, S.M.; Botez, R.M.; Grigorie, T.L. New Reliability Studies of Data-Driven Aircraft Trajectory Prediction. Aerospace 2020, 7, 145. [Google Scholar] [CrossRef]
- Tedjopurnomo, D.A.; Bao, Z.; Zheng, B.; Choudhury, F.M.; Qin, A.K. A Survey on Modern Deep Neural Network for Traffic Prediction: Trends, Methods and Challenges (Extended Abstract). In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; pp. 3795–3796. [Google Scholar] [CrossRef]
- Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
- Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar]
- Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
- Cai, L.; Janowicz, K.; Mai, G.; Yan, B.; Zhu, R. Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting. Trans. GIS 2020, 24, 736–755. [Google Scholar] [CrossRef]
- Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17804–17815. [Google Scholar]
- Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph wavenet for deep spatial-temporal graph modeling. arXiv 2019, arXiv:1906.00121. [Google Scholar]
- Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Chen, W.; Chen, L.; Xie, Y.; Cao, W.; Gao, Y.; Feng, X. Multi-Range Attentive Bicomponent Graph Convolutional Network for Traffic Forecasting. Proc. AAAI Conf. Artif. Intell. 2020, 34, 3529–3536. [Google Scholar] [CrossRef]
- Geng, X.; Li, Y.; Wang, L.; Zhang, L.; Yang, Q.; Ye, J.; Liu, Y. Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting. Proc. AAAI Conf. Artif. Intell. 2019, 33, 3656–3663. [Google Scholar] [CrossRef]
- Qin, Y.; Fang, Y.; Luo, H.; Zhao, F.; Wang, C. DMGCRN: Dynamic Multi-Graph Convolution Recurrent Network for Traffic Forecasting. arXiv 2021, arXiv:2112.02264. [Google Scholar]
- Li, M.; Zhu, Z. Spatial-Temporal Fusion Graph Neural Networks for Traffic Flow Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 4189–4196. [Google Scholar] [CrossRef]
- Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting. Proc. AAAI Conf. Artif. Intell. 2020, 34, 914–921. [Google Scholar] [CrossRef]
- He, S.; Luo, Q.; Du, R.; Zhao, L.; He, G.; Fu, H.; Li, H. STGC-GNNs: A GNN-based traffic prediction framework with a spatial–temporal Granger causality graph. Phys. Stat. Mech. Its Appl. 2023, 623, 128913. [Google Scholar] [CrossRef]
- Ta, X.; Liu, Z.; Hu, X.; Yu, L.; Sun, L.; Du, B. Adaptive Spatio-temporal Graph Neural Network for traffic forecasting. Knowl.-Based Syst. 2022, 242, 108199. [Google Scholar] [CrossRef]
- Ji, J.; Wang, J.; Huang, C.; Wu, J.; Xu, B.; Wu, Z.; Zhang, J.; Zheng, Y. Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction. arXiv 2022, arXiv:2212.04475. [Google Scholar] [CrossRef]
- Liu, X.; Liang, Y.; Huang, C.; Zheng, Y.; Hooi, B.; Zimmermann, R. When do contrastive learning signals help spatio-temporal graph forecasting? In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 1–4 November 2022; pp. 1–12. [Google Scholar]
- Shao, Z.; Zhang, Z.; Wang, F.; Xu, Y. Pre-Training Enhanced Spatial-Temporal Graph Neural Network for Multivariate Time Series Forecasting. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, Washington, DC, USA, 14–18 August 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1567–1577. [Google Scholar] [CrossRef]
- Hwang, D.; Park, J.; Kwon, S.; Kim, K.M.; Ha, J.W.; Kim, H.J. Self-Supervised Auxiliary Learning with Meta-Paths for Heterogeneous Graphs. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Vancouver, BC, Canada, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
- Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell. Transp. Syst. 2014, 16, 865–873. [Google Scholar] [CrossRef]
- Zhao, Z.; Chen, W.; Wu, X.; Chen, P.C.; Liu, J. LSTM network: A deep learning approach for short-term traffic forecast. IET Intell. Transp. Syst. 2017, 11, 68–75. [Google Scholar] [CrossRef]
- Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar]
- Liu, Y.; Zheng, H.; Feng, X.; Chen, Z. Short-term traffic flow prediction with Conv-LSTM. In Proceedings of the 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 11–13 October 2017; pp. 1–6. [Google Scholar]
- Ma, X.; Dai, Z.; He, Z.; Ma, J.; Wang, Y.; Wang, Y. Learning traffic as images: A deep convolutional neural network for large-scale transportation network speed prediction. Sensors 2017, 17, 818. [Google Scholar] [CrossRef]
- Zonoozi, A.; Kim, J.j.; Li, X.L.; Cong, G. Periodic-CRN: A convolutional recurrent model for crowd density prediction with recurring periodic patterns. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; Volume 18, pp. 3732–3738. [Google Scholar]
- Jia, T.; Yan, P. Predicting citywide road traffic flow using deep spatiotemporal neural networks. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3101–3111. [Google Scholar] [CrossRef]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.c. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 802–810. [Google Scholar]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
- Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
- Atwood, J.; Towsley, D. Diffusion-convolutional neural networks. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Tian, C.; Chan, W.K.V. Spatial-temporal attention wavenet: A deep learning framework for traffic prediction considering spatial-temporal dependencies. IET Intell. Transp. Syst. 2021, 15, 549–561. [Google Scholar] [CrossRef]
- Zhou, Q.; Chen, N.; Lin, S. FASTNN: A Deep Learning Approach for Traffic Flow Prediction Considering Spatiotemporal Features. Sensors 2022, 22, 6921. [Google Scholar] [CrossRef]
- Jin, G.; Liang, Y.; Fang, Y.; Huang, J.; Zhang, J.; Zheng, Y. Spatio-temporal graph neural networks for predictive learning in urban computing: A survey. arXiv 2023, arXiv:2303.14483. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Xu, M.; Dai, W.; Liu, C.; Gao, X.; Lin, W.; Qi, G.J.; Xiong, H. Spatial-temporal transformer networks for traffic flow forecasting. arXiv 2020, arXiv:2001.02908. [Google Scholar]
- Hashemi, S.M.; Hashemi, S.A.; Botez, R.M.; Ghazi, G. Aircraft Trajectory Prediction Enhanced through Resilient Generative Adversarial Networks Secured by Blockchain: Application to UAS-S4 Ehécatl. Appl. Sci. 2023, 13, 9503. [Google Scholar] [CrossRef]
- Hashemi, S.M.; Hashemi, S.A.; Botez, R.M.; Ghazi, G. A Novel Fault-Tolerant Air Traffic Management Methodology Using Autoencoder and P2P Blockchain Consensus Protocol. Aerospace 2023, 10, 357. [Google Scholar] [CrossRef]
- Khaled, A.; Elsir, A.M.T.; Shen, Y. TFGAN: Traffic forecasting using generative adversarial network with multi-graph convolutional network. Knowl.-Based Syst. 2022, 249, 108990. [Google Scholar] [CrossRef]
- Xu, B.; Wang, X.; Liu, Z.; Kang, L. A GAN Combined with Graph Contrastive Learning for Traffic Forecasting. In Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things, CNIOT ’23, Xiamen China, 26–28 May 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 866–873. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, F.; Hou, Z.; Mian, L.; Wang, Z.; Zhang, J.; Tang, J. Self-supervised learning: Generative or contrastive. IEEE Trans. Knowl. Data Eng. 2021, 35, 857–876. [Google Scholar] [CrossRef]
- Banville, H.; Chehab, O.; Hyvärinen, A.; Engemann, D.A.; Gramfort, A. Uncovering the structure of clinical EEG signals with self-supervised learning. J. Neural Eng. 2021, 18, 046020. [Google Scholar] [CrossRef] [PubMed]
- Chung, Y.A.; Hsu, W.N.; Tang, H.; Glass, J. An unsupervised autoregressive model for speech representation learning. arXiv 2019, arXiv:1904.03240. [Google Scholar]
- Bai, J.; Wang, W.; Zhou, Y.; Xiong, C. Representation learning for sequence data with deep autoencoding predictive components. arXiv 2020, arXiv:2010.03135. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Dataset | #Nodes | #Edges | Sparsity | Sampling Interval | #Sampling Points |
---|---|---|---|---|---|
METR-LA | 207 | 1722 | 4.02% | 5 min | 34,272 |
PEMS-BAY | 325 | 2694 | 2.55% | 5 min | 52,116 |
Prediction Horizon | Evaluation Metrics | T-GCN | STC-T-GCN | Graph WaveNet | STC-Graph WaveNet |
---|---|---|---|---|---|
15 min | RMSE | 5.11 | 5.10 | 4.85 | 4.85 |
MAE | 2.63 | 2.64 | 2.60 | 2.63 | |
MAPE | 6.99% | 6.97% | 6.64% | 6.75% | |
30 min | RMSE | 5.95 | 5.91 | 5.98 | 5.84 |
MAE | 2.99 | 2.98 | 3.09 | 3.04 | |
MAPE | 8.24% | 8.11% | 8.80% | 8.37% | |
45 min | RMSE | 6.48 | 6.45 | 6.67 | 6.58 |
MAE | 3.33 | 3.32 | 3.40 | 3.34 | |
MAPE | 9.11% | 9.05% | 10.03% | 9.60% | |
60 min | RMSE | 6.88 | 6.84 | 7.63 | 7.35 |
MAE | 3.41 | 3.40 | 3.87 | 3.79 | |
MAPE | 9.48% | 9.45% | 11.79% | 10.88% |
Prediction Horizon | Evaluation Metrics | T-GCN | STC-T-GCN | Graph WaveNet | STC-Graph WaveNet |
---|---|---|---|---|---|
15 min | RMSE | 2.48 | 2.48 | 2.47 | 2.49 |
MAE | 1.25 | 1.25 | 1.17 | 1.18 | |
MAPE | 2.57% | 2.58% | 2.34% | 2.44% | |
30 min | RMSE | 3.17 | 3.14 | 3.47 | 3.42 |
MAE | 1.49 | 1.48 | 1.52 | 1.52 | |
MAPE | 3.26% | 3.24% | 3.40% | 3.35% | |
45 min | RMSE | 3.67 | 3.65 | 4.18 | 4.15 |
MAE | 1.67 | 1.66 | 1.82 | 1.81 | |
MAPE | 3.81% | 3.77% | 4.06% | 4.25% | |
60 min | RMSE | 3.93 | 3.91 | 4.90 | 4.68 |
MAE | 1.79 | 1.78 | 2.08 | 2.05 | |
MAPE | 4.14% | 4.08% | 5.20% | 4.97% |
RMSE | MAE | MAPE | |
---|---|---|---|
Spatiotemporal context masking | 7.35 | 3.79 | 10.88% |
Degree centrality masking | 7.67 | 3.85 | 11.27% |
Spatial masking | 7.52 | 3.81 | 11.59% |
Temporal masking | 7.45 | 3.81 | 11.53% |
Completely random masking | 7.55 | 3.81 | 11.14% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, G.; He, S.; Han, X.; Luo, Q.; Du, R.; Fu, X.; Zhao, L. Self-Supervised Spatiotemporal Masking Strategy-Based Models for Traffic Flow Forecasting. Symmetry 2023, 15, 2002. https://doi.org/10.3390/sym15112002
Liu G, He S, Han X, Luo Q, Du R, Fu X, Zhao L. Self-Supervised Spatiotemporal Masking Strategy-Based Models for Traffic Flow Forecasting. Symmetry. 2023; 15(11):2002. https://doi.org/10.3390/sym15112002
Chicago/Turabian StyleLiu, Gang, Silu He, Xing Han, Qinyao Luo, Ronghua Du, Xinsha Fu, and Ling Zhao. 2023. "Self-Supervised Spatiotemporal Masking Strategy-Based Models for Traffic Flow Forecasting" Symmetry 15, no. 11: 2002. https://doi.org/10.3390/sym15112002