1. Introduction
The transportation system is a crucial type of infrastructure in modern cities. As urbanization progresses, the growing urban population and the number of vehicles on the transportation network contribute to the increasing complexity of the traffic system. Consequently, there is an urgent need that lies in the development of Intelligent Transportation Systems (ITS). Early intervention based on traffic flow prediction is a crucial prerequisite for implementing ITS as it improves the efficiency of a transportation system, mitigates traffic-related problems, and facilitates the development of smart cities [
1]. By analyzing the past data on traffic flow, it can help with accurate traffic flow prediction attempts that can anticipate the future circumstances of traffic on road networks. The intelligent management of road networks in metropolitan areas is made possible by accurate and prompt traffic flow forecasts, which also decrease traffic congestion and improve traffic efficiency. To acquire the traffic flow status of a city, various sources of information can be utilized, including a transportation network, private vehicle movements, taxi tracks, and public transportation transaction records that are captured by sensors [
2].
Nonetheless, achieving accurate traffic prediction has become progressively more challenging. On the one hand, traffic data are intrinsically a time series characterized by intricate temporal dependencies, exhibiting periodicity, volatility, uncertainty, and nonlinearity along the time dimension. Traffic data at a specific location exhibit nonlinear variations at distinct points in time, rendering the long-term prediction of traffic flow challenging. On the other hand, traffic data exhibit intricate dynamic spatial correlations, and the variability of traffic flows across different regional patterns is significant.
Figure 1 illustrates an actual road network in a given region, thereby showing the dynamic correlation of traffic flows across geography. The complex spatial and temporal relationships in a road traffic network can have a significant impact on the accuracy of a traffic flow prediction system. In reality, traffic flow can show significant differences in different areas and time periods, with frequent congestion in office areas (such as from business districts) and residential areas (such as from dwelling districts) during morning and evening peak hours. In addition, traffic flows in commercial areas increase significantly during holidays. In addition, the intricate relationship between vehicles and roads in the spatial dimension complicates the accurate prediction of traffic flow. For example, disruptions in traffic flow due to temporary road closures for maintenance and unpredictable traffic accidents over a period of time can have an impact on distant roads. In addition, the complexity of roadway intersections poses additional challenges for the accurate prediction of traffic flow.
To tackle the aforementioned challenges, researchers have extensively investigated the issue. The currently available methods may be roughly divided into four categories: conventional methods of statistical analysis, methods of machine learning, methods of deep learning, and graphical neural network methods. The predominant statistical methods employed include Autoregressive Integrated Moving Average (ARIMA) [
3] and Vector Autoregression (VAR) [
4], both of which rely on the assumption of ideal smoothness [
5]. However, traffic road networks display dynamic and complicated behaviors, and statistically based approaches fall short in capturing nonlinear correlations, leading to large inaccuracies when forecasting enormous volumes of traffic data defined by complex spatio-temporal properties. However, machine learning techniques excel in capturing nonlinear relationships, leading to the application of classical methods such as Support Vector Regression (SVR) [
6] and K-Nearest Neighbor [
7] in traffic flow prediction. Nonetheless, these models still exhibit limited performance when mining intricate spatio-temporal correlations. Deep learning methods effectively extract abstract features from raw data using multilayer neural networks. Deep learning methods excel in deriving traffic flow information from past data and producing precise forecasts, which is in contrast to machine learning methods that depend on features that are manually generated. Consequently, methods based on deep learning for predicting traffic flow have become more popular recently. Notably, Recurrent Neural Networks (RNNs), encompassing Long Short-Term Memory (LSTM) [
8] and Gated Recurrent Unit (GRU) [
9], effectively capture the temporal correlation within traffic data. Nevertheless, these RNN models ignore the spatial correlations present in spatio-temporal data and interpret traffic sequences from different roadways as separate data streams. While convolutional neural networks (CNNs) [
10] excel in capturing spatial correlations within regular spatial grids, traffic road networks pose a challenge as they possess a topologically complex non-Euclidean data structure. On the other hand, graph neural networks (GNNs) [
11] have the capability to directly model non-Euclidean data; as such, graph convolutional networks (GCNs) [
12] and graph attention networks (GATs) [
13] are employed to capture the spatial correlations present in traffic road networks.
Due to the inherent temporal and spatial correlations contained in traffic flow data, focusing exclusively on modeling time or space is inadequate for traffic flow prediction. Temporal modeling ignores the influence of geographical location on traffic flow and only concentrates on the temporal patterns of traffic flow. For instance, during specific morning peak hours, the traffic flow on residential roads might be influenced by nearby office clusters, an aspect that temporal modeling fails to capture. In contrast, spatial modeling solely concentrates on capturing the variation pattern of traffic flow in the spatial dimension while disregarding the influence of time on traffic flow. For example, within a commercial area, traffic flow exhibits fluctuations during particular time periods, such as holidays, resulting in a significant surge in traffic. This phenomenon cannot be sufficiently captured through spatial modeling alone. Consequently, it is essential to take into account both temporal and spatial correlations in order to improve the accuracy of traffic flow forecasts. Hence, several studies ([
14,
15,
16,
17,
18]) in the field of spatio-temporal modeling commonly employ a combination of RNNs and GNNs to capture the intricate spatio-temporal relationships within traffic data. The network model’s accuracy is increased by the attention mechanism, which enables the model to concentrate more on the data that are pertinent to the current issue. Therefore, the attention mechanism is widely employed in spatio-temporal methods for traffic flow prediction. By utilizing the attention mechanism, the model can prioritize the significant locations and time points associated with traffic flow changes, thereby furnishing additional information to elucidate the prediction results of the model. To effectively capture the spatio-temporal correlations in traffic data, several studies have included attention processes together with independent modules. This is demonstrated by models such as Spatio-Temporal Graph Convolutional Network (STGCN) [
18] and Attention-Based Spatio-Temporal Graph Convolutional Network (ASTGCN) [
19]. Initially, these models capture temporal correlations through a dedicated module, and this is followed by forwarding the extracted temporal features to a module responsible for capturing spatial correlations and incorporating an attention mechanism. However, this approach diminishes the captured spatio-temporal dependence. Consequently, certain models strive to devise novel graph structures to address the challenging problem of spatio-temporal correlation in traffic data. Spatio-Temporal Synchronous Graph Convolutional Network (STSGCN) [
20] addresses the challenge of capturing spatio-temporal correlations synchronously by constructing local spatio-temporal maps. Spatial Temporal Graph Neural Network (STGNN) [
21] integrates GRU and transformer [
22] models to capture both local and global temporal dependencies, thereby demonstrating the effectiveness of the attention mechanism in capturing the long-term temporal relationships of Spatio-Temporal Fusion Graph Neural Networks (STFGNN) [
23], which achieves the simultaneous capture of spatio-temporal correlations by generating spatio-temporal maps and fusing features. However, these models do not account for the dynamic spatio-temporal dependencies among the nodes in the traffic road network. To capture dynamic spatio-temporal correlations, adaptive graph convolutional recurrent network (AGCRN) [
24] automatically infers the interdependencies among different traffic sequences through two adaptive modules and a learnable node embedding matrix. However, it lacks the inclusion of attention mechanisms to capture both short-term and long-term temporal correlations. Graph Convolutional Dynamic Recurrent Network with Attention [
25] integrates the attention mechanism into the graph convolution and dynamic GRU to capture the long-term temporal dependencies in traffic flow. However, its performance is highly dependent on the K-value in the k-hopGC module, which employs a k-hop neighbor matrix to extend the receptive field of the GCN; thus, it requires several manual tests to determine the optimal K-value. Multi-Attention Predictive Recurrent Neural Network [
26] combines convolutional neural networks and predictive recursive neural networks to extract spatio-temporal information from traffic flow data. However, it still lacks sufficient focus on the correlation between global information and comprehensive features of traffic flow.
To address the above challenges, this paper introduces a novel deep learning framework, named Adaptive Graph Convolutional Recurrent Network with Transformer and Whale Optimization Algorithm (WOA-AGCRTN), which combines the transformer algorithm and WOA. The proposed network model has the capability to automatically infer the interdependencies among various traffic sequences while capturing both short-range and long-range spatio-temporal correlations within traffic road networks. This enhanced capability enables the model to effectively capture global spatio-temporal correlations. In summary, the key contributions of this paper can be summarized as follows:
We propose an adaptive graph convolutional recurrent network with the transformer algorithm. This network infers the interdependencies between traffic sequences and integrates the transformer technique to capture both long and short-term temporal dependencies.
We propose utilizing whale optimization algorithms to design an optimal network structure that aligns with the transportation domain, thereby aiming to significantly enhance the accuracy of traffic flow prediction.
The feasibility and advantages of the proposed network model are validated using four real datasets. The results from experiments on these datasets affirm the effectiveness of our method. In PEMS03, our model reduces MAE by 2.6% and RMSE by 1.4%. In PEMS04, improvements are 1.6% in MAE and 1.4% in RMSE. In PEMS07, a 4.1% MAE improvement and 2.2% in RMSE is exhibited. Moreover, in PEMS08, our model surpasses the baseline with a 3.4% MAE improvement and 1.6% in RMSE.
We effectively address the challenge of long-range time dependence and significantly improve the performance of the network model compared to several baseline methods, including the most recent state-of-the-art approaches.
The remaining parts of this article are structured as follows:
Section 2 presents some work related to traffic prediction and the swarm intelligence optimization algorithm.
Section 3 presents the methodology of the proposed WOA-AGCRTN, and
Section 4 presents and analyzes the experiment results. Finally,
Section 5 details the conclusions and the prospects of our research.
5. Conclusions
This paper introduces a novel deep learning framework for traffic flow prediction. Firstly, we propose an AGCRN that integrates the transformer algorithm to effectively capture long-range temporal correlations. Secondly, the whale optimization algorithm (WOA) is employed to automatically design a WOA-AGCRTN network structure that achieves optimal performance with limited computational resources. Our network framework’s efficacy and superiority for resolving traffic flow prediction problems are demonstrated by the results from the experiments on four real datasets. Specifically, in the PEMS03 dataset, it achieved a 2.6% improvement in MAE and a 1.4% improvement in RMSE. In the PEMS04 dataset, the improvements were 1.6% in MAE and 1.4% in RMSE, while the MAPE remained essentially the same as the best baseline. For the PEMS07 dataset, our approach demonstrated a 4.1% improvement in MAE and a 2.2% improvement in RMSE. Moreover, on the PEMS08 dataset, it surpassed the current best baseline approach by achieving a 3.4% improvement in MAE and a 1.6% improvement in RMSE. The experimental results demonstrate that the WOA-AGCRTN model achieved a good performance in traffic flow prediction across four public datasets.
However, using optimization methods to optimize the structure and hyperparameters of neural networks is a very computationally resource intensive task. It is often difficult to optimize a set of structures and hyperparameters for neural networks that perform well on larger datasets. In addition, for any neural network model, the optimal structure and hyperparameters will vary from dataset to dataset. We do not have enough computational resources to search for the optimal parameters and structure of the model for a new dataset. Therefore, we often need to re-optimize the model whenever we train on a new dataset.
Finally, traffic networks exhibit distinct structural and dynamical properties that differentiate them from other types of networks, such as social and biological networks. Therefore, in future research, the plan is to conduct a detailed analysis of the characteristics and dynamic correlations of various networks. We aim to leverage the strengths of the WOA-AGCRTN model in temporal and spatial domains and to explore its applicability in other tasks related to network structure analysis and time prediction.