1. Introduction
Intersections are the sites of collection and turn of vehicles, which can easily be the bottlenecks of restricting the entire road network operation efficiency fully. Improving the traffic efficiency of intersections has always been the concern of transportation researchers and engineers. Short-term traffic forecasting of the intersection operation state can provide real-time traffic information, which is helpful for traffic managers to optimize signal control schemes to mitigating traffic delays.
The main traffic parameters of intersections, including the passing time, traffic speed and waiting time, etc., are used to detect the intersection’s operating performance. Among these parameters, the passing time and traffic speed can intuitively reflect the intersection’s overall operating performance [
1]. With the development of traffic sensors, especially the emergence of mobile Internet, it is possible to extract traffic parameters from large-scale traffic data. The mobile sensors-equipped vehicles (i.e., the floating car) can monitor the traffic operation performance of large-scale intersection groups at low cost [
2]. It can transfer the real-time traffic information to the database by the Global Positioning System (GPS) and modern communication technology, which have gradually become the mainstream approach to probe the intersection operation performance. The GPS is one of the Global Navigation Satellite System (GNSS), which can accurately locate the vehicle trajectory in real time [
3]. The GPS system can track vehicle trajectory and collect traffic data, which usually contains temporal information (i.e., timestamps), spatial information (i.e., longitude and latitude) and speed information [
4]. Meanwhile, we also consider the external factors that affect the intersection operation performance, such as weather conditions, temperature, wind speed and precipitation. Since bad weather conditions will affect the speed of vehicles, resulting in delays at intersections [
5].
Nowadays, there are two methods to extract traffic parameters, including the digital-map-based [
6,
7] and grid-based methods [
8]. Even though the digital-map-based method is a higher accuracy approach, it has two drawbacks: computational complexity and the high quality of the GIS Digital Map. Especially for the intersection, it is sometimes difficult to obtain a high-precision digital map to correspond to the more complex structures. Recently, we proposed an efficient grid-based method to visualize the signal intersections and evaluate the intersection operation state [
9]. The grid-based method can discretize intersections into grids to improve the efficiency of extracting the traffic parameters. This efficient grid-based method is extended for the intersection. Moreover, based on the grid model, we can use the clustering method to identify the affected areas to explore the spatial traffic feature.
In the traffic prediction stage, the short-term traffic prediction development is based on the traditional statistical methods such as the Historical Average (HA), Autoregressive Integrated Moving Average (ARIMA) [
10] and Kalman filter (KF) [
11], etc. It is noted that the mathematical statistics-based methods have three main drawbacks: certain ideal assumptions, insufficient computing and massive multidimensional data. Primarily, it is inadequate to predict the complex real-time traffic state of intersections. With the development of computing technology, machine learning methods have made up for statistical defects. For instance, Support Vector Regression (SVR) [
12], artificial neural networks (ANN) [
13] are typical machine learning methods, which have been successfully applied to traffic flow forecasting. It is noted that the machine learning models above are difficult to deal with the large-scale Spatio-temporal traffic data.
With the development of intelligent technology, the deep learning method emerges as the times require. The typical time series model is the long short-term memory (LSTM) [
14], which is evolved from the Recurrent Neural Networks (RNN) [
15]. Compared with the RNN model, the LSTM has more advantages in processing long short-term time-series data, whose model structure is more advanced. Furthermore, Gated Recurrent Unit (GRU) model is a variant of the LSTM, which has a more concise structure and promotes faster convergence [
16]. However, neither LSTM nor GRU model can capture the spatial traffic feature in intersections. In our previous research, we used the grid method to extract the spatial feature of floating car data in the expressway [
16]. Noted that the intersection’s spatial traffic features are more complex, which is affected by travelers’ route choices and management control. In terms of spatial correlations, the convolutional neural networks (CNN) model is a mature method to extract spatial traffic features [
17]. Even though the CNN model can extract intersections’ spatial features, the CNN model cannot reveal the intersection’s topology structure. The most plausible way to incorporate intersections topology into the deep learning model is to employ a graph convolutional neural network (GCN) [
18,
19]. Furthermore, to extract the Spatio-temporal traffic feature simultaneously, the fusion deep learning (FDL) models were proposed to predict the traffic Spatio-temporal state, such as the Convolutional LSTM Network (i.e., ConvLSTM) [
20], CNN-LSTM fusion model [
21] and GCN-GRU model [
22]. Moreover, to distribute the weight of the FDL model reasonably, the attention mechanism is adopted to capture the corresponding weight of the MFDL model, which has been applied to short-term traffic prediction [
23]. The ResNet has been applied to traffic prediction, such as the traffic flow predictions [
24,
25]. To our best knowledge, however, the ResNets rarely apply to predict the traffic performance of an intersection. Therefore, we combine the residual network with CNN model to extract the spatial features of intersections.
This research focuses on devising the multi-task fusion deep learning model (MFDL) to predict the intersection operation performance. The multi-task learning method has been applied in traffic fields such as network traffic classification [
26], traffic flow forecasting [
27] and traffic demand prediction [
28]. Moreover, the multi-task learning method is rarely used in intersection prediction. The two main reasons are that the intersection’s traffic state is more complex and the high-precision data is not easy to obtain. This paper selects the passing time and speed as the prediction targets, which can reflect the process state and the visual state of the intersection. Compared with the existing multi-task model, we adopted the attention mechanism to assign the weight of variables to achieve feature fusion automatically. The multi-task learning method considers the difference between passing the time and speed and sharing each task’s common feature, which is a more comprehensive intersection operation state prediction. The contribution of this work is as follows.
First, based on the grid model, the traffic parameters are extracted. The grid-based method can identify the intersections rapidly without the digital map, which is easily transferred to other cities.
Second, the residual network (ResNet) is incorporated into the MFDL model to enhance the depth of the model, which contributes to alleviate the problem of gradient disappearance. Furthermore, based on the GCN model, the intersections’ topological propagation patterns are also considered, which previous studies were rarely involved.
Third, the MFDL framework integrates the deep learning method, preprocessed the data, extracts the traffic feature of the intersection, and finally predicts the passing time and speed of the intersections. Compared with the benchmark models, the MFDL model not only captures the Spatio-temporal traffic feature of intersections but also has better accuracy and robustness. Meanwhile, compared with the signal-task model, the MFDL can significantly improve the prediction accuracy and efficiency. The MFDL can be easily transferred to other cities for the traffic operation performance prediction of the intersection.
Using the intersection groups of Beijing as a case study, the proposed method’s accuracy and stability are demonstrated. The rest of this paper is organized as follows.
Section 2 describes the floating car data and details the proposed methodology.
Section 3 conducts a case study from the Beijing core area and an in-depth analysis of the experimental results. Finally,
Section 4 ends with major conclusions and discussions for future research.
2. Methodology
Figure 1 shows the multi-task fusion deep learning method framework, which includes the data process procedures, the model construct and model evaluation. For the first and second sections, according to the reconstructed intersection, the coordinate of floating car data (i.e., FCD) would be transformed to a based-grid coordinate to extract traffic parameters. For the third section, we construct the multi-task fusion deep learning model by fusing the CNN, GCN, LSTM and ResNet, an attention mechanism. For the fourth section, after entering the dataset into the model, we evaluate and verify the model’s effectiveness.
2.1. Data Preprocessing
The sampling rate of the obtained floating car data is 3 s, obtaining from DiDi company, one of the companies with the largest number of car Hailing users globally. The FCD data mainly contains three kinds of information: temporal, spatial and speed information. The variables and attributes of FCD are lists in
Table 1.
The data used in this case study include about 110 million floating cars’ trajectories, which the geographic coordinate ranges from 116.399450 to 116.431700 and the 39.947934 to 39.966344. The data are the same as those used in the previous study [
9,
29].
Figure 2a shows the trajectory points, and according to trajectories, we can obtain the topology of the intersection (see
Figure 2b). We select ten intersections as the target objects in this study area, whose ID from No.0 to No.9 (see
Figure 2b). The selected intersections cover three types, including the regular intersection with four legs (i.e., No.0 to No.7) and the intersection with three legs (i.e., No.8 and No.9).
Due to the block of GPS signal and the error of hardware/software in the process of data collection and transmission, there may be some errors in the original floating car data. It is necessary to preprocess the FCD to reduce the impact of error data and improve the prediction results accuracy. A more detailed data preprocessing process has been described in previous studies [
9,
16,
29]. This paper introduces two main steps as follows:
Step 1: Remove the FCD, which is out of range of the intersection area; Delete the FCD whose speed is over 90 km/h; Eliminate the redundant FCD, which is similar or duplicate data.
Step 2: Replenish the missing data by the interpolation method due to the weak satellite signal or operation error during the FCD collecting.
2.2. The Grid Model
In this study, the intersection area range (i.e.,
) is set to 300 m, covering the range of intersection proposed by the previous study [
30]. In the intersection area, the intersection will be divided into discrete grids with a side length of
; If the grid size is too large, it may be out of the intersection area. Furthermore, if the grid size is too small, it cannot cover a floating car adequately. Previous study thought that the grid size of the identified intersection should be range from 1 to 10 m [
31]. Given the above consideration, the length of the grid is set to 5 m, and the intersection area can be composed of
, and the
in this study (See
Figure 3).
After dividing the intersection area into square grids, the floating car data is matched to the grid by the basic arithmetic algorithm, which is efficient and simple. Thereby, the FCD is transformed into the grid-based dataset (i.e., GFCD). Mathematically, the algorithm is as follows:
where
are the right, left, up and down boundaries of the intersection area;
are the number of rows and columns;
is the grid coordinate that a trajectory point belongs to, whose latitude and longitude are
.
2.2.1. Identification of Traffic Intersection Area
Due to the unique Spatio-temporal characteristics of each signalized intersection, the corresponding range of the influence area of signalized intersection is also different. If we define the influence area of signal intersection as a fixed area, it cannot sufficiently reflect the unique traffic characteristics of a signalized intersection. It is well known that the intersection is the bottleneck in the road network, where the stop-and-go phenomenon is also the most common. Therefore, we extracted the stop feature to define the range of the intersection. The stop frequency in a special area at intersections varies with the GFCD spatial feature. In general, the entrance area is more distinguished than other areas since the stop frequency in the entrance area is higher than the upstream and inner areas. Based on this feature, we constructed the stop dataset of GFCD by making statistics of stops in each intersection area grid.
Based on the stop dataset, the intersection area can be clustered into three groups, including the upstream area, the entrance area, and the nearby stop-line area. In order to distinguish the three areas, this study adopts the fuzzy C-means (FCM) clustering [
32] method to identify the clusters. FCM clustering combines the essence of the fuzzy theory and provides more flexible clustering results [
33]. In most cases, the traffic areas cannot be divided into obviously separated clusters. The membership degree of the FCM clustering method ranges from 0 to 1, which is suitable for the traffic stop scenario clustering. Based on the FCM, the function of FCM is as follow:
where
is the membership degrees, which is restricted by the normalization rules (i.e.,
), where
is the number of GFCD and c is the number of cluster center;
is the Euclidean Distance;
is the spatial vector of GFCD;
is the spatial vector of cluster centers;
Under the constraint
, to minimize the objective function, it can be got through determining the derivation of Lagrangian. Then, iteratively update
and
by using the following equations:
According to the prior knowledge about the stop dataset, the number of clusters is set to three clusters: the low, medium and high frequency of stops corresponding to three areas (i.e., upstream, the entrance and nearby stop-line area). For the Beijing center area case, the intersection can be clustered into several groups (see
Figure 4).
Figure 4 presents the GFCD cluster result corresponding to the three areas, which shows the effectiveness of the proposed method. It intuitively can be seen that the cluster 1, 2 and 3 represent the upstream, entrance lane and nearby stop-line area. Moreover, the stop lines’ central area of intersections can be determined by the highest frequency of the stop. It is noted that the central areas of intersection 0 and 2 contain a part of cluster 2 since the left-turn phase is not set. This leads to the obvious conflict point between straight and turning left direction, leading to a high stop frequency in the central area. Furthermore, according to the different clustering results, we constructed the dataset of upstream, approach and central area of the intersection.
2.2.2. Identification of the Floating Car Trajectories Direction
After defining the intersection area range, the turning direction will be identified based on the grid model. First, the legs of the intersection are identified by GFCD and label (see
Figure 5). The intersection legs can be divided into five areas, and the points of FCD are then mapped to the five areas. Then, based on the order of entering and exit Area ID, the direction of trajectory is identified, respectively. Taking southbound straight as an example, the entrance
is 2 and the exit Area ID is 1, and it also passes through the central area of the intersection (i.e.,
). Based on the grid model’s algorithm, it is simple to identify the direction of the floating cars’ trajectories passing through the intersection.
2.3. The Multi-Task Fusion Deep Learning Model
The multi-task fusion deep learning (MFDL) model architecture is composed of Residual Network, GCN, CNN, LSTM and Attention mechanism. Since the LSTM or the GRU only extracts the temporal information, the proposed model architecture can capture the Spatio-temporal feature and the topological structure of the intersection.
2.3.1. The Residual Network
More road network features can be extracted by the deeper model [
16]. However, the deeper models are easy to encounter the problem of gradient explosion and vanishing. To solve the problem, the Residual Network (i.e., ResNet) emerges as the times require [
34], whose core idea makes the model deeper through the skip layer connection (see
Figure 6a). The “Conv” indicates a convolutional layer, “BN” denotes a batch- normalization layer, and “ReLU” represents an activation layer. In this study, we use the improved structure of ResNet, which can solve the vanishing or exploding gradient problem better (see
Figure 6b). The ResNet model is shown as follows:
where the
is the residual block input;
is the residual block output.
2.3.2. The GCN
The road network can be regarded as a topological structure composed of points and lines, which points are intersections and lines are roads. When capturing the spatial feature, the CNN models cannot process the non-Euclidean structure’s data and extract the intersection’s topological relationship. In contrast, the Graph Convolutional Network (i.e., GCN) can make up for this defect [
22]. It can capture the intersection of topological dependencies. In this study, the intersections are defined on the graph and focus on the structured traffic time series of pass intersection time (see
Figure 7). At the time step
, the intersections graph can be defined as
. The observation
is the set of vertices, corresponding to the observations from
approaches in the intersections and the
is the set of edges, indicating the connectedness between approaches in the intersections, while the
is the weighted adjacency matrix of
.
The GCN function can be defined as follows:
In Equation (8), is an activation function; , is the adjacency matrix, represents the identity matrix; denotes the diagonal node-degree matrix of ; the is the weight of the parameter matrix.
It should be noted that the calculation of stacking multiple GCN layers is more complex, and the gradient is easier to disappear [
35]. Furthermore, with the deeper GCNs arising, the over-smoothing will make the features of the same vertex indistinguishable and debase the forecast accuracy [
36]. Therefore, the ResNet GCN is proposed to make up for them. Then the graph signal
of intersection is transformed to
as the ResNet input. The
has the same shape as the
and the contains the topological information between the intersections.
In Equation (9), is the Laplacian matrix (see Equation (9); is the input; is the approaches of intersections; is the time steps for approaches in intersections.
2.3.3. Multi-Task Fusion Deep Learning Method
In this section, a novel multi-task fusion deep learning model framework is proposed to realize the intersection operation performance forecast. The framework integrates the historical pattern, real-time pattern, spatial pattern, topological structure and weather condition to predict the passing time and speed of the intersection. Herein, there are four variable groups in the multi-task fusion deep learning method (see
Figure 8). In the first variable group, the passing time is used as the input variable to capture the temporal feature. The second variable group extracts the intersection topological information. The third variable group captures the Spatio-temporal features of speed. The fourth variable group shows that the effect of weather on prediction accuracy. The fusion section is used to fuse the information from four variable groups.
For the passing time variable group, the passing time is the most intuitive parameter representing the intersections’ operation performance [
9]. Noted that the historical passing intersection time reveals the normal propagation rule and verifies the real-time passing time pattern. Therefore, this section adopts the historical and real-time pattern of passing intersection time as the input matrix. We extracted the passing time from the floating car data (see
Figure 9).
According to the definition of intersection region effect, for intersection approach
in different intersections, the
trajectory’s passing time can be calculated by
. Since the trajectory points do not coincide with the boundary of the intersection approach, it is necessary to estimate the time stamp of the approach boundary (i.e.,
) based on the acceleration formula. In Equations (10) and (11), by calculating the distance (i.e.,
, R is the radius of the earth, | means or) between FCD point and the boundary of the intersection approach in geodetic coordinates, and combining with the time interval (i.e.,
Tin), we can obtain the time difference between the points and the boundary of the intersection approach. Therefore, according to Equations (10) and (11), the average passing time (i.e.,
) can be calculated by Equation (12):
where the
are the coordinates of the entrance and exit boundary of the intersection, respectively;
are the coordinates of the trajectory points that enter and exit, respectively;
represents the time interval, and the time interval of the floating car data in this study is 3 s.
The passing-time matrix
is given by:
The input of the passing time variable group (i.e.,
) is given by:
where
is the number of entrances of intersections, and
is the time steps for each entrance of intersections,
presents the real-time pattern,
presents historical patterns. In the passing time variable group, the three-time steps (i.e.,
) are adopted to predict
passing time. For example, when time granularity is 10 min, there are 96 time-slices in the daytime (i.e., from 6:00 to 22:00), and the dataset possesses 31 days of data. Therefore, for the 92 entrances lane at intersections, the input dimension matrix
is [92 × 2 × 96 × 31]. The passing time matrix is divided into two datasets, the training dataset, whose proportion is 70% (i.e., [92 × 2 × 96 × 31 × 70%]), the test dataset, whose proportion is 30% (i.e., [92 × 2 × 96 × 31 × 30%]).
For the speed variable group, because of the spatial correlation between the upstream, inner areas and the entrance area, we selected three parameters: the upstream average
, the inner average speed
and the entrance speed as the input variables
. The speed matrices
,
,
are defined as Equations (16)–(18). The input of the speed variable group (i.e.,
) is given by Equation (19).
In the graph variable group, the intersection group’s topology has a great influence on the passing time. The ResNet GCN model is adopted to capture the topology of the intersection group. The passing time and average speed are input into the ResNet GCN model as the graph signal, respectively. According to Equations (9), (13) and (17), the graph variable can be defined as:
In the weather variable group, we consider four categories of weather variables, including temperature (i.e., TE, measured by Celsius degree), atmospheric pressure (i.e., PR, measured by Pascal), wind speed (i.e., WS, measured by a mile per hour) and precipitation (i.e., RA, measured by millimeter). The weather condition obtains one value per hour (see
Table 2). The data is obtained from the free meteorological data website called “Wheat A” (Wheat A) [
37]. To correspond to the time granularity of the average passing time, the time slice of weather-condition data should be transformed to the corresponding time bucket (e.g., the weather-condition from 6:00 to 6:10 will be equal to the recorded data from 6:00 to 7:00 (see the first row in
Table 2). Similarly, according to the data division rules, the weather-condition data should be split into training and test dataset.
The input of the weather variable matrix is given by Equation (21).
In the feature fusion layer, the attention mechanism can distribute the different weights of the features from the neural network layers. In this paper, the attention layer is proposed to capture weight scores of different time steps, usually assigning a heavier weight score to adjacent time periods and a lower weight score to distant time periods [
38].
where
is a matrix consisting of output vectors
,
is the length of the vector. the
, represents the feature variable from four subsection;
are the weight of the different features; and the
is the Hadamard product.
2.4. Model Configuration
The model experiment was implemented using Python 3.6 with Tensorflow [
39], Keras [
40] on Windows 10 for comparing the models. The experiment’s platform’s calculation cell is constructed with 32 CPU cores, 64G RAM and NVIDIA GeForce RTX 2080 GPU to meet the requirements of this experiment.
There are four subsections, the feature fusion and the output section, in the model framework. Moreover, the Rectified Linear Unit (ReLu) is used to solve the exploding/vanishing gradient problem. The dropout is set to 0.5 to prevent over-fitting. Furthermore, the Adam algorithm is proposed to update the parameters of the neural network. The partial hyper-parameter settings are shown in
Table 3.
2.5. Models to Be Compared
This section will feed the training and test dataset to the proposed MFDL and benchmark models, respectively. Three typical benchmark models are selected to compare with the proposed MFDL model, including the mathematical statistics-based models (MS) (e.g., ARIMA), machine-learning-based model (ML) (e.g., SVR) and the deep learning model (DL) (e.g., LSTM, GRU, CNN and ConvLSTM). To ensure fairness, the following benchmark algorithms have the same input features (the same category and the time interval).
MS model: For the ARIMA model, we use the Akaike Information Criterion (AIC) as the standard to select the optimal model. Noted that it is difficult for the ARIMA model to capture the Spatio-temporal feature of the intersections, so we constructed 92 models to represent the 92 intersection entrance lanes.
ML model: Two main parameters selection of SVR (i.e., the penalty coefficient “C” and the parameter “Gamma”) are based on the cross-validation, and the kernel function is set to the radial basis function.
DL model: For LSTM and GRU both have two hidden layers and 128-unit neurons. For the CNN and ConvLSTM, the kernel size is 2 × 2, and the kernel layers are set to 32 and 64 filters, respectively.
MFDL model: We consider four models: the MFDL without weather (i.e., No weather), the MFDL without graph information (i.e., No Graph), the MFDL without CNN (i.e., No CNN), and the MFDL model.
The time lag is set to 10 min, and the hyperparameters are set the same as the proposed model. Then we use the RMSE and MAE to measure the total predictive accuracy of fitting in the whole test data and use WMAPE to measure the models’ predictive performance.
2.6. Loss Function and Evaluation Metrics
In order to compare the proposed fusion deep learning model framework with the benchmark models, three indicators are required to evaluate model performance, including the Mean Absolute Error (MAE), the Root-Mean-Square Error (RMSE) and weighted-mean-absolute-percentage error (WMAPE). The mean-squared error (MSE) is adopted as the loss function of speed and the passing time. Furthermore, the weight of loss is set to 0.5. These definitions are as follows:
where
n is the number of test samples;
is the real values;
is the predicted values;
is the average values.
4. Discussion
In this study, we constructed a multi-task fusion deep learning framework for intersection traffic operation performance prediction. The passing time and the speed are selected to be prediction targets, which can reflect the process state and the visual state of the intersection operation performance.
In the data-collecting stage, the floating car data is used as the data source to verify the prediction model’s availability, which can reflect the traffic performance of large-scale intersection areas. The floating car data can accurately describe the upstream and downstream traffic state of the intersection, which makes up for the shortage of small coverage of fixed sensor data [
16].
In the parameter extraction stage, we adopted the grid model, which can identify the intersections rapidly without the digital map. The novel grid model is proposed to extract the traffic parameters from the floating car data. On the one hand, the grid model simplifies the complex map-matching algorithm and improves the efficiency without the digital map. On the other hand, the intersection affected area and the direction of GFCD can be identified by the grid model and the fuzzy C-means (FCM) clustering method, which exceeded the limit of fixed influence area of intersection [
44]. It indicated that the grid model has significant universality, which can be applied to other cities.
In the construction model stage, we design the MFDL framework of four variable groups. Meanwhile, to enhance the depth of the model, the ResNet is incorporated into the MFDL model, which can enhance the depth of the model and alleviate the problem of gradient disappearance [
45]. The MFDL can capture the temporal, spatial, topology feature of the passing time and speed and the prediction results are promising. The two target predictions are negatively correlated, and the interplay between these two targeted prediction variables can significantly improve the prediction accuracy and efficiency, which is consistent with the studies of Kunpeng Zhang et al. [
46]. This proposed method predicts the intersection operation performance in real-time and can provide valuable insights for traffic managers to improve the intersection’s operation efficiency.
There is more work ahead in the future development of this study. First, although accurate speed and time can be extracted from a single data source (i.e., FCD), it is difficult to estimate the actual traffic flow because the permeability cannot be obtained [
47]. In order to comprehensively detect the operation performance of signalized intersections, it is necessary to import multi-source data, such as induction loop data [
48], microwave data [
49], etc., to extract traffic flow information. Second, in the passing time pattern, we consider the real-time and historical pattern. In future work, we will consider more passing time patterns to improve the accuracy. Lastly, the existing amount of data is enough to support the construction and validation of the model. Naturally, if a larger range of floating car data can be obtained in the future, using this proposed model to predict the traffic performance of intersections and validate the model will better reflect the universality of the model.
5. Conclusions
In this paper, a multi-task fusion deep learning framework is proposed for intersection traffic operation performance prediction. The passing time and the speed are selected to be prediction targets, which can reflect the intersection operation performance. The main conclusions of this paper are summarized as follows.
MFDL model enables us to capture the Spatio-temporal and topology feature of the traffic state efficiently. Comparisons with benchmark models show that the fusion deep learning model achieves the best prediction accuracy and robustness among the baseline model in different time granularity. In the process of STL and MFDL comparison, when the time granularity is 10 min and the epoch is 50, the training time is reduced by 8.3 min, and the efficiency increased by 46.42%, which means that the MFDL is more efficient. In the analysis of weather factors, the precipitation has a more significant impact on the prediction than other weather factors in different time granularities.
Future work will concentrate on exploring the novel deep learning structure based on the fusion method. For the influence factors of intersections operation state, we will consider multi-source input variables, including the time scheme, traffic flow, waiting time, etc., to improve the prediction accuracy.