2.1. Data-Driven Models for Traffic Prediction
Given that enough data are available, data-driven models are more appropriate for traffic prediction, specifically neural network approaches [
1]. The function
of data-driven approaches that relates the explicative variables with the target variable is usually determined by using statistical inference and machine learning techniques. In this paper, these traffic prediction methods fall into two categories: the methods considering departure time and the methods not considering departure time. The methods considering departure time are SARIMA and the LR. The methods that do not consider departure time are k-NNR, SVR and neural networks (NNs).
Seasonal Autoregressive Integrated Moving Average is a time-series analysis method for traffic prediction. ARMA is a combination of Autoregressive (AR) model and Moving Average (MA) model. ARMA is usually used for traffic prediction under stationary traffic dynamics where the mean, variance and auto-correlation are unchanged. ARIMA is a generalization of ARMA. When the traffic dynamic is non-stationarity, the initial differencing step of ARIMA is used one or more times to eliminate the non-stationarity.
Ahmed and Cook [
7] proposed using ARIMA to predict short-term highway traffic flow. The experiments show that ARIMA (p, d, q) of order (0, 1, 3) can better model the given datasets. Many ARIMA-based models have been proposed for traffic prediction in the past decades. For example, ARIMA model of order (0, 1, 1) is proposed to reproduce all original time series. The experiments show that it is the most adequate model to reproduce all original time series [
8].
If a seasonal component is added into ARIMA, we gain the structure denominated model called SARIMA. In [
9], Williams et al. asserted that a one-week lagged first seasonal difference applied to discrete interval traffic data will yield a weakly stationary transformation. Resting on this assertion and the Wold decomposition theorem, Williams et al. presented a theoretical hypothesis that a univariate traffic data stream can be modeled as SARIMA. To validate the theoretical hypothesis, they performed experiments based on actual dataset from ITS and the empirical results are consistent with the theoretical hypothesis [
9].
For traffic prediction, LR is one of the most typical non-parametric methods. Rice et al. discovered two naive predictors of travel time
: historical mean travel time
and current status travel time
[
10]. They concluded that travel time has a linear relation with its two naive predictors. Based on a dataset gathered from 116 single loop detectors along 48 miles in Los Angeles, they compared their method with other methods including principal components method and nearest neighbor method. The correctness of the conclusion is validated by the comparison.
In [
11], k-NNR method is suggested as a candidate forecaster and is used for traffic prediction. The output value of k-NNR is the weighted average value of its k nearest neighbors on departure time. The empirical study describes the accuracy by comparing of the k-NN regression method to simple univariate time-series forecasts. In [
12], k-NNR method is used to predict short-term traffic flow. In [
13], k-NNR method is used to develop a model for dynamic multi-interval traffic volume prediction.
The idea of local weight is effective for traffic prediction. In [
14], the effectiveness of local weight is validated. To suggest an approach for large-scale travel time prediction, Nikovski et al. presented an experimental comparison of several non-parametric methods [
14], including LR, locally weighted regression, regression trees, k-NNR and neural networks. Although the non-linear methods have expected superiority over LR method, the locally weighted regression is the only non-linear method that can consistently outperform linear regression.
Given classification from the perspective of machine learning, the frequently used non-parametric methods usually fall into two categories: SVR and ANN.
The SVM method non-linearly maps input vectors to a very high dimension feature space, in which the linear decision surface for classification is constructed. SVM method does not depend on the dimension of input vectors. The high dimensionality space of SVM has high generalizability and big advantage for classification [
15]. The SVR method is based on the SVM method. Thus, SVR-based methods are proposed for traffic prediction and achieve good performance. In [
16], the SVR predictor with a radial basis function (RBF) kernel is proposed for travel time prediction. Based on the real highway traffic data, the proposed method achieves better performance than current-time predictor and historical-mean predictor. In [
17], the incremental SVR method was proposed for traffic flow prediction. Based on the data sequentially collected by probe vehicles or loop detectors, the experimental results show that the proposed method is superior to the back-propagation neural network. In [
18], an online version of SVR is proposed for short-term traffic flow prediction under atypical conditions (such as vehicular crashes, inclement weather, work zone and holidays).
Traffic prediction is based on modeling the complex non-linear spatiotemporal traffic dynamics in road network [
1]. Many different types of neural networks have been proposed for traffic prediction including auto-encoders (AEs) [
19], multi-layer perceptrons (MLP) [
20] and recurrent neural networks [
5,
6,
21].
In [
19], Yisheng et al. rethought the traffic prediction problem based on deep architecture models, because existing traffic prediction methods are shallow and cannot live up to many real-world applications. Auto-Encoders (AEs) are proposed to learn generic traffic flow features and to predict traffic flow. AEs are trained on a greedy layer-wise way. Experimental results demonstrate that the proposed method for traffic flow prediction has superior performance compared to the baseline methods.
Polson et al. proposed a deep learning model for traffic flow prediction. The proposed model consists of a sequence of full-connection layers with activation function
to extract features. The proposed model is the same as MLP. The experimental data are from 21 road segments. The road segments span thirteen miles and are the major corridor that connects Chicago’s southwest suburbs to its central business district [
20]. Based on the experimental data, the effectiveness of the proposed model is validated. In experiments, the amount of input nodes is the amount of road segments.
The topology of neural network should be derived from traffic-related consideration. Thus, Elman RNN [
22] is referred to as SSNN in [
1]. The topology of SSNN consists of input layer, hidden layer, output layer and context layer. The input layer receives traffic data (such as traffic flow and average speeds) on the main carriage way, on-ramps and off-ramps (if available). The output layer consists of one neuron that calculates the predicted travel time. The context layer stores the previous internal states of the model. The hidden layer receives inputs from input layer and then stores them in context layer, finally transforming them into output layer.
CNN methods may improve the predictive accuracy by transforming traffic into images and using the implicit correlations in the nearest neighbors [
2,
3,
4]. In [
2], a CNN method is proposed for large-scale, network-wide traffic speed prediction. In [
3], a fusion of CNN and LSTM is proposed for short-term passenger demand prediction. In [
4], a CNN method with an Error-feedback RNN is proposed for continuous traffic speed prediction.
In [
6], LSTM NN is proposed for traffic prediction. Based on the traffic speed data from two microwave traffic detectors that are deployed along the express-way without signal controls, the effectiveness of LSTM NN for traffic prediction is validated. In [
5], LSTM NN is proposed for travel time prediction. Evaluation experiments are made based on the travel time dataset provided by Highways England. The experimental results show that travel time prediction error is relatively smaller than the baseline methods and 7.0% is the approximate median of the mean relative error of 66 links.
In summary, various techniques are involved in traffic prediction to improve the performance of traffic prediction. These prediction methods are proposed and evaluated based on specific traffic data separately, thus it is difficult to say which method is definitely superior over other methods in every situation. Neural networks can better capture complex non-linear spatiotemporal relationship. Neural networks, especially LSTM NN, are promising for traffic prediction.
The existing LSTM NNs for traffic prediction have two drawbacks: they do not use the departure time through the links for traffic prediction, and the way of modeling long-term dependence in time series is not direct in terms of traffic prediction. Thus, we assume that the way of modeling long-term dependence in time series may be improved by using more direct access. Travel time is correlated with departure time because traffic dynamics have some periodicity with departure time when traffic flow is not free. Thus, LSTM NN for traffic prediction may be improved by efficiently using of departure time.
2.2. Attention Mechanism
Attention mechanism has recently succeeded in image classification [
23], neural machine translation [
24], multimedia recommendation [
25] and many other tasks, because it can concentrate on the effective parts of features adaptively.
In the tasks for image classification, it is computationally expensive to apply convolutional neural networks (CNN) on large images because the computational cost of CNN scales linearly with the number of pixels of input images. To address the problem of enormous computation cost, a novel RNN model with attention is presented in [
23]. The attention mechanism helps the proposed model to adaptively select a sequence of regions from image or video and only processes the selected regions at high resolution.
In the tasks for machine translation, attentional mechanism selectively focuses on the effective parts of input sentences during translating and improves the accuracy of machine translation. Luong et al. proposed two approaches of attention mechanism for machine translation: a global approach that always attends to all source words and a local approach that only considers a subset of the source words at a time [
24].
In the task of multimedia recommendation, existing collaborative filtering systems (CF) ignore the implicitness in the users’ interactions with multimedia content. In [
25], a two-layer attention mechanism is proposed to extract implicit feedback. The bottom layer adaptively selects the informative implicit feedbacks on component level. The upper layer adaptively selects the informative implicit feedbacks on item level. The selected implicit feedbacks are incorporated into the classic CF model with implicit feedback.
In this paper, the attention mechanism is proposed to address the two drawbacks of LSTM NN. The traditional recurrent way to construct the depth of LSTM NN is substituted by attention mechanism. The attention mechanism is over the output layers of LSTM NN to model long-term dependence. The departure time is used as the aspect of the attention mechanism and the attention mechanism is used to integrate departure time into the proposed model.