Des Prediction
Des Prediction
Des Prediction
net/publication/348165903
CITATIONS READS
7 336
6 authors, including:
Chaocan Xiang
Chongqing University
59 PUBLICATIONS 765 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Crowdsourcing; Urban Logistics; Travel Route Planning; Urban Data Co-mining View project
All content following this page was uploaded by Chengwu Liao on 13 July 2021.
locations can be used to narrow down the candidates, but also tiality of GPS points in the urban space.
show that they have the different discriminative capability in • We propose a dual BiLSTMs neural network with atten-
determining the destination. What is more, we can also observe tion mechanism to further model the relationship between
that the sequential context of each discriminative location the heading destination and the bidirectional sequential
also plays an important role in determining the destination. context of visited locations while capturing locations’
More specifically, on the one hand, the sequential context discriminative capability in determining the destination.
corresponding to the taxi’s moving direction among different Two BiLSTM components are corresponding to two
visited locations can be used to remove the candidates in the kinds of GPS embedding sequences, and they share the
opposite direction. On the other hand, the sequential context learnable weights in the attention mechanism.
can provide complementary information for determination. For • Extensive experiments using real-world datasets are con-
example, both the d2 and d3 are in the potential heading ducted to evaluate our proposed model in the passenger’s
positions for the taxi at l2 , but the sequential context of destination prediction task. The results demonstrate the
l2 (from l0 to l2) indicates that the taxi had passed by d2 advancement and effectiveness of our model.
earlier. Thus, the taxi at l2 could exclude d2 from candidates. The rest of this paper is organized as follows. In Section II,
In conclusion, these observations indicate that the heading we briefly discuss the related studies. In section III, we
destination has certain inherent relationship with the sequence introduce several basic concepts and the main problem in this
of visited locations by the taxi, in which some discrimina- study. In Section IV, we elaborate on the methodology of our
tive locations’ spatial and sequential context can determine proposed framework. Then we present the experiments and
the heading destination cooperatively. However, due to the case study in Section V. Finally, we conclude our work and
collected GPS trajectory is raw and unfinished, it is difficult outlook the future research directions in Section VI.
to identify those discriminative locations (e.g., l3 , l4 ) and
their different discriminative capability in determining the
destination. Moreover, the correlations between the heading II. R ELATED W ORK
destination and the numerous sequential context of locations In this section, we review the related work on predicting
are complicated and also difficult to be learned. Meanwhile, taxi destination by leveraging trajectory data, which can be
we can find that how to represent the raw GPS trajectory is a broadly grouped into two categories, i.e., statistical methods
preliminary yet fundamental issue. and neural network methods, respectively.
In this paper, we first hierarchically divide the urban space
into grid cells, and propose two different GPS embedding
methods to represent the GPS trajectory as two embedding A. Statistical Methods
sequences which can convey the geographic proximity and Statistical methods are widely adopted in predicting the
multi-scale spatial context of GPS points respectively. On top destination for a given query trajectory, which can be further
of that, we propose a dual Bidirectional LSTMs neural network divided into the probability methods and the index meth-
(BiLSTMs) to further model the relationship between the ods [2, 16, 17, 26, 28, 29, 36]. A typical probability solution
heading destination and the bidirectional sequential context of is a combination of the Markov model and the Bayesian
visited locations, while capturing the different discriminative inference model, in which the first model is used to capture the
capability of locations in determining the destination by using transition probabilities of locations from historical trajectories
the attention mechanism. Additionally, the OT (origin and while the second is used to estimate the probabilities of
time) information is aggregated into the neural network as each candidate destination [28, 27, 29, 36]. In more details,
auxiliary features. Finally, the neural network outputs the the study in [29] first employs the Markov transition matrix
probabilities of candidate destinations (i.e., the clusters of multiplication to obtain the transition probabilities between
historical passenger’s destinations in the city). two locations, then further determines the most likely future
In short, we propose a novel framework to predict the location to improve the final Bayesian inference. A different
taxi-passenger’s destination based on the partially left taxi probability solution can be found in the T-DesP model [17].
trajectory. The major technical contributions of this paper can It proposes an Absorbing Markov Chain model to deduce
be summarized as follows. the transition probability between each location pair in the
• We model the underlying relationship between the head- temporal space, then constructs an absorbing tensor for the
ing destination and the sequence of visited locations destination prediction with a theoretical model. In terms
from the partial taxi trajectory to predict the passenger’s of the index method, the DESTPRE model in [26] uses a
destination. This study is also an attempt to extend the sophisticated index based on the Bucket PR Quadtree and
existing drop-off location prediction studies, opening up the Minwise hashing to efficiently find the similar trajectories
more opportunities for developing smarter passenger- from the historical data for a query partial trajectory. After that,
centered services. the matched trajectories’ destinations are grouped into several
• We propose two different GPS embedding methods to clusters and cluster centers are taken as the final predicted
model the spatial context of GPS points in the urban destinations.
space. The grid-lnglat embedding can reveal the robust Note that the probability and index methods usually require
geographic proximity of GPS points, and the Quadtree an exact match between the query trajectory and historical
embedding can hierarchically depict the multi-scale spa- trajectories. Such a matching schema easily suffers from the
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 3
(a) (b)
(a) (b)
Fig. 5: The GPS embedding based on Quadtree spatial structure: (a) area partition scheme; (b) corresponding Quadtree.
However, it is extremely difficult to enable our classification of a moving object over time. In order to reveal the spatial
model to directly predict on the numerous small destination context of GPS points, we introduce two GPS embedding
venues. Hence, in this study, the CPDs are obtained by cluster- approaches to represent their spatiality in the 2-dim geographic
ing the historical passenger’s destination venues in each UAR domain and multi-Scale spatial domain, respectively.
as shown in Fig. 4 (b). A two-step procedure is conducted to 1) Grid-lnglat Embedding: The longitude and latitude val-
identify CPDs in the Chengdu city. ues are the most perceptive information to describe the position
• Determine which UAR a given destination venue of a GPS point in the 2-dim earth surface, and they are also
belongs to. With the procedures in identifying UARs, consistent and consecutive in the coordinate space. The relative
the neatly arranged pixels in the image m0 can be mapped position or distance between any points in the surface can be
into a small cell in the city, and the UAR-label of each revealed by their coordinate values. Also the moving patterns
pixel is recorded in the vector Lu . We can examine which of taxis can be directly depicted by the variation patterns
pixel the given destination venue belongs to, and then of such coordinate values. Moreover, this kind of numerical
utilize the label of this pixel as the target UAR-label variation in the sequence can be effectively captured by the
for the destination venue. As we can see, this process recurrent neural network, thus it is reasonable to represent GPS
is very simple with light computation burden, thus it can points by the longitude-latitude values.
efficiently execute on the whole dataset. However, an essential problem underlying the raw GPS
• Aggregate destination venues into clusters. The mean- points is that the original longitude-latitude values would
shift clustering algorithm [9] is employed to aggregate deviate when uncertainty arises due to the low sampling rate or
destination venues in each UAR. The CPD-labels of noise. A simple solution is to map GPS points into a coarse-
venues are recorded in Ld , and the centroids of clusters grained spatial space. As introduced in the above, the city
are recorded in Ctrs . By traversing UARs, 941 clusters is divided into g × g grid cells, and each GPS point li is
are obtained in the Chengdu city, which are used as the in a cell c j . Hence, to make the GPS representation more
target candidate destinations CPDs. robust to the uncertainty, all the GPS points within a single
After obtaining the CPDs in the city, we further use them cell can be considered as the same object. In this manner,
to label our taxi trajectories for the supervised learning. each GPS point can be represented by the center coordinate
Specifically, we first assign the last GPS point of each given (longitude-latitude) of the cell. Finally, the original longitude-
trajectory into UARs, then find three nearest CPD clusters latitude point li = (lngli , latli ) are represented by the grid-
through the centroids Ctrs with their geographic distance. We lnglat embedding ei = (lngc j , latc j ).
take the CPD-label of the nearest destination venue in three 2) Quadtree Embedding: Significant patterns of taxi’s mo-
CPD clusters as the label of the given trajectory. In particular, tion can be revealed in different spatial scales. Specifically,
with this labelling approach, the trajectories end at different the trajectory in a micro scale can provide the precise location
gates of a candidate destination would be tagged with the information of the moving taxi, and the trajectory in a macro
same label. We argue that these destination labels are not scale can reveal taxi’s global moving trend in the urban
technically the ground truth. However, drop-off positions of space. Such multi-scale motion characteristics have been
taxis are normally very close to the passengers’ destinations, in demonstrated to be meaningful in the destination prediction
this respect, our labelling approach still makes sense. Finally, task [18, 33]. However, traditional GPS embedding methods
the labelled trajectories are used to evaluate the proposed cannot hierarchically depict where the raw GPS points locate
prediction model. in the city, such as one-hot, cell ids, longitude&latitude [31].
As a typical spatial data structure, Quadtree has four branches
C. GPS Embedding attached to a point at each level, so as to describe the multi-
A trajectory is typically represented as a sequence of dis- scale properties of a location by recursively decomposing
crete GPS points, which indicates a sequential visited location the two-dimensional space into four equal-area blocks [22].
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 6
of a unit circle in the coordinates centered on (0,0), i.e., • Chengdu Taxi Trajectory Data. This taxi GPS trajectory
[0, 24) ⇒ [0, 2π), then the hour time Ho can be represented dataset was collected in the city of Chengdu, China, from
by the coordinate of a point in the unit circle based on the 3rd to 30th August 2014. The sample rate is 10 seconds.
radian θ , as in Eq. 4. More than 300000 taxi trajectories were generated by
to over 13000 taxis each day. We select the data in a square
Ho (T ) = (cos θ , sin θ ), θ = 2π( ) (5) area around the Second-Ring of Chengdu city, obtaining
24
Compared to the one-hot embedding, this approach can sig- 813861 complete trajectories.
• Chengdu Passenger’s Destination Data. The passen-
nificantly reduce the dimensionality and also remain the time
similarity between 00:00 and 23:00. Besides, to depict the ger’s destination data is released by the Didi Chuxing
position where the trajectory started in the urban space, we company 1 . This dataset contains over 110000 anonymous
also adopt the GPS embedding with the Quadtree method. historical passengers’ destinations generated in Chengdu
The starting location can be denoted by e01 , which carries city, within two months in 2018. Specifically, each record
hierarchical spatial characteristics in the multi-scale spatial is generated in the online ride-hailing platform by passen-
domain. gers. The information includes a location where passen-
4) Predicting Layer: Two full connected layers (FC1 and gers locate (i.e., the origin) and a venue that passengers
FC2 ) and a Softmax classifier are employed to fuse features head for (i.e., the destination). Hence, these venues tagged
and output the final output probabilities. Specifically, the as the destinations of passengers, can reveal where people
motion features and OT features are concatenated and fed into usually head for in the city by a paid ride.
• Porto Taxi Trajectory Data. This dataset is from the
two full connected layers, and obtain the original outputs z0
as in Eq. 6. ECML-PKDD competition [14], containing over 1.7 mil-
lion complete trajectories collected from 442 taxis run-
z = ReLU W FC1 [R, R0 , Ho , e01 ] + bFC1 ,
ning in the city of Porto from 1st July 2013 to 30th June
(6) 2014. The sample rate is 15 seconds. Since the entire
z0 = W FC2 z + bFC2 dataset is very large and trajectories are distributed very
where W FC and bFC are learnable parameters of FC layers. unevenly throughout the whole space, we further select
Note that there are |CPD| dimensions in the z0 corresponding the data located in the main areas of the city, and obtain
to candidate destination regions. Softmax is adopted as the 665989 complete trajectories.
multi-class logistic regression classifier, to get the probability We randomly partition the training set, validation set and
distribution of candidate destinations. For an input partial tra- testing set, following the same way to some recently published
jectory T p , the probability p̂ of the j-th candidate destination papers regarding the taxi destination prediction [10, 19, 20].
d j being the true destination (y) of the T p , can be obtained The partition is at a ratio of 6 : 1 : 1.
by performing the Softmax classifier on the original output z0 . 2) Evaluation Metrics: The evaluation metrics of destina-
The final predicted result ŷ is the candidate destination with tion prediction methods are twofold:
the highest probability as in Eq. 7. • Accuracy@k indicates the ratio of destinations which
0
ez j are accurately predicted in top-k candidate destinations
p̂ y = d j |T P =
|D| 0
, 1 ≤ j ≤ |D| to all query trajectories. Specifically, for each partial
∑i=1 ezi (7) trajectory T p , the positive result can be returned when
ŷ = arg max p̂ y = d j |T P
the true destination of T p is within the top-k candidate
j destinations ranked by predicted probabilities P̂.
• Distance Error indicates the average distance error be-
We apply the cross-entropy as our loss function, which is
normally used to compute the distance between the predicted tween the labelled destinations and predicted top-1 des-
probability distribution in the softmax classifier and the true tinations for all query trajectories. In this study, the
probability distribution. The equation is in Eq.8. candidate destinations are location clusters, so this dis-
tance error is calculated from two clusters’ centroids. The
1 |D| Haversine distance is adopted, which is defined as Eq. 9:
si log y0i + λ kωk22
L(ω) = − ∑ (8) r
|D| i=1 Haversine_dis(x, y) = 2 · R · arctan(
δ
),
1−δ
where ω is the learnable parameters in the neural network, and (9)
λ is the regularizer hyper-parameter for the L2 regularization. 2 φx − φy 2 λx − λy
δ = sin + cos φx cos φy sin
si is the i-th dimension in the one-hot representation of the 2 2
true destination y. In the training phase, we employ the Adam
to optimize the loss function with the learning rate lr . where φ is the latitude, λ is the longitude and R is the radius
of the earth (R = 6371 km).
V. E VALUATION 3) Parameter Setting: In our attention-based dual BiLSTMs
neural network, the number of neurons in each BiLSTM
A. Experiment Setup
component is set to 200, the size of attention is 30, and
1) Data Description: Our experiments are conducted based
on three real-world datasets. 1 Didi Chuxing GAIA Initiative. https://gaia.didichuxing.com
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 9
50 neurons in the dense layer for OT features. 500 and 941 TABLE I: The prediction distance error of the proposed deep
neurons in the last two dense layers for fusing features and model trained on the Chengdu dataset under different
outputting for the Softmax classifier. Both the learning rate lr input parameters, i.e., 1 partition level (h); 2 percentage
and regularizer parameter λ are set to 0.001. The batch size of training trajectory completion (p)
is 128. Parameters for Distance Error (m)
4) Comparison Algorithms: In this study, we compare our Input Trajectories 10% 30% 50% 70% 90% AVG
destination prediction model with three baselines, the details h = 5, p = 70% 2927 2273 1546 906 775 1685
of which are presented as follows. h = 6, p = 70% 3037 2265 1473 803 702 1656
h = 8, p = 70% 3068 2207 1418 784 703 1636
• R_MLP: Use latitudes and longitudes of the first and h = 7, p = 70% 3141 2156 1369 718 694 1615
last 5 points of raw GPS trajectories as the inputs of h = 7, p = 50% 2936 1975 1283 1278 1678 1830
Multi-Layer Perceptrons [10]. This neural network was h = 7, p = 60% 3006 2086 1311 889 1100 1678
the winning model in the taxi destination prediction h = 7, p = 80% 3151 2284 1542 806 464 1649
competition (ECML-PKDD [14]). h = 7, p = 90% 3166 2352 1663 925 336 1688
• R_LSTM: Use raw GPS trajectories as inputs of the
regular Long Short-Term Memory network (LSTM).
• RW_BiLSTM: Use a sliding window of 5 successive Hence, in the evaluation stage, p and k are employed
GPS points in every raw GPS trajectory as the input unit as main variables to evaluate the prediction performance of
at each time step for the BiLSTM network [10]. different models.
Additionally, h and p are two preset parameters regarding
5) Evaluation Environment: All the evaluations in the pa-
the input trajectories in the training stage. We conduct a group
per are programmed using Python 3.7 with the TensorFlow
of comparison experiments to investigate their effects on our
Library, and running on a PC with 4 NVIDIA GeForce RTX
proposed model and then determine the suitable values. Tab I
2080 Ti GPU and 192 GB RAM.
illustrates the prediction distance errors of our model trained
on the Chengdu dataset with different values of h and p. As
shown in the table, the results in the first four rows can tell
B. Parameter Sensitivity Study that the performance can be improved with an increase of h
In this study, in addition to the common hyper-parameters in (smaller cell size), but too small size can lead to worse results.
the deep model, there are three important parameters regarding It is because when the grid cell is too large, two different GPS
the training and testing processes: points might be wrongly represented by a unique embedding
vector; on the contrary, if the grid cell size is too small, the
• Partition level (h). The parameter h is used to hierarchi- GPS points that should belong to the same place may be
cally divide the urban space into 4h grid cells. It not only wrongly represented by different embedding vectors. The last
determines the grid cell size, but also relates to the depth five rows can indicate two points: 1) the deep model would
and width of Quadtree. In particular, the spatial features show great performance when the p of testing trajectory is
in GPS embedding vectors are also sensitive to the grid equal to that of training trajectories; 2) when p = 70%, the
cell size, while the prediction of our model strongly relies trained model achieves lowest average distance error, showing
on the revealed spatial features. As a result, the partition more robustness in our test scenario. Hence, in the training
level h would have a significant impact on our prediction stage, for experiments on the Chengdu dataset, the partition
results. level h for training is set to 7 (cell size equals 120m × 120m),
• Percentage of trajectory completion (p). The com- and the percentage of trajectory completion p is set to 70%.
pletion percentage of partial trajectory p is the most
predominant factor in our prediction task. The query
trajectory in a high completion percentage can provide C. Effectiveness Evaluation
more information and also locates closer to the destina- 1) Effectiveness of GPS Embedding Methods: The GPS
tion. In particular, in the real-world prediction scenario, embedding methods in this study are promised to improve
it is the unknown and gradually growing p making the the prediction performance of our deep neural network by
prediction extremely challenging. Hence, to evaluate the revealing representative spatiality of GPS points in the partial
model performance, the testing trajectories should be trajectory. In order to demonstrate the effectiveness of our
incremental and unfinished (p ≤ 100%). At the same time, embedding method (Quadtree & grid-lnglat embedding), other
to train the model on the unfinished input trajectories can three different methods are tested on the same neural network,
make the model more robust in the prediction scenario. namely the raw GPS trajectories, single Quadtree embedding
• Number of output destinations (k). Owing to the num- and single grid-lnglat embedding.
ber of output destinations may be insufficient e.g, k = 1, Table II and Figure 8(a) show the comparison results
the default value of k is set to 5 following the setting varying the percentage of trajectory completion. Figure 8 (b)
of other studies [28, 33]. Furthermore, k in the accuracy shows their trends in prediction accuracy metric with respect
metric can reveal more vulnerability of models. The more to different k. We can find:
rapid increment of the prediction accuracy by increasing • Our Quadtree & grid-lnglat embedding method out-
k, means the better robustness of model [28]. performs the other methods, and the gap is gradually
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 10
TABLE II: The prediction distance error of our deep model TABLE III: The prediction distance error of different predic-
with different GPS embedding methods in the Chengdu tion models in the Chengdu datasets, varying the percentage
datasets, varying the percentage of trajectory completion. of trajectory completion.
GPS Embedding Distance Error (m) Destination Distance Error (m)
Methods 10% 30% 50% 70% 90% AVG Prediction Models 10% 30% 50% 70% 90% AVG
Raw-lnglat 3031 2339 1624 1014 790 1760 + OT_F 3435 2179 1460 852 689 1723
R_MPL
Quadtree 3178 2198 1391 772 654 1639 − OT_F 3270 2207 1497 915 749 1727
Grid-lnglat 2979 2253 1471 861 803 1673 + OT_F 3127 2223 1481 850 764 1689
R_LSTM
Ours 3141 2156 1369 718 694 1615 − OT_F 3043 2262 1490 882 791 1693
RW_ + OT_F 3343 2123 1339 782 667 1650
0.7 BiLSTM − OT_F 3233 2087 1411 844 741 1671
Raw-lnglat
0.6 Raw-lnglat
0.6 Quadtree + OT_F 3141 2156 1369 718 694 1615
Quadtree
Grid-lnglat
Ours
Grid-lnglat 0.5 − OT_F 3222 2198 1403 772 687 1656
Accuracy
Accuracy
Ours
0.4 Ours
0.4
0.3 0.7
0.2 R_MLP
R_MLP
0.2 0.6 R_LSTM
R_LSTM 0.6
R+W_BiLSTM RW_BiLSTM
0
Accuracy
Accuracy
Ours Ours
10 30 50 70 90 1 2 3 4 5 0.4 0.5
Percentage of Trajectory Completion (%) Top-k
0.4
(a) (b) 0.2
0.3
0 0.2
Fig. 8: The prediction accuracy of our deep model with 10 30 50 70 90 1 2 3 4 5
Percentage of Trajectory Completion (%) Top-k
different embedding methods in the Chengdu datasets: a)
varying the percentage of trajectory completion when k is 5; b) (a) (b)
varying the number of predicted destinations when p is 70%.
Fig. 9: The prediction accuracy of different prediction models
in the Chengdu datasets: a) varying the percentage of trajectory
widening with the growth of the percentage of trajectory completion when k is 5; b) varying the number of predicted
completion. Since the deep model is trained with input destinations when p is 70%.
trajectories in p = 70%, the increasing slope of these
polylines are getting lower after p = 70% in Fig. 8(a).
• The grid-lnglat embedding shows better performance than
with/without OT features. By profiling these results, we can
raw GPS trajectories especially when p is growing, which observe that:
indicates that the grid-lnglat embedding can alleviate the • Our model outperforms baseline models in the prediction
uncertainty and sparse problem in GPS trajectories. scenario, achieving the lowest average prediction distance
• The Quadtree embedding is better than grid-lnglat em- error 1615m in Tab III. In Fig. 9 (b), our model is
bedding, which shows the multi-scale motion features are increasingly better than baseline models with the increas-
more discriminative than 2-dim spatial motion features in ing of parameter k, which shows the superior prediction
the destination prediction task. performance and better robustness of our model.
• Compared to the single Quadtree embedding or grid- • In Table III, we can observe that all the deep models
lnglat embedding, the combination of them namely our achieve performance improvements by using the OT
Quadtree & grid-lnglat method can achieve further im- features. Such results demonstrate the strong correlations
provements regarding the distance error and prediction between the passengers’ destination and where and when
accuracy. Such results demonstrate that it is of great the taxi trip started.
significance to consider both the 2-dim geographic and • The RW_BiLSTM model shows better performance than
multi-scale spatiality of GPS points in the destination the simple R_LSTM model. It demonstrates the signif-
predction task. icance of considering both the preceding and future
Based on the above observations, we can conclude that our sequential context of GPS points for the destination
GPS embedding method is effective in revealing representative prediction task.
spatial features from partial trajectories, thus enhancing the • As shown in Tab. III and Fig. 9 (a), with the growth
performance of our deep model. of trajectory completion p, the increasing trends of the
2) Effectiveness of Our Prediction Model: In this section, R_MPL model and ours, are faster than both the R_LSTM
we conduct comparison experiments with baselines, to demon- and RW_BiLSTM model. It can tell that the RNN based
strate the effectiveness of our destination prediction model models are more susceptible to the sparsity problem when
(GPS embedding methods with an attention-based dual BiL- dealing with the long-term raw trajectory data. But our
STMs). Moreover, we investigate the effect of our extracted model can effectively alleviate this problem due to the
OT features (i.e., where and when passengers took the taxi) adoption of GPS embedding methods.
on the prediction task. Table III and Figure 9 illustrate the The attention mechanism is employed to capture the differ-
prediction results of different models varying parameters p and ent discriminative capability of visited locations in determining
k. Tab III also presents the distance error of different models the heading destination. To evaluate its effectiveness, we visu-
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 11
alize the learned crucial locations in the Chengdu city, called TABLE V: The prediction distance error of different GPS em-
the attention map. The attention intensity ai of locations can be bedding methods and prediction models on the Porto dataset,
represented by the times of being identified as the locations varying the percentage of testing trajectory completion.
with the largest discriminative capability in determining the Distance Error (m)
Effectiveness Study
destination. After testing 100,000 trajectories in our model, 10% 30% 50% 70% 90% AVG
we obtain 6320 unique grid cells. Due to the 128 ∗ 128 grid GPS Raw_lnglat 3032 2304 1317 594 623 1574
cells (at the h = 7 level) in the urban space are too small to be Embedding Quadtree 2939 2198 1260 566 579 1508
visually examined, we present the attention map at the h = 5 Methods Grid_lnglat 3016 2239 1313 602 616 1557
level (32 ∗ 32 grid cells) in Fig. 10. These grid cells are further R_MPL 3119 2216 1340 656 573 1580
divided into three groups for the visualization, namely ai ≥ 50, Prediction R_LSTM 3018 2294 1365 604 676 1591
ai ≥ 200 and ai ≥ 500. As illustrated in the figure, the grid Models RW_BiLSTM 3155 2240 1312 556 623 1577
cells with the high attention intensity are mainly located in the Ours - OT_F 3107 2187 1303 580 574 1550
center of the city. As a typical example, the top-1 weighted Ours 2848 2175 1254 560 572 1481
cell is further highlighted with the red color, which is the
determinant location of 1066 trajectories. The corresponding
enlarged map is shown in the figure. We can find that this
place contains a bridge and a crossway of two primary roads, Porto dataset. By profiling the results shown in Tab. V and
which are consistent with the representative landmarks that Fig. 11, we can observe that:
we intend to capture by the attention mechanism. Such results
• Our GPS embedding method outperforms the other three
demonstrate the effectiveness of the attention mechanism in
ones on this dataset. Their performance differences are
our prediction task.
also similar to those on the Chengdu dataset. The results
further verify the conclusions: 1) Our GPS embedding
D. Model Evaluation on Porto Dataset
methods can alleviate the uncertainty and sparse problem
In this section, we conduct a group of experiments on the in the prediction task on raw GPS trajectories. 2) Combin-
Porto dataset, to investigate whether the proposed deep neural ing the Quadtree embedding and grid-lnglat embedding
network and GPS embedding methods can effectively perform can reveal more representative spatiality of GPS points
on the trajectory data from a different city and in a long- thus benefit the prediction performance.
term time period (one year). Since the historical passenger’s • Compared to the other neural networks, our proposed
destination data is not available in Porto city, we take the attention-based dual BiLSTMs shows the best perfor-
drop-off location clusters as the outputs of our neural network, mance. Such results demonstrate that our deep neural net-
following the previous studies [10, 19]. The clusters are also work can effectively model the spatiotemporal context of
obtained by employing the mean-shift clustering algorithm [9] visited locations, and extract more discriminative features
on the trajectory data, and the cluster number is 989. Note from the partial trajectory. Additionally, the OT features
that the parameter settings of the neural network is the same are also effective in improving the prediction performance
to that of the experiments in Chengdu city. on the Porto dataset.
We first conduct the parameter sensitivity study on the Porto
dataset to determine suitable values for h and p, i.e., partition In summary, experiment results on the Porto dataset further
level and completion percentage for training trajectories. As demonstrate the feasibility and effectiveness of our proposed
shown in Tab. IV, we can find the suitable value for h is 7 (cell GPS embedding methods and deep neural network in the
size equals 70m × 70m) and for p is 70%. Also the variation destination prediction task. From the other perspective, the
of the model’s performance when varying two parameters is experimental results also indicate that the underlying relation-
similar to that on the Chengdu dataset (in Tab. I). ship between the passenger’s destination and the partially left
After the parameter study, we evaluate the effectiveness of taxi trajectory exists in different cities, and can be learned by
GPS embedding methods and different neural networks on the deep models.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 12
0.6
Raw-lnglat R_MLP
0.5 Quadtree R_LSTM
Grid-lnglat
0.5 RW_BiLSTM
Ours
Accuracy
Accuracy
Ours
0.4 0.4 Ours - OT_F
0.3
0.3
0.2
0.2
1 2 3 4 5 1 2 3 4 5
Top-k Top-k
(a) (b)
[10] A. De Brébisson, É. Simon, A. Auvolat, P. Vincent, and Y. Bengio. (HPCC/SmartCity/DSS), pages 1093–1100, 2019.
Artificial neural networks applied to taxi destination prediction. arXiv [32] X. Zhang, Z. Zhao, Y. Zheng, and J. Li. Prediction of taxi destinations
preprint arXiv:1508.00021, 2015. using a novel data embedding method and ensemble learning. IEEE
[11] Y. Endo, K. Nishida, H. Toda, and H. Sawada. Predicting destinations Transactions on Intelligent Transportation Systems, 21(1):68–78, 2020.
from partial trajectories using recurrent neural network. In Pacific-Asia [33] J. Zhao, J. Xu, R. Zhou, P. Zhao, C. Liu, and F. Zhu. On prediction of
Conference on Knowledge Discovery and Data Mining, pages 160–172, user destination by sub-trajectory understanding: A deep learning based
2017. approach. In Proceedings of the 27th ACM International Conference on
[12] Y. Gao, D. Jiang, and Y. Xu. Optimize taxi driving strategies based Information and Knowledge Management, pages 1413–1422, 2018.
on reinforcement learning. International Journal of Geographical [34] Y. Zheng, Y. Liu, J. Yuan, and X. Xie. Urban computing with taxicabs.
Information Science, 32(8):1677–1696, 2018. In Proceedings of the 13th International Conference on Ubiquitous
[13] M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial transformer Computing, pages 89–98, 2011.
networks. In Advances in neural information processing systems, pages [35] C. Zhong, S. M. Arisona, X. Huang, M. Batty, and G. Schmitt.
2017–2025, 2015. Detecting the dynamics of urban structure through spatial network
[14] Kaggle. Kaggle competition. analysis. International Journal of Geographical Information Science,
https://www.kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i. 28(11):2178–2199, 2014.
[15] E. Kauderer-Abrams. Quantifying translation-invariance in convolutional [36] B. D. Ziebart, A. L. Maas, A. K. Dey, and J. A. Bagnell. Navigate like
neural networks. arXiv preprint arXiv:1801.01450, 2017. a cabbie: Probabilistic reasoning from observed context-aware behavior.
[16] J. Krumm and E. Horvitz. Predestination: Inferring destinations from In Proceedings of the 10th International Conference on Ubiquitous
partial trajectories. In International Conference on Ubiquitous Comput- Computing, pages 322–331, 2008.
ing, pages 243–260, 2006.
[17] X. Li, M. Li, Y.-J. Gong, X.-L. Zhang, and J. Yin. T-DesP: Destination
prediction based on big trajectory data. IEEE Transactions on Intelligent
Transportation Systems, 17(8):2344–2354, 2016.
[18] J. Lv, Q. Li, Q. Sun, and X. Wang. T-conv: a convolutional neural
network for multi-scale taxi trajectory prediction. In 2018 IEEE
International Conference on Big Data and Smart Computing (BigComp), Chengwu Liao is pursuing his Ph.D. degree at the
pages 82–89, 2018. College of Computer Science, Chongqing Univer-
[19] J. Lv, Q. Sun, Q. Li, and L. Moreira-Matias. Multi-scale and multi-scope sity, China. He received the B.Sc degree from the
convolutional neural networks for destination prediction of trajectories. College of Information Engineering and Automa-
IEEE Transactions on Intelligent Transportation Systems, 21(8):3184– tion, Kunming University of Science and Technol-
3195, 2020. ogy, Yunnan, China, in 2017. His research interests
[20] A. Rossi, G. Barlacchi, M. Bianchini, and B. Lepri. Modelling taxi include intelligent transportation systems, spatiotem-
drivers’ behaviour for the next destination prediction. IEEE Transactions poral trajectory mining, urban data visualization.
on Intelligent Transportation Systems, 21(7):2980–2989, 2020.
[21] A. W. Smith, A. L. Kun, and J. Krumm. Predicting taxi pickups
in cities: Which data sources should we use? In Proceedings of the
2017 ACM International Joint Conference on Pervasive and Ubiquitous
Computing and Proceedings of the 2017 ACM International Symposium
on Wearable Computers, pages 380–387, 2017.
[22] M. Sperber. Quadtree and Octree, pages 931–934. Springer US, 2008.
[23] J. Teevan, A. Karlson, S. Amini, A. Brush, and J. Krumm. Under-
standing the importance of location, time, and people in mobile local
search behavior. In Proceedings of the 13th International Conference on Chao Chen is a Full Professor at College of Com-
Human Computer Interaction with Mobile Devices and Services, pages puter Science, Chongqing University, Chongqing,
77–80, 2011. China. He obtained his Ph.D. degree from Pierre
[24] Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, J. Ye, and W. Lv. and Marie Curie University and Institut Mines-
The simpler the better: a unified approach to predicting original taxi Télécom/Télécom SudParis, France in 2014. He
demands based on large-scale online platforms. In Proceedings of the received the B.Sc. and M.Sc. degrees in control
23rd ACM SIGKDD International Conference on Knowledge Discovery science and control engineering from Northwestern
and Data Mining, pages 1653–1662, 2017. Polytechnical University, Xi’an, China, in 2007 and
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, 2010, respectively. Dr. Chen got published over 80
Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in papers including 20 ACM/IEEE Transactions. His
neural information processing systems, pages 5998–6008, 2017. work on taxi trajectory data mining was featured
[26] M. Xu, D. Wang, and J. Li. DESTPRE: a data-driven approach to by IEEE SPECTRUM in 2011, 2016, and 2020 respectively. He was also
destination prediction for taxi rides. In Proceedings of the 2016 ACM the recipient of the Best Paper Runner-Up Award at MobiQuitous 2011.
International Joint Conference on Pervasive and Ubiquitous Computing, His research interests include pervasive computing, mobile computing, urban
pages 729–739, 2016. logistics, data mining from large-scale GPS trajectory data, and big data
[27] A. Y. Xue, J. Qi, X. Xie, R. Zhang, J. Huang, and Y. Li. Solving the data analytics for smart cities.
sparsity problem in destination prediction. The International Journal on
Very Large Data Bases, 24(2):219–243, 2015.
[28] A. Y. Xue, R. Zhang, Y. Zheng, X. Xie, J. Huang, and Z. Xu. Destination
prediction by sub-trajectory synthesis and privacy protection against
such prediction. In 2013 IEEE 29th International Conference on Data
Engineering (ICDE), pages 254–265, 2013.
[29] Z. Yang, H. Sun, J. Huang, Z. Sun, H. Xiong, S. Qiao, Z. Guan, Chaocan Xiang is an associate professor at the
and X. Jia. An efficient destination prediction approach based on College of Computer Science, Chongqing Univer-
future trajectory prediction and transition matrix optimization. IEEE sity, Chongqing, China. He received the BS and
Transactions on Knowledge and Data Engineering, 32(2):203–217, Ph.D. degrees in computer science and engineering
2020. from the Nanjing Institute of Communication Engi-
[30] L. Zhang, G. Zhang, Z. Liang, and E. F. Ozioko. Multi-features taxi neering, China, in 2009 and 2014, respectively. He
destination prediction with frequency domain processing. PloS one, studied at the University of Michigan-Ann Arbor in
13(3):e0194629, 2018. 2017. His current research interests include wireless
[31] R. Zhang, J. Guo, H. Jiang, P. Xie, and C. Wang. Multi-task learning sensor networks, crowd-sensing networks, and IoT.
for location prediction with deep multi-model ensembles. In 2019
IEEE 21st International Conference on High Performance Computing
and Communications; IEEE 17th International Conference on Smart
City; IEEE 5th International Conference on Data Science and Systems
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 14