Des Prediction

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/348165903
Taxi-Passenger's Destination Prediction via GPS Embedding and Attention-

Based BiLSTM Model
Article in IEEE Transactions on Intelligent Transportation Systems · January 2021

DOI: 10.1109/TITS.2020.3044943
CITATIONS READS
7 336
6 authors, including:
Chengwu Liao Chao Chen

Chongqing University Chongqing University
12 PUBLICATIONS 94 CITATIONS 178 PUBLICATIONS 4,624 CITATIONS
SEE PROFILE SEE PROFILE
Chaocan Xiang
Chongqing University
59 PUBLICATIONS 765 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Crowdsourcing; Urban Logistics; Travel Route Planning; Urban Data Co-mining View project
All content following this page was uploaded by Chengwu Liao on 13 July 2021.
The user has requested enhancement of the downloaded file.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1
Taxi-Passenger’s Destination Prediction via GPS

Embedding and Attention-based BiLSTM Model
Chengwu Liao, Chao Chen, Chaocan Xiang, Hongyu Huang, Hong Xie, Songtao Guo
Abstract—The prediction of taxi-passenger’s destination with

the partial GPS trajectory left by moving taxis is an important
yet challenging research issue. The high uncertainty of human
mobility and limited clue provided by the unfinished trajectory
are two major barriers to developing effective predictors. In
general, such a prediction task is often converted to the iden-
tification one among given candidate destinations. Hence, how to
extract the discriminative knowledge from the partial trajectory
becomes crucial. It is well-recognized that the sequence of visited
locations by the taxi has inherent relationship with the heading
destination. Inspired by the idea, we propose a novel approach
that jointly combines the GPS embedding and attention-based
BiLSTM techniques for the prediction of passenger’s destination.
Specifically, we propose two GPS embedding methods to encode
the geographic proximity and multi-scale spatiality of GPS points Fig. 1: An illustrative example of the discriminative visited lo-
into embedding vectors, so as to reveal the spatial context cation and its sequence of the partial trajectory in determining
of visited locations in the urban space. After converting GPS
the taxi-passenger’s destination.
trajectories into embedding sequences, we further establish an
attention-based dual BiLSTMs neural network to model the
relationship between the heading destination and the bidirec-
tional sequential context of visited locations. Meanwhile, the Services (LBSs) [23]. This task aims to predict the place
discriminative capability of visited locations in determining the where the passenger in an ongoing taxi heads for (e.g., a
destination can be captured by the attention mechanism. In addi-
tion, the OT (origin and time) information is aggregated into the commercial center). Predicting such destinations can strongly
neural network as auxiliary features. Stepping closer to smarter benefit passenger-centered services like personalized in-car
passenger services, rather than telling destinations in terms of recommendations and advertisements.
drop-off clusters, our proposed model outputs the destinations in However, it is extremely challenging to develop effective
terms of historical passengers’ destination clusters. Finally, we
predictors. On the one hand, human mobility in the city is
evaluate the system performance based on two real large-scale
datasets. Results show the superior performance of our proposed highly uncertain, and taxis can move to almost anywhere
model. in the city according to passengers’ requirements. Thus, the
Index Terms—Destination prediction, partial trajectory, mo-
potential destinations can be widely scattered in the city. On
tion patterns, GPS embedding, BiLSTM, attention mechanism. the other hand, the prediction can only rely on the partially
left taxi trajectory with unknown completion, since no personal
information regarding the taxi passenger can be revealed due
I. I NTRODUCTION to privacy issues. Unfortunately, the partial trajectory only tells
limited clues, i.e., a sequence of visited locations between the
S an important component in the urban transportation
A system, GPS-equipped taxis can be viewed as ubiqui-
tous mobile sensors, constantly probing a city’s rhythm and
origin and the current location. In general, such destination
prediction is often converted to the identification task among
given candidate destinations. Hence, how to extract the dis-
pulse [34]. As a result, an increasing number of smart urban
criminative knowledge or features from the partial trajectory
services have been enabled for a wide spectrum of people from
becomes the key issue.
drivers and passengers to urban planners by mining taxi GPS
Fig. 1 shows an example for the taxi’s visited location
trajectory big data during the past decade [6, 7, 5, 12, 21, 24].
and its sequence of the partial trajectory in determining the
In particular, the prediction of taxi passenger’s destination has
passenger’s destination. We can observe that as the taxi passes
been received intensive attention regarding the Local-based
through the locations (l1 to l5 ), the set of candidate destina-
Chao Chen is with the State Key Laboratory of Mechanical Trans- tions for the passenger gradually shrinks and the destination
mission (Chongqing University), and also with the College of Com- becomes more determined. For example, when the taxi reaches
puter Science, Chongqing University, Chongqing 400044, China (Email:
cschaochen@cqu.edu.cn). the location l1 , the candidate d1 can be excluded due to its
Chengwu Liao, Chaocan Xiang, Hongyu Huang, Hong Xie, Songtao Guo opposite direction. Similarly, when the taxi reaches the bridge
are with the Key Laboratory of Dependable Service Computing in Cyber l4 , five candidates (i.e., d1 to d5 ) can be safely excluded
Physical Society (Chongqing University), Ministry of Education and also with
the College of Computer Science, Chongqing University, Chongqing 400044, from the candidate set according to the moving direction.
China. These observations not only tell that some discriminative
locations can be used to narrow down the candidates, but also tiality of GPS points in the urban space.
show that they have the different discriminative capability in • We propose a dual BiLSTMs neural network with atten-
determining the destination. What is more, we can also observe tion mechanism to further model the relationship between
that the sequential context of each discriminative location the heading destination and the bidirectional sequential
also plays an important role in determining the destination. context of visited locations while capturing locations’
More specifically, on the one hand, the sequential context discriminative capability in determining the destination.
corresponding to the taxi’s moving direction among different Two BiLSTM components are corresponding to two
visited locations can be used to remove the candidates in the kinds of GPS embedding sequences, and they share the
opposite direction. On the other hand, the sequential context learnable weights in the attention mechanism.
can provide complementary information for determination. For • Extensive experiments using real-world datasets are con-
example, both the d2 and d3 are in the potential heading ducted to evaluate our proposed model in the passenger’s
positions for the taxi at l2 , but the sequential context of destination prediction task. The results demonstrate the
l2 (from l0 to l2) indicates that the taxi had passed by d2 advancement and effectiveness of our model.
earlier. Thus, the taxi at l2 could exclude d2 from candidates. The rest of this paper is organized as follows. In Section II,
In conclusion, these observations indicate that the heading we briefly discuss the related studies. In section III, we
destination has certain inherent relationship with the sequence introduce several basic concepts and the main problem in this
of visited locations by the taxi, in which some discrimina- study. In Section IV, we elaborate on the methodology of our
tive locations’ spatial and sequential context can determine proposed framework. Then we present the experiments and
the heading destination cooperatively. However, due to the case study in Section V. Finally, we conclude our work and
collected GPS trajectory is raw and unfinished, it is difficult outlook the future research directions in Section VI.
to identify those discriminative locations (e.g., l3 , l4 ) and
their different discriminative capability in determining the
destination. Moreover, the correlations between the heading II. R ELATED W ORK
destination and the numerous sequential context of locations In this section, we review the related work on predicting
are complicated and also difficult to be learned. Meanwhile, taxi destination by leveraging trajectory data, which can be
we can find that how to represent the raw GPS trajectory is a broadly grouped into two categories, i.e., statistical methods
preliminary yet fundamental issue. and neural network methods, respectively.
In this paper, we first hierarchically divide the urban space
into grid cells, and propose two different GPS embedding
methods to represent the GPS trajectory as two embedding A. Statistical Methods
sequences which can convey the geographic proximity and Statistical methods are widely adopted in predicting the
multi-scale spatial context of GPS points respectively. On top destination for a given query trajectory, which can be further
of that, we propose a dual Bidirectional LSTMs neural network divided into the probability methods and the index meth-
(BiLSTMs) to further model the relationship between the ods [2, 16, 17, 26, 28, 29, 36]. A typical probability solution
heading destination and the bidirectional sequential context of is a combination of the Markov model and the Bayesian
visited locations, while capturing the different discriminative inference model, in which the first model is used to capture the
capability of locations in determining the destination by using transition probabilities of locations from historical trajectories
the attention mechanism. Additionally, the OT (origin and while the second is used to estimate the probabilities of
time) information is aggregated into the neural network as each candidate destination [28, 27, 29, 36]. In more details,
auxiliary features. Finally, the neural network outputs the the study in [29] first employs the Markov transition matrix
probabilities of candidate destinations (i.e., the clusters of multiplication to obtain the transition probabilities between
historical passenger’s destinations in the city). two locations, then further determines the most likely future
In short, we propose a novel framework to predict the location to improve the final Bayesian inference. A different
taxi-passenger’s destination based on the partially left taxi probability solution can be found in the T-DesP model [17].
trajectory. The major technical contributions of this paper can It proposes an Absorbing Markov Chain model to deduce
be summarized as follows. the transition probability between each location pair in the
• We model the underlying relationship between the head- temporal space, then constructs an absorbing tensor for the
ing destination and the sequence of visited locations destination prediction with a theoretical model. In terms
from the partial taxi trajectory to predict the passenger’s of the index method, the DESTPRE model in [26] uses a
destination. This study is also an attempt to extend the sophisticated index based on the Bucket PR Quadtree and
existing drop-off location prediction studies, opening up the Minwise hashing to efficiently find the similar trajectories
more opportunities for developing smarter passenger- from the historical data for a query partial trajectory. After that,
centered services. the matched trajectories’ destinations are grouped into several
• We propose two different GPS embedding methods to clusters and cluster centers are taken as the final predicted
model the spatial context of GPS points in the urban destinations.
space. The grid-lnglat embedding can reveal the robust Note that the probability and index methods usually require
geographic proximity of GPS points, and the Quadtree an exact match between the query trajectory and historical
embedding can hierarchically depict the multi-scale spa- trajectories. Such a matching schema easily suffers from the
sparsity problem in the real-world trajectory dataset. To ad-

dress this problem, the SubSyn algorithm [28, 27] decomposes
the long-term historical trajectories into sub-trajectories, and
the T-DesP model [17] fills the missing values caused by the
sparse problem in the transition tensor through a context-aware
tensor decomposition approach. Besides, clustering algorithms
are also adopted to overcome the sparsity problem in the
(a) (b) (c)
trajectory-based location prediction [1, 2, 8]. For example, the
study in [2] groups the sparse historical trajectories into several Fig. 2: The hierarchical partition of a square region at the first
clusters, then models the main traffic flow patterns within each three levels (from a to c), i.e., h = 1, 2, 3, respectively.
trajectory cluster. For a query trajectory, the model would first
model its traffic patterns then assign it to a trajectory cluster
by examining the similarity of their traffic patterns. Finally, the LSTM network to extract the sequential features for the
the mean of the destination locations in this cluster would be prediction.
taken as the predicted destination. In a nutshell, existing neural network methods still can not
effectively extract both the spatial and sequential features from
the partial trajectory for the destination prediction. In this
B. Neural Network Methods
study, we combine two novel GPS embedding methods with
Recent years, neural networks including the multi-layer per- an attention-based dual BiLSTMs model to extract the spatial
ceptron (MLP), Convolution Neural Network (CNN), Recur- and sequential features simultaneously.
rent Neural Network (RNN) and their variation networks have
achieved superior performance in the destination prediction III. P RELIMINARY
task [10, 19, 20, 33]. In general, they are adopted to capture
In this section, we first introduce a few basic concepts in
the latent spatial and temporal features in the partial trajectory
this work, then introduce the problem statement formally.
for the prediction. In terms of the spatial feature extraction,
the MLP and DBN models [10, 32] directly take the longitude
and latitude information of GPS points as their inputs, so as to A. Basic Concepts
extract the original geographic features from each trajectory. Definition 1 (Taxi Trajectory). The trajectory T of a com-
The DBN model in [32] is further combined with the support pleted taxi trip is a sequence of time-tagged GPS points
vector regression to predict the destination at different tra- from the pick-up location (l1 ) to drop-off location (ln ). T
jectory segments. To extract more intuitive spatial features for is denoted by T = {l1 , l2 , ..., li , li+1 , ..., ln }(1 6 i 6 n), where
the destination prediction, the CNN-based models convert GPS li = (lngi , lati ,ti , si ) contains the longitude, latitude, timestamp
trajectories into images and extract the two-dimensional spatial and passengers status (occupied or idle) of the i-th GPS point.
features [18, 19, 30]. The T-CONV models [18, 19] are able to
Definition 2 (Partial Taxi Trajectory). Partial trajectory is the
capture the multi-scale spatial features from multi-resolution
previous part of a completed trajectory, i.e., the GPS points
trajectory images. However, it is difficult for the CNN-based
sequence started from l1 but ended at a location ln0 before the
model to distinguish two separately located trajectories with
taxi reaches the destination. Partial trajectory can be denoted
similar shape in the image, due to the translation invariance
by T p = {l1 , l2 , ..., ln0 }(1 6 n0 6 n, p = n0 /n), where p is the
property of CNN networks [13, 15].
percentage of this partial trajectory varying from 0% to 100%.
In terms of the sequential feature extraction, RNN-based
neural networks are proposed to model the sequential depen- Definition 3 (Hierarchical Grid Cells). We divide the city
dency of GPS points for the destination prediction [10, 11, 33]. into g × g grid cells, where g = 2h (h > 1), and h denotes the
In more details, the study in [10] proposes different kinds granularity of city partition. In this manner, each cell at the
of RNN-based models. They find the BiRNN network with h level would be further divided into four smaller cells at the
the sliding window inputs can achieve superior performance. h + 1 level. Fig. 2 illustrates the hierarchical partition of a
This study also shows the effectiveness of the bidirectional region at the first three levels.
sequential features for the destination prediction. The study
Definition 4 (Passenger’s Destination Data). Passenger’s des-
in [11] employs the RNN model to estimate the sequential
tination data PD contains a collective of historical destina-
transition probabilities of locations in the next time step. Then
tions of passengers in the city. This data can reveal where
it simulates the taxi’s movement with the learned transition
people in the city usually head for by a paid ride, such as a
patterns and stops the movement at the maximum step to
hotel.
output the current location as the taxi’s destination. The TALL
model proposed in [33] employs the BiLSTM networks to
extract the hierarchical sequential features from the trajectories B. Problem Statement
in different granularities, so as to enable the fine-granularity The problem of predicting the taxi-passenger’s destination
destination prediction. Different from the other methods, the in a moving taxi can be viewed as predicting the probability
study in [20] takes the sequences of historical pick-up and of the passenger heading for each candidate destination in the
drop-off locations as their input trajectories, then employs urban space, which can be formulated as:
(a) (b)
Fig. 4: (a) 158 urban activity regions (UARs) with different

colors in Chengdu city; (b) 5 candidate passenger’s destina-
tions (CPDs) with aggregated destination locations in a UAR.
B. Candidate Passenger’s Destination Identification

Fig. 3: The overview of the proposed passenger’s destination In this study, we intend to predict the passenger’s des-
prediction framework. tination, i.e., a place where the passenger actually heads
for. Compared to the drop-off locations along the roads,
such predicted destinations can show more passenger-centered
Given: semantics. Human beings are known as collective people (i.e.,
1) T p (p is randomly from 0% to 100%): The partial GPS most of people live, work together with others in nature),
trajectory collected from the moving taxi. thus it is highly likely that people take activities in a small
2) CPD = {d1 , d2 , ..., dM }: A set of candidate passenger’s and scattered fraction of the whole city space called Urban
destinations in the city. Activity Regions (UARs) [4]. The target candidate passenger’s
Predict p̂ y = dm |T P , dm ∈ CPD: The probability of each

destinations (CPDs in short) are the smaller places in these
candidate dm being the passenger’s targeted destination y, on UARs where passengers in the city usually head for. Therefore,
the condition of partial trajectory T P . Then return the top-k the preliminary step to obtain CPDs is to find all the scattered
destinations with the highest predicted probabilities. UARs in the urban space.
1) Urban Activity Region Identification: In the case of taxi
IV. M ETHODOLOGY trips, passengers prefer to choose the closest location to get off
A. Overview the taxi without further walking across the main road, rivers
The framework of our prediction model is illustrated in and so on. These physical barriers usually serve as boundaries
Fig. 3. The methodology is threefold: 1 candidate desti- of UARs in the urban space [35]. Fortunately, such barriers can
nation identification; 2 feature extraction from the partial be extracted from the road network data. Hence, we conduct
taxi trajectory; 3 probability computation for each candidate a two-step procedure to divide the urban space into a number
destination. of disjointed UARs, as shown in Fig. 4 (a).
To be more specific, in the first stage, the candidate pas- • We download the road network data from the Open-
senger’s destinations are obtained by employing the image- StreetMap platform with ample information on road
processing and clustering algorithms on the passenger’s desti- level/type attributes. Then we filter out the high-level
nation data and the road network data. In the second stage, the road segments that are only tagged as ‘primary’, ‘pri-
extraction of discriminative features from the taxi trajectory is mary_link’, ‘secondary’, ‘secondary_link’, or ‘river’.
achieved by modeling the spatial and sequential context of • The identification of UARs in the city can be transformed
GPS points in the trajectory. In more details, we first employ into the extraction of connected components in an image.
two GPS embedding methods to represent raw GPS trajec- Specifically, we first convert the trimmed road network
tories as the grid-lnglat embedding sequence and Quadtree into a binary image m, which consists of K pixels. Then
embedding sequence from different perspectives. Then we we perform morphology operations (dilate → erode →
establish a dual BiLSTMs model with an attention mechanism, skel) to obtain the skeleton image m0 of the road segments
to learn the latent motion features from embedding sequences. while retaining the topology structure. Next, we use the
OT (origin and time) features from each trajectory are also connected component labelling operation to extract UARs
extracted. This stage can be viewed as the feature engineering. U and return a vector Lu = [l1u , l2u , ...lKu ] recording the
In the third stage, the extracted features are concatenated and UAR-labels of all K pixels in m0 .
further fused by dense layers, then a softmax classifier is 2) CPD Identification and Trajectory Labelling: As intro-
adopted to output the probabilities of candidate passenger’s duced in Definition 4, the passenger’s destination data PD
destinations. contains the historical passenger’s destinations in the city.
(a) (b)
Fig. 5: The GPS embedding based on Quadtree spatial structure: (a) area partition scheme; (b) corresponding Quadtree.
However, it is extremely difficult to enable our classification of a moving object over time. In order to reveal the spatial
model to directly predict on the numerous small destination context of GPS points, we introduce two GPS embedding
venues. Hence, in this study, the CPDs are obtained by cluster- approaches to represent their spatiality in the 2-dim geographic
ing the historical passenger’s destination venues in each UAR domain and multi-Scale spatial domain, respectively.
as shown in Fig. 4 (b). A two-step procedure is conducted to 1) Grid-lnglat Embedding: The longitude and latitude val-
identify CPDs in the Chengdu city. ues are the most perceptive information to describe the position
• Determine which UAR a given destination venue of a GPS point in the 2-dim earth surface, and they are also
belongs to. With the procedures in identifying UARs, consistent and consecutive in the coordinate space. The relative
the neatly arranged pixels in the image m0 can be mapped position or distance between any points in the surface can be
into a small cell in the city, and the UAR-label of each revealed by their coordinate values. Also the moving patterns
pixel is recorded in the vector Lu . We can examine which of taxis can be directly depicted by the variation patterns
pixel the given destination venue belongs to, and then of such coordinate values. Moreover, this kind of numerical
utilize the label of this pixel as the target UAR-label variation in the sequence can be effectively captured by the
for the destination venue. As we can see, this process recurrent neural network, thus it is reasonable to represent GPS
is very simple with light computation burden, thus it can points by the longitude-latitude values.
efficiently execute on the whole dataset. However, an essential problem underlying the raw GPS
• Aggregate destination venues into clusters. The mean- points is that the original longitude-latitude values would
shift clustering algorithm [9] is employed to aggregate deviate when uncertainty arises due to the low sampling rate or
destination venues in each UAR. The CPD-labels of noise. A simple solution is to map GPS points into a coarse-
venues are recorded in Ld , and the centroids of clusters grained spatial space. As introduced in the above, the city
are recorded in Ctrs . By traversing UARs, 941 clusters is divided into g × g grid cells, and each GPS point li is
are obtained in the Chengdu city, which are used as the in a cell c j . Hence, to make the GPS representation more
target candidate destinations CPDs. robust to the uncertainty, all the GPS points within a single
After obtaining the CPDs in the city, we further use them cell can be considered as the same object. In this manner,
to label our taxi trajectories for the supervised learning. each GPS point can be represented by the center coordinate
Specifically, we first assign the last GPS point of each given (longitude-latitude) of the cell. Finally, the original longitude-
trajectory into UARs, then find three nearest CPD clusters latitude point li = (lngli , latli ) are represented by the grid-
through the centroids Ctrs with their geographic distance. We lnglat embedding ei = (lngc j , latc j ).
take the CPD-label of the nearest destination venue in three 2) Quadtree Embedding: Significant patterns of taxi’s mo-
CPD clusters as the label of the given trajectory. In particular, tion can be revealed in different spatial scales. Specifically,
with this labelling approach, the trajectories end at different the trajectory in a micro scale can provide the precise location
gates of a candidate destination would be tagged with the information of the moving taxi, and the trajectory in a macro
same label. We argue that these destination labels are not scale can reveal taxi’s global moving trend in the urban
technically the ground truth. However, drop-off positions of space. Such multi-scale motion characteristics have been
taxis are normally very close to the passengers’ destinations, in demonstrated to be meaningful in the destination prediction
this respect, our labelling approach still makes sense. Finally, task [18, 33]. However, traditional GPS embedding methods
the labelled trajectories are used to evaluate the proposed cannot hierarchically depict where the raw GPS points locate
prediction model. in the city, such as one-hot, cell ids, longitude&latitude [31].
As a typical spatial data structure, Quadtree has four branches
C. GPS Embedding attached to a point at each level, so as to describe the multi-
A trajectory is typically represented as a sequence of dis- scale properties of a location by recursively decomposing
crete GPS points, which indicates a sequential visited location the two-dimensional space into four equal-area blocks [22].
This spatial data structure has been widely used in location

modeling and indexing [3, 26]. Hence, in order to reveal the
underlying multi-scale spatiality of taxi trajectories, we apply
the Quadtree structure to hierarchically describe the position
of each GPS point in the city.
The hierarchical grid cells introduced in Definition 3 can
be converted to the Quadtree structure. Specifically, grid cells
at different partition levels (i.e., different cell size) can be
represented by the Quadtree nodes in different layers. Thus Fig. 6: The diagram of sequential feature extraction from
the smaller grid cell means the higher partition level and trajectory by using BiLSTM (forward LSTM and backward
also the deeper Quadtree. Fig. 5(a) illustrates a hierarchically LSTM).
partitioned (h = 3) square area with a GPS trajectory (T =
{l1 , l2 , l3 , l4 , l5 }). As shown in the figure, four subdivided equal
cells are are located in the four directions within the former D. Attention-Based Dual BiLSTMs
cell, namely SW, SE, NW, NE. At the first partition level, the After employing the aforementioned embedding methods,
2-dimensional space is divided into four cells by gray lines. the descriptive spatial context of GPS trajectories have been
At the second and third levels, cells are further subdivided revealed in the GPS embedding sequences. In this section,
by the blue lines and orange lines, respectively. After all, we establish an attention-based BiLSTM network to further
the corresponding MX-Quadtree with three layers is shown model the the long-term sequential context of GPS points
in Fig. 5(b). Each node in the rectangle denotes a direction in in the trajectory, and predict the probabilities of candidate
the cell. All the GPS points are contained in the leaf nodes at destinations.
the L3 level of this tree. Based on that, we can hierarchically It is necessary to clarify the principle behind the com-
describe the locations of GPS points in the urban space by bination of BiLSTM and attention mechanism techniques,
enumerating the direction information in the tree from root while many studies simply introduce how to use new deep
to leaf like the address (country, state, district). For example, learning techniques to improve the performance. The detailed
the GPS point l5 can be represented by the embedding vector explanations are as follows.
e0 5 = [NW, SE, SE]. In this manner, the multi-scale spatial
• As illustrated in Fig. 6, for the partial trajectory se-
information of each GPS point is hierarchically accumulated
quence (from e1 to e5 ), the forward LSTM can extract
in the embedding vector. →
−
a hidden state hi at each step. The traditional feature
In order to feed the embedding vector into the neural →
−
learning approaches mostly take h5 as the final hidden
network, we should use numerical values to represent such sequential feature of the trajectory sequence. In this way,
hierarchical location information. Note that in the multi-scale the extracted features are mainly contributed by the last
domain, four cells in different directions are regarded as inde- few GPS points in the sequence. The situation could
pendent items at each partition level, and our proposed model be worse when the input sequence becomes longer at
doesn’t further utilize their mutual relationships. Note that the the finer granularity. Moreover, in this study, we find
spatial similarity between GPS points can be revealed by the that the visited locations have the different discriminative
longitude and latitude in the 2-D geographic domain. Hence, capability in determining the heading destination, such
we employ the one-hot encoding to represent four independent as bridges and major intersections in a region. In this
subdivided cells in different directions {SW, SE, NW, NE} as respect, the contribution weights of GPS points should
{1000, 0100, 0010, 0001}. Finally, trajectory T in Fig.5 can consider such capability rather than their locality in the
be converted to the Quadtree embedding sequence shown in trajectory. To that end, the attention mechanism is applied
Eq. 1. in the neural network to learn the different discriminative
capabilities of GPS points in the trajectory.
 0   • The attention mechanism would capture the attention
→
−

e1 1000, 0100, 0001, ...
e0 2  0100, intensity of hi in the sequence at each time step. How-
1000, 0010, ...  →
−
 0    ever, in the forward LSTM, the hi only contains the
T =
e0 3  = 0100,
  0010, 1000, ...  (1)
 sequential context from the preceding sequence, e.g.,
e 4  0100, 0010, 0010, ...  →
− −−−→
h3 = LST M(e1 99K e2 99K e3 ), which lacks the future con-
e0 5 0010, 0100, 0100, ...
text from the following sequence. While in the backward
LSTM, the sequential context only covers the following
←− ←−−−
By this way, numerical changes in each item of this vector sequence, e.g., h3 = LST M(e3 L99 e4 L99 e5 ). Therefore,
sequence (e0 1 to e0 5 ) can intuitively reveal the motion of taxi in this study, we employ the bidirectional LSTM to model
in multi-scale spatial domains. It significantly paves the way both the preceding and future context of each GPS point.
for the following neural network to capture the multi-scale • Following the discussion in the above, we can draw a
motion features. Note that we derive the Quadtree embedding conclusion that BiLSTM and attention mechanism are
sequences at the same partition level (i.e., same cell size) used mutually supportive in learning the latent sequential de-
in the grid-lnglat embedding. pendence of trajectory data.
1) Dual BiLSTMs: As an improved variant of RNN, LSTM

is proposed to solve the long term dependency problem. It adds
three gates to control when to forget the previous hidden state
or input the current signal. Different from the standard LSTM,
at each time step, BiLSTM (Fig. 6) integrates the preceding
and future sequential features by performing the forward and
backward process simultaneously. Each input sequence can
be denoted as x = {e1 , e2 , ..., eN }(e ∈ R1×V ), where N is the
length of sequence, and V is the dimension of vectors in the
sequence. The forward process can be formulated in Eq. 2.
→
− →
− h i →−
i n = σ W i en , ~hn−1 + b i
→
− →
− h →
− i → −
f n = σ W f en , h n−1 + b f
→
−
→
− h →
− i →−
o n = σ W o en , h n−1 + b o
(2)
→
−
→
− h →
− i → −
g n = tanh W c en , h n−1 + b c
→
− →
− →
− Fig. 7: The attention mechanism on top of two BiLSTMs for
cn= f n → −c n−1 + i v → −gn
→
− two kinds of GPS embedding sequences.
h =→
n
−o n tanh (→ −c )
n
where → denotes the forward process and σ is the sigmoid

respectively. To be more specific, after performing BiLSTMs
activation function, and in , fn , on , gn and hn denote the
on the embedding sequences x and x0 , the outputs H and H 0 are
input gate, forget gate, output gate, modulate gate and hidden
sent into the same perceptron then their results at each step are
state at the n-th step, respectively. While parameters Wi , W f ,
integrated in the mn . The attention intensity α for each GPS
Wo and Wc denote the weight matrix for those above gates,
point in the sequence is derived by employing the softmax
respectively. denotes the element-wise product. Since the
function on the {m1 , m2 , ..., mN } values. Then, the original
backward and forward processes are the same in principle but
hidden feature h and h0 can be converted to the weighted
in opposite series order, the equation of the backward process
representation r and r0 through the corresponding α. At last,
could be derived similarly by replacing → with ←. Finally,
the final representations of embedding sequences x and x0 can
The overall output H of BiLSTM cell for an input sequence
be obtained, namely R and R0 . The equations are shown in
x would be denoted by Eq. 3.
Eq. 4, where Wh , bh are weights in the perceptron.
→
− ← −
H = {h1 , h2 , ..., hn , ...hN }, hn = [ hn , hn ] (3)
mi = tanh (Wh hi + bh ) + tanh (Wh h0 i + bh )
where hn is the output of BiLSTM at the n step by concate- N
exp (mi )
nating the forward and backward hidden states. In this study, αi = N
, ∑ αi = 1
the GPS embedding sequences are derived from different ∑t=1 exp (mt ) i=1 (4)
perspectives, i.e., x = {e1 , e2 , ..., eN } from the 2-D geographic N N
domain and x0 = {e01 , e02 , ..., e0N } from the multi-scale spatial R = ∑ αi hi , R0 = ∑ αi h0i
i=1 i=1
domain. Hence, we establish a dual BiLSTMs for two kind
of inputs, and obtain different hidden feature sequences, i.e., 3) OT Features from Trajectories: There are many fine-
H = {h1 , h2 , ..., hN } and H 0 = {h01 , h02 , ..., h0N }. grained physical characteristics contained in trajectories, such
2) Attention Mechanism: Attention mechanisms have been as the time, speed information. But not all of them are helpful
widely used in sequence modeling and transduction models, in the destination prediction task. Particularly, in our prediction
allowing modeling of dependencies without regard to their scenario, the input is a partial trajectory and the percentage
distance in the input sequences [25]. In this study, the attention of completion is unknown. Thus, we should consider the
mechanism is conducted on top of the dual BiLSTMs network, properties that won’t be much different when the percentage
to capture the discriminative capability of visited locations in of trip completion is different.
determining the destination and extract the most determinant Inspired by the regularity of people’s mobility between
hidden features from two kinds of GPS embedding sequences. origins and destinations in the city, we consider the OT
Owing to these two embedding sequences are representing the features in the trajectory, namely when and where passengers
same trajectory and at the same partition level, the hidden took the taxi. We extract two kinds of time features at
feature sequences are sharing the learnable weights in the different time granularities from the initial GPS point of each
attention mechanism. In this way, the attention intensity of trajectory. To be more specific, at the coarse level the day
GPS point in the trajectory is captured from both the 2-dim type Do (workday/non-workday) is denoted by a binary value
geo domain and multi-scale spatial domain. (0/1), and at the fine level the hour time Ho for a trajectory
Fig. 7 illustrates the attention mechanism in this study, is presented in a two-dim vector by the circular embedding
where ⊕ and denote the element-wise addition and product, method. The hour time value to is convert to the radian
of a unit circle in the coordinates centered on (0,0), i.e., • Chengdu Taxi Trajectory Data. This taxi GPS trajectory
[0, 24) ⇒ [0, 2π), then the hour time Ho can be represented dataset was collected in the city of Chengdu, China, from
by the coordinate of a point in the unit circle based on the 3rd to 30th August 2014. The sample rate is 10 seconds.
radian θ , as in Eq. 4. More than 300000 taxi trajectories were generated by
to over 13000 taxis each day. We select the data in a square
Ho (T ) = (cos θ , sin θ ), θ = 2π( ) (5) area around the Second-Ring of Chengdu city, obtaining
24
Compared to the one-hot embedding, this approach can sig- 813861 complete trajectories.
• Chengdu Passenger’s Destination Data. The passen-
nificantly reduce the dimensionality and also remain the time
similarity between 00:00 and 23:00. Besides, to depict the ger’s destination data is released by the Didi Chuxing
position where the trajectory started in the urban space, we company 1 . This dataset contains over 110000 anonymous
also adopt the GPS embedding with the Quadtree method. historical passengers’ destinations generated in Chengdu
The starting location can be denoted by e01 , which carries city, within two months in 2018. Specifically, each record
hierarchical spatial characteristics in the multi-scale spatial is generated in the online ride-hailing platform by passen-
domain. gers. The information includes a location where passen-
4) Predicting Layer: Two full connected layers (FC1 and gers locate (i.e., the origin) and a venue that passengers
FC2 ) and a Softmax classifier are employed to fuse features head for (i.e., the destination). Hence, these venues tagged
and output the final output probabilities. Specifically, the as the destinations of passengers, can reveal where people
motion features and OT features are concatenated and fed into usually head for in the city by a paid ride.
• Porto Taxi Trajectory Data. This dataset is from the
two full connected layers, and obtain the original outputs z0
as in Eq. 6. ECML-PKDD competition [14], containing over 1.7 mil-
lion complete trajectories collected from 442 taxis run-
z = ReLU W FC1 [R, R0 , Ho , e01 ] + bFC1 ,

ning in the city of Porto from 1st July 2013 to 30th June
(6) 2014. The sample rate is 15 seconds. Since the entire
z0 = W FC2 z + bFC2 dataset is very large and trajectories are distributed very
where W FC and bFC are learnable parameters of FC layers. unevenly throughout the whole space, we further select
Note that there are |CPD| dimensions in the z0 corresponding the data located in the main areas of the city, and obtain
to candidate destination regions. Softmax is adopted as the 665989 complete trajectories.
multi-class logistic regression classifier, to get the probability We randomly partition the training set, validation set and
distribution of candidate destinations. For an input partial tra- testing set, following the same way to some recently published
jectory T p , the probability p̂ of the j-th candidate destination papers regarding the taxi destination prediction [10, 19, 20].
d j being the true destination (y) of the T p , can be obtained The partition is at a ratio of 6 : 1 : 1.
by performing the Softmax classifier on the original output z0 . 2) Evaluation Metrics: The evaluation metrics of destina-
The final predicted result ŷ is the candidate destination with tion prediction methods are twofold:
the highest probability as in Eq. 7. • Accuracy@k indicates the ratio of destinations which
0
ez j are accurately predicted in top-k candidate destinations
p̂ y = d j |T P =

|D| 0
, 1 ≤ j ≤ |D| to all query trajectories. Specifically, for each partial
∑i=1 ezi (7) trajectory T p , the positive result can be returned when
ŷ = arg max p̂ y = d j |T P
the true destination of T p is within the top-k candidate
j destinations ranked by predicted probabilities P̂.
• Distance Error indicates the average distance error be-
We apply the cross-entropy as our loss function, which is
normally used to compute the distance between the predicted tween the labelled destinations and predicted top-1 des-
probability distribution in the softmax classifier and the true tinations for all query trajectories. In this study, the
probability distribution. The equation is in Eq.8. candidate destinations are location clusters, so this dis-
tance error is calculated from two clusters’ centroids. The
1 |D| Haversine distance is adopted, which is defined as Eq. 9:
si log y0i + λ kωk22

L(ω) = − ∑ (8) r
|D| i=1 Haversine_dis(x, y) = 2 · R · arctan(
δ
),
1−δ
where ω is the learnable parameters in the neural network, and (9)
λ is the regularizer hyper-parameter for the L2 regularization. 2 φx − φy 2 λx − λy
δ = sin + cos φx cos φy sin
si is the i-th dimension in the one-hot representation of the 2 2
true destination y. In the training phase, we employ the Adam
to optimize the loss function with the learning rate lr . where φ is the latitude, λ is the longitude and R is the radius
of the earth (R = 6371 km).
V. E VALUATION 3) Parameter Setting: In our attention-based dual BiLSTMs
neural network, the number of neurons in each BiLSTM
A. Experiment Setup
component is set to 200, the size of attention is 30, and
1) Data Description: Our experiments are conducted based
on three real-world datasets. 1 Didi Chuxing GAIA Initiative. https://gaia.didichuxing.com
50 neurons in the dense layer for OT features. 500 and 941 TABLE I: The prediction distance error of the proposed deep
neurons in the last two dense layers for fusing features and model trained on the Chengdu dataset under different
outputting for the Softmax classifier. Both the learning rate lr input parameters, i.e., 1 partition level (h); 2 percentage
and regularizer parameter λ are set to 0.001. The batch size of training trajectory completion (p)
is 128. Parameters for Distance Error (m)
4) Comparison Algorithms: In this study, we compare our Input Trajectories 10% 30% 50% 70% 90% AVG
destination prediction model with three baselines, the details h = 5, p = 70% 2927 2273 1546 906 775 1685
of which are presented as follows. h = 6, p = 70% 3037 2265 1473 803 702 1656
h = 8, p = 70% 3068 2207 1418 784 703 1636
• R_MLP: Use latitudes and longitudes of the first and h = 7, p = 70% 3141 2156 1369 718 694 1615
last 5 points of raw GPS trajectories as the inputs of h = 7, p = 50% 2936 1975 1283 1278 1678 1830
Multi-Layer Perceptrons [10]. This neural network was h = 7, p = 60% 3006 2086 1311 889 1100 1678
the winning model in the taxi destination prediction h = 7, p = 80% 3151 2284 1542 806 464 1649
competition (ECML-PKDD [14]). h = 7, p = 90% 3166 2352 1663 925 336 1688
• R_LSTM: Use raw GPS trajectories as inputs of the
regular Long Short-Term Memory network (LSTM).
• RW_BiLSTM: Use a sliding window of 5 successive Hence, in the evaluation stage, p and k are employed
GPS points in every raw GPS trajectory as the input unit as main variables to evaluate the prediction performance of
at each time step for the BiLSTM network [10]. different models.
Additionally, h and p are two preset parameters regarding
5) Evaluation Environment: All the evaluations in the pa-
the input trajectories in the training stage. We conduct a group
per are programmed using Python 3.7 with the TensorFlow
of comparison experiments to investigate their effects on our
Library, and running on a PC with 4 NVIDIA GeForce RTX
proposed model and then determine the suitable values. Tab I
2080 Ti GPU and 192 GB RAM.
illustrates the prediction distance errors of our model trained
on the Chengdu dataset with different values of h and p. As
shown in the table, the results in the first four rows can tell
B. Parameter Sensitivity Study that the performance can be improved with an increase of h
In this study, in addition to the common hyper-parameters in (smaller cell size), but too small size can lead to worse results.
the deep model, there are three important parameters regarding It is because when the grid cell is too large, two different GPS
the training and testing processes: points might be wrongly represented by a unique embedding
vector; on the contrary, if the grid cell size is too small, the
• Partition level (h). The parameter h is used to hierarchi- GPS points that should belong to the same place may be
cally divide the urban space into 4h grid cells. It not only wrongly represented by different embedding vectors. The last
determines the grid cell size, but also relates to the depth five rows can indicate two points: 1) the deep model would
and width of Quadtree. In particular, the spatial features show great performance when the p of testing trajectory is
in GPS embedding vectors are also sensitive to the grid equal to that of training trajectories; 2) when p = 70%, the
cell size, while the prediction of our model strongly relies trained model achieves lowest average distance error, showing
on the revealed spatial features. As a result, the partition more robustness in our test scenario. Hence, in the training
level h would have a significant impact on our prediction stage, for experiments on the Chengdu dataset, the partition
results. level h for training is set to 7 (cell size equals 120m × 120m),
• Percentage of trajectory completion (p). The com- and the percentage of trajectory completion p is set to 70%.
pletion percentage of partial trajectory p is the most
predominant factor in our prediction task. The query
trajectory in a high completion percentage can provide C. Effectiveness Evaluation
more information and also locates closer to the destina- 1) Effectiveness of GPS Embedding Methods: The GPS
tion. In particular, in the real-world prediction scenario, embedding methods in this study are promised to improve
it is the unknown and gradually growing p making the the prediction performance of our deep neural network by
prediction extremely challenging. Hence, to evaluate the revealing representative spatiality of GPS points in the partial
model performance, the testing trajectories should be trajectory. In order to demonstrate the effectiveness of our
incremental and unfinished (p ≤ 100%). At the same time, embedding method (Quadtree & grid-lnglat embedding), other
to train the model on the unfinished input trajectories can three different methods are tested on the same neural network,
make the model more robust in the prediction scenario. namely the raw GPS trajectories, single Quadtree embedding
• Number of output destinations (k). Owing to the num- and single grid-lnglat embedding.
ber of output destinations may be insufficient e.g, k = 1, Table II and Figure 8(a) show the comparison results
the default value of k is set to 5 following the setting varying the percentage of trajectory completion. Figure 8 (b)
of other studies [28, 33]. Furthermore, k in the accuracy shows their trends in prediction accuracy metric with respect
metric can reveal more vulnerability of models. The more to different k. We can find:
rapid increment of the prediction accuracy by increasing • Our Quadtree & grid-lnglat embedding method out-
k, means the better robustness of model [28]. performs the other methods, and the gap is gradually
TABLE II: The prediction distance error of our deep model TABLE III: The prediction distance error of different predic-
with different GPS embedding methods in the Chengdu tion models in the Chengdu datasets, varying the percentage
datasets, varying the percentage of trajectory completion. of trajectory completion.
GPS Embedding Distance Error (m) Destination Distance Error (m)
Methods 10% 30% 50% 70% 90% AVG Prediction Models 10% 30% 50% 70% 90% AVG
Raw-lnglat 3031 2339 1624 1014 790 1760 + OT_F 3435 2179 1460 852 689 1723
R_MPL
Quadtree 3178 2198 1391 772 654 1639 − OT_F 3270 2207 1497 915 749 1727
Grid-lnglat 2979 2253 1471 861 803 1673 + OT_F 3127 2223 1481 850 764 1689
R_LSTM
Ours 3141 2156 1369 718 694 1615 − OT_F 3043 2262 1490 882 791 1693
RW_ + OT_F 3343 2123 1339 782 667 1650
0.7 BiLSTM − OT_F 3233 2087 1411 844 741 1671
Raw-lnglat
0.6 Raw-lnglat
0.6 Quadtree + OT_F 3141 2156 1369 718 694 1615
Quadtree
Grid-lnglat
Ours
Grid-lnglat 0.5 − OT_F 3222 2198 1403 772 687 1656
Accuracy
Accuracy
Ours
0.4 Ours
0.4
0.3 0.7
0.2 R_MLP
R_MLP
0.2 0.6 R_LSTM
R_LSTM 0.6
R+W_BiLSTM RW_BiLSTM
0
Accuracy
Accuracy
Ours Ours
10 30 50 70 90 1 2 3 4 5 0.4 0.5
Percentage of Trajectory Completion (%) Top-k
0.4
(a) (b) 0.2
0.3
0 0.2
Fig. 8: The prediction accuracy of our deep model with 10 30 50 70 90 1 2 3 4 5
Percentage of Trajectory Completion (%) Top-k
different embedding methods in the Chengdu datasets: a)
varying the percentage of trajectory completion when k is 5; b) (a) (b)
varying the number of predicted destinations when p is 70%.
Fig. 9: The prediction accuracy of different prediction models
in the Chengdu datasets: a) varying the percentage of trajectory
widening with the growth of the percentage of trajectory completion when k is 5; b) varying the number of predicted
completion. Since the deep model is trained with input destinations when p is 70%.
trajectories in p = 70%, the increasing slope of these
polylines are getting lower after p = 70% in Fig. 8(a).
• The grid-lnglat embedding shows better performance than
with/without OT features. By profiling these results, we can
raw GPS trajectories especially when p is growing, which observe that:
indicates that the grid-lnglat embedding can alleviate the • Our model outperforms baseline models in the prediction
uncertainty and sparse problem in GPS trajectories. scenario, achieving the lowest average prediction distance
• The Quadtree embedding is better than grid-lnglat em- error 1615m in Tab III. In Fig. 9 (b), our model is
bedding, which shows the multi-scale motion features are increasingly better than baseline models with the increas-
more discriminative than 2-dim spatial motion features in ing of parameter k, which shows the superior prediction
the destination prediction task. performance and better robustness of our model.
• Compared to the single Quadtree embedding or grid- • In Table III, we can observe that all the deep models
lnglat embedding, the combination of them namely our achieve performance improvements by using the OT
Quadtree & grid-lnglat method can achieve further im- features. Such results demonstrate the strong correlations
provements regarding the distance error and prediction between the passengers’ destination and where and when
accuracy. Such results demonstrate that it is of great the taxi trip started.
significance to consider both the 2-dim geographic and • The RW_BiLSTM model shows better performance than
multi-scale spatiality of GPS points in the destination the simple R_LSTM model. It demonstrates the signif-
predction task. icance of considering both the preceding and future
Based on the above observations, we can conclude that our sequential context of GPS points for the destination
GPS embedding method is effective in revealing representative prediction task.
spatial features from partial trajectories, thus enhancing the • As shown in Tab. III and Fig. 9 (a), with the growth
performance of our deep model. of trajectory completion p, the increasing trends of the
2) Effectiveness of Our Prediction Model: In this section, R_MPL model and ours, are faster than both the R_LSTM
we conduct comparison experiments with baselines, to demon- and RW_BiLSTM model. It can tell that the RNN based
strate the effectiveness of our destination prediction model models are more susceptible to the sparsity problem when
(GPS embedding methods with an attention-based dual BiL- dealing with the long-term raw trajectory data. But our
STMs). Moreover, we investigate the effect of our extracted model can effectively alleviate this problem due to the
OT features (i.e., where and when passengers took the taxi) adoption of GPS embedding methods.
on the prediction task. Table III and Figure 9 illustrate the The attention mechanism is employed to capture the differ-
prediction results of different models varying parameters p and ent discriminative capability of visited locations in determining
k. Tab III also presents the distance error of different models the heading destination. To evaluate its effectiveness, we visu-
TABLE IV: The prediction distance error of our deep model

trained on the Porto dataset with different input parameters,
varying the percentage of testing trajectory completion: 1 par-
tition level (h); 2 percentage of training trajectory completion
(p).
Parameters for Distance Error (m)
Input Trajectories 10% 30% 50% 70% 90% AVG
h = 6, p = 70% 2887 2186 1284 574 553 1496
h = 7, p = 70% 2848 2175 1254 560 572 1481
h = 8, p = 70% 3083 2194 1284 587 576 1545
Fig. 10: The attention map of weighted cells in the city of h = 7, p = 60% 2900 2036 1086 736 914 1534
Chengdu learnt by the attention mechanism. h = 7, p = 80% 2966 2313 1467 626 304 1535
alize the learned crucial locations in the Chengdu city, called TABLE V: The prediction distance error of different GPS em-
the attention map. The attention intensity ai of locations can be bedding methods and prediction models on the Porto dataset,
represented by the times of being identified as the locations varying the percentage of testing trajectory completion.
with the largest discriminative capability in determining the Distance Error (m)
Effectiveness Study
destination. After testing 100,000 trajectories in our model, 10% 30% 50% 70% 90% AVG
we obtain 6320 unique grid cells. Due to the 128 ∗ 128 grid GPS Raw_lnglat 3032 2304 1317 594 623 1574
cells (at the h = 7 level) in the urban space are too small to be Embedding Quadtree 2939 2198 1260 566 579 1508
visually examined, we present the attention map at the h = 5 Methods Grid_lnglat 3016 2239 1313 602 616 1557
level (32 ∗ 32 grid cells) in Fig. 10. These grid cells are further R_MPL 3119 2216 1340 656 573 1580
divided into three groups for the visualization, namely ai ≥ 50, Prediction R_LSTM 3018 2294 1365 604 676 1591
ai ≥ 200 and ai ≥ 500. As illustrated in the figure, the grid Models RW_BiLSTM 3155 2240 1312 556 623 1577
cells with the high attention intensity are mainly located in the Ours - OT_F 3107 2187 1303 580 574 1550
center of the city. As a typical example, the top-1 weighted Ours 2848 2175 1254 560 572 1481
cell is further highlighted with the red color, which is the
determinant location of 1066 trajectories. The corresponding
enlarged map is shown in the figure. We can find that this
place contains a bridge and a crossway of two primary roads, Porto dataset. By profiling the results shown in Tab. V and
which are consistent with the representative landmarks that Fig. 11, we can observe that:
we intend to capture by the attention mechanism. Such results
• Our GPS embedding method outperforms the other three
demonstrate the effectiveness of the attention mechanism in
ones on this dataset. Their performance differences are
our prediction task.
also similar to those on the Chengdu dataset. The results
further verify the conclusions: 1) Our GPS embedding
D. Model Evaluation on Porto Dataset
methods can alleviate the uncertainty and sparse problem
In this section, we conduct a group of experiments on the in the prediction task on raw GPS trajectories. 2) Combin-
Porto dataset, to investigate whether the proposed deep neural ing the Quadtree embedding and grid-lnglat embedding
network and GPS embedding methods can effectively perform can reveal more representative spatiality of GPS points
on the trajectory data from a different city and in a long- thus benefit the prediction performance.
term time period (one year). Since the historical passenger’s • Compared to the other neural networks, our proposed
destination data is not available in Porto city, we take the attention-based dual BiLSTMs shows the best perfor-
drop-off location clusters as the outputs of our neural network, mance. Such results demonstrate that our deep neural net-
following the previous studies [10, 19]. The clusters are also work can effectively model the spatiotemporal context of
obtained by employing the mean-shift clustering algorithm [9] visited locations, and extract more discriminative features
on the trajectory data, and the cluster number is 989. Note from the partial trajectory. Additionally, the OT features
that the parameter settings of the neural network is the same are also effective in improving the prediction performance
to that of the experiments in Chengdu city. on the Porto dataset.
We first conduct the parameter sensitivity study on the Porto
dataset to determine suitable values for h and p, i.e., partition In summary, experiment results on the Porto dataset further
level and completion percentage for training trajectories. As demonstrate the feasibility and effectiveness of our proposed
shown in Tab. IV, we can find the suitable value for h is 7 (cell GPS embedding methods and deep neural network in the
size equals 70m × 70m) and for p is 70%. Also the variation destination prediction task. From the other perspective, the
of the model’s performance when varying two parameters is experimental results also indicate that the underlying relation-
similar to that on the Chengdu dataset (in Tab. I). ship between the passenger’s destination and the partially left
After the parameter study, we evaluate the effectiveness of taxi trajectory exists in different cities, and can be learned by
GPS embedding methods and different neural networks on the deep models.
0.6
Raw-lnglat R_MLP
0.5 Quadtree R_LSTM
Grid-lnglat
0.5 RW_BiLSTM
Ours
Accuracy
Accuracy
Ours
0.4 0.4 Ours - OT_F
0.3
0.3
0.2
0.2
1 2 3 4 5 1 2 3 4 5
Top-k Top-k
(a) (b)
Fig. 11: The prediction accuracy of different GPS embedding

methods (a) and prediction models (b) on the Porto dataset, Fig. 12: A case study regarding the passenger’s destination
varying the number of predicted destinations when p is 70%. prediction in Chengdu city at 3 PM on non-workday.
E. Case Study study of passenger’s trip purpose inference by analyzing the

Fig. 12 shows a prediction case on a partial taxi trajectory categories of human activities in the predicted destination.
(p = 70%) at 3 PM on non-workday in Chengdu city. Five Third, we plan to test the proposed destination prediction
blue landmarks in the figure denote the top five predicted model in real-world taxis. Specifically, we could employ an
candidate destinations with the highest probabilities, and the electronic platform with computing and storage capabilities
red landmark denotes the real drop-off position of this taxi. (e.g., Raspberry Pi) as the data center in taxis, so as to
As we can see, for the query from this taxi, our prediction deploy our deep model and recommendation information.
model can directly present five places which are the most likely Then the GPS trajectory collection, destination prediction and
destinations of passengers. Hence, in this case, the passengers recommendation processes are able to work in ongoing taxis in
might head for the Happy Valley Region for entertainment or real time. By this way, we could further explore the potential
head for the company, school and residential quarters. Under problems and opportunities of destination prediction studies.
such circumstances, the advertisement or recommendation
systems deployed in the taxi could present the personalized ACKNOWLEDGMENTS
promotion information to the passengers, such as the discount The work was supported by the National Natural Science
messages or the popular entertainment items in the Happy Foundation of China (No. 61872050), the Chongqing Basic
Valley park. and Frontier Research Program (No. cstc2018jcyjAX0551),
and the Fundamental Research Funds for the Central Univer-
VI. C ONCLUSION AND F UTURE W ORK sities (No. 21619310). Chengwu Liao and Chao Chen con-
tributed equally to this work. Chao Chen is the corresponding
This paper presents a novel taxi-passenger’s destination pre- author of this paper.
diction framework, which consists of the candidate passenger’s
destination identification, two GPS embedding methods and R EFERENCES
an attention-based dual BiLSTMs neural network. Different [1] J. A. Alvarez-Garcia, J. A. Ortega, L. Gonzalez-Abril, and F. Velasco.
from existing studies, we attempt to predict the passengers’ Trip destination prediction based on past gps log using a hidden markov
model. Expert Systems with Applications, 37(12):8166–8171, 2010.
destination instead of drop-off locations, opening up more op- [2] P. C. Besse, B. Guillouet, J.-M. Loubes, and F. Royer. Destination
portunities for developing smarter passenger-centered services. prediction by trajectory distribution-based model. IEEE Transactions
In this study, we focus on modeling the underlying relationship on Intelligent Transportation Systems, 19(8):2470–2481, 2017.
[3] S. Buthpitiya, Y. Zhang, A. K. Dey, and M. Griss. N-gram geo-trace
between the passenger’s destination and visited locations in the modeling. In International Conference on Pervasive Computing, pages
partially left taxi trajectory for the prediction. Specifically, the 97–114, 2011.
combination of GPS embedding methods and the attention- [4] C. Chen, S. Jiao, S. Zhang, W. Liu, L. Feng, and Y. Wang. Tripimputor:
real-time imputing taxi trip purpose leveraging multi-sourced urban data.
based dual BiLSTMs can model the spatial and bidirectional IEEE Transactions on Intelligent Transportation Systems, 19(10):3292–
sequential context of visited locations simultaneously, thus ob- 3304, 2018.
tain discriminative motion features from the partial trajectory. [5] C. Chen, C. Liao, X. Xie, Y. Wang, and J. Zhao. Trip2vec: a deep
embedding approach for clustering and profiling taxi trip purposes.
We also fuse the motion features with the OT (origin and time) Personal and Ubiquitous Computing, 23(1):53–66, 2019.
features in the neural network. Finally, the neural network [6] C. Chen, D. Zhang, B. Guo, X. Ma, G. Pan, and Z. Wu. Tripplanner:
outputs the probabilities of candidate destinations (i.e., the Personalized trip planning leveraging heterogeneous crowdsourced digi-
tal footprints. IEEE Transactions on Intelligent Transportation Systems,
clusters of historical passenger’s destinations in the city). 16(3):1259–1273, 2014.
Extensive experiments on the real-world datasets demonstrate [7] C. Chen, D. Zhang, X. Ma, B. Guo, L. Wang, Y. Wang, and E. Sha.
the effectiveness of our proposed model in predicting the CROWDDELIVER: planning city-wide package delivery paths leverag-
ing the crowd of taxis. IEEE Transactions on Intelligent Transportation
passenger’s destination. Systems, 18(6):1478–1496, 2016.
In the future, we plan to broaden and deepen this work [8] M. Chen, Y. Liu, and X. Yu. Predicting next locations with object
in several directions. First, we plan to aggregate more spatial clustering and trajectory clustering. In Pacific-Asia Conference on
Knowledge Discovery and Data Mining, pages 344–356. Springer, 2015.
context into the embedding of GPS points, such as the land [9] Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Transactions
use and the types of roads. Second, we plan to explore the on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995.
[10] A. De Brébisson, É. Simon, A. Auvolat, P. Vincent, and Y. Bengio. (HPCC/SmartCity/DSS), pages 1093–1100, 2019.
Artificial neural networks applied to taxi destination prediction. arXiv [32] X. Zhang, Z. Zhao, Y. Zheng, and J. Li. Prediction of taxi destinations
preprint arXiv:1508.00021, 2015. using a novel data embedding method and ensemble learning. IEEE
[11] Y. Endo, K. Nishida, H. Toda, and H. Sawada. Predicting destinations Transactions on Intelligent Transportation Systems, 21(1):68–78, 2020.
from partial trajectories using recurrent neural network. In Pacific-Asia [33] J. Zhao, J. Xu, R. Zhou, P. Zhao, C. Liu, and F. Zhu. On prediction of
Conference on Knowledge Discovery and Data Mining, pages 160–172, user destination by sub-trajectory understanding: A deep learning based
2017. approach. In Proceedings of the 27th ACM International Conference on
[12] Y. Gao, D. Jiang, and Y. Xu. Optimize taxi driving strategies based Information and Knowledge Management, pages 1413–1422, 2018.
on reinforcement learning. International Journal of Geographical [34] Y. Zheng, Y. Liu, J. Yuan, and X. Xie. Urban computing with taxicabs.
Information Science, 32(8):1677–1696, 2018. In Proceedings of the 13th International Conference on Ubiquitous
[13] M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial transformer Computing, pages 89–98, 2011.
networks. In Advances in neural information processing systems, pages [35] C. Zhong, S. M. Arisona, X. Huang, M. Batty, and G. Schmitt.
2017–2025, 2015. Detecting the dynamics of urban structure through spatial network
[14] Kaggle. Kaggle competition. analysis. International Journal of Geographical Information Science,
https://www.kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i. 28(11):2178–2199, 2014.
[15] E. Kauderer-Abrams. Quantifying translation-invariance in convolutional [36] B. D. Ziebart, A. L. Maas, A. K. Dey, and J. A. Bagnell. Navigate like
neural networks. arXiv preprint arXiv:1801.01450, 2017. a cabbie: Probabilistic reasoning from observed context-aware behavior.
[16] J. Krumm and E. Horvitz. Predestination: Inferring destinations from In Proceedings of the 10th International Conference on Ubiquitous
partial trajectories. In International Conference on Ubiquitous Comput- Computing, pages 322–331, 2008.
ing, pages 243–260, 2006.
[17] X. Li, M. Li, Y.-J. Gong, X.-L. Zhang, and J. Yin. T-DesP: Destination
prediction based on big trajectory data. IEEE Transactions on Intelligent
Transportation Systems, 17(8):2344–2354, 2016.
[18] J. Lv, Q. Li, Q. Sun, and X. Wang. T-conv: a convolutional neural
network for multi-scale taxi trajectory prediction. In 2018 IEEE
International Conference on Big Data and Smart Computing (BigComp), Chengwu Liao is pursuing his Ph.D. degree at the
pages 82–89, 2018. College of Computer Science, Chongqing Univer-
[19] J. Lv, Q. Sun, Q. Li, and L. Moreira-Matias. Multi-scale and multi-scope sity, China. He received the B.Sc degree from the
convolutional neural networks for destination prediction of trajectories. College of Information Engineering and Automa-
IEEE Transactions on Intelligent Transportation Systems, 21(8):3184– tion, Kunming University of Science and Technol-
3195, 2020. ogy, Yunnan, China, in 2017. His research interests
[20] A. Rossi, G. Barlacchi, M. Bianchini, and B. Lepri. Modelling taxi include intelligent transportation systems, spatiotem-
drivers’ behaviour for the next destination prediction. IEEE Transactions poral trajectory mining, urban data visualization.
on Intelligent Transportation Systems, 21(7):2980–2989, 2020.
[21] A. W. Smith, A. L. Kun, and J. Krumm. Predicting taxi pickups
in cities: Which data sources should we use? In Proceedings of the
2017 ACM International Joint Conference on Pervasive and Ubiquitous
Computing and Proceedings of the 2017 ACM International Symposium
on Wearable Computers, pages 380–387, 2017.
[22] M. Sperber. Quadtree and Octree, pages 931–934. Springer US, 2008.
[23] J. Teevan, A. Karlson, S. Amini, A. Brush, and J. Krumm. Under-
standing the importance of location, time, and people in mobile local
search behavior. In Proceedings of the 13th International Conference on Chao Chen is a Full Professor at College of Com-
Human Computer Interaction with Mobile Devices and Services, pages puter Science, Chongqing University, Chongqing,
77–80, 2011. China. He obtained his Ph.D. degree from Pierre
[24] Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, J. Ye, and W. Lv. and Marie Curie University and Institut Mines-
The simpler the better: a unified approach to predicting original taxi Télécom/Télécom SudParis, France in 2014. He
demands based on large-scale online platforms. In Proceedings of the received the B.Sc. and M.Sc. degrees in control
23rd ACM SIGKDD International Conference on Knowledge Discovery science and control engineering from Northwestern
and Data Mining, pages 1653–1662, 2017. Polytechnical University, Xi’an, China, in 2007 and
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, 2010, respectively. Dr. Chen got published over 80
Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in papers including 20 ACM/IEEE Transactions. His
neural information processing systems, pages 5998–6008, 2017. work on taxi trajectory data mining was featured
[26] M. Xu, D. Wang, and J. Li. DESTPRE: a data-driven approach to by IEEE SPECTRUM in 2011, 2016, and 2020 respectively. He was also
destination prediction for taxi rides. In Proceedings of the 2016 ACM the recipient of the Best Paper Runner-Up Award at MobiQuitous 2011.
International Joint Conference on Pervasive and Ubiquitous Computing, His research interests include pervasive computing, mobile computing, urban
pages 729–739, 2016. logistics, data mining from large-scale GPS trajectory data, and big data
[27] A. Y. Xue, J. Qi, X. Xie, R. Zhang, J. Huang, and Y. Li. Solving the data analytics for smart cities.
sparsity problem in destination prediction. The International Journal on
Very Large Data Bases, 24(2):219–243, 2015.
[28] A. Y. Xue, R. Zhang, Y. Zheng, X. Xie, J. Huang, and Z. Xu. Destination
prediction by sub-trajectory synthesis and privacy protection against
such prediction. In 2013 IEEE 29th International Conference on Data
Engineering (ICDE), pages 254–265, 2013.
[29] Z. Yang, H. Sun, J. Huang, Z. Sun, H. Xiong, S. Qiao, Z. Guan, Chaocan Xiang is an associate professor at the
and X. Jia. An efficient destination prediction approach based on College of Computer Science, Chongqing Univer-
future trajectory prediction and transition matrix optimization. IEEE sity, Chongqing, China. He received the BS and
Transactions on Knowledge and Data Engineering, 32(2):203–217, Ph.D. degrees in computer science and engineering
2020. from the Nanjing Institute of Communication Engi-
[30] L. Zhang, G. Zhang, Z. Liang, and E. F. Ozioko. Multi-features taxi neering, China, in 2009 and 2014, respectively. He
destination prediction with frequency domain processing. PloS one, studied at the University of Michigan-Ann Arbor in
13(3):e0194629, 2018. 2017. His current research interests include wireless
[31] R. Zhang, J. Guo, H. Jiang, P. Xie, and C. Wang. Multi-task learning sensor networks, crowd-sensing networks, and IoT.
for location prediction with deep multi-model ensembles. In 2019
IEEE 21st International Conference on High Performance Computing
and Communications; IEEE 17th International Conference on Smart
City; IEEE 5th International Conference on Data Science and Systems
Hongyu Huang received his B.S. degree from

Chongqing Normal University in 2002, the M.S.
degree from Chongqing University in 2005, and
the Ph.D. degree from Shanghai Jiao Tong Uni-
versity in 2009 respectively. He is currently an
associate professor with the college of computer
science, Chongqing University, Chongqing, China.
His research interests include mobile crowd-sensing,
privacy preserving computing, and vehicular ad hoc
networks.
Hong Xie received the B.Eng. degree in Computer

Science from University of Science and Technology
of China and the Ph.D degree in Computer Science
from The Chinese University of Hong Kong proudly
under the supervision of Prof. John C.S. Lui. He is
currently a researcher in the College of Computer
Science, Chongqing University. His research inter-
ests include online learning and data science. He is
a member of the ACM/IEEE.
Songtao Guo received the B.S., M.S., and Ph.D.

degrees in computer software and theory from
Chongqing University, Chongqing, China, in 1999,
2003, and 2008, respectively. He was a professor
from 2011 to 2012 at Chongqing University. He is
currently a full professor at Chongqing University,
Chongqing, China. He was a senior research asso-
ciate at City University of Hong Kong from 2010
to 2011, and a visiting scholar at Stony Brook Uni-
versity, New York, from 2011 to 2012. His research
interests include wireless sensor networks, wireless
ad hoc networks, data center networks, and mobile edge computing. He has
published more than 100 scientific papers in leading refereed journals and
conferences. He has received many research grants as a principal investigator
from the National Science Foundations of China and Chongqing as well as the
Postdoctoral Science Foundation of China. He is an IEEE/ACM/CCF Senior
Member.
View publication stats

Des Prediction

Uploaded by

Copyright:

Available Formats

Des Prediction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Des Prediction

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Taxi-Passenger's Destination Prediction via GPS Embedding and Attention-

Article in IEEE Transactions on Intelligent Transportation Systems · January 2021

Chengwu Liao Chao Chen

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Taxi-Passenger’s Destination Prediction via GPS

Abstract—The prediction of taxi-passenger’s destination with

sparsity problem in the real-world trajectory dataset. To ad-

Fig. 4: (a) 158 urban activity regions (UARs) with different

B. Candidate Passenger’s Destination Identification

This spatial data structure has been widely used in location

1) Dual BiLSTMs: As an improved variant of RNN, LSTM

where → denotes the forward process and σ is the sigmoid

TABLE IV: The prediction distance error of our deep model

Fig. 11: The prediction accuracy of different GPS embedding

E. Case Study study of passenger’s trip purpose inference by analyzing the

Hongyu Huang received his B.S. degree from

Hong Xie received the B.Eng. degree in Computer

Songtao Guo received the B.S., M.S., and Ph.D.

View publication stats

You might also like