RouteKG: A knowledge graph-based framework for route prediction on road networks

Yihong Tang Zhan Zhao Weipeng Deng Shuyu Lei Yuebing Liang Zhenliang Ma Department of Urban Planning and Design, The University of Hong Kong, Hong Kong SAR, China Urban Systems Institute, The University of Hong Kong, Hong Kong SAR, China Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong SAR, China Senseable City Lab, Department of Urban Studies and Planning, Massachusetts Institute of Technology, Cambridge MA, United States Department of Civil and Architectural Engineering, KTH Royal Institute of Technology, Stockholm, Sweden

Abstract

Short-term route prediction on road networks allows us to anticipate the future trajectories of road users, enabling a plethora of intelligent transportation applications such as dynamic traffic control or personalized route recommendation. Despite recent advances in this area, existing methods focus primarily on learning sequential transition patterns, neglecting the inherent spatial structural relations in road networks that can affect human routing decisions. To fill this gap, this paper introduces RouteKG, a novel Knowledge Graph-based framework for route prediction. Specifically, we construct a Knowledge Graph on the road network, thereby learning and leveraging spatial relations, especially moving directions, which are crucial for human navigation. Moreover, an $n$ -ary tree-based algorithm is introduced to efficiently generate top- $K$ routes in a batch mode, enhancing scalability and computational efficiency. To further optimize the prediction performance, a rank refinement module is incorporated to fine-tune the candidate route rankings. The model performance is evaluated using two real-world vehicle trajectory datasets from two Chinese cities, Chengdu and Shanghai, under various practical scenarios. The results demonstrate a significant improvement in accuracy over baseline methods.We further validate our model through a case study that utilizes the pre-trained model as a simulator for real-time traffic flow estimation at the link level. The proposed RouteKG promises wide-ranging applications in vehicle navigation, traffic management, and other intelligent transportation tasks.

keywords:

Route prediction, Knowledge graph, Road network representation, Trajectory data mining, Geospatial AI

1 Introduction

In intelligent transportation systems (ITS), with the increasing prevalence of mobile sensors (e.g., GPS devices) and vehicular communication technologies, the ability to predict road users’ future routes is not merely a convenience but a necessity to support a range of applications such as vehicle navigation (Ziebart et al. 2008), traffic management (Li et al. 2020) and location-based recommendation (Kong et al. 2017, Tang et al. 2022). There are generally two types of route prediction tasks. On the one hand, for transport planning applications, it is often required to predict the complete route (as a sequence of road links) from the origin to destination. This is typically referred to as route choice modeling in the literature (Prato 2009), where the destination information has to be given. On the other hand, for real-time ITS applications, the destination information may not be available, and it is usually adequate to predict the near-future route trajectory of a moving agent based on the observed trajectory so far. This study focuses on the latter, which we call the short-term route prediction problem.

Numerous methods have been proposed in the literature to tackle this problem. Recent works typically use Recurrent Neural Networks (RNNs) (Rumelhart et al. 1986), especially Long Short-Term Memory (LSTM) (Hochreiter & Schmidhuber 1997) and Gated Recurrent Unit (GRU) (Cho et al. 2014), to capture sequential dependencies in trajectory data (Alahi et al. 2016, Mo et al. 2023). Most existing models focus primarily on learning sequential patterns for route prediction, often overlooking the inherent spatial structure in road networks that can affect human routing decisions. To address this issue, some studies have started to leverage Graph Neural Networks (GNNs) to encode road networks for improved prediction of vehicle trajectories (Liang & Zhao 2021) and traffic conditions (Zhao et al. 2019, Li et al. 2017). However, these methods still treat road networks merely as generic graphs, oversimplifying their structure and disregarding crucial geographical attributes and spatial factors.

As a type of spatial network, road networks consist of a set of spatial entities (e.g., intersections, links, etc.) organized in a way to facilitate traffic flows in a mostly 2-dimensional space. The relationships between these entities can be described by a set of spatial factors such as direction, distance, and connectivity. For example, one of the important spatial factors to consider in routing problems is the direction of travel (i.e., goal direction). It has been widely recognized in the navigation and cognitive psychology literature that humans utilize directional cues to navigate their environment (Etienne & Jeffery 2004, Chrastil & Warren 2015). Existing short-term route prediction models, however, often ignore the directional factor or incorporate it as a side feature, thus possibly leading to sub-optimal model performance. These limitations highlight the need for a more spatially explicit model to learn and incorporate these spatial relations throughout road networks.

There are other challenges for short-term route prediction on road networks. Firstly, most existing methods focus on generating a single predicted route (Rathore et al. 2019, Yan et al. 2022). However, due to the inherent uncertainties, providing multiple route predictions can have more practical implications. For instance, traffic managers can optimize real-time traffic flow by considering multiple potential routes of moving vehicles, and transportation system users can benefit from having a wider variety of routing options. Secondly, as road networks grow in size, scalability becomes a challenge for GNN-based methods (Hamilton et al. 2017), as they require substantial computational and memory resources, limiting their applicability to large-scale networks. Lastly, the prediction performance is heavily dependent on the availability of information about the destination (or goal). Generally, potential performance enhancements can be achieved by incorporating goal information into route prediction models. The availability of goal information can vary, including (1) no information, (2) goal direction only, and (3) complete goal information. These varying degrees of goal information availability can impact the route prediction to different extents, but no existing studies have carried out a comprehensive evaluation across all these scenarios.

With the aforementioned challenges, we propose a novel model, “RouteKG”, which leverages the potential of Knowledge Graphs (KGs) (Wang et al. 2014) to encode road networks for short-term route prediction. Unlike existing models that rely on sequence-to-sequence (seq2seq) structures (Sutskever et al. 2014), our approach interprets route prediction as a Knowledge Graph Completion (KGC) task (Chen et al. 2020). Specifically, we propose a Knowledge Graph Module that can predict the future links (tail entities) a user might traverse based on current links (head entities) and moving directions (relations) without solely relying on sequential structures. The module explicitly incorporates the goal moving direction (estimated or actual) into the future route prediction process, better aligning with the intrinsic nature of human navigation. In addition, we employ a Route Generation Module to efficiently generate top- $K$ route candidates, and a Rank Refinement Module that can model the dependencies between different links within each predicted route to rerank the route candidates for their consistency, resulting in the final top- $K$ predictions. Our proposed KG-based framework can effectively model the spatial relations, thus outperforming existing baselines by a large margin, and could benefiting a range of other transportation or routing tasks. To summarize, this paper contributes to the literature as follows:

•

We introduce RouteKG, a novel KG-based modeling framework for short-term route prediction. In this approach, we adapt the KG to represent road networks and reformulate the route prediction problem as a KGC task. Therefore, we can leverage the road network and route representations learned from the KG to enhance prediction accuracy and interpretability.
•

We propose an $n$ -ary tree-based route generation algorithm that enables efficient batch generation of future routes based on predicted probabilities derived from the KG. Additionally, we employ a rank refinement module that effectively prioritizes routes for their consistency by modeling dependencies between their road links, resulting in more accurate, trustworthy, and reliable top- $K$ route predictions.
•

Through extensive experiments on two real-world vehicle trajectory datasets from Chengdu and Shanghai, the results demonstrate the superior prediction performance of RouteKG over state-of-the-art baseline models across various scenarios of goal information availability, with low response latency. Furthermore, a case study using the trained RouteKG as a simulator to estimate real-time traffic flows at the link level demonstrates our method’s effectiveness in diverse application scenarios.

2 Literature Review

2.1 Trajectory Prediction

2.1.1 Motion Prediction

Motion prediction, which anticipates an agent’s future trajectory from past movements, is central to autonomous driving systems (Yurtsever et al. 2020, Lefèvre et al. 2014). Its importance has amplified with advancements in autonomous driving and robot navigation, improving safety and efficiency by mitigating collision risks and boosting performance (Rudenko et al. 2020). However, the dynamic and uncertain nature of agents’ movements presents unique challenges (Paravarzar & Mohammad 2020). Motion prediction methods can generally be divided into two broad categories: classic and deep learning-based, each with unique advantages and limitations.

Classic methods leverage mathematical models grounded in physics and geometry to focus on the deterministic aspects of an agent’s motion, offering simplicity, interpretability, and efficiency (Helbing & Molnar 1995). Yet, these methods struggle to capture the stochastic behavior of agents in complex environments (Huang et al. 2022).

On the other hand, deep learning-based motion prediction methods leverage neural networks’ power to model the complexities of agent behavior (Alahi et al. 2016). These methods aim to learn the intricate, often non-linear, relationships between different influencing factors from large-scale data. Approaches such as RNNs and Generative Adversarial Networks (GANs) are commonly used (Gupta et al. 2018, Sadeghian et al. 2019, Gu et al. 2021). Recent efforts employ diffusion process (Ho et al. 2020) simulate the process of human motion variation from indeterminate to determinate (Gu et al. 2022). The advantage of deep learning methods is their ability to capture the underlying patterns and subtleties that traditional mathematical models might miss. However, they require extensive computational resources and large amounts of training data, and often lack the interpretability of classic methods (Rudenko et al. 2020).

2.1.2 Route Prediction

Route prediction, distinct from motion prediction, forecasts the future trajectories of agents that typically operate within road network constraints, necessitating different problem formulations and solutions. Similar to motion prediction, models designed for short-term route prediction can also be broadly classified into traditional approaches and deep learning-based methods.

Traditional methods utilize shortest path-based methods such as the Dijkstra’s algorithm (Dijkstra 1959), Bellman-Ford, and A* (Hart et al. 1968) for route prediction tasks. However, these dynamic programming-based methods require destination information to generate potential routes for trajectory prediction. As the destination information is often unavailable for short-term route prediction, other works have employed Kalman Filters (Abbas et al. 2020) or Hidden Markov Models (HMMs) (Simmons et al. 2006, Ye et al. 2016) to predict users’ destinations and routes. Nevertheless, these methods struggle to model long-term temporal dependencies due to relatively simple model structures.

In comparison, deep learning-based methods have outperformed traditional methods in prediction tasks, exhibiting superior ability in modeling spatial-temporal dependencies. The RNN-based encoder-decoder trajectory representation learning framework (Fu & Lee 2020) can adapt to tasks such as trajectory similarity measurement, travel time prediction, and destination prediction. Other studies have utilized Graph Convolutional Networks (GCN) and attention mechanisms to refine trajectory representation for prediction purposes (Shao et al. 2021). Some studies have proposed models for tasks ranging from predicting the next link using historical trajectories (Liu et al. 2022) to enhancing route prediction through pre-training and contrastive learning (Yan et al. 2022). Furthermore, some models are designed for road network-constrained trajectory recovery, capable of recovering fine-grained points from low-sampling records (Ren et al. 2021, Chen, Zhang, Sun & Zheng 2022).

Despite significant progress in short-term route prediction on road networks, many existing methods view it as a sequence-to-sequence task, leveraging sequential models like RNNs or Transformers for prediction. These methods often overlook the crucial role of spatial relations within the road network, an essential aspect of routing tasks.

2.2 Knowledge Graph

2.2.1 Knowledge Graph Completion

The rapidly expanding interest in KGs has fueled the advancement in tasks like recommender systems, question answering, and semantic search, given their ability to provide structured and machine-interpretable knowledge about real-world entities and their relations (Noy et al. 2019, Sheth et al. 2019, Paulheim 2017). Despite their immense potential, a critical problem is the inherent incompleteness of information, making KGC an important and burgeoning research area. KGC refers to inferring missing or incomplete information in a KG by predicting new relationships between entities based on existing information (Chen et al. 2020).

Earlier studies on KGC typically employed statistical relational learning (SRL) methods, such as Markov Logic Networks (MLN) (Richardson & Domingos 2006) and Probabilistic Soft Logic (PSL) (Bach et al. 2017). These methods demonstrate effectiveness in capturing complex dependencies but need to improve scalability due to the need to specify all possible rules manually. More scalable machine learning approaches, especially those involving embeddings, have been proposed to overcome these limitations in recent years. TransE is a seminal model in this line, which models relations as translations in the entity embedding space (Bordes et al. 2013). Follow-up models such as TransH (Wang et al. 2014), TransR (Lin et al. 2015), and TransD (Ji et al. 2015) were subsequently proposed to handle complex relational data by introducing hyperplanes, relation-specific spaces, or dynamic mapping matrices respectively. Meanwhile, tensor factorization-based models like RESCAL (Nickel et al. 2011), DistMult (Yang et al. 2014), and ComplEx (Trouillon et al. 2016) have been developed, aiming to capture the complex correlations between entities and relations. These models generally perform well but can be computationally intensive. More recently, models based on GNNs have shown promising results for KGC. Models such as R-GCN (Schlichtkrull et al. 2018) and CompGCN (Vashishth et al. 2019) have achieved competitive results by modeling KGs as multi-relational graphs and learning from both the graph structure and node attributes.

To summarize, KGC is a process that leverages machine learning to infer and predict missing knowledge automatically. It leverages the rich structure of KGs, employing effective entity and relation representations for improved prediction.

2.2.2 Mobility Knowledge Graph

KGs have been increasingly utilized to address complex urban mobility problems. Mobility KGs have witnessed considerable growth and advancements in recent years, particularly with integrating multi-source transportation data, creating KGs derived from GPS trajectory data, and utilizing structured knowledge bases to augment urban mobility data analysis.

Tan et al. (2021) devised a KG for urban traffic systems to uncover the implicit relationships amongst traffic entities and thereby unearth valuable traffic knowledge. Similarly, Zhuang et al. (2017) constructed an urban movement KG using GPS trajectory data and affirmed the practicality of their model by predicting the level of user attention directed towards various city locations. Zhao et al. (2020) put forth a generalized framework for multi-source spatiotemporal data analysis, underpinned by KG embedding, intending to discern the network structure and semantic relationships embedded within multi-source spatiotemporal data. Several studies have focused on building KGs grounded on geographical information and human mobilities for various applications, such as predicting subsequent locations (i.e., Point of Interest recommendation) (Liu et al. 2021, Rao et al. 2022, Wang et al. 2021), modeling event streams (Wang et al. 2020), learning user similarity (Zhang et al. 2023), forecasting destinations (Li et al. 2022, Chi et al. 2022), and performing epidemic contact tracing (Chen, Zhang, Qian & Li 2022).

Despite their methodological divergence, these approaches rely on different data sources to construct mobility KGs, often resulting in superior outcomes but potentially sacrificing some generalizability. Notably, current work has yet to address the design of KGs for route prediction or road network representation learning while retaining generalizability.

3 Preliminaries

In this section, we introduce definitions and the problem formulation in Section 3.1. All the notations used in this paper are listed in the A.

3.1 Problem Formulation

Definition 1 (Road Network $\mathbf{G}$ ).

The road network can be modeled as a Multi-Directed Graph (MultiDiGraph) $\mathbf{G}=(\mathbf{V},\mathbf{E})$ , where $\mathbf{V}$ is a set of vertices (or nodes) representing unique intersections or endpoints in the road network, and $\mathbf{E}$ is a set of directed edges, each representing a link. Each vertex $v\in\mathbf{V}$ is associated with a geographical coordinate $(lat_{v},lon_{v})$ . Each edge $e\in\mathbf{E}$ carries certain attributes, such as length, road type, etc. Multiple edges may connect the same pair of vertices, accounting for multiple links connecting the same intersections (e.g., parallel roads). An edge $e^{k}$ is denoted as $e^{k}=(v^{s}_{k},v^{e}_{k},m)$ , where $m$ distinguishes edges connecting the same pair of nodes.

Definition 2 (Route $x$ ).

A map-matched route $x$ of length $m$ is a sequence of links, $x=\left\{e^{1},e^{2},...,e^{m}\right\},x\in\mathcal{X}$ . For every consecutive pair of links $(e^{i},e^{i+1})$ , there exists a node $v$ in the graph $\mathbf{G}$ that connects the two edges. The set of all map-matched routes is denoted as $\mathcal{X}$ . The $i$ -th route can be partitioned into an observed route $x^{o}_{i}=\{e^{j}_{i}\}^{\Gamma}_{j=1}$ with length $\Gamma$ and a future route $x^{f}_{i}=\{e^{j}_{i}\}_{j=\Gamma+1}^{\Gamma+\Gamma^{\prime}}$ with length $\Gamma^{\prime}$ , where $\mathcal{X}^{o}=\{x_{i}^{o}\}_{i=1}^{|\mathcal{X}|}$ and $\mathcal{X}^{f}=\{x_{i}^{f}\}_{i=1}^{|\mathcal{X}|}$ .

Given above definitions, the short-term route prediction (or route prediction for short) problem can be broadly defined as the task of predicting the future route based on observed routes. However, as discussed in Section 1, the availability of goal information plays a pivotal role in routing tasks. In some scenarios, no goal information is available. In other cases, we may know the rough direction of the destination, or its exact location. The degree of goal information inclusion can greatly influence the specific formulation of route prediction Dendorfer et al. (2020). Remarkably, no existing work has undertaken an exhaustive evaluation encompassing all these distinct scenarios. Consequently, in this study, we categorize the route prediction problem into three subproblems:

Problem 1 (Route Prediction $\mathcal{F}$ ).

Generally, the route prediction problem aims to learn a function $\mathcal{F}$ that maps observed routes to future routes. We identify three distinct subproblems that arise based on the availability of the goal information:

Subproblem 1 (Route prediction with unknown goal $\mathcal{F}_{1}$ ) The goal information is completely absent from the input. The mapping function $\mathcal{F}_{1}$ is designed to predict the future routes solely based on the observed routes, disregarding any goal information:

\left[\left\{x^{o}_{i}\right\}_{i=1}^{|\mathcal{X}|};\mathbf{G}\right]% \stackrel{{\scriptstyle\mathcal{F}_{1}(\cdot;\Theta_{1})}}{{\longrightarrow}}% \{x^{f}_{i}\}_{i=1}^{|\mathcal{X}|},

(1)

Subproblem 2 (Route prediction with goal direction only $\mathcal{F}_{2}$ ) The goal direction $r^{d}_{i}$ is known in addition to the observed routes. The mapping function $\mathcal{F}_{2}$ leverages the goal direction to predict the future routes more accurately:

\left[\left\{x^{o}_{i};r^{d}_{i}\right\}_{i=1}^{|\mathcal{X}|};\mathbf{G}% \right]\stackrel{{\scriptstyle\mathcal{F}_{2}(\cdot;\Theta_{2})}}{{% \longrightarrow}}\{x^{f}_{i}\}_{i=1}^{|\mathcal{X}|},

(2)

Subproblem 3 (Route prediction with complete goal information $\mathcal{F}_{3}$ ) Complete goal information is given in the input. The mapping function $\mathcal{F}_{3}$ leverages both the goal direction $r^{d}_{i}$ and exact goal link $e^{\Gamma+\Gamma^{\prime}}_{i}$ to generate more accurate predictions of the future routes:

\left[\left\{x^{o}_{i};r^{d}_{i};e^{\Gamma+\Gamma^{\prime}}_{i}\right\}_{i=1}^% {|\mathcal{X}|};\mathbf{G}\right]\stackrel{{\scriptstyle\mathcal{F}_{3}(\cdot;% \Theta_{3})}}{{\longrightarrow}}\{x^{f}_{i}\}_{i=1}^{|\mathcal{X}|},

(3)

where $x_{i}^{o}$ is the $i$ -th observed route, and $\Theta_{1},\Theta_{2},\Theta_{3}$ are the parameter sets of the mapping functions $\mathcal{F}_{1},\mathcal{F}_{2},\mathcal{F}_{3}$ .

In the context of routing applications, it is crucial to account for various destination-specific requirements. By addressing the routing prediction problem through the three identified subproblems, our study offers valuable empirical evidence regarding the impact of different degrees of goal information availability in real-world scenarios.

3.2 Knowledge Graph

A KG is a heterogeneous structured data representation containing entities (nodes) and their interrelations (edges). The edges carry precise semantic information about the relation type or associated attributes. Formally, the graph is often represented by triplets: $\mathcal{G}=\left\{(h,r,t)\text{ }|\text{ }h,t\in\mathcal{E},r\in\mathcal{R}\right\}$ , where $h$ represents the head entity, $r$ the relation, and $t$ the tail entity. $\mathcal{E}$ is the set of entities, and $\mathcal{R}$ the set of relations. These triplets concisely encode factual information for efficient knowledge discovery, inference, and integration. The graph not merely serves as a repository of existing knowledge but also facilitates the inference of missing information. This process, known as Knowledge Graph Completion (KGC), finds a tail entity $\hat{t}$ given a head entity and a relation, denoted as $(h,r,\hat{t})$ , or its reverse, denoted as $(\hat{h},r,t)$ , thereby completing a partial triplet.

To enhance KGC and provide quantitative measures of relations, KG embedding maps entities and relations to a low-dimensional space, preserving the relational structure. The embedding process can be formalized as two mapping functions $\mathcal{M}_{\mathcal{E}}:\mathcal{E}\rightarrow\mathbb{R}^{\delta_{\mathcal{E% }}}$ and $\mathcal{M}_{\mathcal{R}}:\mathcal{R}\rightarrow\mathbb{R}^{\delta_{\mathcal{R% }}}$ , where ${\delta_{\mathcal{E}}}$ and ${\delta_{\mathcal{R}}}$ are the dimensions of the entity embedding space and relation embedding space. A scoring function $\phi:\mathbb{R}^{\delta_{\mathcal{E}}}\times\mathbb{R}^{\delta_{\mathcal{R}}}% \times\mathbb{R}^{\delta_{\mathcal{E}}}\rightarrow\mathbb{R}$ computes the plausibility of a relation $r$ between entities $h$ and $t$ in the embedded space. The function is defined such that $\phi(\mathcal{M}_{\mathcal{E}}(h),\mathcal{M}_{\mathcal{R}}(r),\mathcal{M}_{% \mathcal{E}}(t))$ returns a real number representing the score of the triplet $(h,r,t)$ . KGC infers missing relations or entities by identifying triplets with high scores under the scoring function. This embedding mechanism, coupled with a scoring function, computes and extends the encoded relations within the KG, providing a robust knowledge discovery and integration tool.

4 Methodology

4.1 RouteKG Framework Overview

This section introduces RouteKG, the proposed solution to the route prediction problem. As depicted in Figure 1, the model comprises four modules, namely Data Preprocessing Module $\mathcal{M}_{d}$ , Knowledge Graph Module $\mathcal{M}_{kg}$ , Route Generation Module $\mathcal{M}_{g}$ , and Rank Refinement Module $\mathcal{M}_{r}$ , each serving a specific purpose and collectively working towards an effective solution.

Refer to caption — Figure 1: The flowchart of RouteKG.

We start by processing the raw GPS trajectories $\mathcal{T}$ and the raw road network data $G$ with the Data Preprocessing Module. This module generates the direction label matrix $\mathbf{D}$ , the node adjacency edges (NAE) matrix $\mathbf{A}$ , and map-matched routes $\mathcal{X}$ . We then divide $\mathcal{X}$ into the observed routes $\mathcal{X}^{o}$ and future routes $\mathcal{X}^{f}$ . We represent the road network as a MultiDiGraph $\mathbf{G}$ and express the preprocessing step as $\left(\mathcal{X}^{o},\mathcal{X}^{f},\mathcal{X},\mathbf{D},\mathbf{A}\right)% =\mathcal{M}_{d}\left(\mathcal{T},\mathbf{G}\right)$ .

Next, the Knowledge Graph Module is the core component of the proposed model, it takes the observed routes $\mathcal{X}^{o}$ and road network $\mathbf{G}$ to predict future routes. It constructs a knowledge graph on the road network $\mathbf{G}$ , learns spatial relations $\mathcal{R}$ , and predicts future routes, converting $\mathcal{X}^{o}$ to future route probabilities $\mathrm{Pr}(\widetilde{\mathcal{X}^{f}})$ via $\left(\mathrm{Pr}(\widetilde{\mathcal{X}^{f}}),\mathcal{R}\right)=\mathcal{M}_% {kg}\left(\mathcal{X}^{o},\mathbf{G},\mathbf{D};\Theta_{kg}\right)$ , where $\Theta_{kg}$ are the module’s parameters, and $\mathcal{R}$ represents the learned spatial relations.

With the future route probabilities $\mathrm{Pr}(\widetilde{\mathcal{X}^{f}})$ in hand, the Route Generation Module employs an $n$ -ary tree algorithm to generate potential future routes, yielding the top- $K$ preliminary route predictions. This step is captured by $\left\{\widetilde{\mathcal{X}^{f}_{k}}\right\}_{k=1}^{K}=\mathcal{M}_{g}\left(% \mathrm{Pr}(\widetilde{\mathcal{X}^{f}}),\mathbf{G},\mathbf{A}\right)$ , where $\widetilde{\mathcal{X}^{f}_{k}}$ denotes the $k$ -th generated future route.

When predicting future routes, predicted road links at different time-step are not independent but related. Thus, a Rank Refinement Module is utilized to collectively learn and assess the predicted route. It takes the initial top- $K$ predictions, $\left\{\widetilde{\mathcal{X}^{f}_{k}}\right\}_{k=1}^{K}$ , and refines them using the spatial relations, $\mathcal{R}$ . This refinement is achieved by the mapping $\left\{\widehat{\mathcal{X}^{f}_{k}}\right\}_{k=1}^{K}=\mathcal{M}_{r}\left(% \left\{\widetilde{\mathcal{X}^{f}_{k}}\right\}_{k=1}^{K},\mathcal{R};\Theta_{r% }\right)$ , where $\Theta_{r}$ are the module’s parameters. This stage ensures that the final route predictions are accurate by considering the sequence of routes and spatial relations.

The motivations and details of the four modules will be explained in the following subsections.

4.2 Data Preprocessing Module

To facilitate the KG-related process and route prediction, we first need to perform specific calculations on the road network. This subsection details the method for producing the necessary data for the model components, which aims to compute routes $\mathcal{X}$ , route directions $\mathcal{X}_{d}$ , link-to-link direction matrix $\mathbf{D}\in\mathbb{R}^{|\mathbf{E}|\times|\mathbf{E}|}$ , and node adjacency edges matrix $\mathbf{A}\in\mathbb{R}^{|\mathbf{V}|\times N_{A}}$ , where the $N_{A}$ is the maximum number of the adjacent edges of all nodes in the $\mathbf{G}$ .

Routes $\mathcal{X}$ are obtained by map-matching GPS trajectories $\mathcal{T}$ to the road network $\mathbf{G}$ (Yang & Gidofalvi 2018). These routes are then divided into observed routes $\mathcal{X}^{o}$ and future routes $\mathcal{X}^{f}$ . Considering the importance of direction information in navigation (Chrastil & Warren 2015), we discretize continuous directions into $N_{d}$ classes to form $\mathcal{X}_{d}$ and $\mathbf{D}$ . Figure 2 provides an example based on $N_{d}=8$ . It has been shown that 8 directions are adequate in uniquely mapping most link-to-link movements and can enhance route prediction performance (Liang & Zhao 2021). This discretization allows for the convenient computation and assignment of inter- and intra-edge direction labels. It is worth noting that the two-way roads are given only one direction label for simplicity.

To preserve the road network structure information, we construct the node adjacency edges (NAE) matrix, denoted as $\mathbf{A}$ , can be derived directly from the road network $\mathbf{G}$ . To build $\mathbf{A}$ , we pad the edges adjacent to each node to an uniform length, thereby creating a matrix of dimensions $\mathbb{R}^{|\mathbf{V}|\times N_{A}}$ . In this context, $|\mathbf{V}|$ indicates the total number of nodes in the road network, while $N_{A}$ represents the maximum number of adjacent edges to any node. This padding approach enables batch training combined with smart masking techniques.

4.3 Knowledge Graph Module

After data preprocessing, we designed a Knowledge Graph Module that adapts the KG to the road network, which learns the complex spatial relationships between road links and therefore more accurately estimate the probability of each link as part of the future route $x^{f}$ , given an observed route $x^{o}$ . Formally, given a road network $\mathbf{G}$ and an observed route $x^{o}\in\mathcal{X}^{o}$ , the module outputs $\Gamma^{\prime}$ probability distributions $\mathrm{Pr}(\widetilde{\mathcal{X}^{f}})=\left\{\mathrm{Pr}\left(\widetilde{x^% {f,\gamma}}\right)\right\}_{\gamma=1}^{\Gamma^{\prime}}$ . Each distribution indicates the probability of a link being part of future routes, with the $\gamma$ -th distribution indicating the likelihood of each road link being the $\gamma$ -th link in those future routes, where $\gamma=1,2,\dots,\Gamma^{\prime}$ .

Intuitively, a driver’s route choice is based on their intended goal. Therefore, using KGC for route prediction aligns with the logic behind drivers’ route selections. However, most existing KGs are designed for search engines (Xiong et al. 2017) and text-based Question Answering (Huang et al. 2019), making them unsuitable for direct application to road networks. Therefore, we need to construct a KG tailored to the characteristics of road networks, redefine the KGC problem in this context, and use learned spatial relations for more accurate route prediction. These tasks are encompassed in three submodules we’ve designed: Knowledge Graph Construction, Knowledge Graph Representation Learning, and Future Route Prediction through KGC. We will detail these in the following subsections.

4.3.1 Knowledge Graph Construction

To design a KG $\mathcal{G}$ tailored for road networks and route prediction, we first need to select the crucial spatial and structural features in road networks. The desired KG should preserve the spatial relations amongst the identified entities while maintaining its applicability and generalizability across fine-grained scenarios on the road networks. In alignment with this objective, the selection focuses solely on those entities and relations that pervade all road networks and routing contexts. A detailed explanation of the entity and relation selection processes is provided below.

Entity selection

When constructing the KG $\mathcal{G}$ for road networks and routes, the initial key step is to identify entities $\mathcal{E}$ . In the context of a road network, the predominant entity is the link. Every link $e$ is characterized by their unique identifiers and associated attributes such as length or connectivity. Selecting links as the sole entities reflects their intrinsic importance within the road network. It ensures the broad generalizability of the resulting KG across various routing contexts and scenarios, contributing to the applicability of the proposed approach.

Relation selection

As discussed earlier, route prediction is reformulated as a KGC problem. Consequently, base on the selected entities (i.e., links), the relations $\mathcal{R}$ chosen for the road network should reflect and preserve the following features: (1) the spatial and structural properties of the road network, and (2) the consistency and preference patterns observed in drivers’ route selections. Given this, we identify four relations to construct the KG: connectivity, consistency, distance, and direction. Each relation offers unique insights into the relation between links within the road network. Connectivity describes the topological structure of links in the road network. A “ConnectBy” relation $\mathcal{R}_{c}$ is established between two links if they are directly connected via a shared node. The “ConsistentWith” relation $\mathcal{R}_{s}$ is derived from observed routes, capturing the co-occurrence of two links within the same routes. A higher co-occurrence rate indicates a stronger “ConsistentWith” relation, providing an empirical basis for capturing real-world routing patterns. The spatial distance between two links forms another key “DistanceTo” relation $\mathcal{R}_{a}$ . It is crucial in many scenarios where the physical proximity of links impacts route selection and planning. Lastly, the direction forms the most critical “DirectionTo” relation $\mathcal{R}_{d}$ from a navigational standpoint. It provides essential directional information between links, greatly enhancing the graph’s utility for various routing tasks.

The four major spatial relation types are summarized in Table 1. By comprehensively capturing these four types of relations over the identified entities, the KG possesses a rich and nuanced representation of the road network, which could facilitate various advanced routing tasks.

Table 1: Major spatial relation types and corresponding notations and data sources of the constructed KG. A major relation type may contain multiple relations. For example, the relation type “DirectionTo” contains

N_{d}

directions, indicating a total of

N_{d}

direction relations.

Relation	Notation	Data Source
ConnectBy	$\mathcal{R}^{c}$	Road Network $\mathbf{G}$
ConsistentWith	$\mathcal{R}^{s}$	Road Network $\mathbf{G}$ , Observed routes $\mathcal{X}^{o}$
DistanceTo	$\mathcal{R}^{a}$	Road Network $\mathbf{G}$
DirectionTo	$\mathcal{R}^{d}$	Road Network $\mathbf{G}$

4.3.2 Knowledge Graph Representation Learning

Spatial relations between entities (i.e. links) on a road network should be route-agnostic. This means that these relations should be independent of specific routes and instead solely reflect the spatial attributes of the road network itself. These relations also need to be encoded efficiently to support training and inference. One common approach is to employ KG embedding techniques, which aims to find embedding functions $\mathcal{M}_{\mathcal{E}}$ , $\mathcal{M}_{\mathcal{R}}$ that maps each entity and each relation in to an feature vector. The embedding function $\mathcal{M}_{\mathcal{E}}(\cdot)$ and $\mathcal{M}_{\mathcal{E}}(\cdot)$ should preserve the inherent property of $\mathcal{G}$ . However, road networks exhibit complex relations, as illustrated in Figure 3, involving many-to-many relations. To address this complexity, we modify and adapt the translation distance model TransH (Bordes et al. 2013) to $\mathcal{G}$ in our study. This enables us to effectively learn the vector representations of both entities and relations in the knowledge graph $\mathcal{G}$ .

To enable KG representation learning, we first need to construct the sets of positive triplets $\Delta$ and negative triplets $\Delta^{\prime}$ for each relation type. To facilitate batch representation learning and ensure comprehensive learning of all entities and relations, we employ a Random Sampling-based method for KG representation learning. The subsequent paragraphs will offer detailed description of the positive and negative triplet sets’ construction processes for each relation type in $\mathcal{G}$ .

ConnectBy $\mathcal{R}^{c}$

For $\mathcal{R}^{c}$ , positive triplets $\Delta_{\mathcal{R}^{c}}$ are sampled from adjacent edges in spatial graph $\mathbf{G}$ , linked by “ConnectBy”. Negative triplets $\Delta^{\prime}_{\mathcal{R}^{c}}$ , conversely, are sampled from non-adjacent edges.

ConsistentWith $\mathcal{R}^{s}$

The ‘ConsistentWith’ relation $\mathcal{R}^{s}$ , identifies co-occurring links in the same routes, indicating inter-links transition patterns. To construct $\Delta_{\mathcal{R}^{s}}$ and $\Delta^{\prime}_{\mathcal{R}^{s}}$ , we utilize the spatial graph $\mathbf{G}$ and observed routes $\mathcal{X}^{o}$ . The positive set $\Delta_{\mathcal{R}^{s}}$ consists of edges appearing together in observed routes, while the negative set $\Delta^{\prime}_{\mathcal{R}^{s}}$ is formed by randomly sampling edges from $\mathbf{G}$ .

DistanceTo $\mathcal{R}^{a}$

For the ‘DistanceTo’ relation $\mathcal{R}^{a}$ , sampling is done from observed routes $\mathcal{X}^{o}$ , better aligning the “DistanceTo” relation with route prediction and reducing the possible $\mathcal{R}^{a}$ relations. The resulting sets are $\Delta_{\mathcal{R}^{a}}$ and $\Delta^{\prime}_{\mathcal{R}^{a}}$ for positive and negative triplets, respectively.

DirectionTo $\mathcal{R}^{d}$

We need the inter-link direction matrix $\mathbf{D}$ to construct positive and negative triplet sets $\Delta_{\mathcal{R}^{d}}$ and $\Delta^{\prime}_{\mathcal{R}^{d}}$ . For sampled edges $e^{i},e^{j}\in\mathbf{E}$ , their relative direction $r^{d}$ is given by $r^{d}=\mathbf{D}_{e^{i}e^{j}}$ , forming the positive set. The negative set is similarly formed from edge pairs that contradict the directional relation in $\mathbf{D}$ .

We denote the positive triplet sets for all relations as $\Delta_{\mathcal{R}^{\cdot}}$ and the negative sets as $\Delta_{\mathcal{R}^{\cdot}}^{\prime}$ . Consider an identified type of relation $\mathcal{R}^{\cdot}$ , and we incorporate two trainable weight matrices. One matrix functions as the relation embedding matrix, represented as $\mathbf{W}_{\mathcal{R}^{\cdot}}\in\mathbb{R}^{|\mathcal{R}^{\cdot}|\times% \delta_{\mathcal{R}^{\cdot}}}$ , while the other corresponds to the relation hyperplane, denoted as $\mathbf{P}_{\mathcal{R}^{\cdot}}$ , both maintaining congruent dimensions. Given the sets of positive triplets $\Delta_{\cdot}$ and negative triplets $\Delta_{\cdot}^{\prime}$ , where $\cdot$ can represent any of the relations on the KG, the representation learning process involves the following steps. For any triplet $(h,r,t)\in\Delta_{\cdot}$ and $(h^{\prime},r^{\prime},t^{\prime})\in\Delta^{\prime}_{\cdot}$ , we use $\mathbf{h}$ , $\mathbf{r}$ , $\mathbf{t}$ , $\mathbf{h}^{\prime}$ , $\mathbf{r}^{\prime}$ , and $\mathbf{t}^{\prime}$ denoted their embeddings and use $\mathbf{p}^{r}$ and $\mathbf{p}^{r^{\prime}}$ to denote the hyperplane of relation $r$ and $r^{\prime}$ with $\left(\mathbf{p}^{r}\right)^{\top}$ and $\left(\mathbf{p}^{r^{\prime}}\right)^{\top}$ as their transposes. Then we carry out the process defined by Eq. (4) for all relations present on the KG. The loss function for the KG $\mathcal{G}$ ’s representation learning is:

	$\displaystyle\mathcal{L}_{rep}=\sum_{\Delta,\Delta^{\prime}\in\left\{(\Delta_{% \mathcal{R}^{\cdot}},\Delta_{\mathcal{R}^{\cdot}}^{\prime})\right\}}\sum_{(h,r% ,t)\in\Delta}\sum_{(h^{\prime},r^{\prime},t^{\prime})\in\Delta^{\prime}}\Biggl% {[}$	$\displaystyle\left\\|\text{ }\left(\mathbf{h}-\left(\mathbf{p}^{r}\right)^{\top% }\mathbf{h}\mathbf{p}^{r}\right)+\mathbf{r}-\left(\mathbf{t}-\left(\mathbf{p}^% {r}\right)^{\top}\mathbf{t}\mathbf{p}^{r}\right)\text{ }\right\\|_{\ell_{1}}+\psi-$		(4)
		$\displaystyle\left\\|\text{ }\left(\mathbf{h}^{\prime}-\left(\mathbf{p}^{r^{% \prime}}\right)^{\top}\mathbf{h}^{\prime}\mathbf{p}^{r^{\prime}}\right)+% \mathbf{r}^{\prime}-\left(\mathbf{t}^{\prime}-\left(\mathbf{p}^{r^{\prime}}% \right)^{\top}\mathbf{t}^{\prime}\mathbf{p}^{r^{\prime}}\right)\text{ }\right% \\|_{\ell_{1}}\text{ }\Biggr{]}_{+},$		(4)

Eq. (4) defines the margin loss $\mathcal{L}_{rep}$ , which calculates the difference in scores between positive and negative triplets. The margin $\psi$ ensures a separation between the scores of positive and negative triplets. Before each batch training starts, we impose a constraint to ensure that $\textbf{p}^{r}$ and $\textbf{p}^{r^{\prime}}$ are unit normal vectors by projecting them to the unit $\ell_{2}$ -ball: $\forall r\in\mathcal{R},\left\|\mathbf{p}^{r}\right\|_{2}=1$ .

4.3.3 Future Route Prediction through Knowledge Graph Completion

Our study approaches the task of future route prediction by framing it as a KGC problem. As shown in Figure 4, given the last link $e_{i}^{\Gamma}$ (i.e., head entity) of the $i$ -th observed route $x^{o}_{i}$ and the (estimated or actual) direction of movement (i.e., relation), our objective is to infer the future route $\widehat{x^{f}_{i}}$ (i.e., tail entity) that the vehicle will traverse. In our case, the actual direction $r^{d}_{i}$ of a vehicle’s movement is the direction from the current link to the last link of the future route. Utilizing KGC, we introduce an innovative objective to predict the immediate future routes of road users. This addition not only utilizes the learned KG embeddings to enhance route prediction accuracy but also enriches the KG representation with deeper semantics, thereby creating a synergistic effect between KG embedding and route prediction.

For illustration, we consider the $i$ -th observed route $x^{o}_{i}\in\mathcal{X}^{o}$ and the corresponding direction $x^{o,d}_{i}\in\mathcal{X}^{o,d}$ , drawn from the set of observed routes $\mathcal{X}^{o}$ and directions $\mathcal{X}^{o,d}$ , respectively. Here, $x^{o}_{i}=\{e^{j}_{i}\}^{\Gamma}_{j=1}$ and $x^{o,d}_{i}=\{e^{d,j}_{i}\}_{j=\Gamma+1}^{\Gamma+\Gamma^{\prime}}$ . Initially, we extract the embeddings of all elements of $x^{o}$ and $x^{o,d}$ by multiplying their respective one-hot vectors with the corresponding trainable embedding matrices: $\mathbf{W}_{\mathcal{E}}\in\mathbb{R}^{|\mathcal{E}|\times\delta_{\mathcal{E}}}$ and $\mathbf{W}_{\mathcal{R}^{d}}\in\mathbb{R}^{|\mathcal{R}^{d}|\times\delta_{% \mathcal{R}^{d}}}$ . The resulting embeddings for $x^{o}_{i}$ and $x^{o,d}_{i}$ are denoted as $\mathbf{x}^{o}_{i}=\{\mathbf{e}^{j}_{i}\|\}_{j=1}^{\Gamma}$ and $\mathbf{x}^{o,d}_{i}=\{\mathbf{e}^{d,j}_{i}\|\}_{j=\Gamma+1}^{\Gamma+\Gamma^{% \prime}}$ , with $\mathbf{e}_{i}^{\cdot}\in\mathbb{R}^{\delta_{\mathcal{E}}}$ , $\mathbf{e}^{d,\cdot}_{i}\in\mathbb{R}^{\delta_{\mathcal{R}^{d}}}$ . To predict the direction of the vehicle’s future routes, we utilize a Multi-layer Perceptron (MLP) (Popescu et al. 2009) to encode $\mathbf{x}^{o}_{i}$ and $\mathbf{x}^{o,d}_{i}$ :

\widehat{r^{d}_{i}}=\operatorname*{argmax}\left(\textsc{MLP}_{d}\left(\mathbf{% x}^{o}_{i}\text{ }\|\text{ }\mathbf{x}^{o,d}_{i}\right)\right),

(5)

where $\widehat{r^{d}_{i}}$ represents the estimated direction of the $i$ -th future route, and we employ the cross-entropy loss to optimize the parameters of the $\mathrm{MLP}_{d}$ :

\mathcal{L}_{d}=-\log\mathrm{Softmax}\left(\mathrm{MLP}_{d}\left({x}^{o}_{i}% \text{ }\|\text{ }\mathbf{x}^{o,d}_{i}\right)\right)\left[r^{d}_{i}\right]

(6)

It should be noted that the estimation of the vehicle’s future route direction is only necessary when the goal is unspecified (i.e., subproblem 1). Conversely, when the goal direction or the actual goal is provided, one can directly utilize the given goal direction (i.e., subproblems 2 and 3).

To employ KGC to predict future routes, the last link $e_{i}^{\Gamma}$ is converted into the corresponding entity embedding by multiplying the one-hot vector of $e_{i}^{\Gamma}$ with $\mathbf{W}_{\mathcal{E}}$ , yielding $\mathbf{e}_{i}^{\Gamma}\in\mathbb{R}^{\delta_{\mathcal{E}}}$ . Then, $\mathbf{r}^{d}_{i}\in\mathbb{R}^{\delta_{\mathcal{R}^{d}}}$ and $\mathbf{p}^{d}_{i}\in\mathbb{R}^{\delta_{\mathcal{R}^{d}}}$ are obtained through a similar operation with $\mathbf{W}_{\mathcal{R}^{d}}$ and $\mathbf{P}_{\mathcal{R}^{d}}$ , respectively. Note that in this stage, RouteKG only needs the current position of a vehicle (i.e., the last link of the observed route), which is fundamentally different from existing seq2seq methods. Given $\mathbf{p}^{d}_{i}$ , we first project the $\mathbf{e}_{i}^{\Gamma}\in\mathbb{R}^{\delta_{\mathcal{E}}}$ and all links embeddings $\mathbf{e}\in\mathbb{R}^{|\mathcal{E}|\times\delta_{\mathcal{E}}}$ to the hyperplane $\mathbf{p}^{d}_{i}$ to obtain the projected head embedding $\mathbf{e}_{i,\perp}^{\Gamma}$ and all candidate tail embedding $\mathbf{e}_{\perp}$ on the hyperplane:

	$\displaystyle\mathbf{e}_{i,\perp}^{\Gamma}$	$\displaystyle=\mathbf{e}^{\Gamma}_{i}-{\left(\mathbf{p}^{d}_{i}\right)}^{\top}% \mathbf{e}^{\Gamma}_{i}{\mathbf{p}^{d}_{i}},$		(7)
	$\displaystyle\mathbf{e}_{\perp}$	$\displaystyle=\mathbf{e}-{\left(\mathbf{p}^{d}_{i}\right)}^{\top}\mathbf{e}% \text{ }{\mathbf{p}^{d}_{i}},$		(7)

where $\mathbf{e}_{i,\perp}^{\Gamma}\in\mathbb{R}^{\delta_{\mathcal{E}}}$ and $\mathbf{e}_{\perp}\in\mathbb{R}^{|\mathcal{E}|\times\delta_{\mathcal{E}}}$ .

Upon acquiring the projected head embedding $\mathbf{e}_{i,\perp}^{\Gamma}$ , we add the relation to the projected head embedding to query the tail entity. Given the projected head embedding $\mathbf{e}_{i,\perp}^{\Gamma}$ , the direction relation embedding $\mathbf{r}^{d}_{i}$ , and the distance relation embedding $\mathbf{r}^{a,\gamma}_{i}$ , we could query the tail entity based on the following equation:

\mathrm{Pr}\left(\widetilde{x^{f,\gamma}_{i}}\right)=\mathrm{Softmax}\left(% \mathbf{e}_{\perp}\cdot\left[\left(\mathbf{e}_{i,\perp}^{\Gamma}+\mathbf{r}^{d% }_{i}\right)\odot\mathbf{r}^{a,\gamma}_{i}\right]^{\top}\right),

(8)

where $\mathrm{Pr}\left(\widetilde{x^{f,\gamma}_{i}}\right)\in\mathbb{R}^{|\mathcal{E% }|}$ is the predicted probability distribution which indicates the likelihood of each link being the $\gamma$ -th link of the $i$ -th future route. We can set $\gamma$ from 1 to $\Gamma^{\prime}$ and recursively use Eq. (6) to obtain $\Gamma^{\prime}$ probability distributions $\left\{\mathrm{Pr}(\widetilde{x^{f,\gamma}_{i}})\right\}_{\gamma=1}^{\Gamma^{% \prime}}$ representing the estimated future route probabilities, which is the final output of the module.

For route prediction with complete goal information (i.e., subproblem 3), we make a subtle change to Eq. (8) by simply add the projected the tail entity (i.e., goal) embedding $\mathbf{e}_{i,\perp}^{\Gamma+\Gamma^{\prime}}$ to the head embedding $\mathbf{e}_{i,\perp}^{\Gamma}$ . The tail entity quering process could be updated as:

\mathrm{Pr}\left(\widetilde{x^{f,\gamma}_{i}}\right)=\mathrm{Softmax}\left(% \mathbf{e}_{\perp}\cdot\left[\left(\mathbf{e}_{i,\perp}^{\Gamma}+\mathbf{e}_{i% ,\perp}^{\Gamma+\Gamma^{\prime}}+\mathbf{r}^{d}_{i}\right)\odot\mathbf{r}^{a,% \gamma}_{i}\right]^{\top}\right),

(9)

To optimize the KG embeddings, the loss of the future route prediction is defined as:

\mathcal{L}_{pred}=-\sum_{i=1}^{|\mathcal{X}|}\sum_{\gamma=1}^{\Gamma^{\prime}% }\log\mathrm{Pr}\left(\widetilde{x^{f,\gamma}_{i}}\right)\left[x_{i}^{f,\gamma% }\right],

(10)

where $x_{i}^{f,\gamma}$ is the actual $\gamma$ -th link of the $i$ -th future route, and $[\cdot]$ is the indexing operation that retrieves and maximizes the log-probability of the actual link. Note that $x_{i}^{f,\gamma}$ and $e_{i}^{\Gamma+\gamma}$ indicate the same link in the $i$ -th future route.

4.4 Route Generation Module

Given predicted future route probabilities $\mathrm{Pr}(\widetilde{\mathcal{X}^{f}})$ , the Route Generation Module generates multiple possible future routes from these probabilities. An $n$ -ary tree-based algorithm, Spanning Route, is proposed to generate these future routes based on the predicted probabilities. This algorithm is visualized in Figure 5 through a simplified case where $n=2$ and only $\Gamma=3$ predicted future links are illustrated. For each tree node, we designate four attributes: name, parent, end_node, and pred. The name corresponds to the identification of the leaf, while the parent points to the predecessor of the current leaf. The attribute end_node signifies the terminal node of the current predicted link, and pred embodies the present predictions.

Input :

\Gamma^{\prime}

probability distributions

\left\{\mathrm{Pr}(\widetilde{x^{f,\gamma}_{i}})\right\}_{\gamma=1}^{\Gamma^{% \prime}}

;

road network

\mathbf{G}=(\mathbf{V},\mathbf{E})

;

NAE matrix

\mathbf{A}

;

the tree’s degree

n

Output : Top-

K

predicted future routes

\left\{\widetilde{x_{i,k}^{f}}\right\}_{k=1}^{K}

1 root

\leftarrow

CreateNewNode(name = “root”, parent = NIL, end_node =

v_{i,\Gamma}^{s}

, pred = NIL)

2 for $\gamma=1,\dots,\Gamma^{\prime}$ do

3 leaves

\leftarrow

GetLeaves(root)

4 for leaf $\in$ leaves do

\mathcal{N}_{end\_node}^{e}

\mathbf{A}[leaf.end\_node,:]

\left\{e_{i,k}^{\Gamma+\gamma}\right\}_{k=1}^{n}

= GetTopK(

\mathrm{Pr}(\widetilde{x^{f,\gamma}_{i}})[\mathcal{N}_{end\_node}^{e}]

K=n

)

7 for $k=1,\dots,n$ do

8 node = CreateNewNode(name = “

k

”, parent = leaf, end_node =

e_{i,k}^{\Gamma+\gamma}[1]

, pred =

e_{i,k}^{\Gamma+\gamma}

)

9 end for

11 end for

13 end for

14leaves

\leftarrow

GetLeaves(root)

15 for $k=1,\dots,K$ do

\text{path}_{k}

= Traverse(root, leaves[

k

])

\widetilde{x_{i,k}^{f}}

\left\{\text{path}_{k}\text{[i].pred}\right\}_{i=1}^{\Gamma^{\prime}}

19 end for

Algorithm 1 Spanning Route.

To formally introduce the Spanning Route algorithm, we provide the pseudo-code for generating the multiple future routes based on the predicted probabilities in Algorithm 1. Specifically, the algorithm encompasses four primary functions. The CreateNewNode function instantiates a new node in the tree given its attributes, while the GetLeaves function takes the root node as input and outputs all leaves of the tree. The GetTopK function retrieves the top- $k$ predictions given a predicted probability and $k$ , and the Traverse function applies a tree-based Depth-First Search (DFS) traversal algorithm—specifically a Pre-Order traversal (Tarjan 1972)—to acquire the path from the root to a specified leaf. This last function is instrumental in merging the predicted links into cohesive predicted routes. We note that certain detailed masking and indexing operations have been omitted in the presented pseudo code for clarity. For more details, please refer to the minibatch version of the Spanning Route algorithm 3 detailed in the appendix (B).

4.5 Rank Refinement Module

The top- $K$ future routes candidates $\widetilde{\mathcal{X}^{f}}=\left\{\widetilde{\mathcal{X}^{f}_{k}}\right\}_{k=% 1}^{K}=\left\{\left\{\widetilde{x_{i,k}^{f}}\right\}_{k=1}^{K}\right\}_{i=1}^{% |\mathcal{X}^{f}|}$ offer an initial selection of possible future routes. However, the dependencies among different links within these routes are solely based on the connectivity of the road network. Given that the consistency and other spatial relations of links within a route also affect people’s choices of routes, a more refined approach is needed for accurate future route prediction. To achieve this, we leverage learned spatial relations $\mathcal{R}$ to model the dependencies between different links and rerank the candidate routes based on the learned dependencies. This process can be denoted as $\left\{\widehat{\mathcal{X}^{f}_{k}}\right\}_{k=1}^{K}=\mathcal{M}_{r}\left(% \left\{\widetilde{\mathcal{X}^{f}_{k}}\right\}_{k=1}^{K},\mathcal{R};\Theta_{r% }\right)$ .

Consider the future routes $\widetilde{x_{i}^{f}}\in\mathbb{R}^{K\times\Gamma^{\prime}}$ . Initially, these routes are encoded by leveraging the embedding matrices $\mathbf{W}_{\mathcal{E}}$ and $\mathbf{W}_{\mathcal{R}^{d}}$ , thereby resulting in the route embedding $\widetilde{\mathbf{x}^{f}_{i}}\in\mathbb{R}^{K\times\Gamma^{\prime}\times% \delta_{\mathcal{E}}}$ and route direction embedding $\widetilde{\mathbf{x}^{f,d}_{i}}\in\mathbb{R}^{K\times\Gamma^{\prime}\times% \delta_{\mathcal{R}^{d}}}$ . In the subsequent reranking phase, we prioritize the routes with higher consistency and connectivity, utilizing the spatial relations $\mathcal{R}$ learned from the Knowledge Graph Module. Specifically, the obtained route embeddings are projected onto the “ConnectBy” and “ConsistentWith” hyperplanes as follows:

	$\displaystyle\widetilde{\mathbf{x}^{f}_{i,\perp^{c}}}$	$\displaystyle=\widetilde{\mathbf{x}^{f}_{i}}-\left(\mathbf{p}^{c}\right)^{\top% }\widetilde{\mathbf{x}^{f}_{i}}\mathbf{p}^{c}$		(11)
	$\displaystyle\widetilde{\mathbf{x}^{f}_{i,\perp^{s}}}$	$\displaystyle=\widetilde{\mathbf{x}^{f}_{i}}-\left(\mathbf{p}^{s}\right)^{\top% }\widetilde{\mathbf{x}^{f}_{i}}\mathbf{p}^{s},$		(11)

where $\widetilde{\mathbf{x}^{f}_{i,\perp^{c}}}\in\mathbb{R}^{K\times\Gamma^{\prime}% \times\delta_{\mathcal{E}}}$ and $\widetilde{\mathbf{x}^{f}_{i,\perp^{s}}}\in\mathbb{R}^{K\times\Gamma^{\prime}% \times\delta_{\mathcal{E}}}$ represent the projected route embeddings.

To quantify the internal consistency and connectivity of the generated routes, related margins for each route are calculated:

	$\displaystyle\mathbf{r}^{f,c}_{i,margin}$	$\displaystyle=\frac{1}{\Gamma^{\prime}-1}\sum_{j=1}^{\Gamma^{\prime}-1}% \widetilde{\mathbf{x}^{f}_{i,\perp^{c}}}\left[:,j,:\right]-\widetilde{\mathbf{% x}^{f}_{i,\perp^{c}}}\left[:,j+1,:\right]$		(12)
	$\displaystyle\mathbf{r}^{f,s}_{i,margin}$	$\displaystyle=\frac{1}{\Gamma^{\prime}-1}\sum_{j=1}^{\Gamma^{\prime}-1}% \widetilde{\mathbf{x}^{f}_{i,\perp^{s}}}\left[:,j,:\right]-\widetilde{\mathbf{% x}^{f}_{i,\perp^{s}}}\left[:,j+1,:\right],$		(12)

where $\mathbf{r}^{f,c}_{i,margin}\in\mathbb{R}^{K\times\delta_{\mathcal{E}}}$ is the connectivity margin and $\mathbf{r}^{f,s}_{i,margin}\in\mathbb{R}^{K\times\delta_{\mathcal{E}}}$ the consistency margin. Following this, the derived $\mathbf{r}^{f,c}_{margin}$ and $\mathbf{r}^{f,s}_{margin}$ are flattened and, together with the flattened route embedding $\widetilde{\mathbf{x}^{f}_{i}}\in\mathbb{R}^{K\cdot\Gamma^{\prime}\cdot\delta_% {\mathcal{E}}}$ and route direction embedding $\widetilde{\mathbf{x}^{f,d}_{i}}\in\mathbb{R}^{K\cdot\Gamma^{\prime}\cdot% \delta_{\mathcal{E}}}$ , used to compute the new rank:

\mathrm{Pr}\left(\widetilde{R}\right)=\mathrm{Softmax}\left(\mathrm{MLP}_{r}% \left(\mathbf{r}^{c}_{margin}\text{ }\|\text{ }\mathbf{r}^{s}_{margin}\text{ }% \|\text{ }\mathrm{MLP}_{f}\left(\widetilde{\mathbf{x}^{f}_{i}}\text{ }\|\text{% }\widetilde{\mathbf{x}^{f,d}_{i}}\right)\right)\right),

(13)

where $\mathrm{Pr}(\widetilde{R})\in\mathbb{R}^{K}$ denotes the probability distribution over the $K$ predicted future routes being the actual future routes. Based on this probability, we can determine the new predicted rank, resulting in reranked future route predictions denoted as $\left\{\widehat{x^{f}_{i,k}}\right\}_{k=1}^{K}$ . We also adopt the cross-entropy loss for the Rank Refinement Module:

\mathcal{L}_{rank}=-\log\mathrm{Pr}\left(\widetilde{R}\right)\left[x^{f}_{i}% \right],

(14)

Note that samples from the set of multiple predicted future routes are excluded if they do not contain a ground truth future route. While our KG facilitates an effective overall selection (i.e., top- $K$ ) of future routes, it is crucial to notice that an enhancement in the top- $K$ predictions does not necessarily translate to a superior top- $1$ prediction. Therefore, we directly refine the top- $1$ prediction using the initial predictions in our implementation. This is done by employing a MLP to encode the initial prediction embeddings $\mathbf{x}^{f}_{i}$ and $\mathbf{x}^{f,d}_{i}$ , along with the last observed link $\mathbf{e}^{\Gamma}_{i}$ and $\mathbf{e}^{d,\Gamma}_{i}$ , and estimated goal direction $\mathbf{r}^{d}_{i}$ :

\widetilde{x^{f}_{i}}=\mathrm{MLP}_{k}\left(\mathbf{e}^{\Gamma}_{i}\text{ }\|% \text{ }\mathbf{e}^{d,\Gamma}_{i}\text{ }\|\text{ }\mathbf{r}^{d}_{i}\text{ }% \|\text{ }\mathrm{MLP}_{x}\left(\mathbf{x}^{f}_{i}\text{ }\|\text{ }\mathbf{x}% ^{f,d}_{i}\right)\right),

(15)

where $\widetilde{x^{f}_{i}}\in\mathbb{R}^{\Gamma^{\prime}\times|\mathcal{E}|}$ , is also optimized by minimizing the corresponding cross entropy loss. Subsequently, the top- $1$ prediction is generated using the Route Generation Module for the $n=1$ case. The generated future route can be inserted into the first position, replacing the original $K$ -th prediction, to obtain the refined top- $K$ future route predictions $\left\{\widehat{x^{f}_{i,k}}\right\}_{k=1}^{K}$ .

4.6 Multi-Objectives Optimization

The objective of RouteKG is to leverage the learned spatial relations to efficiently make future route predictions, which is done by optimizing multiple objectives. Specifically, the overall loss function could be written as:

\mathcal{L}=w_{rep}\cdot\mathcal{L}_{rep}+w_{d}\cdot\mathcal{L}_{d}+w_{pred}% \cdot\mathcal{L}_{pred}+w_{rank}\cdot\mathcal{L}_{rank},

(16)

where $w_{\cdot}$ are weights for different loss.

The complete learning procedure of RouteKG are detailed in Algorithm 2.

Input : A batch of observed routes

\mathcal{X}^{o}_{\mathcal{B}}\in\mathbb{R}^{\mathcal{B}\times\Gamma}

;

Road network

\mathbf{G}

;

Inter-road direction matrix

\mathbf{D}

;

NAE matrix

\mathbf{A}

;

1 Randomly initialize

\Theta_{kg}=\left\{\mathbf{W}_{\mathcal{E}},\mathbf{W}_{\mathcal{R}^{c}},% \mathbf{W}_{\mathcal{R}^{s}},\mathbf{W}_{\mathcal{R}^{a}},\mathbf{W}_{\mathcal% {R}^{d}},\mathbf{P}_{\mathcal{R}^{c}},\mathbf{P}_{\mathcal{R}^{s}},\mathbf{P}_% {\mathcal{R}^{a}},\mathbf{P}_{\mathcal{R}^{d}},\mathrm{MLP}_{d}\right\}

and

\Theta_{r}=\left\{\mathrm{MLP}_{r},\mathrm{MLP}_{f},\mathrm{MLP}_{k},\mathrm{% MLP}_{x}\right\}

2 for $m=1,\dots,max\_iterations$ do

3 // Early stopping here.

4 Normalize embeddings of hyperplanes

\left\|\mathbf{P}_{\mathcal{R}^{\cdot}}\right\|_{2}=1

5 // Forward Propagation.

\mathrm{Pr}(\widetilde{\mathcal{X}^{f}}),\mathcal{R}\leftarrow\mathcal{M}_{kg}% \left(\mathcal{X}^{o},\mathbf{G},\mathbf{D};\Theta_{kg}\right)

\left\{\widetilde{\mathcal{X}^{f}_{k}}\right\}_{k=1}^{K}\leftarrow\mathcal{M}_% {g}\left(\mathrm{Pr}(\widetilde{\mathcal{X}^{f}}),\mathbf{G},\mathbf{A}\right)

\left\{\widehat{\mathcal{X}^{f}_{k}}\right\}_{k=1}^{K}\leftarrow\mathcal{M}_{r% }\left(\left\{\widetilde{\mathcal{X}^{f}_{k}}\right\}_{k=1}^{K},\mathcal{R};% \Theta_{r}\right)

9 // Back Propagation.

\Theta_{kg}\leftarrow\Theta_{kg}-\nabla_{\Theta_{kg}}\left\{\mathcal{L}_{rep}+% \mathcal{L}_{d}+\mathcal{L}_{pred}\right\}

\Theta_{r}\leftarrow\Theta_{r}-\nabla_{\Theta_{r}}\left\{\mathcal{L}_{rank}\right\}

13 end for

Algorithm 2 RouteKG algorithm.

5 Experiments

5.1 Data

We conduct experiments on taxi trajectory data obtained from two cities in China: Chengdu and Shanghai. The Chengdu dataset was acquired from the Didi Chuxing GAIA Initiative¹¹1https://gaia.didichuxing.com. It contains the records of 143,888 drivers, covering a month of data from November 1, 2016, to November 30, 2016, with an average sampling rate of 2 4 seconds. The selected region in Chengdu spans from 30.65°N to 30.73°N in latitude and 104.04°E to 104.13°E in longitude, with the region’s road network comprising 2,832 nodes and 6,506 edges. The Chengdu data record incorporates driver ID, order ID, timestamp, longitude, and latitude. This study used the first seven days of Chengdu’s data.

The Shanghai dataset consists of trajectory records from 10,609 taxis from April 16, 2015, to April 21, 2015, with an average sampling rate of approximately 10 seconds per record. We concentrated on a specific region in Shanghai, adhering to the parameters outlined by (Zhao & Liang 2023). The chosen region’s road network incorporates 320 nodes and 714 links. Each data entry includes the taxi ID, date, time, longitude, latitude, and an occupied flag indicator.

For data preprocessing, we initially employed a fast map-matching algorithm (Yang & Gidofalvi 2018) to convert GPS traces into routes on the respective road network. We then cleaned the data, eliminating routes that contained loops and those with too few links (i.e., less than ten links). Subsequently, the refined Chengdu dataset contained 93,125 routes, while the Shanghai dataset comprised 24,468 routes. The road networks of Chengdu and Shanghai are visually represented in Figure 6, and the key network statistics are summarized in Table 2.

Table 2: Summary statistics of road networks in Chengdu and Shanghai, where ID & OD refer to in-degree and out-degree, respectively.

	Nodes	Edges	MID / MOD	Max ID/OD	Min ID/OD	Density
Chengdu	2832	6506	2.297	4	0	8.11e-4
Shanghai	320	714	2.231	4	1	6.99e-3

5.2 Baseline Methods

In this study, we compare our approach with several established baselines to evaluate performance. These baselines include:

•

Markov: The Markov model is a well-known sequential prediction method extensively used in the field. It bases its route forecasting on observed transition patterns between road links.
•

Dijkstra (Dijkstra 1959): Dijkstra’s algorithm is a prominent method for finding the shortest paths between nodes in a graph, where the path length is assumed to be the sum of link length to reflect realistic geographic distance. This baseline can only work when the exact goal location is given.
•

RNN (Rumelhart et al. 1986): The RNN is an artificial neural network that recognizes patterns in sequential data. It accomplishes this by utilizing internal memory to process arbitrary sequences of inputs, making it effective for predicting future routes.
•

GRU (Cho et al. 2014): The GRU is a type of RNNs that utilizes gating mechanisms to capture long-term dependencies in the data, thereby improving the model’s predictive capabilities. This is especially beneficial for applications in trajectory prediction, where long-term dependencies play a crucial role.
•

LSTM (Hochreiter & Schmidhuber 1997): The LSTM is another variant of RNNs that addresses the problem of learning long-term dependencies in data. It does this by implementing a special architecture consisting of a series of memory cells, which effectively control the flow of information, making LSTMs advantageous for predicting future routes.
•

NetTraj (Liang & Zhao 2021): NetTraj is an advanced network-based trajectory prediction model specifically designed for predicting future movements in road networks. It leverages the inherent structure of road networks and utilizes historical trajectory data for accurate predictions. By integrating the Graph Attention Network (GAT) with LSTM, NetTraj offers a robust framework for making future trajectory predictions.
•

RCM-BC (Zhao & Liang 2023): The Route Choice Model-Behavioral Cloning (RCM-BC) is a behavioral cloning approach designed for route choice modeling in sequential decision-making scenarios. It employs supervised learning to create a policy that maps states to actions based on observed behavior to predict future routes. This baseline also requires the knowledge of the exact goal location to work.

Table 3: Performance comparison of different methods.

Main Results	Chengdu									Shanghai
NoGoal	Link-level			Route-level						Link-level			Route-level
NoGoal	R@1	R@5	R@10	R@1	R@5	R@10	M@1	M@5	M@10	R@1	R@5	R@10	R@1	R@5	R@10	M@1	M@5	M@10
Markov	0.696	0.698	0.699	0.466	0.468	0.468	0.466	0.466	0.467	0.633	0.634	0.635	0.448	0.448	0.449	0.448	0.448	0.448
RNN	0.812	0.878	0.914	0.665	0.840	0.888	0.665	0.733	0.739	0.709	0.789	0.837	0.542	0.719	0.783	0.542	0.605	0.613
GRU	0.799	0.864	0.902	0.650	0.825	0.872	0.650	0.718	0.724	0.709	0.790	0.839	0.544	0.718	0.785	0.544	0.605	0.614
LSTM	0.803	0.868	0.905	0.656	0.828	0.877	0.656	0.723	0.729	0.700	0.779	0.831	0.537	0.706	0.775	0.537	0.596	0.605
NetTraj	0.809	0.874	0.909	0.662	0.836	0.882	0.662	0.730	0.735	0.709	0.788	0.836	0.547	0.717	0.781	0.547	0.606	0.615
RouteKG	0.841	0.940	0.968	0.696	0.885	0.931	0.696	0.762	0.768	0.724	0.865	0.909	0.563	0.762	0.831	0.563	0.624	0.634
GoalD	Link-level			Route-level						Link-level			Route-level
GoalD	R@1	R@5	R@10	R@1	R@5	R@10	M@1	M@5	M@10	R@1	R@5	R@10	R@1	R@5	R@10	M@1	M@5	M@10
RNN	0.853	0.911	0.939	0.718	0.881	0.918	0.718	0.783	0.787	0.777	0.850	0.893	0.621	0.788	0.849	0.621	0.683	0.691
GRU	0.843	0.902	0.931	0.708	0.868	0.908	0.708	0.772	0.776	0.780	0.852	0.890	0.625	0.791	0.844	0.625	0.687	0.693
LSTM	0.852	0.912	0.936	0.727	0.882	0.914	0.727	0.788	0.792	0.794	0.859	0.893	0.650	0.801	0.848	0.650	0.706	0.712
NetTraj	0.868	0.923	0.949	0.741	0.896	0.929	0.741	0.802	0.806	0.803	0.871	0.906	0.656	0.816	0.865	0.656	0.715	0.722
RouteKG	0.916	0.978	0.988	0.815	0.953	0.974	0.815	0.866	0.869	0.843	0.946	0.963	0.723	0.894	0.918	0.723	0.780	0.784
Goal	Link-level			Route-level						Link-level			Route-level
Goal	R@1	R@5	R@10	R@1	R@5	R@10	M@1	M@5	M@10	R@1	R@5	R@10	R@1	R@5	R@10	M@1	M@5	M@10
Dijkstra	0.737	–	–	0.715	–	–	–	–	–	0.724	–	–	0.703	–	–	–	–	–
RNN	0.866	0.916	0.941	0.755	0.892	0.924	0.755	0.808	0.812	0.862	0.902	0.926	0.761	0.861	0.892	0.761	0.799	0.803
GRU	0.858	0.912	0.939	0.736	0.883	0.919	0.736	0.794	0.798	0.859	0.900	0.921	0.755	0.861	0.887	0.755	0.796	0.799
LSTM	0.872	0.912	0.938	0.782	0.888	0.920	0.782	0.823	0.826	0.878	0.908	0.929	0.804	0.877	0.904	0.804	0.830	0.833
NetTraj	0.876	0.918	0.941	0.782	0.894	0.923	0.782	0.825	0.829	0.883	0.919	0.938	0.790	0.884	0.910	0.790	0.826	0.829
RCM-BC	0.784	0.933	0.956	0.669	0.880	0.918	0.669	0.754	0.760	0.827	0.936	0.954	0.748	0.908	0.934	0.748	0.817	0.820
RouteKG	0.974	0.991	0.995	0.958	0.983	0.988	0.958	0.967	0.968	0.945	0.979	0.984	0.915	0.959	0.969	0.915	0.932	0.933

5.3 Main Results

5.3.1 Experimental Settings

To comprehensively assess model performance, we design experiments based on the three subproblems as defined in Section 3.1: (1) route prediction with unknown goal $\mathcal{F}_{1}$ , (2) route prediction with goal direction only $\mathcal{F}_{2}$ , and (3) route prediction with complete goal information $\mathcal{F}_{3}$ . We refer to these three subproblems as NoGoal, GoalD, and Goal. They reflect varying degrees of information availability regarding the road user’s intended destination, and represent a broad range of real-world application scenarios. For instance, a system might not know the user’s exact destination due to privacy concerns but could have access to more general information, such as the goal direction.

Most of the baseline models are designed for the NoGoal scenario, but two of them (Dikstra and RCM-BC) are for the Goal scenario only. Unlike these baselines, RouteKG requires the goal direction information. Therefore, specific model implementations are needed to incorporate the available goal information into different models under different scenarios. Under the NoGoal scenario, the goal direction is unknown, but we can still estimate it based on the observed route. Consequently, the estimated goal direction is used in RouteKG under NoGoal. Under GoalD, the actual goal direction is used instead of the estimated one in RouteKG. For other deep learning baseline models (except for RCM-BC), we concatenate the embedding of goal directions with the respective model’s inputs. Similarly, under the Goal scenario, the same concatenation strategy can be used for the baseline models, enriching them with complete goal information. In RouteKG, we add the embedding of the goal location directly to the embedding of the last link in the observed route.

In our main experiments, the input observed route length is set as $\Gamma=10$ and output future route length as $\Gamma^{\prime}=5$ . For model evaluation, the datasets are partitioned into training, validation, and test subsets in a 6:2:2 ratio. Different models are evaluated under the NoGoal, GoalD, and Goal scenarios, using both the “link-level” and “route-level” metrics. Link-level assessment has practical implications, particularly for tasks related to traffic flows, while route-level evaluation offers valuable information for routing applications. Specifically, we utilize Recall and Mean Reciprocal Rank (MRR), two prevalent metrics. Recall measures the ratio of relevant items retrieved from all relevant items, indicating the system’s capacity to fetch desired information. MRR, on the other hand, evaluates the rank position of the correct answer, computing the average reciprocal rank of the highest-ranked correct answer across queries. A higher MRR signifies superior performance. These metrics provide insights into model effectiveness and ranking quality and are useful tools for assessing and enhancing system performance.

We consider the top- $k$ predictions for the $i$ -th observed route $\left\{\widehat{x^{f}_{i,k}}\right\}_{k=1}^{K}$ = $\left\{\left\{\widehat{e_{i,k}^{j}}\right\}_{j=\Gamma+1}^{\Gamma+\Gamma^{% \prime}}\right\}_{k=1}^{K}$ and the actual $i$ -th future route $x_{i}^{f}=\left\{e_{i}^{j}\right\}_{j=\Gamma+1}^{\Gamma+\Gamma^{\prime}}$ to define our evaluation metrics. The link-level recall R@K is defined as

R@K=\frac{1}{|\mathcal{X}|}\sum_{i=1}^{|\mathcal{X}|}\max_{k=1}^{K}\left[\frac% {1}{\Gamma^{\prime}}\sum_{j=\Gamma+1}^{\Gamma+\Gamma^{\prime}}\mathbb{I}\left(% \widehat{e_{i,k}^{j}}=e_{i}^{j}\right)\right].

(17)

where $\mathbb{I}(\cdot)$ is the indicator function: $\mathbb{I}(a=b)=\begin{cases}1&\text{if }a=b\\ 0&\text{otherwise}\end{cases}$ .

Similarly, the route-level recall R@K is defined as

R@K=\frac{1}{|\mathcal{X}|}\sum_{i=1}^{|\mathcal{X}|}\max_{k=1}^{K}\left[% \mathbb{I}\left(\sum_{j=\Gamma+1}^{\Gamma+\Gamma^{\prime}}\mathbb{I}\left(% \widehat{e_{i,k}^{j}}=e_{i}^{j}\right),\Gamma^{\prime}\right)\right].

(18)

We also compute the route-level MRR of the top- $k$ predictions, M@K, as follows:

M@K=\frac{1}{|\mathcal{X}|}\sum_{i=1}^{|\mathcal{X}|}\sum_{k=1}^{K}\frac{1}{k}% \left[\mathbb{I}\left(\sum_{j=\Gamma+1}^{\Gamma+\Gamma^{\prime}}\mathbb{I}% \left(\widehat{e_{i,k}^{j}}=e_{i}^{j}\right),\Gamma^{\prime}\right)\right].

(19)

The experiments are conducted on an Ubuntu server leveraging the Python 3.6 environment. The deep learning computations are performed using the PyTorch framework. The server’s hardware specifications include an Intel(R) Xeon(R) Platinum 8375C CPU with a clock speed of 2.90GHz, coupled with 8 NVIDIA GeForce RTX 3090 GPUs, each featuring 24GB of memory. To ensure the robustness and generalizability of our model, hyperparameters are tuned based on the performance of the validation set. The fine-tuning of these hyperparameters is crucial for balancing the bias-variance trade-off and optimizing the model’s performance. All hyperparameters are listed in C.

5.3.2 Main Results Analysis

Table reftab:mainresults shows a comparison of the accuracy of the different methods in predicting future routes on two real-world datasets under three scenarios. Overall, our proposed RouteKG model achieves optimal accuracy on all metrics. It is observed that for all models, the route-level prediction accuracy is lower than the link-level prediction accuracy. This underscores the importance of modeling the consistency between different road links. Comparing the different models, we can find that Deep Learning-based methods achieve higher accuracy in general. This aligns with empirical findings and is further enhanced with the integration of additional information. For instance, models like NetTraj and RouteKG, which incorporate spatial data, outperform simpler models like RNN and its variants. Remarkably, RouteKG outperforms the NetTraj model and other baselines, even without extra information, which highlights the effectiveness of our approach to integrate KG for future route prediction. Comparing under different experimental settings, intuitively, introducing more Goal information progressively improves overall accuracy. In particular, RouteKG’s prediction accuracy is greatly improved after the gradual incorporation of Goal’s information, and at the same time, it also has an accuracy improvement of about 5.41% in comparison with the optimal baseline under the NoGoal condition, which proves the effectiveness of RouteKG in processing and utilizing Goal information, and its applicability under various conditions. Lastly, comparing between different datasets, the prediction accuracy on the chengdu dataset is better than that on the Shanghai dataset, likely attributed to the larger data volume of the former.

To provide intuitive understanding about the prediction results, we show two qualitative example RouteKG outputs under NoGoal. Figure 7 illustrates, for each example, the last observed link alongside the estimated direction and top-3 predictions. Although certain future routes may display peculiar turns due to constraints imposed by the road network, most predicted future routes exhibit a correct heading based on the estimated goal direction. An important observation from the first example is the misalignment between the links adjacent to the last observed link and the predicted direction. Consequently, the predicted road link initially are constrained on the road network. However, as the prediction progresses, subsequent steps are adjusted to align with the route’s predicted direction.

5.4 Ablation Analysis

This section analyzes the results of the ablation experiments. Specifically, we focus on the analysis performed on RouteKG under GoalD. By conducting these ablation experiments, we can gain insight into the importance of each component of the model and its contribution to the overall predictive capabilities of the model.

Figure 8 compares the performance of RouteKG with its two ablation variants. Notably, RouteKG w/o rerank removes the Rank Refinement Module. Experimental results show that removing this module significantly reduces prediction performance. This suggests the interconnected nature of link choices, emphasizing the need for a module to model route consistency and choice correlation. This highlights the module’s indispensability. Remarkably, even without reranking, RouteKG still outperforms most benchmark methods, particularly in the top 5 and top 10 predictions. This demonstrates RouteKG’s efficacy in identifying potential future routes, reinforcing the importance of integrating the ranking refinement module for enhancing top 1 predictions. In summary, while RouteKG effectively generates future route sets, incorporating a reranking module is crucial for accurately prioritizing the top 1,000 dollar predictions.

RouteKG w/o relation denotes the RouteKG model removes the KG representation learning. The observed performance drop in this variant is less pronounced compared to the removal of the Rank Refinement Module. This indicates that although KG representation learning is beneficial to the route prediction process, it acts more as an auxiliary component. The substantial effectiveness of using KGC alone in predicting future routes underscores the suitability of approaching future route prediction as a KGC problem.

5.5 Sensitivity Analysis

In this section, we perform a sensitivity analysis to assess the robustness and reliability of our model under various parameter settings.

We investigate the model’s performance under varying lengths of future routes to be predicted, denoted as $\Gamma^{\prime}$ . As depicted in Figure 9, by altering $\Gamma^{\prime}$ from 2 to 8, there is a noticeable trend of declining performance with increasing $\Gamma^{\prime}$ values. This indicates that predicting longer routes becomes progressively challenging due to an expanded candidate space and heightened uncertainty, particularly when lacking Goal information.

5.6 Efficiency Analysis

Efficient route prediction in transportation systems is paramount, necessitating prompt response for system operators and road users. To assess the model efficiency, we analyze the inference time of various models in two datasets, aiming to ascertain real-time performance capabilities. This allows us to identify the most efficient and responsive models, which is crucial for ensuring smooth user experiences and effective traffic management.

Figure 10 delineates the inference times across various models. Note that the results from Dijkstra and RCM-BC are omitted due to their overly high inference times. All baseline utilize the Spanning Route algorithm to generate future routes except the Markov model. The Markov model leverages pre-computed transition probabilities to sample and generate the top- $k$ predictions through $k$ iterations.

RouteKG demonstrates remarkable efficiency, achieving average inference times of 598.01ms and 244.47ms for every 10k requests on the Chengdu and Shanghai datasets, respectively, with standard deviations of 1.21ms and 19.35ms. In contrast, the Dijkstra model, based on dynamic programming, takes over 38 seconds, and the RCM-BC model exceeds 1000 seconds, rendering them impractical for real-time systems. Models utilizing the Spanning Route algorithm (e.g., RNN, GRU, LSTM, NetTraj, RouteKG) show superior inference times, with less than 400 milliseconds in Chengdu and 250 milliseconds in Shanghai per 10k requests. RouteKG exhibits a marginally higher inference time, likely attributable to the reranking process. Nonetheless, these results highlight RouteKG’s suitability for real-time traffic applications, where rapid processing is essential for efficient transportation systems.

5.7 Case Study: Traffic Flow Estimation

In this section, we conduct a case study on traffic flow estimation to demonstrate the practical use cases of RouteKG, utilizing its potential to generate future routes for accurate traffic flow estimations. This can offer key insights into traffic pattern dynamics and enhancing the reliability of traffic flow predictions. Specifically, we adopt a sampling-based method for generating future routes to maximize the utility of the top- $k$ future routes predictions. Initially, the top- $k$ predictions are converted to a probability distribution using temperature scaling (Guo et al. 2017). Subsequently, we sample from the predicted top- $k$ future routes for each observed trajectory based on their probability distribution. The estimated link-level traffic flows are then obtained by aggregating the number of predicted future routes at the link level. To counter the effects of uncertainty, we iterate the experiments ten times and reported traffic flow estimation results in a mean±std format, focusing solely on the top-10 predictions for simplicity.

Table 4: Traffic flow estimation results on Chengdu and Shanghai dataset. (mean ± std)

Traffic Flow	Chengdu			Shanghai
NoGoal	MAE	RMSE	$\mathrm{R}^{2}$	MAE	RMSE	$\mathrm{R}^{2}$
Markov	7.849 ± 0.003	26.104 ± 0.010	0.820 ± 0.000	20.758 ± 0.012	42.040 ± 0.021	0.716 ± 0.000
RNN	3.774 ± 0.016	12.018 ± 0.109	0.962 ± 0.001	12.062 ± 0.167	22.711 ± 0.183	0.917 ± 0.001
GRU	3.974 ± 0.014	12.180 ± 0.115	0.961 ± 0.001	12.453 ± 0.153	23.674 ± 0.207	0.910 ± 0.002
LSTM	3.961 ± 0.008	12.517 ± 0.100	0.959 ± 0.001	12.705 ± 0.115	23.980 ± 0.137	0.908 ± 0.001
NetTraj	3.777 ± 0.014	11.896 ± 0.121	0.963 ± 0.001	12.333 ± 0.137	23.030 ± 0.201	0.915 ± 0.001
RouteKG	2.464 ± 0.030	6.725 ± 0.116	0.988 ± 0.000	8.178 ± 0.200	15.330 ± 0.537	0.962 ± 0.003
GoalD	MAE	RMSE	$\mathrm{R}^{2}$	MAE	RMSE	$\mathrm{R}^{2}$
RNN	3.121 ± 0.010	11.034 ± 0.114	0.968 ± 0.001	9.520 ± 0.136	17.287 ± 0.182	0.952 ± 0.001
GRU	3.226 ± 0.013	11.051 ± 0.120	0.968 ± 0.001	9.108 ± 0.153	16.400 ± 0.219	0.957 ± 0.001
LSTM	3.111 ± 0.009	10.995 ± 0.110	0.968 ± 0.001	8.390 ± 0.167	14.731 ± 0.277	0.965 ± 0.001
NetTraj	2.948 ± 0.003	10.796 ± 0.130	0.969 ± 0.001	8.006 ± 0.176	14.311 ± 0.307	0.967 ± 0.001
RouteKG	1.688 ± 0.032	6.237 ± 0.161	0.990 ± 0.001	4.682 ± 0.091	7.237 ± 0.284	0.992 ± 0.001
Goal	MAE	RMSE	$\mathrm{R}^{2}$	MAE	RMSE	$\mathrm{R}^{2}$
Dijkstra	4.386	17.146	0.922	12.655	29.711	0.858
RNN	2.988 ± 0.007	10.084 ± 0.170	0.973 ± 0.001	7.200 ± 0.144	14.232 ± 0.248	0.967 ± 0.001
GRU	3.055 ± 0.011	10.161 ± 0.169	0.973 ± 0.001	7.100 ± 0.127	13.496 ± 0.237	0.971 ± 0.001
LSTM	2.962 ± 0.014	10.167 ± 0.156	0.973 ± 0.001	6.903 ± 0.117	13.703 ± 0.187	0.970 ± 0.001
NetTraj	2.899 ± 0.004	9.970 ± 0.175	0.974 ± 0.001	6.669 ± 0.126	13.604 ± 0.231	0.970 ± 0.001
RCM-BC	3.299 ± 0.036	7.923 ± 0.037	0.980 ± 0.000	4.993 ± 0.136	8.701 ± 0.049	0.988 ± 0.000
RouteKG	1.012 ± 0.016	3.168 ± 0.096	0.997 ± 0.000	3.604 ± 0.088	6.340 ± 0.151	0.994 ± 0.000

The effectiveness of traffic flow estimation with RouteKG is demonstrated using three standard regression metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination ( $\mathrm{R}^{2}$ ), as detailed in Table 4. RouteKG consistently outperforms in all metrics for both datasets, aligning with our main experiment results in Section 5.3. Notably, incorporating more Goal information leads to improved accuracy in traffic flow predictions, highlighting the strength of our approach.

In particular, RouteKG’s performance in the NoGoal scenario significantly surpasses the baseline for both datasets, suggesting that our method of estimating moving directions and leveraging KGC is more effective than current state-of-the-art (SOTA) modeling methods. Quantitatively, it reduces MAE, RMSE, and $\mathrm{R}^{2}$ by 34.7%, 43.5%, and 2.6% respectively, compared to the best baseline. Under the GoalD scenario, performance increases notably, indicating potential for future refinement in modeling future directions. Importantly, RouteKG’s enhancements in traffic flow estimation, especially when including the actual future direction, are more significant than those of the baselines. This reaffirms RouteKG’s advanced integration of direction information in the KGC problem. With actual goal information incorporated, RouteKG achieves an MAE of 1, RMSE of 3, and 99.7% in $\mathrm{R}^{2}$ , underlining its efficacy and promise for practical applications.

To summarize, these results suggest that RouteKG could also be an effective tool for traffic flow estimation, offering accurate and rapid analysis essential for real-time traffic management.

6 Conclusion

This research presents RouteKG, a novel knowledge graph (KG) framework for short-term route prediction on road networks. It treats route prediction as a knowledge graph completion (KGC) problem. The framework constructs a KG based on the road network to facilitate KG representation learning, which is designed to capture spatial relations that are essential for various urban routing tasks. Through KGC, the learned relations can be further utilized for future route prediction. The devised Spanning Route algorithm allows for the efficient generation of multiple possible future routes, while a Rank Refinement Module is integrated to further leverage learned spatial relations to rerank the initial predictions, thereby achieving more accurate route prediction results.

RouteKG is evaluated using taxi trajectory data from Chengdu and Shanghai. The evaluation considers three practical scenarios with different levels of goal information availability: NoGoal, GoalD, and Goal. The experiment results show that the proposed RouteKG consistently outperforms the baseline methods based on various evaluation metrics. Additionally, the model efficiency analysis highlights that route predictions can be generated in less than 500ms per 10k requests, largely thanks to the Spanning Route algorithm, which validates the suitability of RouteKG for real-time traffic applications. To demonstrate the applicability of RouteKG beyond routing tasks, we utilize it to estimate link-level traffic flows, achieving an $\mathrm{R}^{2}$ value of 0.997 under the Goal scenario. This could provide valuable insights for future designs of intelligent transportation systems.

Future research can extend this work in several ways. First, incorporating other spatial relations (e.g., function zones, spatial regions, etc.) with urban and road network attributes can augment the scalability and generalizability of the model. This would enable the model to provide high-performance feedback for multi-functional intelligent transportation services rapidly, adapting to different tasks promptly. Second, future work can potentially enhance the Spanning Route algorithm by integrating an n-ary tree pruning approach, offering a solution to model complexity increases exponentially with route prediction length. The optimized algorithm is anticipated to offer superior scalability and more efficient future route generation with reduced computational resources. Last but not least, future research might delve deeper into harnessing KGs for broader urban applications, such as employing KGs to integrate diverse datasets and learn interrelationships amongst them. For instance, discerning correlations between traffic patterns and population demographics might empower urban planners to envisage and anticipate the implications of varied urban development strategies.

Acknowledgment

This research is supported by the National Natural Science Foundation of China (NSFC42201502) and Seed Funding for Strategic Interdisciplinary Research Scheme at the University of Hong Kong (URC102010057).

References

(1)
Abbas et al. (2020) Abbas, M. T., Jibran, M. A., Afaq, M. & Song, W.-C. (2020), ‘An adaptive approach to vehicle trajectory prediction using multimodel kalman filter’, Transactions on Emerging Telecommunications Technologies 31(5), e3734.
Alahi et al. (2016) Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L. & Savarese, S. (2016), Social lstm: Human trajectory prediction in crowded spaces, in ‘Proceedings of the IEEE conference on computer vision and pattern recognition’, pp. 961–971.
Bach et al. (2017) Bach, S. H., Broecheler, M., Huang, B. & Getoor, L. (2017), ‘Hinge-loss markov random fields and probabilistic soft logic’.
Bordes et al. (2013) Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J. & Yakhnenko, O. (2013), ‘Translating embeddings for modeling multi-relational data’, Advances in neural information processing systems 26.
Chen, Zhang, Qian & Li (2022) Chen, T., Zhang, Y., Qian, X. & Li, J. (2022), ‘A knowledge graph-based method for epidemic contact tracing in public transportation’, Transportation Research Part C: Emerging Technologies 137, 103587.
Chen, Zhang, Sun & Zheng (2022) Chen, Y., Zhang, H., Sun, W. & Zheng, B. (2022), ‘Rntrajrec: Road network enhanced trajectory recovery with spatial-temporal transformer’, arXiv preprint arXiv:2211.13234 .
Chen et al. (2020) Chen, Z., Wang, Y., Zhao, B., Cheng, J., Zhao, X. & Duan, Z. (2020), ‘Knowledge graph completion: A review’, Ieee Access 8, 192435–192456.
Chi et al. (2022) Chi, H., Wang, B., Ge, Q. & Huo, G. (2022), ‘Knowledge graph-based enhanced transformer for metro individual travel destination prediction’, Journal of Advanced Transportation 2022.
Cho et al. (2014) Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. & Bengio, Y. (2014), ‘Learning phrase representations using rnn encoder-decoder for statistical machine translation’, arXiv preprint arXiv:1406.1078 .
Chrastil & Warren (2015) Chrastil, E. R. & Warren, W. H. (2015), ‘Active and passive spatial learning in human navigation: acquisition of graph knowledge.’, Journal of experimental psychology: learning, memory, and cognition 41(4), 1162.
Dendorfer et al. (2020) Dendorfer, P., Osep, A. & Leal-Taixé, L. (2020), Goal-gan: Multimodal trajectory prediction based on goal position estimation, in ‘Proceedings of the Asian Conference on Computer Vision’.
Dijkstra (1959) Dijkstra, E. W. (1959), ‘A note on two problems in connexion with graphs’, Numerische Mathematik 1, 269–271.
Etienne & Jeffery (2004) Etienne, A. S. & Jeffery, K. J. (2004), ‘Path integration in mammals’, Hippocampus 14(2), 180–192.
Fu & Lee (2020) Fu, T.-Y. & Lee, W.-C. (2020), ‘Trembr: Exploring road networks for trajectory representation learning’, ACM Transactions on Intelligent Systems and Technology (TIST) 11(1), 1–25.
Gu et al. (2021) Gu, J., Sun, C. & Zhao, H. (2021), Densetnt: End-to-end trajectory prediction from dense goal sets, in ‘Proceedings of the IEEE/CVF International Conference on Computer Vision’, pp. 15303–15312.
Gu et al. (2022) Gu, T., Chen, G., Li, J., Lin, C., Rao, Y., Zhou, J. & Lu, J. (2022), Stochastic trajectory prediction via motion indeterminacy diffusion, in ‘Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition’, pp. 17113–17122.
Guo et al. (2017) Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. (2017), On calibration of modern neural networks, in ‘International conference on machine learning’, PMLR, pp. 1321–1330.
Gupta et al. (2018) Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S. & Alahi, A. (2018), Social gan: Socially acceptable trajectories with generative adversarial networks, in ‘Proceedings of the IEEE conference on computer vision and pattern recognition’, pp. 2255–2264.
Hamilton et al. (2017) Hamilton, W., Ying, Z. & Leskovec, J. (2017), ‘Inductive representation learning on large graphs’, Advances in neural information processing systems 30.
Hart et al. (1968) Hart, P. E., Nilsson, N. J. & Raphael, B. (1968), ‘A formal basis for the heuristic determination of minimum cost paths’, IEEE transactions on Systems Science and Cybernetics 4(2), 100–107.
Helbing & Molnar (1995) Helbing, D. & Molnar, P. (1995), ‘Social force model for pedestrian dynamics’, Physical review E 51(5), 4282.
Ho et al. (2020) Ho, J., Jain, A. & Abbeel, P. (2020), ‘Denoising diffusion probabilistic models’, Advances in Neural Information Processing Systems 33, 6840–6851.
Hochreiter & Schmidhuber (1997) Hochreiter, S. & Schmidhuber, J. (1997), ‘Long short-term memory’, Neural computation 9(8), 1735–1780.
Huang et al. (2019) Huang, X., Zhang, J., Li, D. & Li, P. (2019), Knowledge graph embedding based question answering, in ‘Proceedings of the twelfth ACM international conference on web search and data mining’, pp. 105–113.
Huang et al. (2022) Huang, Y., Du, J., Yang, Z., Zhou, Z., Zhang, L. & Chen, H. (2022), ‘A survey on trajectory-prediction methods for autonomous driving’, IEEE Transactions on Intelligent Vehicles 7(3), 652–674.
Ji et al. (2015) Ji, G., He, S., Xu, L., Liu, K. & Zhao, J. (2015), Knowledge graph embedding via dynamic mapping matrix, in ‘Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers)’, pp. 687–696.
Kong et al. (2017) Kong, X., Xia, F., Wang, J., Rahim, A. & Das, S. K. (2017), ‘Time-location-relationship combined service recommendation based on taxi trajectory data’, IEEE Transactions on Industrial Informatics 13(3), 1202–1212.
Lefèvre et al. (2014) Lefèvre, S., Vasquez, D. & Laugier, C. (2014), ‘A survey on motion prediction and risk assessment for intelligent vehicles’, ROBOMECH journal 1(1), 1–14.
Li et al. (2022) Li, G., Chen, Y., Liao, Q. & He, Z. (2022), ‘Potential destination discovery for low predictability individuals based on knowledge graph’, Transportation Research Part C: Emerging Technologies 145, 103928.
Li et al. (2020) Li, L., Jiang, R., He, Z., Chen, X. M. & Zhou, X. (2020), ‘Trajectory data-based traffic flow studies: A revisit’, Transportation Research Part C: Emerging Technologies 114, 225–240.
Li et al. (2017) Li, Y., Yu, R., Shahabi, C. & Liu, Y. (2017), ‘Diffusion convolutional recurrent neural network: Data-driven traffic forecasting’, arXiv preprint arXiv:1707.01926 .
Liang & Zhao (2021) Liang, Y. & Zhao, Z. (2021), ‘Nettraj: A network-based vehicle trajectory prediction model with directional representation and spatiotemporal attention mechanisms’, IEEE Transactions on Intelligent Transportation Systems 23(9), 14470–14481.
Lin et al. (2015) Lin, Y., Liu, Z., Sun, M., Liu, Y. & Zhu, X. (2015), Learning entity and relation embeddings for knowledge graph completion, in ‘Proceedings of the AAAI conference on artificial intelligence’, Vol. 29.
Liu et al. (2021) Liu, C., Gao, C., Jin, D. & Li, Y. (2021), ‘Improving location recommendation with urban knowledge graph’, arXiv preprint arXiv:2111.01013 .
Liu et al. (2022) Liu, K., Ruan, S., Xu, Q., Long, C., Xiao, N., Hu, N., Yu, L. & Pan, S. J. (2022), Modeling trajectories with multi-task learning, in ‘2022 23rd IEEE International Conference on Mobile Data Management (MDM)’, IEEE, pp. 208–213.
Mo et al. (2023) Mo, B., Wang, Q., Guo, X., Winkenbach, M. & Zhao, J. (2023), ‘Predicting drivers’ route trajectories in last-mile delivery using a pair-wise attention-based pointer neural network’, Transportation Research Part E: Logistics and Transportation Review 175, 103168.
Nickel et al. (2011) Nickel, M., Tresp, V., Kriegel, H.-P. et al. (2011), A three-way model for collective learning on multi-relational data., in ‘Icml’, Vol. 11, pp. 3104482–3104584.
Noy et al. (2019) Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A. & Taylor, J. (2019), ‘Industry-scale knowledge graphs: Lessons and challenges: Five diverse technology companies show how it’s done’, Queue 17(2), 48–75.
Paravarzar & Mohammad (2020) Paravarzar, S. & Mohammad, B. (2020), ‘Motion prediction on self-driving cars: A review’, arXiv preprint arXiv:2011.03635 .
Paulheim (2017) Paulheim, H. (2017), ‘Knowledge graph refinement: A survey of approaches and evaluation methods’, Semantic web 8(3), 489–508.
Popescu et al. (2009) Popescu, M.-C., Balas, V. E., Perescu-Popescu, L. & Mastorakis, N. (2009), ‘Multilayer perceptron and neural networks’, WSEAS Transactions on Circuits and Systems 8(7), 579–588.
Prato (2009) Prato, C. G. (2009), ‘Route choice modeling: past, present and future research directions’, Journal of Choice Modelling 2(1), 65–100.
Rao et al. (2022) Rao, X., Chen, L., Liu, Y., Shang, S., Yao, B. & Han, P. (2022), Graph-flashback network for next location recommendation, in ‘Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining’, pp. 1463–1471.
Rathore et al. (2019) Rathore, P., Kumar, D., Rajasegarar, S., Palaniswami, M. & Bezdek, J. C. (2019), ‘A scalable framework for trajectory prediction’, IEEE Transactions on Intelligent Transportation Systems 20(10), 3860–3874.
Ren et al. (2021) Ren, H., Ruan, S., Li, Y., Bao, J., Meng, C., Li, R. & Zheng, Y. (2021), Mtrajrec: Map-constrained trajectory recovery via seq2seq multi-task learning, in ‘Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining’, pp. 1410–1419.
Richardson & Domingos (2006) Richardson, M. & Domingos, P. (2006), ‘Markov logic networks’, Machine learning 62, 107–136.
Rudenko et al. (2020) Rudenko, A., Palmieri, L., Herman, M., Kitani, K. M., Gavrila, D. M. & Arras, K. O. (2020), ‘Human motion trajectory prediction: A survey’, The International Journal of Robotics Research 39(8), 895–935.
Rumelhart et al. (1986) Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986), ‘Learning representations by back-propagating errors’, nature 323(6088), 533–536.
Sadeghian et al. (2019) Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H. & Savarese, S. (2019), Sophie: An attentive gan for predicting paths compliant to social and physical constraints, in ‘Proceedings of the IEEE/CVF conference on computer vision and pattern recognition’, pp. 1349–1358.
Schlichtkrull et al. (2018) Schlichtkrull, M., Kipf, T. N., Bloem, P., Van Den Berg, R., Titov, I. & Welling, M. (2018), Modeling relational data with graph convolutional networks, in ‘The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15’, Springer, pp. 593–607.
Shao et al. (2021) Shao, K., Wang, Y., Zhou, Z., Xie, X. & Wang, G. (2021), Trajforesee: How limited detailed trajectories enhance large-scale sparse information to predict vehicle trajectories?, in ‘2021 IEEE 37th International Conference on Data Engineering (ICDE)’, IEEE, pp. 2189–2194.
Sheth et al. (2019) Sheth, A., Padhee, S. & Gyrard, A. (2019), ‘Knowledge graphs and knowledge networks: the story in brief’, IEEE Internet Computing 23(4), 67–75.
Simmons et al. (2006) Simmons, R., Browning, B., Zhang, Y. & Sadekar, V. (2006), Learning to predict driver route and destination intent, in ‘2006 IEEE intelligent transportation systems conference’, IEEE, pp. 127–132.
Sutskever et al. (2014) Sutskever, I., Vinyals, O. & Le, Q. V. (2014), ‘Sequence to sequence learning with neural networks’, Advances in neural information processing systems 27.
Tan et al. (2021) Tan, J., Qiu, Q., Guo, W. & Li, T. (2021), ‘Research on the construction of a knowledge graph and knowledge reasoning model in the field of urban traffic’, Sustainability 13(6), 3191.
Tang et al. (2022) Tang, Y., He, J. & Zhao, Z. (2022), ‘Hgarn: Hierarchical graph attention recurrent network for human mobility prediction’, arXiv preprint arXiv:2210.07765 .
Tarjan (1972) Tarjan, R. (1972), ‘Depth-first search and linear graph algorithms’, SIAM journal on computing 1(2), 146–160.
Trouillon et al. (2016) Trouillon, T., Welbl, J., Riedel, S., Gaussier, É. & Bouchard, G. (2016), Complex embeddings for simple link prediction, in ‘International conference on machine learning’, PMLR, pp. 2071–2080.
Vashishth et al. (2019) Vashishth, S., Sanyal, S., Nitin, V. & Talukdar, P. (2019), ‘Composition-based multi-relational graph convolutional networks’, arXiv preprint arXiv:1911.03082 .
Wang et al. (2021) Wang, H., Yu, Q., Liu, Y., Jin, D. & Li, Y. (2021), ‘Spatio-temporal urban knowledge graph enabled mobility prediction’, Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 5(4), 1–24.
Wang et al. (2020) Wang, P., Liu, K., Jiang, L., Li, X. & Fu, Y. (2020), Incremental mobile user profiling: Reinforcement learning with spatial knowledge graph for modeling event streams, in ‘Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining’, pp. 853–861.
Wang et al. (2014) Wang, Z., Zhang, J., Feng, J. & Chen, Z. (2014), Knowledge graph embedding by translating on hyperplanes, in ‘Proceedings of the AAAI conference on artificial intelligence’, Vol. 28.
Xiong et al. (2017) Xiong, C., Power, R. & Callan, J. (2017), Explicit semantic ranking for academic search via knowledge graph embedding, in ‘Proceedings of the 26th international conference on world wide web’, pp. 1271–1279.
Yan et al. (2022) Yan, B., Zhao, G., Song, L., Yu, Y. & Dong, J. (2022), ‘Precln: Pretrained-based contrastive learning network for vehicle trajectory prediction’, World Wide Web pp. 1–23.
Yang et al. (2014) Yang, B., Yih, W.-t., He, X., Gao, J. & Deng, L. (2014), ‘Embedding entities and relations for learning and inference in knowledge bases’, arXiv preprint arXiv:1412.6575 .
Yang & Gidofalvi (2018) Yang, C. & Gidofalvi, G. (2018), ‘Fast map matching, an algorithm integrating hidden markov model with precomputation’, International Journal of Geographical Information Science 32(3), 547 – 570.
Ye et al. (2016) Ye, N., Zhang, Y., Wang, R. & Malekian, R. (2016), ‘Vehicle trajectory prediction based on hidden markov model’.
Yurtsever et al. (2020) Yurtsever, E., Lambert, J., Carballo, A. & Takeda, K. (2020), ‘A survey of autonomous driving: Common practices and emerging technologies’, IEEE access 8, 58443–58469.
Zhang et al. (2023) Zhang, Q., Ma, Z., Zhang, P., Jenelius, E., Ma, X. & Wen, Y. (2023), ‘User-station attention inference using smart card data: A knowledge graph assisted matrix decomposition model’, Applied Intelligence .
Zhao et al. (2020) Zhao, L., Deng, H., Qiu, L., Li, S., Hou, Z., Sun, H. & Chen, Y. (2020), ‘Urban multi-source spatio-temporal data analysis aware knowledge graph embedding’, Symmetry 12(2), 199.
Zhao et al. (2019) Zhao, L., Song, Y., Zhang, C., Liu, Y., Wang, P., Lin, T., Deng, M. & Li, H. (2019), ‘T-gcn: A temporal graph convolutional network for traffic prediction’, IEEE transactions on intelligent transportation systems 21(9), 3848–3858.
Zhao & Liang (2023) Zhao, Z. & Liang, Y. (2023), ‘A deep inverse reinforcement learning approach to route choice modeling with context-dependent rewards’, Transportation Research Part C: Emerging Technologies 149, 104079.
Zhuang et al. (2017) Zhuang, C., Yuan, N. J., Song, R., Xie, X. & Ma, Q. (2017), Understanding people lifestyles: Construction of urban movement knowledge graph from gps trajectory., in ‘Ijcai’, pp. 3616–3623.
Ziebart et al. (2008) Ziebart, B. D., Maas, A. L., Dey, A. K. & Bagnell, J. A. (2008), Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior, in ‘Proceedings of the 10th international conference on Ubiquitous computing’, pp. 322–331.

Appendix A Notations

The adopted notations in this paper.
$\mathcal{T}$	The set of all GPS trajectories.
$G$	The road network data.
$\mathbf{G}$	The road network, represented by a Multi-Directed Graph (MultiDiGraph).
$\mathbf{V}$ / $\mathbf{E}$	The set of all nodes / edges of $\mathbf{G}$ .
$v$ / $e$	An intersection (or node) / link (or edge) of a road network.
$r^{d}_{i}$	The goal direction of the $i$ -th future route.
$\widehat{r^{d}_{i}}$	The estimated goal direction of the $i$ -th future route.
$\mathbf{r}^{d}_{i}$	The relation embedding of the estimated goal direction of the $i$ -th future route.
$e^{i}$	The $i$ -th link (or edge) of a road network.
$e_{i}^{j}$	The $j$ -th link (or edge) of the $i$ -th route.
$e_{i}^{d,j}$	The direction of $j$ -th link of the $i$ -th route.
$e_{i}^{\Gamma}$	The direction of $j$ -th link of the $i$ -th route.
$\mathbf{e}$	The set of embeddings of all links
$\mathbf{e}^{\Gamma}_{i}$	The embedding of the last (i.e., the $\Gamma$ -th) link in the $i$ -th observed route.
$\mathbf{e}_{i}^{\cdot}$	The embedding of a link in the $i$ -th observed route.
$\mathbf{e}_{i}^{d,\cdot}$	The direction embedding of a link in the $i$ -th observed route.
$\mathbf{e}_{i,\perp}^{\Gamma}$	The hyperplane-projected embedding of the last link in the $i$ -th observed route.
$\mathbf{e}_{\perp}$	The hyperplane-projected set of all candidate tail entities embeddings.
$x_{i}$	The $i$ -th map-matched route on $\mathbf{G}$ of a GPS trajectory.
$x^{o}_{i}$	The $i$ -th observed route on $\mathbf{G}$ of a GPS trajectory.
$x^{o,d}_{i}$	The set of links directions of the $i$ -th observed route.
$x^{f}_{i}$	The $i$ -th future route on $\mathbf{G}$ of a GPS trajectory.
$\widetilde{x_{i,k}^{f}}$	The $k$ -th future route out of the top- $K$ generated routes of $i$ -th future route.
$\mathbf{x}^{o}_{i}$ / $\widetilde{\mathbf{x}^{f}_{i}}$	The embedding of the $i$ -th observed / future route.
$\mathbf{x}^{o,d}_{i}$ / $\widetilde{\mathbf{x}^{f,d}_{i}}$	The embedding of the $i$ -th observed / future route’s directions.
$\widetilde{\mathbf{x}^{f}_{i,\perp^{c}}}$	The $\mathbf{p}^{c}$ hyperplane-projected embedding of the $i$ -th future route.
$\widetilde{\mathbf{x}^{f}_{i,\perp^{s}}}$	The $\mathbf{p}^{s}$ hyperplane-projected embedding of the $i$ -th future route.
$\{\widehat{x_{i,k}^{f}}\}_{k=1}^{K}$	The reranked $k$ -th future route out of the top- $K$ generated routes of $i$ -th future route.
$\mathcal{X}$	The set of all map-matched routes.
$\mathcal{X}^{o}$ / $\mathcal{X}^{f}$	The set of all observed / future routes.
$\widetilde{\mathcal{X}^{f}_{k}}$	The set of all generated $k$ -th future routes out of the top- $K$ generated routes.
$\widehat{\mathcal{X}^{f}_{k}}$	The set of all reranked $k$ -th future routes out of the top- $K$ reranked routes.
$\mathrm{Pr}(\widetilde{x^{f,\gamma}_{i}})$	The predicted probability distribution indicating the likelihood of each link being the $\gamma$ -th link of the $i$ -th future route.
$\mathrm{Pr}(\widetilde{x^{f}_{i}})$	The set of predicted probability distributions for the $i$ -th future route $x^{f}_{i}$ .
$\mathrm{Pr}(\widetilde{\mathcal{X}^{f}})$	The set of all predicted probability distributions for future routes $\mathcal{X}^{f}$ .
$\Gamma$ / $\Gamma^{\prime}$	The length of observed / future routes.
$\mathcal{F}$	The route prediction mapping function.
$K$	The number of generated future routes.
$\mathcal{M}_{d}$ / $\mathcal{M}_{kg}$ / $\mathcal{M}_{g}$ / $\mathcal{M}_{r}$	The mapping functions of the Data Preprocessing Module / Knowledge Graph Module / Route Generation Module / Rank Refinement Module.
$\Theta$	The parameter set of $\mathcal{F}$ .
$\Theta_{kg}$ / $\Theta_{r}$	The parameter set of $\mathcal{M}_{kg}$ / $\mathcal{M}_{r}$ .
$\mathbf{D}$	The direction label matrix.
$\mathbf{A}$	The node adjacent edges matrix.
$\mathbf{D}^{d}$	The inter-road direction matrix.
$\mathcal{G}$	The knowledge graph.
$\mathcal{E}$ / $\mathcal{R}$	The set of all entities / relations of $\mathcal{G}$ .
$(h,r,t)$	A triplet within a knowledge graph $\mathcal{G}$ , $h$ is the head entity, $r$ is the relation, $t$ is the tail entity.
$\mathbf{h}$ / $\mathbf{r}$ / $\mathbf{t}$	The head entity / relation / tail entity embedding.
$\mathbf{p}^{r}$	The hyperplane of the relation $r$ .
$\mathbf{p}^{c}$ / $\mathbf{p}^{s}$ / $\mathbf{p}^{a}$ / $\mathbf{p}^{d}$	The hyperplane of the ConnectBy / ConsistentWith / DistanceTo / DirectionTo relation.
$\mathcal{R}^{c}$ / $\mathcal{R}^{s}$ / $\mathcal{R}^{a}$ / $\mathcal{R}^{d}$	The ConnectBy / ConsistentWith / DistanceTo / DirectionTo spatial relation.
$\delta$	The hidden dimension.
$\delta_{\mathcal{E}}$	The dimension of the entity embedding spaces.
$\delta_{\mathcal{R}^{c}}$ / $\delta_{\mathcal{R}^{s}}$ / $\delta_{\mathcal{R}^{a}}$ / $\delta_{\mathcal{R}^{d}}$	The dimension of the ConnectBy / ConsistentWith / DistanceTo / DirectionTo relation embedding spaces.
$\Delta_{\mathcal{R}^{c}}$ , $\Delta_{\mathcal{R}^{c}}^{\prime}$	The set of valid (positive) triplets and the set of invalid (negative) triplets of relation $\mathcal{R}^{c}$ .
$\Delta_{\mathcal{R}^{s}}$ , $\Delta_{\mathcal{R}^{s}}^{\prime}$	The set of valid (positive) triplets and the set of invalid (negative) triplets of relation $\mathcal{R}^{s}$ .
$\Delta_{\mathcal{R}^{a}}$ , $\Delta_{\mathcal{R}^{a}}^{\prime}$	The set of valid (positive) triplets and the set of invalid (negative) triplets of relation $\mathcal{R}^{a}$ .
$\Delta_{\mathcal{R}^{d}}$ , $\Delta_{\mathcal{R}^{d}}^{\prime}$	The set of valid (positive) triplets and the set of invalid (negative) triplets of relation $\mathcal{R}^{d}$ .
$\phi$	The scoring function of knowledge graph embedding.
$\psi$	The margin of positive and negative scores.
$\mathbf{W}_{\mathcal{E}}$	The trainable entity embedding matrix.
$\mathbf{W}_{\mathcal{R}^{c}}$ / $\mathbf{W}_{\mathcal{R}^{s}}$ / $\mathbf{W}_{\mathcal{R}^{a}}$ / $\mathbf{W}_{\mathcal{R}^{d}}$	The trainable ConnectBy / ConsistentWith / DistanceTo / DirectionTo relation embedding matrix.
$\mathbf{P}_{\mathcal{R}^{c}}$ / $\mathbf{P}_{\mathcal{R}^{s}}$ / $\mathbf{P}_{\mathcal{R}^{a}}$ / $\mathbf{P}_{\mathcal{R}^{d}}$	The trainable ConnectBy / ConsistentWith / DistanceTo / DirectionTo relation Hyperplane.
$\mathbf{r}^{f,c}_{i,margin},\mathbf{r}^{f,s}_{i,margin}$	The connection and consistent margins.
$\mathcal{B}$	The batch size.
$w_{\cdot}$	The weights for different loss functions.
$N_{A}$	The maximum number of the adjacent edges of all nodes in the $\mathbf{G}$ .
$N_{d}$	The number of sections into which directions are discretized.
$\mathcal{L},\mathcal{L}_{\cdot}$	The loss functions.

Appendix B Minibatch version of the Spanning Route algorithm

Algorithm 3 gives the pseudocode for the minibatch Spanning Route.

Input :

\Gamma^{\prime}

batched probability distributions

\left\{\mathrm{Pr}(\widetilde{x^{f,\gamma}})\in\mathbb{R}^{\mathcal{B}\times|% \mathcal{E}|}\right\}_{\gamma=1}^{\Gamma^{\prime}}

;

road network

\mathbf{G}=(\mathbf{V},\mathbf{E})

;

NAE matrix

\mathbf{A}\in\mathbb{R}^{|\mathbf{V}|\times N_{A}}

;

the tree’s degree

n

Output : Top-

K

predicted batched future routes

\left\{\widetilde{x_{k}^{f}}\right\}_{k=1}^{K}

1 // Initialize the root node.

2 root

\leftarrow

CreateNewNode(name = “root”, parent = NIL, end_nodes =

v_{\Gamma}^{s}\in\mathbb{R}^{\mathcal{B}}

, preds = NIL)

3 // Recursively generate a tree of future routes in a greedy manner.

4 for $\gamma=1,\dots,\Gamma^{\prime}$ do

5 // Get leaves of the current tree.

6 leaves

\leftarrow

GetLeaves(root)

7 // Span for each leaf.

8 for leaf $\in$ leaves do

9 // Get the adjacent edges given the end node.

\mathcal{N}_{end\_node}^{e}\in\mathbb{R}^{\mathcal{B}\times N_{A}}

\mathbf{A}[leaf.end\_nodes,:]

11 // Get the top-

n

adjacent edges with highest probabilities based on

\mathrm{Pr}(\widetilde{x^{f,\gamma}_{i}})

\left\{e_{k}^{\Gamma+\gamma}\in\mathbb{R}^{\mathcal{B}}\right\}_{k=1}^{n}

= GetTopK(

\mathrm{Pr}(\widetilde{x^{f,\gamma}})[:,\mathcal{N}_{end\_node}^{e}]

K=n

)

13 // Create leaf node for top-

n

edges and add to the tree.

14 for $k=1,\dots,n$ do

15 // Create leaf node and add to the tree.

16 node = CreateNewNode(name=“

k

”, parent=leaf, end_node=

e_{k}^{\Gamma+\gamma}[1]\in\mathbb{R}^{\mathcal{B}}

, pred=

e_{k}^{\Gamma+\gamma}\in\mathbb{R}^{\mathcal{B}}

)

17 end for

19 end for

21 end for

22leaves

\leftarrow

GetLeaves(root)

23 // Traverse the tree to get top-

K

future routes.

24 for $k=1,\dots,K$ do

25 // Get the path from root to the

k

-th leaf.

\text{path}_{k}

= Traverse(root, leaves[

k

])

27 // Get the generated

k

-th route.

\widetilde{x_{k}^{f}}\in\mathbb{R}^{\mathcal{B}\times\Gamma^{\prime}}

\left\{\text{path}_{k}\text{[i].pred}\in\mathbb{R}^{\mathcal{B}}\right\}_{i=1}% ^{\Gamma^{\prime}}

30 end for

Algorithm 3 Spanning Route (minibatch).

Appendix C Hyperparameters

This study applied consistent experimental configurations to both datasets to ensure reliable and comparable results. A training batch size of 2048 was maintained, and the maximum number of epochs was set to 10,000. Early stopping was employed with a patience parameter of 100 epochs. The hidden dimensions, denoted by $\delta_{\mathcal{E}}$ , $\delta_{\mathcal{R}^{c}}$ , $\delta_{\mathcal{R}^{s}}$ , $\delta_{\mathcal{R}^{a}}$ , and $\delta_{\mathcal{R}^{d}}$ , were uniformly set to 64.

The Adam optimizer was utilized to update the model’s parameters, with a learning rate 1e-3 and weight decay set at 1e-2. To scale the sampling probability distribution for top- $k$ predictions, a temperature parameter of 0.1 was employed for the Chengdu dataset and 0.13 for the Shanghai dataset.

Regarding the weights assigned to different loss terms, under the NoGoal scenario, the weights $[w_{rep},w_{rank},w_{pred},w_{d}]$ were set at $[1,1,1,2.4]$ for the Chengdu dataset and $[1.3,2.8,0.5,2.9]$ for the Shanghai dataset. Under the GoalD scenario, the weights $[w_{rep},w_{rank},w_{pred}]$ were established as $[1,1,1]$ for the Chengdu dataset and $[1.4,2.1,1.7]$ for the Shanghai dataset. Lastly, for the Goal scenario, the weights $[w_{rep},w_{rank},w_{pred}]$ were set as $[2.4,2.2,2.8]$ for the Chengdu dataset and $[1.9,1.3,2.4]$ for the Shanghai dataset. These weight configurations were chosen based on preliminary experiments and empirical observations.

RouteKG: A knowledge graph-based framework for route prediction on road networks

Abstract

keywords:

1 Introduction

2 Literature Review

2.1 Trajectory Prediction

2.1.1 Motion Prediction

2.1.2 Route Prediction

2.2 Knowledge Graph

2.2.1 Knowledge Graph Completion

2.2.2 Mobility Knowledge Graph

3 Preliminaries

3.1 Problem Formulation

Definition 1 (Road Network 𝐆𝐆\mathbf{G}bold_G).

Definition 2 (Route x𝑥xitalic_x).

Problem 1 (Route Prediction ℱℱ\mathcal{F}caligraphic_F).

3.2 Knowledge Graph

4 Methodology

4.1 RouteKG Framework Overview

4.2 Data Preprocessing Module

4.3 Knowledge Graph Module

4.3.1 Knowledge Graph Construction

Entity selection

Relation selection

4.3.2 Knowledge Graph Representation Learning

ConnectBy ℛcsuperscriptℛ𝑐\mathcal{R}^{c}caligraphic_R start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT

ConsistentWith ℛssuperscriptℛ𝑠\mathcal{R}^{s}caligraphic_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT

DistanceTo ℛasuperscriptℛ𝑎\mathcal{R}^{a}caligraphic_R start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT

DirectionTo ℛdsuperscriptℛ𝑑\mathcal{R}^{d}caligraphic_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

4.3.3 Future Route Prediction through Knowledge Graph Completion

4.4 Route Generation Module

4.5 Rank Refinement Module

4.6 Multi-Objectives Optimization

5 Experiments

5.1 Data

5.2 Baseline Methods

5.3 Main Results

5.3.1 Experimental Settings

5.3.2 Main Results Analysis

5.4 Ablation Analysis

5.5 Sensitivity Analysis

5.6 Efficiency Analysis

5.7 Case Study: Traffic Flow Estimation

6 Conclusion

Acknowledgment

References

Appendix A Notations

Appendix B Minibatch version of the Spanning Route algorithm

Appendix C Hyperparameters

Definition 1 (Road Network $\mathbf{G}$ ).

Definition 2 (Route $x$ ).

Problem 1 (Route Prediction $\mathcal{F}$ ).

ConnectBy $\mathcal{R}^{c}$

ConsistentWith $\mathcal{R}^{s}$

DistanceTo $\mathcal{R}^{a}$

DirectionTo $\mathcal{R}^{d}$