Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

WEST GCN-LSTM: Weighted Stacked Spatio-Temporal Graph Neural Networks for Regional Traffic Forecasting

Theodoros Theodoropoulos, Angelos-Christos Maroudis, Antonios Makris, and Konstantinos Tserpes
Department of Informatics and Telematics, Harokopio University of Athens, Greece School of Electrical and Computer Engineering, National Technical University of Athens, Greece
Abstract

Regional traffic forecasting is a critical challenge in urban mobility, with applications to various fields such as the Internet of Everything. In recent years, spatio-temporal graph neural networks have achieved state-of-the-art results in the context of numerous traffic forecasting challenges. This work aims at expanding upon the conventional spatio-temporal graph neural network architectures in a manner that may facilitate the inclusion of information regarding the examined regions, as well as the populations that traverse them, in order to establish a more efficient prediction model. The end-product of this scientific endeavour is a novel spatio-temporal graph neural network architecture that is referred to as WEST (WEighted STacked) GCN-LSTM. Furthermore, the inclusion of the aforementioned information is conducted via the use of two novel dedicated algorithms that are referred to as the Shared Borders Policy and the Adjustable Hops Policy. Through information fusion and distillation, the proposed solution manages to significantly outperform its competitors in the frame of an experimental evaluation that consists of 19 forecasting models, across several datasets. Finally, an additional ablation study determined that each of the components of the proposed solution contributes towards enhancing its overall performance.

Index Terms:
Graph Neural Networks, Traffic Forecasting

I Introduction

Regional traffic forecasting is an emerging challenge in the domain of urban mobility that holds significance in various fields such as smart cities [1], edge computing [2], the Internet of Things [3], wireless networks [4], personalised recommender systems [5], epidemiology modeling [6], and many more. However, the significance of regional traffic forecasting escalates within the Internet of Everything (IoE) paradigm [7], which is characterized by an intricate web of relationships among people, things, data, and processes. Regional traffic forecasting refers to the process of predicting future traffic conditions across diverse geographic areas, characterized by grid-based or non-uniform partitioning, and over multiple periods of time, which may span from a couple of minutes to several hours. This process is conducted via the use of dedicated forecasting models.

Despite its numerous beneficial applications in the frame of the aforementioned fields, regional traffic forecasting poses a complex problem, as it requires accurately forecasting traffic conditions across different areas over an extended time period. This complexity arises from the intricate and interlinked two-fold nature of traffic systems that manifest both spatial temporal characteristics [8]. Spatially, traffic conditions in one region can be influenced by events occurring in neighboring or distant areas, requiring an understanding of the spatial dependencies between different regions. In other words, traffic conditions in one area can have a cascading effect on neighboring regions, making it imperative to accurately model and account for the interdependencies among regions. Temporally, traffic patterns undergo dynamic changes influenced on the basis of time cycles of varying lengths [9], demanding models that capture both short-term fluctuations and long-term trends.

Thus, any attempt at constructing regional traffic forecasting models should be designed in a manner that incorporates the use of information regarding the topology of the various regions and the populations that traverse them. Thankfully, such information can become readily available via the use of technologies such as advanced traffic sensor networks [10], and integrated geographic information systems [11]. These technologies are capable of continuously monitoring, documenting, and archiving spatial and time information across multiple instances. By combining advanced modeling techniques, real-time data streams, and domain-specific knowledge, it is possible to create robust and accurate forecasting systems capable of handling the intricacies of regional traffic [12].

In recent years, spatio-temporal graph neural networks, such as GCN-LSTM [13], have demonstrated state-of-the-art performance in a wide range of traffic forecasting problems [14], primarily due to their intrinsic nature of effectively incorporating contextual information. However, only a small portion of these endeavours focuses on regional traffic forecasting. Furthermore, the use of spatio-temporal graph neural networks for regional traffic forecasting has been quite limited in the sense that all prior attempts focus on capturing either the spatial or the temporal aspects of this problem.

This work aims at expanding upon the GCN-LSTM architecture in a manner that may facilitate the incorporation of information regarding the various populations (temporal aspect), as well as the regions that they traverse (spatial aspect), in order to establish more refined and accurate prediction models, through information fusion and distillation. The result of this scientific endeavour is a novel spatio-temporal graph neural network architecture that leverages weighted stacked graph convolution. This architecture is referred to as WEST (WEighted STacked) GCN-LSTM. Furthermore, the incorporation of the aforementioned information is conducted via the use of two novel algorithms that are referred to as the Shared Borders Policy and the Adjustable Hops Policy. This paper is dedicated to analyzing the proposed architecture and algorithms in great detail, and to evaluating the efficiency of the proposed solution that consists of them.

More specifically, the rest of the paper is structured in the following manner: Section II explores the corresponding scientific literature that is based on the use of various traffic forecasting models. Section III establishes the problem formulation that shall be used throughout this paper. Section IV showcases the proposed solution. Section V describes the experimental process undertaken to evaluate the efficiency of the proposed solution. Finally, Section VI summarizes the merits and findings of this work, and proposes potential future research directions.

II Literature Review

Depending on the examined regional traffic forecasting scenario, the characteristics of the moving entities, as well as the regions that they traverse may vary significantly. Nevertheless, regardless of which type of scenario is being considered, these two aspects constitute the cornerstone of regional traffic forecasting. As such, any attempt at conducting regional traffic forecasting should be carried out within a spatio-temporal framework that is capable of encapsulating the dynamic nature of the moving entities that is bound to manifest.

This view of traffic evolution as a dynamic system led earlier attempts at traffic forecasting to consider the use of Recurrent Neural Networks (RNNs) [15]. While RNNs have been particularly effective for modeling sequential data exhibiting dynamic behavior, their efficacy wanes in storing prolonged information, attributed to the vanishing / exploding gradient phenomenon that appears in long sequence learning [16]. To surmount this limitation, Long Short-Term Memory (LSTM) networks were used in the context of regional traffic forecasting [17]. LSTMs excel in capturing temporal intricacies and long-term dependencies. However, while these models proficiently handle sequential data and unveil temporal patterns within mobility tasks, their prowess in the temporal domain stands in contrast to their limitations in comprehensively encapsulating the spatial aspects of the problem.

Encoder-decoders (EDs) are composite Deep Learning (DL) architectures that are capable of mitigating this limitation. They are designed to handle variable-length input and output sequences, making it ideal for sequence-to-sequence predictions due to their structure. The encoder converts variable-length input into a fixed-shape state, and the decoder, using the encoder’s states, generates the output based on gathered information. While the role of the encoder is encapsulate the spatial underlying dependencies, the decoder aims at capturing the various temporal patterns and thus is usually based on some form of RNN models. Notable encoder-decoder architectures that have been examined in the frame of traffic forecasting scenarios include the LSTM ED [18], THE BD-LSTM ED [19], the CNN-LSTM [20], the Hybrid LSTM ED [21], and the Hybrid LSTM ATT ED [22].

More specifically, the findings that derived while authoring the latter of these works, motivated us to focus our efforts towards establishing a more advanced solution for regional traffic forecasting. This work showcased that ED architectures, when leveraged for multi-step regional traffic forecasting in the context of a single region, manage to outperform their competitors. However, the situation changes drastically when simultaneously exploring multiple regions in a multi-step manner, due to the dramatic increase in the complexity of the input and output sequences. During such multi-regional scenarios, ED architectures seem to lose their clear competitive advantage against other approaches, such as the ones that are based on Linear Regression (LR) [23], and Machine Learning (ML) [24] paradigms, depending on the underlying characteristics of the dynamic systems that derive from the corresponding regional traffic scenario. Significant increases in the complexity of the input and output sequences seem to construct a performance plateau that disproportionally affects each forecasting approach. This observation showcased the need to introduce regional traffic forecasting mechanisms whose efficiency is not jeopardized by increases in the underlying complexity of the problem, but instead through information fusion and refinement manage to consistently rise above the aforementioned plateau, regardless of the underlying system dynamics.

In recent times, the pursuit of effective methodologies to address challenges inherent in processing data originating from non-Euclidean domains has garnered significant attention. At the forefront of these endeavors lie Graph Neural Networks (GNNs) [25], renowned for their adeptness in resolving problems that are intricately intertwined with spatial aspects [26]. This prowess emanates from their inherent capacity to harness and exploit the spatial attributes of data pertaining to a given problem. Over time, foundational GNN architectures have undergone extensions aimed at augmenting their inherent attributes, thus resulting in several variations. The most notable of these variations are recurrent graph neural networks [27], graph convolutional networks (GCN) [28], graph autoencoders [29], and spatio-temporal graph neural networks [30]. Spatio-temporal graph neural networks have been proven quite successful in the field of traffic forecasting [31], since their architecture enables them to simultaneously capture spatial and temporal dependencies. This can be achieved by leveraging graph convolutions to model spatial dependencies alongside RNNs to encapsulate temporal dependencies [32], in a manner that is aligned with the ED paradigm.

The modus operandi of GNNs is based on iteratively aggregating information from the neighbors and updating the representations of nodes. The neighbors of each node are dictated by a dedicated adjacency matrix. In most cases, the node representation is updated on the basis of the direct neighbors of the node, which are called 1-hop neighbors. However, there have been works [33, 34] that advocate for the extension of information aggregation to K-hop format, in order to enhance the model’s expressive power [35]. In the context of spatio-temporal graph neural networks there has been only a single instance [36] that the incorporation of K-hop information aggregation was examined. However, in the context of the aforementioned work, K𝐾Kitalic_K was regarded as a parameter that is tuned by trial-and-error. In our work however, K𝐾Kitalic_K is calculated using information regarding the moving speed of the populations and the topology of the regions. This design choice enables the proposed solution to encapsulate the temporal aspects of regional traffic forecasting, in regards to the moving speed of the various populations.

Despite the fact that there have been numerous works that examine the use of GNNs in the context of traffic forecasting [37], only a rather small fraction of them focus on the use of graph neural networks for regional traffic forecasting [38], the vast majority of which can be categorised based on the way they incorporate the graphs’ adjacency matrices. Most of these works [39, 40, 41, 42, 43, 44] choose to construct the adjacency matrices based on various distance-related metrics (in most cases the distance between the centers of the regions). Other works [45, 46, 47] propose the construction of the adjacency matrices on the basis of traffic pattern similarity matrices. Finally, the last category includes solutions [48, 49, 50] that leverage binary adjacency matrices based on whether or not the involved nodes are neighboring. In our work we expand upon the latter category by proposing the use of a weighted adjacency matrix in order to encapsulate the lengths of borders that are being shared between the various regions. This design choice enables the proposed solution to encapsulate the spatial aspects of regional traffic forecasting, in regards to the topology of the involved regions in a more refined manner. Furthermore, all of the aforementioned GNN-based solutions for regional traffic forecasting are designed to focus on either the spatial (distance of centers, neighboring status) or the temporal (traffic) aspects of this problem. However, in our work we propose a solution that is capable of capturing both the spatial and the temporal aspects of regional traffic forecasting in an optimal manner.

Our work aims to introduce an advanced forecasting model that through information refinement & fusion, is capable of producing more accurate regional traffic predictions. Towards achieving this goal, we extend spatio-temporal graph neural networks in a manner that is aligned with the regional traffic forecasting paradigm. In order to do so, we propose a novel spatio-temporal graph neural networks architecture that incorporates weighted stacked convolutions. Weighted stacked convolutions require the calculation of the weighted adjacency matrix and the number of graph convolution layers K𝐾Kitalic_K. To that end, we also propose two novel policies that leverage information regarding speed of the populations and the topography of the regions that they traverse, to calculate the adjacency matrix and K𝐾Kitalic_K.

III Problem Formulation

This paper presents an evaluation of models designed to forecast population flow estimates for various regions across multiple future time periods. We refer to this challenge as multi-step regional traffic forecasting. A Region refers to a specific area within a larger geographic or urban context that is of particular interest for a certain purpose or analysis. The Regions are represented by the set N={n1,n2..,nn}N=\{n_{1},n_{2}..,n_{n}\}italic_N = { italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . . , italic_n start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, where nnsubscript𝑛𝑛n_{n}italic_n start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT indicates the nthsuperscript𝑛𝑡n^{th}italic_n start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT Region, where 1n|N|1𝑛𝑁1\leq n\leq|N|1 ≤ italic_n ≤ | italic_N |. Each Regionn𝑅𝑒𝑔𝑖𝑜subscript𝑛𝑛Region_{n}italic_R italic_e italic_g italic_i italic_o italic_n start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is characterized by its Borders which are represented by the set Bn={b1,b2..,bb}B_{n}=\{b_{1},b_{2}..,b_{b}\}italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . . , italic_b start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT }, where bbsubscript𝑏𝑏b_{b}italic_b start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT indicates the bthsuperscript𝑏𝑡b^{th}italic_b start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT Border, where 1bB1𝑏𝐵1\leq b\leq B1 ≤ italic_b ≤ italic_B, and by its Centern𝐶𝑒𝑛𝑡𝑒subscript𝑟𝑛Center_{n}italic_C italic_e italic_n italic_t italic_e italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT which is represented by a 2-tuple that corresponds to its x𝑥xitalic_x & y𝑦yitalic_y coordinates. The average distance between the various Centern𝐶𝑒𝑛𝑡𝑒subscript𝑟𝑛Center_{n}italic_C italic_e italic_n italic_t italic_e italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT 2-tuples is denoted by D𝐷Ditalic_D. Furthermore, each Border is regarded as a line segment, which is characterized by a 4-tuple that corresponds to the x𝑥xitalic_x & y𝑦yitalic_y coordinates of the two endpoints of the line segment. The various types of Populations are represented by the set P={p1,p2..,pp}P=\{p_{1},p_{2}..,p_{p}\}italic_P = { italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . . , italic_p start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT }, where ppsubscript𝑝𝑝p_{p}italic_p start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT indicates the pthsuperscript𝑝𝑡p^{th}italic_p start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT Population, where 1p|P|1𝑝𝑃1\leq p\leq|P|1 ≤ italic_p ≤ | italic_P |. The distinction between Population types is made based on their ability to traverse the area, and thus each of these Populations is characterized by an average moving speed Speedp𝑆𝑝𝑒𝑒subscript𝑑𝑝Speed_{p}italic_S italic_p italic_e italic_e italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. The notations used in the context of this work are showcased in Table I.

In time-series analysis, the multi-step formulation involves predicting future values of a time series by forecasting multiple time-steps ahead. This method differs from the single-step approach, which only estimates the next point in time. Furthermore, the multivariate formulation in time series analysis involves creating a model for target variables that relies on multiple predictor variables. These predictor variables are interdependent and display temporal dependencies over time, and they may be impacted by exogenous inputs and noise. This method can be represented mathematically as a system of equations, where the target variable and predictor variables are modeled as stochastic processes that vary with time.

In the context of the present challenge, the output vector’s dimensional space is denoted by R|N|Usuperscript𝑅𝑁𝑈R^{|N|*U}italic_R start_POSTSUPERSCRIPT | italic_N | ∗ italic_U end_POSTSUPERSCRIPT, wherein |N|𝑁|N|| italic_N | represents the number of Regions for which we intend to predict traffic at a given time point t𝑡titalic_t, and U𝑈Uitalic_U denotes the number of future steps over which we aim to make these projections. Similarly, we define the input vector’s dimensional space as R|N|Usuperscript𝑅superscript𝑁superscript𝑈R^{|N^{{}^{\prime}}|*U^{{}^{\prime}}}italic_R start_POSTSUPERSCRIPT | italic_N start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT | ∗ italic_U start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, where |N|superscript𝑁|N^{{}^{\prime}}|| italic_N start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT | corresponds to number of the Regions whose population variations exhibit a reliance on those of N𝑁Nitalic_N, and Usuperscript𝑈U^{{}^{\prime}}italic_U start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT signifies the number of preceding time steps that contribute to the retrospective observation window (look-back window). It is pertinent to note that in our modeling, the value of |N|superscript𝑁|N^{{}^{\prime}}|| italic_N start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT | is equal to that of |N|𝑁|N|| italic_N |.

In further elaboration, we focus on a particular time point tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and consider the input vector X𝑋Xitalic_X as follows:

X={xiU+1,,xil,,xi},lU,formulae-sequence𝑋subscript𝑥𝑖superscript𝑈1subscript𝑥𝑖superscript𝑙subscript𝑥𝑖superscript𝑙superscript𝑈X=\{x_{i-U^{{}^{\prime}}+1},...,x_{i-l^{\prime}},...,x_{i}\},l^{\prime}\in U^{% \prime},italic_X = { italic_x start_POSTSUBSCRIPT italic_i - italic_U start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_i - italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , (1)

,wherein xil={Regiontil1,Regiontil2,,Regiontiln}subscript𝑥𝑖superscript𝑙𝑅𝑒𝑔𝑖𝑜superscriptsubscript𝑛subscript𝑡𝑖superscript𝑙1𝑅𝑒𝑔𝑖𝑜superscriptsubscript𝑛subscript𝑡𝑖superscript𝑙2𝑅𝑒𝑔𝑖𝑜superscriptsubscript𝑛subscript𝑡𝑖superscript𝑙superscript𝑛x_{i-l^{\prime}}=\{Region_{t_{i-l^{\prime}}}^{1},Region_{t_{i-l^{\prime}}}^{2}% ,...,Region_{t_{i-l^{\prime}}}^{n^{{}^{\prime}}}\}italic_x start_POSTSUBSCRIPT italic_i - italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_R italic_e italic_g italic_i italic_o italic_n start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i - italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_R italic_e italic_g italic_i italic_o italic_n start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i - italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_R italic_e italic_g italic_i italic_o italic_n start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i - italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT } represents the Population of each Region nN𝑛superscript𝑁n\in N^{{}^{\prime}}italic_n ∈ italic_N start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT at the time tilsubscript𝑡𝑖superscript𝑙t_{i-l^{\prime}}italic_t start_POSTSUBSCRIPT italic_i - italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. In a similar manner, we model the output vector Y𝑌Yitalic_Y, which is characterized as follows:

Y={yi+1,,yi+l,,yU},lU,formulae-sequence𝑌subscript𝑦𝑖1subscript𝑦𝑖𝑙subscript𝑦𝑈𝑙𝑈Y=\{y_{i+1},...,y_{i+l},...,y_{U}\},l\in U,italic_Y = { italic_y start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_i + italic_l end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT } , italic_l ∈ italic_U , (2)

,wherein yi+l={Regionti+l1,Regionti+l2,,Regionti+ln}subscript𝑦𝑖𝑙𝑅𝑒𝑔𝑖𝑜superscriptsubscript𝑛subscript𝑡𝑖𝑙1𝑅𝑒𝑔𝑖𝑜superscriptsubscript𝑛subscript𝑡𝑖𝑙2𝑅𝑒𝑔𝑖𝑜superscriptsubscript𝑛subscript𝑡𝑖𝑙𝑛y_{i+l}=\{Region_{t_{i+l}}^{1},Region_{t_{i+l}}^{2},...,Region_{t_{i+l}}^{n}\}italic_y start_POSTSUBSCRIPT italic_i + italic_l end_POSTSUBSCRIPT = { italic_R italic_e italic_g italic_i italic_o italic_n start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i + italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_R italic_e italic_g italic_i italic_o italic_n start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i + italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_R italic_e italic_g italic_i italic_o italic_n start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i + italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT } represents the population of each Region nN𝑛𝑁n\in Nitalic_n ∈ italic_N at the time ti+lsubscript𝑡𝑖𝑙t_{i+l}italic_t start_POSTSUBSCRIPT italic_i + italic_l end_POSTSUBSCRIPT.

Since our work aims to expand upon spatio-temporal graph neural networks, it is of vital importance to convert the prior problem formulation to graph format. There have been numerous variations in terms of types of graphs. Arguably, the most significant distinction among these variations lies in whether the considered graph structures are static or dynamic. Dynamic graphs can be classified into Discrete-Time Dynamic Graphs (DTDG) [51] and Continuous-Time Dynamic Graphs (CTDG) [52]. The authors of this work have chosen the DTDG approach in order to formulate regional traffic in a dynamic manner. According to the DTDG paradigm, a dynamic graph is defined as a sequence of snapshots of a static graph. Each one of the snapshots corresponds to a specific time-step t𝑡titalic_t, the duration of which is referred to as twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT. These snapshots construct a temporal continuum that enables the emergence of temporal patterns and phenomena. Furthermore, each one of the aforementioned static graphs consists of multiple nodes and edges that encapsulate the underlying spatial relations. In the context of this work, each graph corresponds to an area in two-dimensional space that is divided in N𝑁Nitalic_N Regions that are being traversed by |P|𝑃|P|| italic_P | Populations at each time-step t𝑡titalic_t.

Given an undirected graph G¯¯𝐺\overline{G}over¯ start_ARG italic_G end_ARG that consists of |N|𝑁|N|| italic_N | nodes and E𝐸Eitalic_E edges. The nodes of the graph correspond to the Regions, and the edges of the graph correspond to how likely it is for a member of a Population to move from one Region to another, within the time-frame of a singular time-step t𝑡titalic_t. This graph can be described by the following two matrices:

  • A weighted Adjacency Matrix 𝐀|N|×|N|𝐀superscript𝑁𝑁\mathbf{A}\in\mathbb{R}^{|N|\times|N|}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT | italic_N | × | italic_N | end_POSTSUPERSCRIPT that incorporates edge weights wij𝑤𝑖𝑗w{ij}italic_w italic_i italic_j.

  • A Feature Matrix 𝐙|N|×F𝐙superscript𝑁𝐹\mathbf{Z}\in\mathbb{R}^{|N|\times F}bold_Z ∈ blackboard_R start_POSTSUPERSCRIPT | italic_N | × italic_F end_POSTSUPERSCRIPT, where F𝐹Fitalic_F corresponds to the dimension of each Feature Vector.

The Feature Matrix can be viewed as the total of the Feature Vectors. Each one of the |N|𝑁|N|| italic_N | rows of the Feature Matrix corresponds to a Feature Vector that describes node-level features. In the context of this work, each node (Region) is described by a Feature Vector, whose dimension is equal to Usuperscript𝑈U^{{}^{\prime}}italic_U start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT which are the traffic values recorded at the corresponding Region during the last Usuperscript𝑈U^{{}^{\prime}}italic_U start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT time-steps, and thus F𝐹Fitalic_F = Usuperscript𝑈U^{{}^{\prime}}italic_U start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT. Furthermore, each of the Usuperscript𝑈U^{{}^{\prime}}italic_U start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT columns of the Feature Matrix Z𝑍Zitalic_Z corresponds to a different time-step t𝑡titalic_t of the input sequence. This formulation enables instances of the Feature Matrix to be modeled as time-series data. The Adjacency Matrix A𝐴Aitalic_A is static and thus, remains unchanged throughout the various time-steps, since it models the statistical possibility of moving from one Region to another. On the other hand, the Feature Matrix Z𝑍Zitalic_Z is dynamic and is different for each time-step t𝑡titalic_t. Subsequently, snapshots of the Feature Matrix Z𝑍Zitalic_Z are conceptually equivalent to the aforementioned input vector X𝑋Xitalic_X |N|×Uabsentsuperscript𝑁superscript𝑈\in\mathbb{R}^{|N|\times U^{{}^{\prime}}}∈ blackboard_R start_POSTSUPERSCRIPT | italic_N | × italic_U start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

TABLE I: Notations used in this paper.
Notations Descriptions
nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT n-dimensional euclidean space
N𝑁Nitalic_N set of Regions
Bordern𝐵𝑜𝑟𝑑𝑒subscript𝑟𝑛Border_{n}italic_B italic_o italic_r italic_d italic_e italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT set of Regionn𝑅𝑒𝑔𝑖𝑜subscript𝑛𝑛Region_{n}italic_R italic_e italic_g italic_i italic_o italic_n start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT’s Borders
Centern𝐶𝑒𝑛𝑡𝑒subscript𝑟𝑛Center_{n}italic_C italic_e italic_n italic_t italic_e italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT Center of Regionn𝑅𝑒𝑔𝑖𝑜subscript𝑛𝑛Region_{n}italic_R italic_e italic_g italic_i italic_o italic_n start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT
D𝐷Ditalic_D the average distance between the Centers
P𝑃Pitalic_P set of Populations
Speedp𝑆𝑝𝑒𝑒subscript𝑑𝑝Speed_{p}italic_S italic_p italic_e italic_e italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT average speed of Populationp𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜subscript𝑛𝑝Population_{p}italic_P italic_o italic_p italic_u italic_l italic_a italic_t italic_i italic_o italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT
U𝑈Uitalic_U number of input time-steps
Usuperscript𝑈U^{\prime}italic_U start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT number of prediction time-steps
twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT duration of each time-step
F𝐹Fitalic_F dimension of Feature Vector
Z𝑍Zitalic_Z Feature Matrix
D𝐷Ditalic_D Average distance between two Region Centers
G¯¯𝐺\overline{G}over¯ start_ARG italic_G end_ARG Undirected Graph
A𝐴Aitalic_A Adjacency Matrix
I𝐼Iitalic_I Identity Matrix
w𝑤witalic_w Edge Weights
K𝐾Kitalic_K number of stacked GCN layers
W𝑊Witalic_W learnable weight matrix
|X|𝑋|X|| italic_X | the number of elements in a given set X

IV Proposed Solution

The aim of this work is to expand upon the GCN-LSTM architecture, depicted in Fig. 1, in a manner that may enable the incorporation of information regarding the various Populations, as well as the Regions that they traverse, in order to produce more refined and accurate prediction models.

Refer to caption
Figure 1: GCN-LSTM architecture.

Towards achieving this goal, the proposed solution consists of three components that are closely intertwined with each other. These components are the following ones:

  • the WEST GCN-LSTM

  • the Shared Borders Policy

  • the Adjustable Hops Policy

The proposed WEST GCN-LSTM is a novel architectural paradigm that extends the GCN-LSTM architecture by facilitating multiple weighted stacked graph convolution layers, based on a weighted Adjacency Matrix A𝐴Aitalic_A and the number of stacked graph convolution layers that is denoted by K𝐾Kitalic_K. According to our solution, A𝐴Aitalic_A and K𝐾Kitalic_K are calculated using the Shared Borders Policy and the Adjustable Hops Policy, respectively. The Shared Borders Policy is designed to leverage information regarding the Borders of the Regions, while the Adjustable Hops Policy is designed to leverage information regarding the Speed of Populations, the Centers of the Regions, and the number of prediction steps. This section is dedicated to showcasing these three components. An overview of the proposed solution is depicted in Fig. 2.

Refer to caption
Figure 2: An overview of the proposed solution.

IV-A WEST GCN-LSTM

WEST GCN-LSTM is a spatio-temporal graph neural network that is based on the ED architectural paradigm. As the name suggests, the encoder is based on weighted stacked GCNs and the decoder on LSTMs. In order to establish the foundations of the proposed solution, it is required to briefly delve into the mechanics of the WEST GCN-LSTM model.

IV-A1 WEighted STacked (WEST) Graph Convolutional Networks (GCN)

GCNs leverage the graph structure to aggregate information from neighboring nodes. In this step, each node collects information from its neighbors, including itself, to update its feature representation. This is achieved through a linear transformation and aggregation process. After aggregating neighbor information, a non-linear activation function (ReLU) is applied in order to generate the aggregated representation hhitalic_h. It is of paramount importance to note that the number of stacked GCN layers is equal to the number of hops in terms of neighbors that the aggregation process can encompass during each iteration. For instance, in a scenario where a GCN is constructed with just a single convolution layer, nodes can exclusively access their immediate neighbors for aggregating representation data. In the frame of weighted stacked graph convolution, the convolutional layers are influenced by the weighted Adjacency Matrix A𝐴Aitalic_A, while the quantity of stacked graph convolution layers is denoted as K𝐾Kitalic_K. Thus, in the context of weighted stacked graph convolutions, hhitalic_h is calculated in the following manner:

h(0)=(Z)for k=0formulae-sequencesuperscript0𝑍for 𝑘0h^{(0)}=\left(Z\right)\qquad\text{for }k=0italic_h start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = ( italic_Z ) for italic_k = 0 (3)
h(k)=σ(D^12A^D^12h(k1)W(k1))for k=1,,Kformulae-sequencesuperscript𝑘𝜎superscript^𝐷12^𝐴superscript^𝐷12superscript𝑘1superscript𝑊𝑘1for 𝑘1𝐾h^{(k)}=\sigma(\hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}}h^{(k-1)}W^{% (k-1)})\qquad\text{for }k=1,\ldots,Kitalic_h start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = italic_σ ( over^ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG italic_A end_ARG over^ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_h start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ( italic_k - 1 ) end_POSTSUPERSCRIPT ) for italic_k = 1 , … , italic_K (4)

,where h(k)superscript𝑘h^{(k)}italic_h start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT corresponds to the aggregated representation after k𝑘kitalic_k convolutional layers, and h(0)=Zsuperscript0𝑍h^{(0)}=Zitalic_h start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = italic_Z corresponds to the Feature Matrix. Furthermore, A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG is equal to A+I𝐴𝐼A+Iitalic_A + italic_I and its purpose is to incorporate self-connections, D^^𝐷\hat{D}over^ start_ARG italic_D end_ARG is the diagonal degree matrix of A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG, W𝑊Witalic_W signifies a dedicated learnable weight matrix, and σ𝜎\sigmaitalic_σ represents the ReLU function.

IV-A2 Long Short Term Memory (LSTM)

LSTM networks, similar to their precursor, employ the Hidden State mechanism in order to facilitate the representation of dynamic temporal behaviour. The unique aspect of LSTM networks is their utilization of the Cell State structure. This architecture introduces Cell State manipulation through regulatory mechanisms known as Gates. Each LSTM node encompasses three gate-related elements, all incorporating sigmoid layers to ensure differentiability within the 0to10𝑡𝑜10-to-10 - italic_t italic_o - 1 range. The sigmoid activation function scales values to facilitate data importance assessment and decision-making regarding retention or omission. Gate structures incorporate two sets of weight matrices, denoted as W𝑊Witalic_W and U𝑈Uitalic_U, associated with Hidden State and input, along with additional matrices for Cell State. The input Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT corresponds to timestamp t𝑡titalic_t. Gates employ these matrices, along with input and prior Hidden State (hiddent1𝑖𝑑𝑑𝑒subscript𝑛𝑡1hidden_{t-1}italic_h italic_i italic_d italic_d italic_e italic_n start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT).

The Forget Gate determines which historical information from past timestamps is to be excluded from the Cell State. Its output is computed using Eq. 5. The Input Gate evaluates the significance of recent input, updating Cell State using Eq. 6. Cell State calculation employs the C¯¯C\overline{\text{C}}over¯ start_ARG C end_ARG vector, generated as per Eq. 7, with tanh activation mitigating gradient issues. The Cell State update process is described in Eq. 8, combining the output of the Forget Gate and the Input Gate with C¯¯C\overline{\text{C}}over¯ start_ARG C end_ARG. The Output Gate computes the subsequent hidden state using Eq. 9. The new Hidden State is calculated according to Eq. 10. Updated Cell State and Hidden State are then propagated to subsequent LSTM nodes for the next time-step [53].

forgett=sigmoid(XtWf+hiddent1Uf)subscriptforget𝑡sigmoidsubscript𝑋𝑡subscript𝑊𝑓subscripthidden𝑡1subscript𝑈𝑓\displaystyle\text{forget}_{t}=\text{sigmoid}(X_{t}\cdot W_{f}+\text{hidden}_{% t-1}\cdot U_{f})forget start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = sigmoid ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_W start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + hidden start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ⋅ italic_U start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) (5)
inputt=sigmoid(XtWi+hiddent1Ui)subscriptinput𝑡sigmoidsubscript𝑋𝑡subscript𝑊𝑖subscripthidden𝑡1subscript𝑈𝑖\displaystyle\text{input}_{t}=\text{sigmoid}(X_{t}\cdot W_{i}+\text{hidden}_{t% -1}\cdot U_{i})input start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = sigmoid ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + hidden start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ⋅ italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (6)
C¯=tanh(XtWc+hiddent1Uc)¯Ctanhsubscript𝑋𝑡subscript𝑊𝑐subscripthidden𝑡1subscript𝑈𝑐\displaystyle\overline{\text{C}}=\text{tanh}(X_{t}\cdot W_{c}+\text{hidden}_{t% -1}\cdot U_{c})over¯ start_ARG C end_ARG = tanh ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_W start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + hidden start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ⋅ italic_U start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) (7)
Ct=forgettCt1+inputtC¯tsubscript𝐶𝑡subscriptforget𝑡subscript𝐶𝑡1subscriptinput𝑡subscript¯C𝑡\displaystyle C_{t}=\text{forget}_{t}\cdot C_{t-1}+\text{input}_{t}\cdot% \overline{\text{C}}_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = forget start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_C start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + input start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ over¯ start_ARG C end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (8)
outputt=sigmoid(XtWo+hiddent1Uo)subscriptoutput𝑡sigmoidsubscript𝑋𝑡subscript𝑊𝑜subscripthidden𝑡1subscript𝑈𝑜\displaystyle\text{output}_{t}=\text{sigmoid}(X_{t}\cdot W_{o}+\text{hidden}_{% t-1}\cdot U_{o})output start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = sigmoid ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_W start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT + hidden start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ⋅ italic_U start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) (9)
hiddent=outputttanh(Ct)subscripthidden𝑡subscriptoutput𝑡tanhsubscript𝐶𝑡\displaystyle\text{hidden}_{t}=\text{output}_{t}\cdot\text{tanh}(C_{t})hidden start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = output start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ tanh ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (10)

IV-A3 WEST GCN-LSTM

The authors of this work combined weighted stacked GCN and LSTM layers to predict regional traffic. This fusion of spatial and temporal DL layers is referred to as WEST GCN-LSTM. Weighted stacked GCNs constitute the encoder and are designed to extract structural characteristics of the input sequence, producing an aggregated representation. This process is carried out in the following manner:

hencoder=WEST GCNencoder(Z,A)superscriptencodersubscriptWEST GCNencoder𝑍𝐴h^{\text{encoder}}=\text{WEST GCN}_{\text{encoder}}(Z,A)\\ italic_h start_POSTSUPERSCRIPT encoder end_POSTSUPERSCRIPT = WEST GCN start_POSTSUBSCRIPT encoder end_POSTSUBSCRIPT ( italic_Z , italic_A ) (11)

Here, hencodersuperscriptencoderh^{\text{encoder}}italic_h start_POSTSUPERSCRIPT encoder end_POSTSUPERSCRIPT represents the aggregated representation after applying weighted stacked graph convolution, A𝐴Aitalic_A is the weighted Adjacency Matrix of the graph, and Z𝑍Zitalic_Z is the Feature Matrix. This representation is then fed as input to the LSTM part of the model, thus capturing temporal patterns at the graph snapshot level. The LSTM part of the model, acting as a decoder, produces the desired predictions, in the following manner:

Y=LSTMdecoder(hencoder)𝑌subscriptLSTMdecodersuperscriptencoderY=\text{LSTM}_{\text{decoder}}(h^{\text{encoder}})italic_Y = LSTM start_POSTSUBSCRIPT decoder end_POSTSUBSCRIPT ( italic_h start_POSTSUPERSCRIPT encoder end_POSTSUPERSCRIPT ) (12)

LSTMdecodersubscriptLSTMdecoder\text{LSTM}_{\text{decoder}}LSTM start_POSTSUBSCRIPT decoder end_POSTSUBSCRIPT refers to the LSTM network that takes the aggregated representation from the encoder as input in order to produce an output that is then passed through a dense layer in order to generate the multi-step predictions. In the frame of multi-step time-series forecasting, the WEST GCN-LSTM model takes as input a sequence of graph signals, where each signal corresponds to a different time-step and is represented as a graph signal on a fixed graph. The goal is to predict the future values of the time-series based on the graph signals of previous time steps.

IV-B Shared Borders Policy

The edges of the graph play an integral role in representing the spatial relations between the Regions by dictating which nodes shall partake in the feature aggregation process. The authors of this paper propose a novel approach by introducing the Shared Borders policy. The Shared Borders policy is based on the reasonable assumption that the greater the lengths of the shared borders are, the more statistically likely it is for a larger percentage of a Population to traverse them.

According to this approach, the Adjacency Matrix A𝐴Aitalic_A shall be constructed based on the lengths of the Borders that are being shared between each pair of Regions (nodes). The process of calculating the length of the shared Borders between two Regions is presented in Alg. 1. In case that two Regions are not neighbors then the corresponding matrix elements shall be equal to 00. Furthermore, the diagonal elements of the adjacency matrix shall be equal to the length of the perimeter of the perspective Region. By doing so, we ensure that the ongoing traffic of a particular Region shall be the main factor in predicting its corresponding future state, while the rest of the Regions will influence the prediction results to a degree that is associated with the Borders that they share with the aforementioned Region. The Shared Borders Policy is presented in Alg. 2. Upon the construction of the Adjacency Matrix A𝐴Aitalic_A using the Shared Borders Policy, it is required to normalize the resulting values in the 0to10𝑡𝑜10-to-10 - italic_t italic_o - 1 range.

The main idea behind the incorporation of the Shared Borders Policy is that by enabling only the neighboring Regions to partake in the feature aggregation process, we are able to establish a distilled version of the spatial correlations that are inherent in topological structures. By doing so, we are alleviating part of the complexity that would otherwise be imposed on the forecasting model. Furthermore, by incorporating the aforementioned weights, we are able to establish a more refined encapsulation of the spatial dependencies that manifest.

Algorithm 1 LengthOfSharedBorder Algorithm
  Description: Algorithm for calculating the length of the overlapping section between two line segments in two-dimensional space.
  Input: The endpoints of the two line segments: (x1,y1),(x2,y2)subscript𝑥1subscript𝑦1subscript𝑥2subscript𝑦2(x_{1},y_{1}),(x_{2},y_{2})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) for the first line segment and (x3,y3),(x4,y4)subscript𝑥3subscript𝑦3subscript𝑥4subscript𝑦4(x_{3},y_{3}),(x_{4},y_{4})( italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , ( italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) for the second line segment.
  Output: The length𝑙𝑒𝑛𝑔𝑡lengthitalic_l italic_e italic_n italic_g italic_t italic_h of the overlapping segment between the two line segments.
  Begin algorithm
  1. If x1>x2subscript𝑥1subscript𝑥2x_{1}>x_{2}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT:x1,x2=x2,x1formulae-sequencesubscript𝑥1subscript𝑥2subscript𝑥2subscript𝑥1x_{1},x_{2}=x_{2},x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTy1,y2=y2,y1formulae-sequencesubscript𝑦1subscript𝑦2subscript𝑦2subscript𝑦1y_{1},y_{2}=y_{2},y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT2. If x3>x4subscript𝑥3subscript𝑥4x_{3}>x_{4}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT > italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT:x3,x4=x4,x3formulae-sequencesubscript𝑥3subscript𝑥4subscript𝑥4subscript𝑥3x_{3},x_{4}=x_{4},x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTy3,y4=y4,y3formulae-sequencesubscript𝑦3subscript𝑦4subscript𝑦4subscript𝑦3y_{3},y_{4}=y_{4},y_{3}italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT3. Calculate the slopes of both line segments using the formulas: slope1y2y1x2x1𝑠𝑙𝑜𝑝subscript𝑒1subscript𝑦2subscript𝑦1subscript𝑥2subscript𝑥1slope_{1}\leftarrow\frac{y_{2}-y_{1}}{x_{2}-x_{1}}italic_s italic_l italic_o italic_p italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← divide start_ARG italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG and slope2y4y3x4x3𝑠𝑙𝑜𝑝subscript𝑒2subscript𝑦4subscript𝑦3subscript𝑥4subscript𝑥3slope_{2}\leftarrow\frac{y_{4}-y_{3}}{x_{4}-x_{3}}italic_s italic_l italic_o italic_p italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ← divide start_ARG italic_y start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG. If slope1=slope2𝑠𝑙𝑜𝑝subscript𝑒1𝑠𝑙𝑜𝑝subscript𝑒2slope_{1}=slope_{2}italic_s italic_l italic_o italic_p italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_s italic_l italic_o italic_p italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT then the two line segments are parallel. If they are indeed parallel proceed to step 4, else length=0𝑙𝑒𝑛𝑔𝑡0length=0italic_l italic_e italic_n italic_g italic_t italic_h = 0 and proceed to step 8.4. If max(x1,x3)min(x2,x4)subscript𝑥1subscript𝑥3subscript𝑥2subscript𝑥4\max(x_{1},x_{3})\leq\min(x_{2},x_{4})roman_max ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ≤ roman_min ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ), then the two line segments overlap. In that case, proceed to step 5, else length=0𝑙𝑒𝑛𝑔𝑡0length=0italic_l italic_e italic_n italic_g italic_t italic_h = 0 and proceed to step 8.5. Calculate xesubscript𝑥𝑒x_{e}italic_x start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and xssubscript𝑥𝑠x_{s}italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, which are x𝑥xitalic_x coordinates of the endpoints of the overlapping segment, by using the following formulas:xs=max(x1,x3)subscript𝑥𝑠subscript𝑥1subscript𝑥3x_{s}=\max(x_{1},x_{3})italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = roman_max ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )xe=min(x2,x4)subscript𝑥𝑒subscript𝑥2subscript𝑥4x_{e}=\min(x_{2},x_{4})italic_x start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = roman_min ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT )6. Calculate yesubscript𝑦𝑒y_{e}italic_y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT and yssubscript𝑦𝑠y_{s}italic_y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, which are y𝑦yitalic_y coordinates of the endpoints of the overlapping segment, by using the following formulas:ys=y1+(xsx1)(y2y1)x2x1subscript𝑦𝑠subscript𝑦1subscript𝑥𝑠subscript𝑥1subscript𝑦2subscript𝑦1subscript𝑥2subscript𝑥1y_{s}=y_{1}+\frac{(x_{s}-x_{1})\cdot(y_{2}-y_{1})}{x_{2}-x_{1}}italic_y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG ( italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⋅ ( italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARGye=y3+(xex3)(y4y3)x4x3subscript𝑦𝑒subscript𝑦3subscript𝑥𝑒subscript𝑥3subscript𝑦4subscript𝑦3subscript𝑥4subscript𝑥3y_{e}=y_{3}+\frac{(x_{e}-x_{3})\cdot(y_{4}-y_{3})}{x_{4}-x_{3}}italic_y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + divide start_ARG ( italic_x start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ⋅ ( italic_y start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG7. Calculate the length𝑙𝑒𝑛𝑔𝑡lengthitalic_l italic_e italic_n italic_g italic_t italic_h of the overlapping segment, by using the following formula: length=(xexs)2+(yeys)2𝑙𝑒𝑛𝑔𝑡superscriptsubscript𝑥𝑒subscript𝑥𝑠2superscriptsubscript𝑦𝑒subscript𝑦𝑠2length=\sqrt{(x_{e}-x_{s})^{2}+(y_{e}-y_{s})^{2}}italic_l italic_e italic_n italic_g italic_t italic_h = square-root start_ARG ( italic_x start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_y start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG8. Return length𝑙𝑒𝑛𝑔𝑡lengthitalic_l italic_e italic_n italic_g italic_t italic_h.
  End algorithm
Algorithm 2 Shared Borders Policy Algorithm.
  Input: The N𝑁Nitalic_N Regions, each represented as a list of 4-tuples. The size of each list of 4-tuples is equal to the number of edges that particular Region has. Each 4-tuple corresponds to the x𝑥xitalic_x & y𝑦yitalic_y coordinates of the vertices that are connected by that particular edge.
  Output: The weighted Adjacency Matrix A𝐴Aitalic_A.
  Begin algorithm
  1. For each pair of lists (Regions) i,j𝑖𝑗i,jitalic_i , italic_j:2.    Initialize L0𝐿0L\leftarrow 0italic_L ← 0 3.    For each pair of 4-tuples (edges) k,l𝑘𝑙k,litalic_k , italic_l:4.        lengthLengthOfSharedBorder(tuplek,tuplel)𝑙𝑒𝑛𝑔𝑡𝐿𝑒𝑛𝑔𝑡𝑂𝑓𝑆𝑎𝑟𝑒𝑑𝐵𝑜𝑟𝑑𝑒𝑟𝑡𝑢𝑝𝑙subscript𝑒𝑘𝑡𝑢𝑝𝑙subscript𝑒𝑙length\leftarrow LengthOfSharedBorder(tuple_{k},tuple_{l})italic_l italic_e italic_n italic_g italic_t italic_h ← italic_L italic_e italic_n italic_g italic_t italic_h italic_O italic_f italic_S italic_h italic_a italic_r italic_e italic_d italic_B italic_o italic_r italic_d italic_e italic_r ( italic_t italic_u italic_p italic_l italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_t italic_u italic_p italic_l italic_e start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT )5.        LL+length𝐿𝐿𝑙𝑒𝑛𝑔𝑡L\leftarrow L+lengthitalic_L ← italic_L + italic_l italic_e italic_n italic_g italic_t italic_h6.     Ai,jLsubscript𝐴𝑖𝑗𝐿A_{i},j\leftarrow Litalic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j ← italic_L6. Return A𝐴Aitalic_A
  End algorithm

IV-C Adjustable Hops Policy

The incorporation of a wide range of features is of paramount importance in the context of enabling the WEST GCN-LSTM model to conduct accurate predictions. In the previous subsection we showcased how the lengths of the shared Borders can be leveraged in order to construct the adjacency matrix A𝐴Aitalic_A. In this subsection, we shall focus on showcasing how the Speed of a Population can be leveraged in order to calculate K𝐾Kitalic_K, which refers to the number of stacked graph convolution layers incorporated at each constructed WEST GCN-LSTM model.

Let us consider three distinct parameters. The first one is the time between two consecutive predictions or observations and it corresponds to the selected time duration twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT. The second one is the average speed of a specific Population that is referred to as Speedp𝑆𝑝𝑒𝑒subscript𝑑𝑝Speed_{p}italic_S italic_p italic_e italic_e italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. The third one is the average distance D𝐷Ditalic_D that has to be traversed in order to transit from the Center of one Region to the Center of one of its neighboring Regions.

When the prediction process is being implemented on the basis of a singular Population, one is able to select the appropriate twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT accordingly, in a manner that facilitates the establishment of consistent observations. For instance, in case that a Population moves at a significantly high speed, the chosen time between two consecutive observations can be decreased. Unfortunately, when a space is occupied by multiple Populations each one moving at a different Speed a problem arises, since in order to be able to formulate comparative analyses between them, it is necessary to utilize the same twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT parameter across the various Populations. The WEST GCN-LSTM model, when leveraging the aforementioned Shared Borders Policy, receives as input only the Regional Traffic that corresponds to neighboring Regions. In other words, when using a conventional GCN-LSTM model the observed Populations should be able to traverse one Region at most during one time-step. As a result, in cases that a subset of the Populations is able to traverse multiple Regions within a singular time-step, a GCN-LSTM model that is using the Shared Borders Policy shall lose its advantage since there will be a potentially significant loss of information.

In order to mitigate this issue, we propose the Adjustable Hops Policy. According to the paradigm of GCNs, the number of graph convolution layers that are being deployed within a singular model corresponds to the number of aggregation hops that are being conducted each time the convolution process takes place. So in case of one convolution layer, only the first degree neighbors are taken into consideration, in case of two convolution layers, the first and second degree neighbors are taken into consideration, etc.

The Adjustable Hops Policy commences by calculating the average distance between the Centers of the Regions and the identification of the Population with the lowest Speed. The parameter twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT is then adjusted in a manner that this specific Population shall be capable of traversing at most one Region during each observation time-step of duration equal to twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT. This Population shall be used as the baseline and the dedicated GCN-LSTM model that corresponds to it shall have only one GCN layer, since its members shall be able to traverse a single Region at most during each time-step. By using the same twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT parameter, calculate how many Regions at most each of the remaining Populations can traverse during a single time-stem. This number of Regions is equal to the number of GCN layers that shall be utilized for each of the corresponding Populations which is denoted by K𝐾Kitalic_K. The specifics of the Adjustable Hops Policy are presented in Alg. 3. It is worth mentioning that this policy results in the establishment of a dedicated prediction model for each distinct K𝐾Kitalic_K that emerges across the various Populations. Furthermore, in multi-step forecasting scenarios, like the ones that shall be explored in the next section of this work, twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT should be adjusted in order to take into account the U𝑈Uitalic_U prediction steps that are being considered and thus needs to be multiplied by U2𝑈2\frac{U}{2}divide start_ARG italic_U end_ARG start_ARG 2 end_ARG.

Algorithm 3 Adjustable Hops Policy Algorithm.
  Input: The Speedp𝑆𝑝𝑒𝑒subscript𝑑𝑝Speed_{p}italic_S italic_p italic_e italic_e italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT values, each one corresponding to one of the P𝑃Pitalic_P Populations. The N𝑁Nitalic_N 2-tuples, each one corresponding to the x𝑥xitalic_x & y𝑦yitalic_y coordinates of a Centern𝐶𝑒𝑛𝑡𝑒subscript𝑟𝑛Center_{n}italic_C italic_e italic_n italic_t italic_e italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of a Region. The U𝑈Uitalic_U value that corresponds to the number of prediction steps.
  Output: The array K𝐾Kitalic_K of size P𝑃Pitalic_P, each element of which corresponds to the maximum number of Regions, members of each corresponding Population can traverse during a singular time-step, whose duration is equal to T𝑇Titalic_T.
  Begin algorithm
  1. Initialize T0𝑇0T\leftarrow 0italic_T ← 02. Initialize MinSpeedSpeed0𝑀𝑖𝑛𝑆𝑝𝑒𝑒𝑑𝑆𝑝𝑒𝑒subscript𝑑0MinSpeed\leftarrow Speed_{0}italic_M italic_i italic_n italic_S italic_p italic_e italic_e italic_d ← italic_S italic_p italic_e italic_e italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT3. Initialize D0𝐷0D\leftarrow 0italic_D ← 04. For each pair of 2-tuples (Centern𝐶𝑒𝑛𝑡𝑒subscript𝑟𝑛Center_{n}italic_C italic_e italic_n italic_t italic_e italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT) k,l𝑘𝑙k,litalic_k , italic_l:5.     D=D+(xlxk)2+(ylyk)2𝐷𝐷superscriptsubscript𝑥𝑙subscript𝑥𝑘2superscriptsubscript𝑦𝑙subscript𝑦𝑘2D=D+\sqrt{{(x_{l}-x_{k})}^{2}+{(y_{l}-y_{k})}^{2}}italic_D = italic_D + square-root start_ARG ( italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_y start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG6. D=DN2𝐷𝐷superscript𝑁2D=\frac{D}{N^{2}}italic_D = divide start_ARG italic_D end_ARG start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG7. For each Speed𝑆𝑝𝑒𝑒𝑑Speeditalic_S italic_p italic_e italic_e italic_d value p𝑝pitalic_p:8.     If Speedp<MinSpeed𝑆𝑝𝑒𝑒subscript𝑑𝑝𝑀𝑖𝑛𝑆𝑝𝑒𝑒𝑑Speed_{p}<MinSpeeditalic_S italic_p italic_e italic_e italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT < italic_M italic_i italic_n italic_S italic_p italic_e italic_e italic_d, then: MinSpeedSpeedp𝑀𝑖𝑛𝑆𝑝𝑒𝑒𝑑𝑆𝑝𝑒𝑒subscript𝑑𝑝MinSpeed\leftarrow Speed_{p}italic_M italic_i italic_n italic_S italic_p italic_e italic_e italic_d ← italic_S italic_p italic_e italic_e italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT9. If U>1𝑈1U>1italic_U > 1: 10.     twindowDMinspeedU2subscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤𝐷𝑀𝑖𝑛𝑠𝑝𝑒𝑒𝑑𝑈2t_{window}\leftarrow\frac{D}{Minspeed}\cdot\frac{U}{2}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT ← divide start_ARG italic_D end_ARG start_ARG italic_M italic_i italic_n italic_s italic_p italic_e italic_e italic_d end_ARG ⋅ divide start_ARG italic_U end_ARG start_ARG 2 end_ARG11. else:12.    twindowDMinspeedsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤𝐷𝑀𝑖𝑛𝑠𝑝𝑒𝑒𝑑t_{window}\leftarrow\frac{D}{Minspeed}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT ← divide start_ARG italic_D end_ARG start_ARG italic_M italic_i italic_n italic_s italic_p italic_e italic_e italic_d end_ARG13. For each Speed𝑆𝑝𝑒𝑒𝑑Speeditalic_S italic_p italic_e italic_e italic_d value p𝑝pitalic_p:14.     Kp=round(SpeedptwindowD)subscript𝐾𝑝round𝑆𝑝𝑒𝑒subscript𝑑𝑝subscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤𝐷K_{p}=\text{round}\left(\frac{Speed_{p}\cdot t_{window}}{D}\right)italic_K start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = round ( divide start_ARG italic_S italic_p italic_e italic_e italic_d start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ⋅ italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT end_ARG start_ARG italic_D end_ARG )15. Return K𝐾Kitalic_K
  End algorithm

V Experimental Evaluation

This section is dedicated to evaluating the efficiency of the proposed solution. Towards achieving this goal, the experimental evaluation consists of 19 forecasting models. The various forecasting models were designed and implemented via Python 3.9.13 and Tensorflow 2.9.1. Additionally, the Hardware Backend that was used for training and inference is a i5-11400 CPU and a NVIDIA GeForce RTX 3060 GPU.

Furthermore, the experiments were conducted on the basis of a real and a synthetic dataset that was constructed using the Simulation of Urban MObility (SUMO) [54] framework. SUMO is a versatile traffic simulator with the capability to handle extensive mobility networks. It incorporates various modes of transportation, including pedestrians, and is equipped with a plethora of tools for generating diverse mobility scenarios. The simulator exhibits realistic features of pedestrian mobility, such as pedestrian-pedestrian interactions in close proximity, reasonable walking speeds, and natural movement patterns. Additionally, SUMO enables pedestrians to interact safely by implementing features such as collision avoidance.

In order to maintain consistency throughout the experimental evaluation and across the two datasets, the authors of this work have chosen to implement the following format selections for both the experimental protocol and the parameter choices that supported the assessment of the models. The space that each dataset covers was divided into 6666 Regions. Each of the two datasets was split using the 80/20%80percent2080/20\%80 / 20 % ratio for training and testing respectively. In the context of multi-step forecasting, 6666 time-steps were considered for the input and output sequences. Finally, the parameters of all of the examined forecasting models were tuned such as KerasTuner.

V-A Evaluation Metrics

The proposed model’s performance is assessed using Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE), as evaluation metrics. These metrics are suitable for evaluating predictions of continuous numbers.

MAE measures the average magnitude of errors in a set of predictions as it is given in Eq. 13 where yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the ground truth values and yi^^subscript𝑦𝑖\hat{y_{i}}over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG are the predictions, while n𝑛nitalic_n is used to denote the total number of predictions in the evaluated/given set.

MAE=1ni=1n|yiyi^|𝑀𝐴𝐸1𝑛superscriptsubscript𝑖1𝑛subscript𝑦𝑖^subscript𝑦𝑖MAE=\frac{1}{n}\sum_{i=1}^{n}\lvert y_{i}-\hat{y_{i}}\rvertitalic_M italic_A italic_E = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG | (13)

MSE is a commonly used metric for evaluating the average squared difference between observed values and predicted values in regression problems and is given in Eq. 14.

MSE=1ni=1n(yiy^i)2𝑀𝑆𝐸1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖subscript^𝑦𝑖2MSE=\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\hat{y}_{i})^{2}italic_M italic_S italic_E = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (14)

Finally, RMSE is the squared version of MSE, it is used as a measure for the standard deviation of the prediction errors and it is given in Eq. 15.

RMSE=1ni=1n(yiyi^)2𝑅𝑀𝑆𝐸1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖^subscript𝑦𝑖2RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\hat{y_{i}})^{2}}italic_R italic_M italic_S italic_E = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (15)

Using a combination of MAE, MSE, and RMSE provides a more comprehensive evaluation of a predictive model. Each metric captures different aspects of the model’s performance, and using all three can offer a more nuanced understanding. MAE measures the average absolute errors, MSE measures the average squared errors, giving more weight to larger errors, and RMSE is the square root of MSE, providing an interpretable metric in the same unit as the data. The lower these metrics are, the more efficient the corresponding forecasting model is.

V-B Benchmark Datasets

In order to gain a comprehensive understanding of the proposed solution’s capabilities, as well as of the experimental process that was employed in order to evaluate it, it is of paramount importance to delve deeper into the specifics of the two datasets that were used. These datasets are a subset of the Berlin (Cycling) dataset and the Central Park (Pedestrian) dataset, the latter of which was generated using SUMO.

  • Central Park: As part of our study, we conducted a simulation of pedestrian traffic in the New York City area, over a period of seven days. Specifically, we focused on Central Park and the adjoining urban districts, modeling different traffic patterns. The generated dataset encompasses the movement attributes of 200,000-230,000 individuals on a daily basis, delineating their position and velocity per second. Central Park and the adjoining urban districts offer pedestrians the opportunity to take breaks and explore the different attractions or smart city features available to them, as well as to temporarily pause at points of interest such as interactive public art installations, historic landmarks, or food trucks and markets. Additionally, given the park’s function as a sports venue, the simulation incorporates variable pedestrian speeds that simulate jogging or running. The distribution of speeds aims to realistically capture the motility characteristics of pedestrians based on factors including such as age, type of activity, and geospatial context. The twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT for this dataset is equal to 5555 minutes.

  • Berlin: The Berlin Cycling Dataset (https://www.kaggle.com/datasets/phisinger/bike-counting-berlin), procured by the Berlin administration in Berlin city, Germany, provides a comprehensive understanding of the long-term developments in bicycle traffic, including trends and seasonal fluctuations, through the use of Automatic Permanent Counting Points (APCPs). The dataset includes 9 years of bicycle traffic counts (2012-2020) collected using induction loops and sensors at APCPs. Bicycles passing over the detection cross-sections of the counting points are counted as they cause changes in the induced electromagnetic field, which are subsequently analyzed by the sensor and recorded as counting pulses. For roads with separate bicycle traffic guidance in each direction of travel, one counting station is established for each direction of travel, while for cross-sections with shared bicycle traffic guidance in both directions of travel, one counting station is installed for both directions of travel. The continuous counting process identifies bicycles based on specific geometries detected by the sensors while other vehicles are filtered out. The twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT for this dataset is equal to 1111 hour.

Furthermore, the experimental evaluation process includes two additional datasets that are referred to as Central Park (Low) and Central Park (High). These two datasets derived from the Central Park dataset by categorizing the moving entities based on their Speed and assigning them to the corresponding dataset. More specifically, the Central Park (Low) consists of moving entities that can traverse utmost one Region during a single time-step (K=1𝐾1K=1italic_K = 1), while the Central Park (High) dataset consists of moving entities that can traverse utmost two Regions during a single time-step (K=2𝐾2K=2italic_K = 2). In other words, the aforementioned Central Park dataset is the amalgamation of the Central Park (Low) and Central Park (High) datasets. It is worth mentioning that these two datasets contain a similar total number of recorded moving entities (Central Park (High) contains about 4%percent44\%4 % more moving entities compared to Central Park (Low)).

The aforementioned datasets can be leveraged to provide information regarding the Regions and the corresponding Regional Traffic that manifests during each time-step. In terms of Regional Traffic, both of these datasets enable direct access to the numerous recorded geolocation points throughout the duration of the simulation. However, the choice of an appropriate partitioning approach is contingent upon the specific context and challenges encountered. While these datasets provide valuable information regarding numerous recorded geolocation points, throughout the examined periods of time, they do not incorporate any notion of distinct spatial Regions. Thus, we had to construct the required Regions using the following approach. More specifically, we applied a k-means clustering algorithm to the various geolocation points that belong to each training dataset. The k𝑘kitalic_k variable that corresponds to the k-means clustering algorithm was selected to be equal to N=6𝑁6N=6italic_N = 6. The coordinates of the created centroids were selected to serve as the corresponding Centers of the N𝑁Nitalic_N Regions. Finally, dedicated Voronoi diagrams were created, for each dataset, using these N𝑁Nitalic_N Centers. The representation of the Voronoi diagram, for a set of n𝑛nitalic_n points P={(xi,yi)}𝑃subscript𝑥𝑖subscript𝑦𝑖P=\{(x_{i},y_{i})\}italic_P = { ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) }, is achieved by using the following data structures:

  • A list V𝑉Vitalic_V of Voronoi regions, where each region is associated with one of the input points.

  • A list E𝐸Eitalic_E of Voronoi edges that define the boundaries between regions.

In order to calculate each Voronoi cell, one needs to find the bisectors between point (xi,yi)subscript𝑥𝑖subscript𝑦𝑖(x_{i},y_{i})( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and all other points in P𝑃Pitalic_P, and then to clip the edges to the bounding box. Then, to merge any overlapping edges in E𝐸Eitalic_E to form the complete Voronoi diagram. This approach enables the examined Regions to encompass diverse geographic areas, exhibit varying population densities and feature distinct mobility patterns, while maintaining a constant size throughout the experimentation process. On top of that, the selection of datasets encompasses two distinct urban mobility scenarios (pedestrian and cycling), with the aim of embracing a broader spectrum of urban applications and assessing the models’ capacity and generalization power across various vector input spaces that exhibit diverse statistical properties and characteristics. These properties and characteristics include different temporal granularities in the context of the chosen twindowsubscript𝑡𝑤𝑖𝑛𝑑𝑜𝑤t_{window}italic_t start_POSTSUBSCRIPT italic_w italic_i italic_n italic_d italic_o italic_w end_POSTSUBSCRIPT, the size of the Regions, the trend, seasonality,and volume of regional traffic. For instance, Fig. 3 presents a comprehensive and nuanced portrayal of the different densities of the regional traffic volume that corresponds to each examined dataset using normalized Kernel Density Estimate (KDE) plots. KDE plots are a method for visualizing the distribution of observations in a dataset, by representing the data using a continuous probability density curve in a given number of dimensions. As the figure indicates there is a significant diversity in terms of regional traffic volume distribution across the various datasets and underlying Regions. This type of diversity of experimental conditions is of vital importance in the context of evaluating the efficiency of the proposed solution in a robust manner.

Refer to caption
Figure 3: Densities of the regional traffic volumes that correspond to the: (1) Berlin , (2) Central Park, (3) Central Park (Low), and (4) Central Park (High) datasets.

V-C Experimental Results

Towards evaluating the efficiency of the proposed solution, on top of the aforementioned diversity of experimental conditions, the authors of this work have included a diverse ensemble of peer competitors. This ensemble consists of numerous forecasting models based on the LR ( LASSO [55], Ridge [56], Elastic Net [57], Lasso Lars [58]), the ML (KNN [59], Decision Trees [60], Tree Regression [61], Bagged Decision Trees [62], Random Forest [63], Extra Trees Regressor [64]), and the ED paradigms. Furthermore, it includes the three variations of the GCN-LSTM model that have been used in the context of regional traffic forecasting. The first one constructs the adjacency matrices based on the distance between the nodes of the graph, the second one creates the adjacency matrices on the basis of traffic pattern similarity matrices, and the third one establishes binary adjacency matrices based on whether or not the involved nodes are neighboring.

The latter of these variations also serves as part of an ablation study that investigates the performance of the proposed solution by removing certain components in order to understand the contribution of the component to the overall system. As stated before, alongside the rest of the peer competitors, the proposed WEST GCN-LSTM is also tested against a GCN-LSTM model that uses binary graph convolution based on whether or not the examined Regions are neighboring. Furthermore, aside from the fully implemented WEST GCN-LSTM that leverages both proposed policies, we also explore a version of the WEST GCN-LSTM that is denoted as WE GCN-LSTM that solely relies on the Shared Borders Policy. The WE GCN-LSTM is suitable for Regional Traffic forecasting scenarios that do not incorporate the velocities of the various Populations, such as the Berlin dataset. Aside from the aforementioned dataset, we also considered a version of the Central Park dataset that does not include the velocities of the Populations. This ablation study serves as the third and final pillar that guarantees the robustness of the experimental evaluation process.

In accordance with the aforementioned ablation study, the experiments are designed to evaluate the efficiency of the proposed solution in two distinct contexts: one that leverages both of the proposed policies (WEST GCN-LSTM) and one that uses only the Shared Borders Policy (WE GCN-LSTM). The evaluation process is conducted in the form of a comparative analysis against the various peer competitors. More specifically, Table II displays the results of the WE GCN-LSTM model, compared against the various peer competitors in terms of MSE, RMSE, and MAE. These results correspond to the Berlin dataset, and Central Park dataset. For each of these datasets, the various forecasting models were trained and hyper-tuned independently. The displayed MSE, RMSE, and MAE values represent the average results (corresponding to the 6666 prediction time-steps and Regions) for each combination of forecasting models and datasets.

TABLE II: Experimental results for the use of the Shared Borders Policy (WE GCN-LSTM).
Model Berlin Central Park
MSE RMSE MAE MSE RMSE MAE
LASSO 6813.74 82.576 55.856 28521.76 168.795 84.334
Ridge 6814.95 82.579 55.862 28550.85 168.974 84.462
Elastic Net 6813.86 82.576 55.855 28538.94 168.857 84.362
Lasso Lars 13969.78 118.304 91.777 19989.29 141.357 60.033
KNN 3040.22 55.111 33.134 148802.20 385.707 192.281
Decision Tree 5827.81 76.231 44.214 140607.77 375.077 181.793
Tree Regression 6746.09 82.151 47.549 143243.34 378.228 185.868
Bagged Decision Trees 3062.17 55.336 34.217 16767.66 409.477 194.014
Random Forest Reg 2742.04 52.215 34.337 198535.42 446.687 202.097
Extra Trees Regressor 2934.09 54.107 35.935 206976.72 454.603 183.070
LSTM ED 3678.18 60.565 42.507 145619.28 381.796 158.338
BD-LSTM ED 2802.66 52.886 36.302 169972.55 412.567 199.281
CNN-LSTM 3243.21 56.903 36.566 127684.06 355.516 123.458
Hybrid LSTM ED 5063.64 71.291 49.423 49496.91 221.573 158.938
Hybrid LSTM ATT ED 5872.85 76.589 51.105 59756.66 244.397 169.515
GCN-LSTM (traffic) 2515.62 50.156 31.744 14717.40 121.342 46.102
GCN-LSTM (centers) 2914.56 53.969 33.730 21114.36 145.459 56.010
GCN-LSTM (binary) 2332.97 48.263 30.354 16376.95 127.857 49.175
WE GCN-LSTM (ours) 1989.07 44.599 26.639 11718.71 108.253 42.594

Furthermore, Table III displays the results of the WEST GCN-LSTM model, compared against the various peer competitors in terms of MSE, RMSE, and MAE. These results correspond to the Central Park (Low) & (High) datasets.

TABLE III: Experimental results for the use of both policies (WEST GCN-LSTM).
Model Central Park
Low High
MSE RMSE MAE MSE RMSE MAE
LASSO 1957.84 44.275 27.170 38499.57 196.233 87.155
Ridge 1956.19 44.318 27.219 38694.22 196.735 87.312
Elastic Net 1956.68 44.289 27.187 38545.71 196.424 87.243
Lasso Lars 2288.14 47.848 34.168 32977.04 181.691 68.060
KNN 5153.61 71.738 47.242 211533.68 460.262 157.145
Decision Tree 7798.36 88.195 52.955 220273.80 468.947 171.820
Tree Regression 7892.61 88.865 55.491 205569.80 453.318 159.357
Bagged Decision Trees 4199.69 64.813 41.567 215214.93 464.097 168.351
Random Forest Reg 3947.21 62.900 39.293 216241.91 464.933 167.393
Extra Trees Regressor 3393.43 58.297 35.494 180371.66 424.022 152.694
LSTM ED 1401.66 37.422 27.564 63112.45 251.224 146.177
BD-LSTM ED 1440.24 37.943 27.442 57015.36 238.590 138.722
CNN-LSTM 1264.56 35.571 25.222 47348.78 217.745 122.168
Hybrid LSTM ED 1637.35 40.471 31.594 70813.06 266.018 165.205
Hybrid LSTM ATT ED 1705.97 41.316 31.202 93655.52 305.773 180.976
GCN-LSTM (traffic) 1105.94 33.224 27.186 17656.01 132.887 57.776
GCN-LSTM (centers) 1426.83 37.783 28.987 23634.92 153.532 73.110
GCN-LSTM (binary) 1179.33 34.329 27.917 20463.79 142.843 63.485
WEST GCN-LSTM (ours) 802.87 28.335 22.861 12106.60 110.030 47.904

V-D Discussion

Before we proceed to analyzing the experimental results, it is of paramount importance to delve deeper into the intricacies of each dataset. As depicted in Fig. 3, the regional traffic volume of the Berlin dataset is slightly larger compared to Central Park (Low). Furthermore, the volume of regional traffic of the Central Park and Central Park (High) datasets are quite similar, while both of them are significantly larger compared to the Regional Traffic (Low) dataset. This is due to the fact that in the context of the conducted experiments, fast-moving entities tend to emerge in a rather aperiodic manner, thus creating sudden bursts in the volume of regional traffic, contrary to slower-moving entities that are closely associated with periodic phenomena. Across all four datasets the larger regional traffic volume values are monopolized by a couple of regions. However, in the case of the Central Park and the Central Park (High) datasets, we witness that four regions are associated with quite large volumes of regional traffic. This inequality among Regions, in the context of their corresponding traffic, constitutes one of the most significant challenges that the various forecasting models have to overcome. More specifically, in the frame of the conducted experiments, we came across two types of regional traffic inequality, which are referred to as major inequality that corresponds to the Berlin and the Central Park (Low) datasets, and as minor inequality that corresponds to the Central Park and the Central Park (High) datasets. The manifestation of each type of regional traffic inequality highly depends on the moving speed of the various entities.

The experimental results indicate a plethora of interesting insights that are worth exploring. Let us begin our exploratory analysis by not considering the forecasting models that are based on the GCN-LSTM paradigm. When doing so, we see that for the Berlin dataset ML and ED solutions manage to perform the best, for the Central Park and Central Park (High) datasets LR solutions manage to outperform their competitors, and for the Central Park (Low) dataset ED solutions achieve the best scores. In other words, the more advanced ED solutions are better equipped to handle scenarios of major regional traffic inequality, while the quite simplistic LR solutions manage to outperform their competitors in scenarios that are characterized by minor regional traffic inequality. This is due to the fact that LR models are less sensitive to input fluctuations and as such, they tend to converge to none optimal solutions. Thus, in the case of minor regional traffic inequality, the extent that they take into account the couple of Regions that monopolize traffic volume does not alter significantly the basis upon which they conduct their predictions, and consequently produce superior results.

When we include forecasting models that are based on the GCN-LSTM paradigm, we see that these models manage to either outperform or perform very closely to the best of their competitors across all examined datasets. This showcases the fact that the information filtering process that is intertwined with graph convolution is capable of mitigating the rather negative impact that regional traffic inequalities have on the various forecasting models. However, the relative performance of each of these models, when compared to its peers, is closely associated with the reasoning behind each information distillation process and the characteristics of each dataset. The performance of the GCN-LSTM (centers) model highly depends on the topology of the Regions and the assumption that the optimal way to represent the relation between two Regions is the proximity to each other. Since in our experiments we did not consider homogeneously shaped Regions, this approach produced the worst results in terms of prediction accuracy compared to its peers. Nevertheless, it produced results that are similar to the best LR, ML, and ED models. On the contrary, the performance of the GCN-LSTM (traffic) model depends on the traffic correlations among the various Regions. Real data exhibit a wide range of statistical properties within entity motion patterns. Consequently, the traffic-based relations among Regions that the GCN-LSTM (traffic) is designed to exploit are more complex and harder to encapsulate in the case of real data. As a result, the GCN-LSTM (traffic) model managed to produce good results in the case of the Central Park, Central Park (Low), and Central Park (High) datasets, while it performed less optimally in the case of the Berlin dataset. Last but not least, the GCN-LSTM (binary) model managed to perform, relatively to its peers, in the most consistent manner across all datasets. In fact, it managed to outperform the GCN-LSTM (centers) model across all datasets, and the GCN-LSTM (centers) in the case of the Berlin dataset. This serves as a quite significant indication that the choice to construct the adjacency matrices based on the neighboring status of the various Regions is a robust approach that manages to provide satisfying results, regardless of the characteristics of each examined dataset.

As stated before, the proposed solution is an attempt at refining the aforementioned neighbor-based approach. So let us examine the performance of the proposed solution using Fig. 4. Fig. 4 displays the performance of the proposed solution, of the best LR, ML, ED, and GCN-LSTM solutions, as well as of the GCN-LSTM (binary) model, in a comparative manner that is based on the use of normalized RMSE values. The same process was also performed for the MSE and the MAE results, and despite some minor differences, the overall conclusions remained the same. Thus, we decided to not include them as part of this discussion. The proposed solution clearly outperformed all of its competitors, across all examined datasets. More specifically, for the Berlin dataset, the proposed solution outperformed the best LR model by 46%percent4646\%46 %, the best ML model by 15%percent1515\%15 %, the best ED model by 15%percent1515\%15 %, and the best GCN-LSTM model by 8%percent88\%8 %. Furthermore, for the Central Park dataset, the proposed solution outperformed the best LR model by 23%percent2323\%23 %, the best ML model by 71%percent7171\%71 %, the best ED model by 51%percent5151\%51 %, and the best GCN-LSTM model by 10%percent1010\%10 %. For the Central Park (Low) dataset, the proposed solution outperformed the best LR model by 36%percent3636\%36 %, the best ML model by 51%percent5151\%51 %, the best ED model by 20%percent2020\%20 %, and the best GCN-LSTM model by 15%percent1515\%15 %. Finally for the Central Park (High) dataset, the proposed solution outperformed the best LR model by 39%percent3939\%39 %, the best ML model by 73%percent7373\%73 %, the best ED model by 49%percent4949\%49 %, and the best GCN-LSTM model by 16%percent1616\%16 %.

Refer to caption
Figure 4: Normalized RMSE values that correspond to the proposed solution (ours), to GCN-LSTM (binary), as well as to the best LR, ML, ED, and GCN-LSTM solutions.

The aforementioned experimental results are indicative of the fact that the proposed solution is capable of outperforming its competitors by a quite significant margin, in a consistent manner, across all examined datasets. However, it is of paramount importance to also examine the proposed solution in a manner that is aligned with the ablation study paradigm, in order to showcase the effect of each part of the proposed solution on its overall performance. The proposed solution outperformed the GCN-LSTM (binary) model by 8%percent88\%8 % for the Berlin dataset, by 14%percent1414\%14 % for the Central Park dataset, by 17%percent1717\%17 % for the Central Park (Low) dataset, and by 22%percent2222\%22 % for the Central Park (High) dataset. As mentioned earlier, the Berlin and Central Park datasets were used to test a version of the proposed solution that uses only the Shared Borders Policy (WE GCN-LSTM). Thus, the fact that the proposed solution that leverages both policies (WEST GCN-LSTM) outperformed the GCN-LSTM (binary) by a greater margin, when compared to the margin by which WE GCN-LSTM surpassed GCN-LSTM (binary), showcases that the fully implemented proposed solution that uses both policies is more efficient. The same principle applies when comparing the proposed solution against the rest of its competitors. The WE GCN-LSTM outperformed the second-best model by 8%percent88\%8 % for the Berlin dataset, and by 10%percent1010\%10 % for the Central Park dataset, while the WEST GCN-LSTM model outperformed the second-best model by 15%percent1515\%15 % for the Central Park (Low) dataset, and by 16%percent1616\%16 % for the Central Park (High) dataset. Since the Central Park dataset is the amalgamation of the Central Park (Low) and the Central Park (High) datasets, experimental consistency, in regards to the comparison between the WE GCN-LSTM and the WEST GCN-LSTM models, is achieved via the use of combinations of datasets that entail identical regional traffic characteristics, as well as across datasets that exhibit different types of regional and traffic-based characteristics.

VI Conclusions and Future Research

The objective of this work is to introduce an advanced prediction model aimed at enhancing the accuracy of regional traffic forecasting through the refinement and fusion of information. Towards achieving this goal, we extend spatio-temporal graph neural networks in alignment with the regional traffic forecasting paradigm. Our approach is based on the proposal of a novel architecture for spatio-temporal graph neural networks, incorporating weighted stacked graph convolutions. The implementation of weighted stacked convolutions necessitates the calculation of a weighted adjacency matrix (denoted as A𝐴Aitalic_A) and determining the number of graph convolution layers (denoted as K𝐾Kitalic_K). To address this, we introduce two novel algorithms that utilize information about the speed of the populations and the topography of the regions that they traverse to compute the adjacency matrix and K𝐾Kitalic_K. In the context of this work, these policies are referred to as the Shared Borders Policy and the Adjustable Hops Policy. This implementation design enables the proposed solution to encapsulate both the spatial and temporal characteristics that are intertwined with regional traffic forecasting in an optimal and more refined manner.

In order to evaluate the efficiency of the proposed solution, we conducted numerous experiments. The proposed solution managed to significantly outperform its competitors in the frame of an experimental evaluation that consists of 19 forecasting models, across all examined datasets. This is due to the fact that the proposed solution manages to simultaneously encapsulate both temporal and spatial aspects of regional traffic forecasting, through information fusion. Furthermore, through information distillation, the proposed solution is also capable of mitigating the dire ramifications of regional traffic inequality. Finally, an additional ablation study concluded that each of one the three parts of the proposed solution serves towards boosting the performance of the proposed solution.

In terms of future research, there are are various directions in order to potentially enhance the performance of the proposed solution even further. One of these directions derives from the fact that the experimental evaluation process did not account for various forms of obstacles that may affect the resulting weighted Adjacency Matrix A𝐴Aitalic_A that is calculated on the basis of the Shared Borders Policy. Thus, a more refined version of Alg. 1 that takes into consideration such obstacles could result in a more efficient prediction model. Furthermore, it is worth examining the use of hybrid approaches for constructing the weighted Adjacency Matrix A𝐴Aitalic_A. These hybrid approaches could stem from the combination of several methodologies that have been already examined in the frame of the corresponding scientific literature, and the proposed solution.

Acknowledgment

This project has received funding from the European Union’s Horizon 2020 research and innovation programmes under grant agreements No 101016509 (CHARITY) and No 777695 (MASTER). The work reflects only the authors’ view, and the EU Agency is not responsible for any use that may be made of the information it contains.

References

  • [1] T. Verma, M. Sirenko, I. Kornecki, S. Cunningham, and N. A. Araújo, “Extracting spatiotemporal commuting patterns from public transit data,” Journal of Urban Mobility, vol. 1, p. 100004, 2021.
  • [2] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and challenges,” IEEE internet of things journal, vol. 3, no. 5, pp. 637–646, 2016.
  • [3] A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi, “Internet of things for smart cities,” IEEE Internet of Things journal, vol. 1, no. 1, pp. 22–32, 2014.
  • [4] C. Schindelhauer, “Mobility in wireless networks,” in International Conference on Current Trends in Theory and Practice of Computer Science.   Springer, 2006, pp. 100–116.
  • [5] P. K. Singh, P. K. D. Pramanik, A. K. Dey, and P. Choudhury, “Recommender systems: an overview, research trends, and future directions,” International Journal of Business and Systems Research, vol. 15, no. 1, pp. 14–52, 2021.
  • [6] C. Ilin, S. Annan-Phan, X. H. Tai, S. Mehra, S. Hsiang, and J. E. Blumenstock, “Public mobility data enables covid-19 forecasting and management at local and global scales,” Scientific reports, vol. 11, no. 1, p. 13531, 2021.
  • [7] T. Snyder and G. Byrd, “The internet of everything,” Computer, vol. 50, no. 06, pp. 8–9, 2017.
  • [8] L. Alessandretti, P. Sapiezynski, S. Lehmann, and A. Baronchelli, “Multi-scale spatio-temporal analysis of human mobility,” PloS one, vol. 12, no. 2, p. e0171686, 2017.
  • [9] H. Barbosa, M. Barthelemy, G. Ghoshal, C. R. James, M. Lenormand, T. Louail, R. Menezes, J. J. Ramasco, F. Simini, and M. Tomasini, “Human mobility: Models and applications,” Physics Reports, vol. 734, pp. 1–74, 2018.
  • [10] L. Po, F. Rollo, C. Bachechi, and A. Corni, “From sensors data to urban traffic flow analysis,” in 2019 IEEE International Smart Cities Conference (ISC2).   IEEE, 2019, pp. 478–485.
  • [11] C. Cao and N. S.-N. Lam, “Understanding the scale and resolution effects in remote sensing and gis,” in Scale in remote sensing and GIS.   Routledge, 2023, pp. 57–72.
  • [12] S. A. Kashinath, S. A. Mostafa, A. Mustapha, H. Mahdin, D. Lim, M. A. Mahmoud, M. A. Mohammed, B. A. S. Al-Rimy, M. F. M. Fudzee, and T. J. Yang, “Review of data fusion methods for real-time and multi-sensor traffic flow analysis,” IEEE Access, vol. 9, pp. 51 258–51 276, 2021.
  • [13] D. Liu, S. Hui, L. Li, Z. Liu, and Z. Zhang, “A method for short-term traffic flow forecasting based on gcn-lstm,” in 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL).   IEEE, 2020, pp. 364–368.
  • [14] S. Wu, “Spatiotemporal dynamic forecasting and analysis of regional traffic flow in urban road networks using deep learning convolutional neural network,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 2, pp. 1607–1615, 2022.
  • [15] J. Van Lint, S. P. Hoogendoorn, and H. J. van Zuylen, “Freeway travel time prediction with state-space neural networks: Modeling state-space dynamics with recurrent neural networks,” Transportation Research Record, vol. 1811, no. 1, pp. 30–39, 2002.
  • [16] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber et al., “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” 2001.
  • [17] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short-term memory neural network for traffic speed prediction using remote microwave sensor data,” Transportation Research Part C: Emerging Technologies, vol. 54, pp. 187–197, 2015.
  • [18] S. Du, T. Li, Y. Yang, X. Gong, and S.-J. Horng, “An lstm based encoder-decoder model for multistep traffic flow prediction,” in 2019 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2019, pp. 1–8.
  • [19] S. Nadeeshan and A. S. Perera, “Multi-step bidirectional lstm for low frequent bus travel time prediction,” in 2021 Moratuwa Engineering Research Conference (MERCon).   IEEE, 2021, pp. 462–467.
  • [20] M. Cao, V. O. Li, and V. W. Chan, “A cnn-lstm model for traffic speed prediction,” in 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring).   IEEE, 2020, pp. 1–5.
  • [21] T. Theodoropoulos, A.-C. Maroudis, J. Violos, and K. Tserpes, “An encoder-decoder deep learning approach for multistep service traffic prediction,” in 2021 IEEE Seventh International Conference on Big Data Computing Service and Applications (BigDataService).   IEEE, 2021, pp. 33–40.
  • [22] J. Violos, T. Theodoropoulos, A.-C. Maroudis, A. Leivadeas, and K. Tserpes, “Self-attention based encoder-decoder for multistep human density prediction,” Journal of urban mobility, vol. 2, p. 100022, 2022.
  • [23] X. Su, X. Yan, and C.-L. Tsai, “Linear regression,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 4, no. 3, pp. 275–294, 2012.
  • [24] M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015.
  • [25] M. Gori, G. Monfardini, and F. Scarselli, “A new model for learning in graph domains,” in Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., vol. 2, 2005, pp. 729–734 vol. 2.
  • [26] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020.
  • [27] H. Dai, Z. Kozareva, B. Dai, A. Smola, and L. Song, “Learning steady-states of iterative algorithms over graphs,” in International conference on machine learning.   PMLR, 2018, pp. 1106–1114.
  • [28] S. Zhang, H. Tong, J. Xu, and R. Maciejewski, “Graph convolutional networks: a comprehensive review,” Computational Social Networks, vol. 6, no. 1, pp. 1–23, 2019.
  • [29] Z. Hou, X. Liu, Y. Cen, Y. Dong, H. Yang, C. Wang, and J. Tang, “Graphmae: Self-supervised masked graph autoencoders,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 594–604.
  • [30] K.-H. N. Bui, J. Cho, and H. Yi, “Spatial-temporal graph neural network for traffic forecasting: An overview and open research issues,” Applied Intelligence, vol. 52, no. 3, pp. 2763–2774, 2022.
  • [31] Y. Liu, S. Rasouli, M. Wong, T. Feng, and T. Huang, “Rt-gcn: Gaussian-based spatiotemporal graph convolutional network for robust traffic prediction,” Information Fusion, vol. 102, p. 102078, 2024.
  • [32] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li, “T-gcn: A temporal graph convolutional network for traffic prediction,” IEEE transactions on intelligent transportation systems, vol. 21, no. 9, pp. 3848–3858, 2019.
  • [33] S. Abu-El-Haija, B. Perozzi, A. Kapoor, N. Alipourfard, K. Lerman, H. Harutyunyan, G. Ver Steeg, and A. Galstyan, “Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing,” in international conference on machine learning.   PMLR, 2019, pp. 21–29.
  • [34] G. Nikolentzos, G. Dasoulas, and M. Vazirgiannis, “k-hop graph neural networks,” Neural Networks, vol. 130, pp. 195–205, 2020.
  • [35] J. Feng, Y. Chen, F. Li, A. Sarkar, and M. Zhang, “How powerful are k-hop message passing graph neural networks,” Advances in Neural Information Processing Systems, vol. 35, pp. 4776–4790, 2022.
  • [36] J. Zhao, Z. Yan, X. Chen, B. Han, S. Wu, and R. Ke, “k-gcn-lstm: A k-hop graph convolutional network and long–short-term memory for ship speed prediction,” Physica A: Statistical Mechanics and its Applications, vol. 606, p. 128107, 2022.
  • [37] W. Jiang, J. Luo, M. He, and W. Gu, “Graph neural network for traffic forecasting: The research progress,” ISPRS International Journal of Geo-Information, vol. 12, no. 3, p. 100, 2023.
  • [38] W. Jiang and J. Luo, “Graph neural network for traffic forecasting: A survey,” Expert Systems with Applications, vol. 207, p. 117921, 2022.
  • [39] K. Chen, F. Chen, B. Lai, Z. Jin, Y. Liu, K. Li, L. Wei, P. Wang, Y. Tang, J. Huang et al., “Dynamic spatio-temporal graph-based cnns for traffic flow prediction,” IEEE Access, vol. 8, pp. 185 136–185 145, 2020.
  • [40] H. Peng, H. Wang, B. Du, M. Z. A. Bhuiyan, H. Ma, J. Liu, L. Wang, Z. Yang, L. Du, S. Wang et al., “Spatial temporal incidence dynamic graph neural networks for traffic flow forecasting,” Information Sciences, vol. 521, pp. 277–290, 2020.
  • [41] Q. Zhou, J.-J. Gu, C. Ling, W.-B. Li, Y. Zhuang, and J. Wang, “Exploiting multiple correlations among urban regions for crowd flow prediction,” Journal of Computer Science and Technology, vol. 35, pp. 338–352, 2020.
  • [42] H. Yang, X. Zhang, Z. Li, and J. Cui, “Region-level traffic prediction based on temporal multi-spatial dependence graph convolutional network from gps data,” Remote Sensing, vol. 14, no. 2, p. 303, 2022.
  • [43] Y. Wang, A. Zhao, J. Li, Z. Lv, C. Dong, and H. Li, “Multi-attribute graph convolution network for regional traffic flow prediction,” Neural Processing Letters, pp. 1–27, 2022.
  • [44] C. Li, H. Zhang, Z. Wang, Y. Wu, and F. Yang, “Multigraph aggregation spatiotemporal graph convolution network for ride-hailing pick-up region prediction,” Wireless Communications and Mobile Computing, vol. 2022, 2022.
  • [45] H. Qiu, Q. Zheng, M. Msahli, G. Memmi, M. Qiu, and J. Lu, “Topological graph convolutional network-based urban traffic flow and density prediction,” IEEE transactions on intelligent transportation systems, vol. 22, no. 7, pp. 4560–4569, 2020.
  • [46] J. Sun, J. Zhang, Q. Li, X. Yi, Y. Liang, and Y. Zheng, “Predicting citywide crowd flows in irregular regions using multi-view graph convolutional networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 5, pp. 2348–2359, 2020.
  • [47] S. Wang, H. Miao, H. Chen, and Z. Huang, “Multi-task adversarial spatial-temporal networks for crowd flow prediction,” in Proceedings of the 29th ACM international conference on information & knowledge management, 2020, pp. 1555–1564.
  • [48] B. Wang, X. Luo, F. Zhang, B. Yuan, A. Bertozzi, and P. Brantingham, “Graph-based deep modeling and real time forecasting of sparse spatio-temporal data (2018),” arXiv preprint arXiv:1804.00684.
  • [49] H. Shi, Q. Yao, Q. Guo, Y. Li, L. Zhang, J. Ye, Y. Li, and Y. Liu, “Predicting origin-destination flow via multi-perspective graph convolutional network,” in 2020 IEEE 36th International conference on data engineering (ICDE).   IEEE, 2020, pp. 1818–1821.
  • [50] G. Yeghikyan, F. L. Opolka, M. Nanni, B. Lepri, and P. Lio, “Learning mobility flows from urban features with spatial interaction models and neural networks,” in 2020 IEEE International Conference on Smart Computing (SMARTCOMP).   IEEE, 2020, pp. 57–64.
  • [51] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence.   International Joint Conferences on Artificial Intelligence Organization, jul 2018. [Online]. Available: https://doi.org/10.24963%2Fijcai.2018%2F505
  • [52] E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, and M. Bronstein, “Temporal graph networks for deep learning on dynamic graphs,” 2020. [Online]. Available: https://arxiv.org/abs/2006.10637
  • [53] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
  • [54] M. Behrisch, L. Bieker, J. Erdmann, and D. Krajzewicz, “SUMO – Simulation of Urban MObility: An Overview,” in Proceedings of SIMUL 2011, The Third International Conference on Advances in System Simulation, S. . U. o. O. Aida Omerovic, R. I.-R. T. P. Diglio A. Simoni, and R. I.-R. T. P. Georgiy Bobashev, Eds.   Barcelona: ThinkMind, Oct. 2011. [Online]. Available: http://www.thinkmind.org/index.php?view=instance&instance=SIMUL+2011
  • [55] J. Haworth and T. Cheng, “Graphical lasso for local spatio-temporal neighbourhood selection,” in Proceedings the GIS Research UK 22nd Annual Conference. Presented at the GISRUK, 2014, pp. 425–433.
  • [56] S. Kundu, M. S. Desarkar, and P. Srijith, “Traffic forecasting with deep learning,” in 2020 IEEE Region 10 Symposium (TENSYMP).   IEEE, 2020, pp. 1074–1077.
  • [57] W. Liu, Z. Dou, W. Wang, Y. Liu, H. Zou, B. Zhang, and S. Hou, “Short-term load forecasting based on elastic net improved gmdh and difference degree weighting optimization,” Applied Sciences, vol. 8, no. 9, p. 1603, 2018.
  • [58] O. Gkountouna, D. Pfoser, and A. Züfle, “Traffic flow estimation using probe vehicle data,” in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA).   IEEE, 2020, pp. 579–588.
  • [59] L. Cai, Y. Yu, S. Zhang, Y. Song, Z. Xiong, and T. Zhou, “A sample-rebalanced outlier-rejected k𝑘kitalic_k-nearest neighbor regression model for short-term traffic flow forecasting,” IEEE access, vol. 8, pp. 22 686–22 696, 2020.
  • [60] W. Alajali, W. Zhou, S. Wen, and Y. Wang, “Intersection traffic prediction using decision tree models,” Symmetry, vol. 10, no. 9, p. 386, 2018.
  • [61] X. Zhan, S. Zhang, W. Y. Szeto, and X. Chen, “Multi-step-ahead traffic speed forecasting using multi-output gradient boosting regression tree,” Journal of Intelligent Transportation Systems, vol. 24, no. 2, pp. 125–141, 2020.
  • [62] H. Xia, X. Wei, Y. Gao, and H. Lv, “Traffic prediction based on ensemble machine learning strategies with bagging and lightgbm,” in 2019 IEEE International Conference on Communications Workshops (ICC Workshops).   IEEE, 2019, pp. 1–6.
  • [63] J. Evans, B. Waterson, and A. Hamilton, “Forecasting road traffic conditions using a context-based random forest algorithm,” Transportation planning and technology, vol. 42, no. 6, pp. 554–572, 2019.
  • [64] S. M. Mastelini, F. K. Nakano, C. Vens, A. C. P. de Leon Ferreira et al., “Online extra trees regressor,” IEEE Transactions on Neural Networks and Learning Systems, 2022.