Enhancing Crowd Flow Prediction in Various Spatial and Temporal Granularities

Marco Cardia, Department of Computer Science, University of Pisa, Italy, marco.cardia@phd.unipi.it

Massimiliano Luca, Free University of Bolzano, Italy and Bruno Kessler Foundation (FBK), Italy, mluca@fbk.eu

Luca Pappalardo, Institute of Information Science and Technology (ISTI), National Research Council (CNR), Italy, luca.pappalardo@isti.cnr.it

DOI: https://doi.org/10.1145/3487553.3524851
WWW '22 Companion: Companion Proceedings of the Web Conference 2022, Virtual Event, Lyon, France, April 2022

The diffusion of the Internet of Things allows nowadays to sense human mobility in great detail, fostering human mobility studies and their applications in various contexts, from traffic management to public security and computational epidemiology. A mobility task that is becoming prominent is crowd flow prediction, i.e., forecasting aggregated incoming and outgoing flows in the locations of a geographic region. Although several deep learning approaches have been proposed to solve this problem, their usage is limited to specific types of spatial tessellations and cannot provide sufficient explanations of their predictions. We propose CrowdNet, a solution to crowd flow prediction based on graph convolutional networks. Compared with state-of-the-art solutions, CrowdNet can be used with regions of irregular shapes and provide meaningful explanations of the predicted crowd flows. We conduct experiments on public data varying the spatio-temporal granularity of crowd flows to show the superiority of our model with respect to existing methods, and we investigate CrowdNet's reliability to missing or noisy input data. Our model is a step forward in the design of reliable deep learning models to predict and explain human displacements in urban environments.

CCS Concepts: • Computing methodologies → Artificial intelligence; • Computing methodologies → Machine learning; • Applied computing → Transportation;

Keywords: human mobility, flow prediction, machine learning, deep learning

ACM Reference Format:
Marco Cardia, Massimiliano Luca, and Luca Pappalardo. 2022. Enhancing Crowd Flow Prediction in Various Spatial and Temporal Granularities. In Companion Proceedings of the Web Conference 2022 (WWW '22 Companion), April 25–29, 2022, Virtual Event, Lyon, France. ACM, New York, NY, USA 9 Pages. https://doi.org/10.1145/3487553.3524851

1 INTRODUCTION

The study of human mobility is relevant to a large variety of topics, including public safety, migration, on-demand services, pollution monitoring, diffusion of epidemics, geomarketing, and traffic optimisation [1, 2, 14, 16, 23, 24, 29, 30]. Thanks to the recent deluge of digital data and the striking development of artificial intelligence, there has been a vast scientific production on various tasks involving human mobility data [1, 22]. A notable example is crowd flow prediction, consisting in forecasting the aggregated incoming and outgoing flows of people that move across regions in a geographic area [37]. The main challenge in solving this task lies in capturing the close and far spatial and temporal dependencies in the data at the same time. To date, crowd flow prediction is tackled with two main approaches: statistical models based on time series, which generally cannot capture both the spatial and the temporal dependencies; and models based on deep learning, which outperform traditional statistical models thanks to their complex architecture [22]. Examples of such solutions are ST-ResNet [37], DMCSTNet [32] and ACMF [20]. All these models rely on convolutional networks to capture long and short term spatio-temporal patterns and on fully connected networks to capture external factors known to have an impact on human mobility (e.g., weather conditions). We argue that existing solutions may not satisfy policymakers needs as the models only predict aggregated inflow or outflow, no information is provided about the origin and the destination of these flows. However, that information is crucial when dealing with certain tasks like epidemic diffusion [3]. There is also another limitation. As the predictors are based on traditional convolutions, they can only deal with regular tessellations and it is not possible to work with geographic areas with irregular shapes like census tracts. It may be of interest to policymakers for a variety of reasons [3]. In this work, we propose CrowdNet, a deep learning approach based on spatio-temporal graph convolutions that solves crowd flow prediction and overtakes the aforementioned limitations. In particular, our model can predict crowd inflows and outflows in geographic areas with both regular and irregular shapes. Moreover, it corroborates the prediction with an origin-destination matrix.

We evaluate CrowdNet on the widely adopted datasets of bikes in New York City and taxis in Beijing. Regardless of the dataset, CrowdNet overtakes other state-of-the-art solutions providing also a set of new advantages to policymakers. We also provide the code to reproduce CrowdNet and our experiments on public datasets at (link to repo removed for double bling submission).The paper is structured as follows. First, we formally define the crowd flow prediction problem and flow prediction problem. In Section CrowdNet, we present an overview of the architecture of CrowdNet. In Section Experiments, we described the used datasets, the evaluation metrics and the settings of the experiments. In Section Results, we provide the obtained results and we show the performances of CrowdNet on irregular geographic areas and on the flow prediction task. In Section 2, we provide a brief description of the literature related to crowd flow prediction. Finally, we give a summary of the contributions of our work and possible future improvements.

2 RELATED WORKS

Statistical-based methods for crowd flow prediction represent flows through equation matrices and adopt independent variables to represent adjacent areas and historical data. Autoregressive Integrated Moving Average (ARIMA) [17] uses a number of lagged observations of univariate time series to forecast new observations. Vector Auto-Regressive (VAR) exploits multiple time series to capture the pairwise relationships among flows [9]. Overall, autoregression approaches cannot capture neither complex temporal and spatial dependencies and they require feature engineering to transform raw data into appropriate internal representations for spatio-temporal dependency detection. In contrast, Deep Learning (DL) approaches can discover features from raw data automatically [11].

Deep Learning approaches. There are many DL algorithms that are specifically designed to solve the crowd flow prediction problem. Many of them are collected in a recent survey by Luca et al. [22]. Most of the solutions leverage convolutional neural networks (CNNs) and recurrent networks (RNNs) to capture spatio-temporal patterns and dependencies. Examples are [4, 6, 13, 19, 21, 26, 28, 31, 34, 37]. Some other solutions also rely on attention mechanisms. Examples are [4, 13, 28, 31]. In what follows, we introduce additional details of the models we will use as baselines in this study. DeepST [36] captures temporal patterns using the assumption that time series always respect temporal closeness, period, and seasonal trend. A convolutioanl neural network (CNN) module captures spatial dependencies, and a fusion mechanism combines the outputs. Spatio-Temporal Residual Network (STResNet) [37] improves on DeepST adding residual learning, a parametric and matrix-based fusion mechanism, and the consideration of external factors. Local-Dilated Region-Shifting Network (LDRSN) [28] combines local and dilated convolutions to learn the nearby and distant spatial dependency, which makes it more resilient to overfitting than approaches based on CNNs. Hybrid-Integrated DL Spatio Temporal network (HIDLST) [26] exploits an long-short term memory network (LSTM) to capture dynamic temporal dependency in time series and Residual CNNs to capture spatial dependencies. Attentive Traffic Flow Machine (ATFM)[21] captures spatial-temporal dependencies with two convolutional LSTMs (ConvLSTM) units and an attention mechanism able to infer the trend evolution exploiting dynamic spatial-temporal feature representation learning. Li et al. [19] proposes a model that is made up of densely connected CNNs to extract spatial characteristics, an attention-based long short-term memory module to capture temporal components and a fully connected neural network to extract features from external factors. Deep Spatio-Temporal Irregular Convolutional Residual LSTM (DST-ICRL) [6] integrates multi-channel traffic representations, irregular convolution residual networks and LSTMs to provide crowd flows forecasting. Multi-View Residual Attention Network (MV-RANet) [34] captures spatial dependencies by a double-branch residual attention network: one branch for small-scale dependency, the other one serves as an attention model, extracting spatial dependencies at large scale. External features are represented as three graphs of functional areas.

3 PROBLEM DEFINITION

In this section, we formalise the problem of crowd flow prediction and introduce the main concepts used in the paper.

Definition 3.1 (Spatial Tessellation) Let R be a geographical area and G a set of polygons. G is called tessellation if the following properties hold:

G contains a finite number of polygons (i.e., tiles) l_i, so that G = {l_i: i = 1,..., n};
Locations are not overlapped, that is l_i ∩ l_j, ∀i ≠ j;
The union of all the locations entirely covers R, i.e. $\bigcup _{i=1}^{n}l_i = R$.

Tessellations allow us to map data points into a finite number of tiles within the area, instead of having raw positions expressed in coordinates. Tiles are represented by either regular geometric shapes such as squares, triangles, quadrilaterals or hexagons, or irregular ones such as census cells or administrative units. Properties (2) and (3) ensure that each point is assigned to only one tile.

Mobility flows represent aggregated movements among geographic locations, and they are usually represented as an Origin-Destination matrix.

Definition 3.2 (Origin-Destination matrix) An Origin-Destination matrix is a matrix Math 2 where n is the number of different origin regions and m is the number of distinct destination regions. T_{i, j} denotes the number of individuals moving from region i to region j.

The origin and destination regions often coincides (n = m). In the Crowd Flow Prediction problem, flows are aggregated into crowd flows (either incoming or outgoing) and represented as a bi-dimensional matrix in which an element represents the crowd flow in a tile during a certain time interval.

Definition 3.3 (Crowd Flow) Given a trajectory T_u describing the movements of an individual u, the set of tiles intersected by T_u in a time interval Δt is defined as:

\begin{equation} q^t_{T_u} = \lbrace (p_k \rightarrow t) \in \Delta t \wedge (p_k \rightarrow (x, y)) \in (i, j) | (i, j)\rbrace \end{equation}
(1)

where the pair ( i, j) indicates a cell on an I × J grid and p_k is u’s current location, identified by coordinates ( x, y).

Let Q be the set of locations covered by all the individual trajectories, and let t − 1, t and t + 1 be three consecutive time spans:

The incoming crowd flow to a location (i, j) is the number of individuals that were not in (i, j) at time t − 1 and are in (i, j) at time t.

\begin{equation} in_t^{(i,j)} = \sum _{T\in Q} |\lbrace t > 1 : (i,j) \notin q_{T}^{t-1} \wedge (i,j) \in q_{T}^{t}\rbrace | \end{equation}
(2)
The outgoing crowd flow from a location (i, j) is the number of individuals that were in (i, j) at time t and are no longer in (i, j) at time t + 1.

\begin{equation} out_t^{(i,j)} = \sum _{T\in Q} |\lbrace t > 1 : (i,j) \in q_{T}^{t} \wedge (i,j) \notin q_{T}^{t+1}\rbrace | \end{equation}
(3)

Given the aforementioned, we define the problem as follows:

Definition 3.4 (Crowd Flow Prediction) Given a spatial tessellation R composed by n tiles and the crowd flows for each cell for t time intervals, crowd flow prediction consists in forecasting X_{t + c}, where Math 6 , given the historical crowd flows {X_i: i = 1,..., t}.

A variant of crowd flow prediction is flow prediction:

Definition 3.5 (Flow Prediction Problem) Given a spatial region R and a temporal Origin-Destination matrix Math 7 where n is the number of different origin tiles, m is the number of distinct destination tiles and t is the number of time intervals, flow prediction consists in predicting the next Origin-Destination matrix, i.e. the OD matrix at time t + 1, given the historical flows {T_i: i = 1,..., t}.

4 CROWDNET

CrowdNet is a deep neural network whose input is a temporal origin-destination matrix that describes historical flows among different regions, allowing it to use tessellations of various shapes (e.g., irregular tessellations). First, the model solves flow prediction, forecasting the flows among all pairs of regions in the tessellation. Then, it solves crowd flow prediction summing all the flows in the predicted OD matrix that have as a destination (origin) k, so to obtain the crowd inflow (outflow) of region k.

Given a geographic area tasselled into n regions, we represent a set of flows at time t as tensors Math 8 , where the first dimension represents the origin and the second dimension represents the destination of the flow. Hence, F_t(i, j) contains the flow at time t moving from region i to region j. A flow equal to 0 means that no people move from region i to region j at time step t.

We adapted CrowdNet from the work by Yu et al. [33] on traffic forecasting. In particular, we treat the problem as a weighted link prediction applied to temporal dynamic directed graphs, where each node represents a region and each weighted edge quantifies the flow between two regions. Formally, a weighted graph, at the t-th time step, is a triple G = (V, E_t, f_t), where V is a set of vertices, E_t is a set of edges, i.e., a set of ordered pairs (u, v) where (u, v) ∈ V × V, with u ≠ v and f_t is a function, Math 9 assigning a value representing the weight of the edge [10]. The network outputs the graph at the t + 1 time interval, i.e., the triple G = (V, E_{t + 1}, f_{t + 1}).

4.1 Architecture

We can formalise CrowdNet as:

\begin{equation} \mbox{CrowdNet}(X_t, A) \rightarrow Y \end{equation}
(4)

where

are the OD matrices with n nodes from time t to t + k. Math 12

is the adjacency matrix of the graph (representing the Origin-Destination flows), in particular A _{i, j} = 1 if i and j are linked in at least one interval Math 13

is the model's prediction, where l is the number of time intervals predicted and n is the number of regions. Therefore, CrowdNet's predictions are the adjacency matrices from time t + k + 1 to t + k + l. In our experiments, we fix l = 1. Y is then aggregated into a bi-dimensional matrix where each element represents the inflow and the outflow, solving the crowd flow prediction problem. Formally, the second output of CrowdNet is Math 14

where n = q × q, q is the number of tiles in the x axis and q is the number of tiles in the y axis.

CrowdNet's architecture is composed of several spatio-temporal convolutional blocks, each made up of a multi-layer structure with two convolutional layers and one spatial graph convolutional layer in between. The former captures temporal dependencies and the latter catches the spatial dependencies. Since for crowd flow prediction it is necessary a good response to dynamic changes [33], we apply convolutions on the time axis to capture the temporal features of flows [8]. Figure 1 schematizes CrowdNet's architecture.

Time Block. Inspired by Gehring et al. [8], we exploit CNNs to capture temporal dependencies. This choice is justified by the fact that CNNs perform well when predicting flows dynamic changes. With respect to Recurrent Neural Networks (RNNs), CNNs have a faster training and the lack of dependencies constrains to previous steps allows a parallel and controllable training process through the multi-layer convolutional structure.

The Time Block (TB) contains a convolution followed by Gated Linear Units (GRUs) [5], which implement a gating mechanism over the output of the convolution. Formally:

\begin{equation} \Gamma = (X * \Theta _1 +b_0) \circ \sigma (X * \Theta _2 + b_1) \end{equation}
(5)

where X is the input, Θ ₁, Θ ₂, b ₀ and b ₁ are learnable parameters, σ is the sigmoid function, * is the convolution operator, and ○ denotes the Hadamard product. The sigmoid gate controls what is relevant for discovering the structure and the dynamics of the time series. To enable the use of deep convolutional networks, we use residual connections from the input X to the output of each layer. We use Rectified Linear Unit ( ReLU) as the final activation function, defined as ReLU( x) = max(0, x). In summary:

\begin{equation} TB = ReLU(X * \Theta _3 + b_2 + \Gamma). \end{equation}
(6)

Spatial Block. We define a Graph Convolution as:

\begin{equation} X^{\prime } = \hat{D}^{-\frac{1}{2}} \hat{A}\hat{D}^{-\frac{1}{2}} X \Theta \end{equation}
(7)

where $ \hat{A} = A + I$, i.e., $\hat{A}$ is the adjacency matrix of the directed graph G with self-loops. I is the identity matrix. $\hat{D}$ is a diagonal matrix:

\begin{equation} \hat{D}_{i,i} = \sum _{j=0}^n \hat{A}_{i,j} \end{equation}
(8)

where the element ( i, i) is the number of adjacent nodes for the node i. All the other elements are equal to 0. X is the input, i.e., the origin-destination matrix for a defined time interval, as defined in Section 4. This operation is better motivated by a first-order approximation of localised spectral filters on the graph [ 15].

The operation of graph convolution is used by a layer block, named Spatial Block (SB). It is a one-layer Graph Convolutional Network (GCN) having the following form as forward model:

\begin{equation} SB = ReLU(\hat{D}^{-\frac{1}{2}} \hat{A}\hat{D}^{-\frac{1}{2}} X \Theta) \end{equation}
(9)

where ReLU is the rectified linear unit activation function applied to the graph convolution operation.

ST-GCN Block. The ST-GCN Block is composed of a Time Block, a Spatial Block, and another Time Block. The Spatial Block is fed by the first Time Block and performs a graph convolution that can be expressed as:

\begin{equation} X^{\prime \prime } = SB(X, \hat{A}) = ReLU(\hat{D}^{-\frac{1}{2}} \hat{A}\hat{D}^{-\frac{1}{2}} X \Theta) \end{equation}
(10)

The output of the Spatial Block layer is provided to a Time Block, which in turn returns:

\begin{equation} X^{\prime \prime \prime } = TB(X^{\prime \prime }) = ReLU(X^{\prime \prime } * \Theta _3 + b_2 + \Gamma). \end{equation}
(11)

Finally, a batch normalisation is applied to the output of the the last Temporal Block. Batch normalisation is defined as

\begin{equation} y^{\prime } = \frac{X^{\prime \prime \prime } - E[X^{\prime \prime \prime }]}{\sqrt {Var(X^{\prime \prime \prime }) + \epsilon }} * \gamma + \beta \end{equation}
(12)

where γ and β are learnable parameters. It allows to use higher learning rates, to have faster training and permits to take less care to initialisation [ 12].

In summary, CrowdNet has two ST-GCN layers and one output layer. The last block maps the outputs of the last ST-GCN layer into a single step prediction output. The loss function used in CrowdNet is the Mean Squared Error (MSE).

5 EXPERIMENTS

5.1 Datasets

The Citi Bike System dataset describes trips recorded by the New York Official Bike sharing system from 2013 to date. We consider trips from April to September 2014 because it is the range of dates usually used in the literature to test crowd flow prediction methods [6, 19, 21, 28, 36, 37]. Each record contains also the ride's start and end times and bike stations’ start and end coordinates.

The Taxi Beijing dataset is based on T-Drive [35]. It was collect by Microsoft in the area of Beijing, China and it contains the GPS location of 10,357 taxis sampled every 177 seconds. The data were collected over a period of one week in February 2008.

The Capital Bikeshare dataset describes the bike trips of the Washington D.C. bike sharing system. We consider the trips from January 2018 to January 2020. The information contained in each record are similar to the ones described for Bike in New York City and contains identifiers, latitude and longitude of the starting station and the ending station with the relative times.

Preprocessing. We use library scikit-mobility [25] to construct a squared tessellation over New York City, Beijing and Washington D.C.. A squared tessellation is a division of a geographic area into equal-sized tiles. Each tile is described by an identifier, the shape of the polygon describing the tile, and the position of the tile in a rectangular matrix modelling the squared tessellation.

We use a spatial join to associate the stations’ coordinates to the tile they fall within. Finally, we aggregate the joined dataset into an OD matrix and into a bi-dimensional matrix describing the crowd flows for each tile. For example, given a time interval and a position Math 26 in the bi-dimensional map of size n × m, the inflow of (i, j) is the sum of all the flows having as destination the cell (i, j). Analogously, the outflow of (i, j) is the sum of all the flows having as origin the cell (i, j).

We repeat this preprocessing framework varying the time slot used to compute the crowd flows and the size of tiles in the tessellation, so to create different datasets. Specifically, we vary the time aggregation value in the set {15, 30, 45, 60} minutes, and the tile size in the set {750, 1000, 1500} meters for New York and Washington and {7500, 10000, 15000} for Beijing. Table 2 in the Appendix describes the datasets and the map size after the preprocessing steps.

5.2 Evaluation Metrics

In our experiments, we adopt Root Mean Squared Error (RMSE) to evaluate crowd flow and flow prediction and the Common Part of Commuters (CPC) for flow prediction [7, 18, 22, 27].

\[ \text{RMSE} = \sqrt {\frac{1}{n} \sum _{i=1}^{n} (y_i - \hat{y}_i)^{2}} \]

where n is the number of predictions, $\hat{y}_i$ indicates the predicted value and y_i the actual value.

\begin{equation} CPC(\hat{T}, T) = \frac{2\sum _{i,j}min(\hat{T}_{ij}, T_{ij})}{\sum _{i,j}\hat{T_{ij}} + \sum _{ij}T_{ij}} \end{equation}
(13)

where $\hat{T}_{ij}$ is the flow from region i to region j predicted by the model and T_ij is the actual flow from region i to region j. CPC ∈ [0, 1]: if two adjacency matrices do not have any flows in common, CPC value is 0. CPC is 1 if the sets of flows are identical.

5.3 Baselines

We compare CrowdNet with the following baselines:

Naïf approach: the predicted crowd flows are the average of the crowd flows in the previous n time slots;
Auto-Regressive Integrated Moving Average (ARIMA): a statistical model for understanding and forecast future values in a time series;
Vector Auto-Regressive (VAR): a variation of ARIMA that exploits multiple time series to capture the pairwise relationships among all flows;
ST-ResNet [37]: a deep neural network prediction model for spatio-temporal data, which shows state-of-the-art results on crowd flows prediction.
DMVSTNet [32]: framework able to model temporal view, spatial view, and semantic view, it models correlations among regions sharing similar temporal patterns.
ACMF [20] is a model able to infer the evolution of the crowd flow by learning dynamic representations of temporally-varying data exploiting an attention mechanism.

Table 1 shows the results of our model compared with these baselines on BikeNYC, BikeDC and TaxiBJ datasets with tiles of size 1000 meters and time intervals of 60 minutes. Using such tile size and time intervals, our model outperforms all the baselines taken into account with an RMSE score of 8.53 in the Bike NYC dataset, 14.91 in Taxi BJ dataset and 1.51 in Bike DC dataset. Similar results are obtained using deep learning models. On the other hand, statistic based methods have worse performance.

Table 1: Comparisons of CrowdNet with baselines on BikeNYC, TaxiBJ and BikeDC, in terms of RMSE.

Model	Bike NYC	Taxi BJ	Bike DC
Naïf approach	19.87	46.12	2.64
ARIMA	12.65	25.98	2.14
VAR	12.50	25.64	1.88
ST-ResNet	9.38	19.33	1.64
DMVSTNet	9.16	18.89	1.60
ACMF	8.89	18.28	1.55
CrowdNet	8.53	14.91	1.51

5.4 Experimental settings

We split each dataset into a development set and a test set. The development set includes a training set and a validation set. 80% of the development set is considered as training set, and the remaining 20% composes the validation set. The test set contains the trips of the last ten days of the dataset.

For ARIMA, we adopt the following hyperparameters: p = 12 (where p is the order of the autoregressive model), d = 0 (where d is the order of differentiation) and q = 24 (the size of the moving average window). For VAR, we use the following hyperparameter values: p = 8, d = 0 and q = 24. For STResNet, we use the same hyperparameter values as the original paper by Zhang et al. [37].

For CrowdNet, we perform a fine tuning of the hyperparameters using a grid search. We select the hyperparameter values corresponding to the best performance obtained on the validation set, i.e. 150 epochs, batch size = 16, Learning rate = 1e-4, RMSprop as optimiser, 12 previous time intervals.

We build CrowdNet and the baselines using PyTorch version 1.8.0 and we perform the experiments using a machine equipped with a Nvidia Quadro RTX 6000 as GPU (with 24 GB of GPU memory).

We train the models adopting a validation-based early-stopping on each training dataset extracted from the flow datasets of Table 2 and on irregular tessellation for all the defined time intervals (15min, 30min, 45min and 60min).

6 RESULTS

In this section, we compare the results of CrowdNet with those of ST-ResNet for the crowd flow prediction problem. Moreover, the following Tables contain results related to the BikeNYC datasets. The same Tables and results for the datasets of BikeDC and TaxiBJ can be found in the Appendix. We report only the BikeNYC results as the behaviour of CrowdNet on the other datasets does not change.

Figure 2 visually compares the mean real crowd inflows with the mean crowd inflows predicted by STResNet and CrowdNet, with tessellations of 1000m and time interval of 60min. It is evident how the predicted crowd flows are strikingly similar to the real ones. For example, the predictions reproduce a notable pattern in Manhattan: the concentration of areas with large crowd flows in the middle of the island and areas with small crowd flows in its borders. In general, CrowdNet slightly underestimates large crowd flows.

Figure 2: Comparison of mean real crowd inflows (center) with those predicted by CrowdNet (left) and STResNet (right).

Figure 3 compares the sum of the total real crowd inflows during one week (from the 22nd to the 28th of Semptember), for tiles of 1000m and time intervals of 60min, with those predicted by STResNet and CrowdNet. CrowdNet's predictions are closer to the real values than STResNet's predictions, therefore we can say that CrowdNet performs better in case of high time intervals. When external events occur such as the thunderstorm occurring on Thursday in Figure 3, the performance of the models is worse.

Figure 3: Comparison of the total real crowd inflow in Manhattan with those predicted by STResNet and CrowdNet.

Table 8 summarises the results of both models, reporting the RMSE for all the possible combination of tile sizes and time intervals. Note how CrowdNet performs better for larger time intervals, while for smaller values the RMSE of the two models is comparable.

One of the advantages of using CrowdNet is the possibility to use irregular tessellations. STResNet, DMBSTNet and ACMP cannot be used with irregular tessellations because they take as input an image-like matrix; irregular tessellations cannot be represented easily in this way. For instance, considering an administrative tessellation, a region can have different neighbours, and it is difficult to represent this kind of relationship using an image-like relationship.

We illustrate the performance of CrowdNet on an irregular tessellation defined by an administrative tessellation defined by 29 neighbourhoods in Manhattan. Data of the administrative tiles are taken from an official tool of the municipality of New York City.

Figure 4 compares, the real crowd inflows and outflows with the crowd inflows and outflows predicted by CrowdNet. As in the case of the squared tessellation, CrowdNet's predictions are strikingly similar to the real one. Note how, also in this case, larger crowd inflows and outflows are concentrated in the middle of the island, while smaller ones concentrate in the southern part of the island.

Figure 4: Comparison of mean real crowd outflows (left) with those predicted by CrowdNet (right). The crowd flows are represented as heatmaps, in which the colour of each cell is proportional to the crowd flow of the corresponding tile.

Flow prediction. While crowd flow prediction aims to forecast the aggregated flows in each tile, flow prediction aims at predicting the flow between each pair of tiles, thus corresponding to the prediction of the entire origin-destination matrix.

In Appendix, Figure 6 compares the real flows with those predicted by CrowdNet: the two predictions are almost identical, meaning that the flows are correctly predicted by the model.

Providing detailed information about the crowd flow predictions is essential to acquire knowledge that can be useful to possible users, such as policymakers and urban planners. In this direction, CrowdNet enriches the predicted crowd flows with useful information, such as the origin and the destination of each flow. As an example, Figure 5a illustrates the predicted crowd flows in companion with the flows between the tiles in Manhattan: each node's size is proportional to its crowd inflow, while edge thickness represents the magnitude of the single flows between pairs of tiles. Figure 5b shows the same crowd flow prediction with a focus on the flows outgoing from node 17. The ability of CrowdNet to solve flow prediction allows us to enrich crowd flow predictions with information about the origin of a tile's inflow or outflow.

Figure 5: Crowd flows prediction enriched with flow network information. Each edge represents a flow between a pair of tiles, with the thickness of edges proportional to the flow value. The heatmap in the background represents the crowd flow prediction.

To investigate the robustness of CrowdNet to the variation in the spatial and temporal aggregation, we investigate how the model's performance changes varying time intervals (fixing a tile size of 1000m) or tile sizes (fixing time intervals to 60min). Results with others time intervals or tile sizes are similar.

CrowdNet is robust with respect to the temporal aggregation. As shown in Appendix through Table 3 the RMSE increases as the time interval increases. This is due to the fact that the flow magnitude increases as the time intervals becomes bigger. Similarly, considering the spatial aggregation, the RMSE increases as the tile size increases. This is due to the fact that as the tile size becomes bigger its flow increases.

7 DISCUSSION AND CONCLUSIONS

We proposed CrowdNet, a solution to crowd flow prediction that represents crowd flows as a graph in which nodes are geographic regions and edges are people moving among them. CrowdNet allows policymakers to use spatial tessellations (even non-squared ones) and make predictions of flows at street or block levels. We also provided a characterisation of CrowdNet's behavior on different types of tessellation with different shapes and sizes and on different time intervals, so to find the combinations that lead to the most accurate predictions.

CrowdNet is a first step towards the adoption of more exhaustive prediction models. Our experiments may lead policymakers to the adoption of more transparent and trustable solutions thanks to their enriched prediction and parameterisation.

ACKNOWLEDGMENTS

Luca Pappalardo has been partially supported by EU project SoBigData++ grant agreement 871042.

REFERENCES

Hugo Barbosa, Marc Barthelemy, Gourab Ghoshal, Charlotte R. James, Maxime Lenormand, Thomas Louail, Ronaldo Menezes, José J. Ramasco, Filippo Simini, and Marcello Tomasini. 2018. Human mobility: Models and applications. Physics Reports 734(2018), 1–74. https://doi.org/10.1016/j.physrep.2018.01.001 arxiv:1710.00004
David Boyce and H. Williams. 2015. Forecasting urban travel: Past, present and future. Edward Elgar Press. 1–650 pages. https://doi.org/10.4337/9781784713591
Alket Cecaj, Marco Lippi, Marco Mamei, and Franco Zambonelli. 2021. Sensing and Forecasting Crowd Distribution in Smart Cities: Potentials and Approaches. IoT 2, 1 (2021), 33–49. https://doi.org/10.3390/iot2010003
Genan Dai, Xiaoyang Hu, Youming Ge, Zhiqing Ning, and Yubao Liu. 2021. Attention based simplified deep residual network for citywide crowd flows prediction. Frontiers of Computer Science 15, 2 (2021), 1–12.
Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. arxiv:1612.08083 [cs.CL]
B. Du, H. Peng, S. Wang, M. Z. A. Bhuiyan, L. Wang, Q. Gong, L. Liu, and J. Li. 2020. Deep Irregular Convolutional Residual LSTM for Urban Traffic Passenger Flows Prediction. IEEE Transactions on Intelligent Transportation Systems 21, 3(2020), 972–985. https://doi.org/10.1109/TITS.2019.2900481
Floriana Gargiulo, Maxime Lenormand, Sylvie Huet, and Omar Baqueiro Espinosa. 2012. Commuting Network Models: Getting the Essentials. Journal of Artificial Societies and Social Simulation 15, 2(2012), 6. https://doi.org/10.18564/jasss.1964
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional Sequence to Sequence Learning. arxiv:1705.03122 [cs.CL]
Bruce E. Hansen. 1995. TIME SERIES ANALYSIS. Econometric Theory 11, 3 (1995), 625–630. https://doi.org/10.1017/S0266466600009440
F. Harary and G. Gupta. 1997. Dynamic graph models. Mathematical and Computer Modelling 25, 7 (1997), 79–87. https://doi.org/10.1016/S0895-7177(97)00050-2
G. E. Hinton and R. R. Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science 313, 5786 (2006), 504–507. https://doi.org/10.1126/science.1127647 arXiv:https://science.sciencemag.org/content/313/5786/504.full.pdf
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arxiv:1502.03167 [cs.LG]
Renhe Jiang, Zekun Cai, Zhaonan Wang, Chuang Yang, Zipei Fan, Quanjun Chen, Kota Tsubouchi, Xuan Song, and Ryosuke Shibasaki. 2021. DeepCrowd: A Deep Model for Large-Scale Citywide Crowd Density and Flow Prediction. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. https://doi.org/10.1109/TKDE.2021.3077056
Luckyson Khaidem, Massimiliano Luca, Fan Yang, Ankit Anand, Bruno Lepri, and Wen Dong. 2020. Optimizing Transportation Dynamics at a City-Scale Using a Reinforcement Learning Framework. IEEE Access 8(2020), 171528–171541.
Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. CoRR abs/1609.02907(2016). arxiv:1609.02907 http://arxiv.org/abs/1609.02907
Valdis Krebs. 2002. Mapping Networks of Terrorist Cells. CONNECTIONS 24, 3 (04 2002), 43–52.
Sangsoo Lee and Daniel B. Fambro. 1999. Application of Subset Autoregressive Integrated Moving Average Model for Short-Term Freeway Traffic Volume Forecasting. Transportation Research Record 1678, 1 (1999), 179–188. https://doi.org/10.3141/1678-22 arXiv:https://doi.org/10.3141/1678-22
Maxime Lenormand, Aleix Bassolas, and José J Ramasco. 2016. Systematic comparison of trip distribution laws and models. Journal of Transport Geography 51 (2016), 158–169. https://doi.org/10.1016/j.jtrangeo.2015.12.008
Wenjia Li, Wei Tao, Junyang Qiu, Xin Liu, X. Zhou, and Zhisong Pan. 2019. Densely Connected Convolutional Networks With Attention LSTM for Crowd Flows Prediction. IEEE Access 7(2019), 140488–140498.
Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, and Liang Lin. 2020. Dynamic Spatial-Temporal Representation Learning for Traffic Flow Prediction. IEEE Transactions on Intelligent Transportation Systems PP (06 2020), 1–15. https://doi.org/10.1109/TITS.2020.3002718
Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, and Liang Lin. 2020. Dynamic Spatial-Temporal Representation Learning for Traffic Flow Prediction. arxiv:1909.02902 [cs.LG]
Massimiliano Luca, Gianni Barlacchi, Bruno Lepri, and Luca Pappalardo. 2021. A survey on deep learning for human mobility. ACM Computing Surveys (CSUR) 55, 1 (2021), 1–44.
Massimiliano Luca, Gianni Barlacchi, Nuria Oliver, and Bruno Lepri. 2021. Leveraging Mobile Phone Data for Migration Flows. arXiv e-prints (2021), arXiv–2105.
Kai Nagel and Maya Paczuski. 1995. Emergent traffic jams. Physical Review E 51, 4 (Apr 1995), 2909–2918. https://doi.org/10.1103/physreve.51.2909
Luca Pappalardo, Filippo Simini, Gianni Barlacchi, and Roberto Pellungrini. 2019. scikit-mobility: a Python library for the analysis, generation and risk assessment of mobility data. arxiv:1907.07062 [physics.soc-ph]
Yibin Ren, Huanfa Chen, Yong Han, Tao Cheng, Yang Zhang, and Ge Chen. 2020. A hybrid integrated deep learning model for the prediction of citywide spatio-temporal flow volumes. International Journal of Geographical Information Science 34, 4(2020), 802–823. https://doi.org/10.1080/13658816.2019.1652303
Filippo Simini, Gianni Barlacchi, Massimiliano Luca, and Luca Pappalardo. 2021. Deep Gravity: enhancing mobility flows generation with deep neural networks and geographic information. arxiv:2012.00489 [cs.LG]
Chujie Tian, Xinning Zhu, Zheng Hu, and Jian Ma. 2020. Deep spatial-temporal networks for crowd flows prediction by dilated convolutions and region-shifting attention mechanism. Applied Intelligence 50, 10 (2020), 3057–3070. https://doi.org/10.1007/s10489-020-01698-0
Michele Tizzoni, Paolo Bajardi, Adeline Decuyper, Guillaume Kon Kam King, Christian M. Schneider, Vincent Blondel, Zbigniew Smoreda, Marta C. González, and Vittoria Colizza. 2014. On the Use of Human Mobility Proxies for Modeling Epidemics. PLOS Computational Biology 10, 7 (07 2014), 1–15. https://doi.org/10.1371/journal.pcbi.1003716
Pu Wang, Timothy Hunter, Alexandre Bayen, Katja Schechtner, and Marta C. Gonzalez. 2012. Understanding Road Usage Patterns in Urban Areas. Scientific reports 2 (12 2012), 1001. https://doi.org/10.1038/srep01001
Senzhang Wang, Jiannong Cao, Hao Chen, Hao Peng, and Zhiqiu Huang. 2020. SeqST-GAN: Seq2Seq Generative Adversarial Nets for Multi-Step Urban Crowd Flow Prediction. ACM Transactions on Spatial Algorithms and Systems (TSAS) 6, 4(2020), 1–24.
Huaxiu Yao, Fei Wu, Jintao Ke, Xianfeng Tang, Yitian Jia, Siyu Lu, Pinghua Gong, Jieping Ye, and Zhenhui Li. 2018. Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction. arxiv:1802.08714 [cs.LG]
Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017. Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting. CoRR abs/1709.04875(2017). arxiv:1709.04875 http://arxiv.org/abs/1709.04875
Hao Yuan, Xinning Zhu, Zheng Hu, and Chunhong Zhang. 2020. Deep multi-view residual attention network for crowd flows prediction. Neurocomputing 404(2020), 198–212. https://doi.org/10.1016/j.neucom.2020.04.124
Jing Yuan, Yu Zheng, Xing Xie, and Guangzhong Sun. 2011. Driving with knowledge from the physical world. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 316–324.
Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, and Xiuwen Yi. 2016. DNN-Based Prediction Model for Spatio-Temporal Data. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems(Burlingame, California) (SIGSPACIAL ’16). Association for Computing Machinery, New York, NY, USA, Article 92, 4 pages. https://doi.org/10.1145/2996913.2997016
Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, Xiuwen Yi, and Tianrui Li. 2017. Predicting Citywide Crowd Flows Using Deep Spatio-Temporal Residual Networks. arxiv:1701.02543 [cs.AI]

A DATASET

Table 2: Different information for the datasets used.

Bike NYC

Taxi BJ

Bike DC

Data Type

Bike Rent

GPS

Bike Rent

Location

New York City

Beijing, China

Washington D.C.

Timespan

04–10 2014

02-2008

01-2018 – 01-2020

Spatial Agg.

750 m (10 x 15)

1000 m (7 x 11)

1500 m (5 x 8)

7500 m (32 x 32)

10000 m (24 x 26)

15000 m (16 x 16)

750 m (32 x 40)

1000 m (23 x 30)

1500 m (14 x 18)

Sampling

177 sec.

# Subjects

421

10,357

557

B RESULTS

Figure 6: Representation of real adjacency matrices and the ones predicted by CrowdNet for BikeNYC dataset with tile size: 1500m. Time interval: 60min.

Table 3: Performance of CrowdNet on the BikeNYC test set in terms of RMSE (above) CPC (below).


		Tile sizes
		750m	1000m	1500m
	15min	0.248	0.409	0.622
Time	30min	0.357	0.654	1.014
intervals	45min	0.460	0.880	1.396
	60min	0.538	1.049	1.815

		Tile sizes
		750m	1000m	1500m
	15min	0.106	0.218	0.414
Time	30min	0.193	0.368	0.559
intervals	45min	0.274	0.460	0.631
	60min	0.321	0.496	0.637

C RESULTS ON OTHER DATASETS

Table 4: Performance of CrowdNet model for flow prediction on the Taxi Beijing test set in terms of RMSE, varying the tile size and time interval.


		Tile sizes
		750m	1000m	1500m
	15min	1.387	1.729	1.891
Time	30min	1.816	2.174	2.438
intervals	45min	1.948	2.642	3.003
	60min	2.131	3.428	4.193

Table 5: Performance of CrowdNet on the Taxi Beijing test set in terms of CPC, varying the tile size and time interval.


		Tile sizes
		750m	1000m	1500m
	15min	0.214	0.319	0.414
Time	30min	0.273	0.381	0.425
intervals	45min	0.301	0.429	0.572
	60min	0.387	0.562	0.714

Table 6: Performance of CrowdNet model for flow prediction on the Bike Washington D.C. test set in terms of RMSE, varying the tile size and time interval.


		Tile sizes
		750m	1000m	1500m
	15min	0.017	0.021	0.023
Time	30min	0.019	0.024	0.028
intervals	45min	0.031	0.044	0.049
	60min	0.034	0.045	0.53

Table 7: Performance of CrowdNet on the Bike Washington D.C. test set in terms of CPC, varying the tile size and time interval.


		Tile sizes
		750m	1000m	1500m
	15min	0.231	0.358	0.416
Time	30min	0.272	0.398	0.453
intervals	45min	0.429	0.596	0.721
	60min	0.512	0.657	0.801

Table 8: Performance of ST-ResNet, DMVSTNet, ACMP and CrowdNet models for crowd flow prediction problem on the BikeNYC dataset.


		Tile sizes
		750m				1000m				1500m
		CrowdNet	STResNet	DMVSTNet	ACMF	CrowdNet	STResNet	DMVSTNet	ACMF	CrowdNet	STResNet	DMVSTNet	ACMF
	15min	1.71	1.69	1.65	1.63	2.76	2.35	2.29	2.26	3.73	3.35	3.27	3.23
Time	30min	3.69	2.65	2.59	2.55	5.23	4.85	4.73	4.67	5.93	5.64	5.51	5.44
intervals	45min	4.34	3.67	3.58	3.54	6.68	5.63	5.50	5.43	11.3	10.91	10.66	10.53
	60min	5.18	5.44	5.31	5.25	8.53	9.38	9.16	8.89	11.1	11.66	11.39	11.25

Table 9: Performance of the ST-ResNet and CrowdNet models for crowd flow prediction problem on the TaxiBJ (upper) and BikeDC (lower) test sets in terms of RMSE, varying the tile size and time interval.


		Tile sizes
		750m		1000m		1500m
		CrowdNet	STResNet	CrowdNet	STResNet	CrowdNet	STResNet
	15min	9.82	9.88	11.54	12.27	15.91	15.88
Time	30min	10.33	11.75	13.13	13.66	16.48	17.02
intervals	45min	10.97	10.86	13.44	15.71	16.69	19.22
	60min	12.78	15.34	14.91	19.33	18.53	24.37

		Tile sizes
		750m		1000m		1500m
		CrowdNet	STResNet	CrowdNet	STResNet	CrowdNet	STResNet
	15min	0.86	0.78	1.03	1.01	1.36	1.45
Time	30min	0.94	0.96	1.17	1.21	1.54	1.57
intervals	45min	1.02	1.08	1.32	1.49	2.01	2.14
	60min	1.20	1.36	1.51	1.64	2.46	2.89

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

WWW '22 Companion, April 25–29, 2022, Virtual Event, Lyon, France