University of California Santa Barbara
ReeSPOT: Reeb Graph Models Semantic Patterns of Normalcy in Human Trajectories
Abstract
This paper introduces ReeSPOT, a novel Reeb graph-based method to model patterns of life in human trajectories (akin to a fingerprint). Human behavior typically follows a pattern of normalcy in day-to-day activities. This is marked by recurring activities within specific time periods. In this paper, we model this behavior using Reeb graphs where any deviation from usual day-to-day activities is encoded as nodes in the Reeb graph. The complexity of the proposed algorithm is linear with respect to the number of time points in a given trajectory. We demonstrate the usage of ReeSPOT and how it captures the critically significant spatial and temporal deviations using the nodes of the Reeb graph. Our case study presented in this paper includes realistic human movement scenarios: visiting uncommon locations, taking odd routes at infrequent times, uncommon time visits, and uncommon stay durations. We analyze the Reeb graph to interpret the topological structure of the GPS trajectories. Potential applications of ReeSPOT include urban planning, security surveillance, and behavioral research.
Keywords:
Reeb Graphs Graph Networks Trajectory Analysis1 Introduction
Recently, there has been an increase in location-aware devices that use the Global Positioning System (GPS) for many applications such as finding efficient routes [17], fitness apps, understanding the progression of infectious diseases [6], and predicting demographic information [19]. This collection of movements, and thus vast amounts of raw trajectories, spotlights the need for a scalable representation of these trajectories that preserves and highlights the structure and topologically important movement patterns (Figure 1).
Human movement analysis is the core component of behavioral research, urban planning, and computational sociology [3], which helps in better modeling human behavior and predicting human movement patterns. Similarly, modeling normal human behavior can also help identify abnormal human behavior. In particular, given a set of movement patterns for a week, month, or year, we want to capture any change in the semantic “patterns of life”. In this paper, we model routine behaviors and movements that characterize daily human activities in a given city using a concept from topology, Reeb graphs.
Traditional trajectory analysis methods are largely based on hand-crafted geometric features and statistical techniques. Such features include traveling distance, mean velocity [22], frequencies of areas or moving patterns [4]. Statistical approaches analyze the temporal patterns with respect to the frequency of trajectory data to identify patterns such as traveling modes [8] and periodic patterns [21]. These approaches are effective for handling structured and less complex data sets but fail to generalize with high-dimensional data or the dynamic nature of human mobility patterns.
Given the amount of GPS data that can be generated by one human on a single day, another obvious direction to look at would be toward data-driven learning methods. Specifically, sampling a single agent’s movement data, sampled at a 1Hz frequency over a month, accumulates roughly 2 million data points.
Extrapolating these figures to a population of a small city like Santa Barbara, with approximately 97,000 agents, results in a dataset comprising an immense 194 billion data points. This scale poses substantial challenges in terms of computational resources and data management, and extrapolating to larger cities, such as New York City, would significantly magnify these challenges. Recent advances in deep learning have significantly enhanced the capability to model human mobility patterns by performing the next-location prediction [10]. Particularly, long short-term memory networks (LSTMs) [7] and attention-based models like Transformers [18] are good at capturing temporal regularities and anomalies in movement patterns. However, these black-box models lack interpretability, thus limiting their applicability in real-time scenarios [20].
Towards interoperability along with large-scale modeling, Graph-based methods are very popular due to their ability to represent complex spatial relationships and movement patterns efficiently. We need models that can succinctly summarize an agent’s trajectory data—retaining essential information while discarding redundancies. Transforming GPS data into graph data structures with nodes as significant geographic locations and edges as the movement information between enables intuitive models for pattern-of-life. Research directions include, Guo et al. [5]’s graph model to establish precise topological relationships among trajectories and geographic locations. Qi et al. [11] incorporate hybrid methods that blend graph-based approaches with statistical models to improve the accuracy of trajectory searches and predictions. Another such work focuses on hierarchical clustering based on graph similarity measures[12], further supporting the need for computational geometry.
In this paper, we use Reeb graphs to cluster the common behavior pattern for a given agent. Our research is motivated by and related to previous research on the construction of Reeb graphs for trajectory data [2, 13]. A Reeb graph captures the connectivity of level sets of a scalar function defined over a space, effectively summarizing the topological features of the space. In the context of trajectory data, scalar functions could represent attributes such as speed, direction, semantics, or geographical points of interest. Reeb graphs can thus map complex trajectories into more interpretable topological constructs. This abstraction facilitates the detection of anomalies by comparing the topological signatures of trajectories and identifying those that differ significantly from the norm. Our main contributions are summarized below:
-
•
We propose a novel Reeb graph-based approach to model the day-to-day activities of a given agent. To the best of our knowledge, this is the first demonstration of Reeb graphs to fingerprint an agent’s behavior.
-
•
We discuss the algorithm and its time complexity demonstrating the scalability of the proposed method.
-
•
We design normal and anomalous scenarios, describe the methods for trajectory generation and present detailed experiments on the interpretation and analysis of Reeb graphs.
2 Methodology
2.1 Previous work on Reeb graphs
Reeb graph was first proposed to study the topology of a manifold [16]. Nodes of the Reeb graph encode the evolution of the level sets of a real-valued function on a manifold. The location of the node is the average location of the points of the trajectories that constitute the node. Reeb graphs have been extensively used in shape analysis for diverse datasets [1]. The first study of Reeb graphs for trajectory group evolvement encodes the merging and splitting structure between different moving entities [2]. Similarly, the spatial subtrajectory clustering algorithm presented a stricter problem [13, 14, 15] but discovers geometric and topological substructure. This is a computationally challenging problem because the initialization step involves an exhaustive search of an agent’s events. Motivated by these challenges, the central focus of this paper is to develop a method for fingerprinting the behavior of an agent over time such as days, weeks, and months. Our approach encodes significant spatio-temporal points of interest—specifically, locations and durations that define critical aspects of an agent’s behavior. We redefine the grouping definitions used in our adapted Reeb graph model. The constructed Reeb graphs effectively partition a set of GPS points into meaningful nodes and edges, thereby quantifying and identifying path deviations.
2.2 Reeb graph models agent pattern of normalcy
A trajectory is defined as a dictionary (key: value) containing an ordered sequence of time points and their associated GPS coordinates:
(1) |
where is chosen according to the desired resolution to sample the pattern of the agent. Here denotes the total number of points in a given trajectory . The frequency of GPS data sampling decides . For example, to model the weekdays of an agent’s activities, the raw GPS data is sampled every second, giving us which is the total number of seconds in a day. Similarly, if the data is sampled every hour, then points per day. We define as the total number of trajectories for a given agent. For example, to model month-long data, and for weekdays, . The common setting used throughout the paper for our problem definition is and . Each time point corresponds to a GPS coordinate representing the position of the agent at time . , where represents the latitude and represents the longitude. The Euclidean distance between two GPS coordinates and is calculated at time as follows:
(2) |
where and are the latitude and longitude of the first point, and and are those of the second point. gives the 2-norm distance between two points on the Euclidean plane. This approximates the geographic distance of the points. The algorithm is defined with respect to a distance threshold within which the points are considered sufficiently close together i.e. within a small geographical area. This is the inter-trajectory distance that guides the granularity of the Reeb graphs according to the problem definition.
Human behavior typically follows a pattern of normalcy in day-to-day activities. This is marked by recurring activities within specific time periods. In order to discover the large-scale spatio-temporal patterns, we represent the bundling structure of trajectories as a Reeb graph . Nodes of the Reeb graph will pinpoint critical GPS points of the agent’s pattern. Intuitively, if a continuous portion of a behavior of the agent happens at the same time and within the same spatial distance () every day then they present a pattern of normalcy. We formalize this by introducing the concept of “bundles” to characterize normal behavior through consistent daily subtrajectory events. Each trajectory begins with an appear event at the first index and concludes with a disappear event at the last index of . Deviations from this norm by more than are classified as disconnect events, while a return to the norm is labeled a connect event. Formally, for a given and i.e. sampled every hour, let’s take two trajectories and :
-
•
At time : and are the appear events.
-
•
At time : and are the disappear events.
-
•
If , but , then represents a disconnect event between and .
-
•
If , but , then represents a connect event between and .
2.3 Construction of Reeb graphs and analysis of time complexity
Reeb graph construction (illustrated in Figure 2) can be divided into the following major steps: event computation, construction of dynamic graphs (s), connectivity query in the dynamic graph for bundle partition (), and construction of the Reeb graphs () from bundles partition as shown in Figure 2. The first step of Reeb graph construction involves computing the connect and disconnect events. Algorithm 1 outlines the steps of computing events. The event computation takes time, where represents the number of time points in the trajectories and . At each time point, the algorithm looks for possibilities of potential events. The second step of the Reeb graph involves handling the events to construct dynamic graph s. The nodes of represent the daily trajectories and the edges of the represent the -connectivity between them. The total number of nodes in is 5 representing one trajectory for each day of the agent. The connected component of the will give us the step bundle partition of subtrajectories denoted by such that every segment in is uniquely assigned to exactly one bundle. The final step is to construct the Reeb graph from these bundles. Reeb graph can be constructed from by connecting adjacent bundles with nodes and bundles as edges similar to the described construction in [13]. So, the time complexity of the Reeb graph construction step would be because in the worst case, all the time points will have events. At each time, the connectivity query to the dynamic graph with 5 nodes takes constant time. The more detailed steps can be found in the Algorithm 2.
3 Experimentation/Case Study
3.1 Data generation
We model the pattern of life of a single agent over different trajectories. Each trajectory is simulated using the SUMO software package [9] and represents realistic behavior and movement patterns over the course of one week. We construct the Reeb graph for each trajectory and show how it sufficiently represents the trajectory’s information with significantly fewer nodes.
In this case study, we analyze the behavioral patterns of a simulated high-school student from the city of Santa Barbara, California (Figure 1), using trajectory data that includes multiple points of interest (POIs), such as the student’s home, school, park, grocery store, and lake. The student’s daily routine typically consists of attending school from approximately 8:00 AM to 9:00 AM, concluding at around 4:00 PM to 5:00 PM, followed by visits to recreational sites before returning home. To thoroughly investigate both normal and anomalous behavioral patterns, we generated five days of normal trajectory data, complemented by additional days tailored to each specific scenario described earlier. Each trajectory entry is recorded with timestamps, latitude, and longitude coordinates. Figure 1 displays the student’s trajectories across different POI locations for the rare location scenario, illustrating the distribution of both routine and deviant movements. Figure 3 displays the same data as a 3D plot, providing a clear spatio-temporal visualization of the student’s stay locations, duration, and revisit frequencies.
3.2 Definition of anomalous behavior
We define as a set of normal POIs and their corresponding time points,
where represents the geographic coordinates with and , and is the time at which these coordinates were recorded. Relative to this definition, all the anomaly behaviors for a given agent are defined as follows:
Scenario 1 (S1): Rare Location Anomaly Rare location anomaly refers to a scenario when an agent visits a new location . is spatially different from their normal spatial geographical points of interest such as school or work. Reeb graph will encode this rare location by creating a new node localizing the abnormality.
Scenario 2 (S2): Rare Route Visit Anomaly In this scenario, the agent visits the same POI locations multiple times but utilizes a uniquely different route on a single journey. This introduces disconnect event from their normal movement pattern, resulting in a new node in the Reeb graph. More formally, if and , then nodes and will be added to .
Scenario 3 (S3): Uncommon Time Visit This is a case of time violation where the agent visits a familiar location at an uncommon time i.e,
Scenario 4 (S4): Uncommon Stay Duration Anomaly In this scenario the agent stays for an abnormal duration () at a specific location . This results in a disconnect event for the agent’s trajectory from the normal pattern of life at .
3.3 Reeb Graph Generation
We use a down-sampling rate of one hour for Reeb graphs. This setting helps us to monitor changes in location grouping states at each hour. The threshold for spatial connect and disconnect events is set to 0.0005 GPS degrees (5.56 meters). Initially, we construct a Reeb graph from the normal activity trajectories of days 0 to 4 to model the student’s typical pattern of life.
As depicted in Figure 1 and Figure 3, ReeSPOT successfully identifies all normal POIs as a part of the Reeb graph nodes, demonstrating its efficacy in reflecting the spatial distribution of the student’s activities. Notably, an anomalous scenario depicted in Figure 1 and Figure 3 shows the student visiting a movie theater during school hours which is defined as a deviation from the normal. This is captured by a new Reeb graph node, highlighting its potential for identifying critical spatial anomalies.
3.4 Analysis and interpretation of scenarios using Reeb graphs
To better understand the formation of Reeb graph nodes and demonstrate the utility of the Reeb graph across all six scenarios, we generated time-latitude plots (Figure 4). These plots, with the hour of day on the x-axis and latitude on the y-axis, include trajectory points sampled every 10 seconds alongside Reeb graph nodes. Each plot provides a visual representation of different behavioral patterns and anomalies and illustrates ReeSPOT’s effectiveness in capturing anomalous trajectories for all scenarios. We explain the scenarios one by one below:
-
•
Figure 4(a) illustrates the student’s normal routine pattern, with stays at home, school, and visits to various recreational spots. Notable events include appear and disappear at the beginning and end of each day. There are three disconnect events around hour 17 which indicates divergences to different locations after school.Connect event shows trajectories getting merged back on the way home at hour 18.
-
•
Figure 4(b) for S1 depicts a rare location where we visualize an abnormal visit to the movie theater, showing three additional Reeb nodes and altered connectivity events at hour 9 and 14.
-
•
Figure 4(c) for S2 captures an alternative route to school. At hour 9, instead of following the normal route, the student deviates towards a direction with a lower latitude and then returns to school. This deviation is captured by the bottom Reeb graph node at hour 9. Additionally, a disconnect event occurs at 9, followed by a connect event at hour 10 when all trajectories converge at the school.
-
•
Figure 4(d) for S3 reveals an uncommon time anomaly, where the student attends school at hour 2 and travels to the park at around hour 10, significantly deviating from the typical schedule, but with the same POIs.
-
•
Figure 4(e) for S4 shows another time-related anomaly with a prolonged stay at home until almost hour 12, and similarly, 3 new nodes appear for the reeb graph because of disconnect event from the usual trajectory.
-
•
Figure 4(f) for S4 presents a detailed look at scenario 4, from hour 16 to hour 17. Since the reeb graph sample rate is one hour, the reeb graph nodes appear at hour 17 to represent the disconnect events in the past hour.
3.5 Reeb graph iteratively detects anomalous behavior of an agent
In the context of detecting anomalous trajectories within real-life data (test dataset), we iteratively construct Reeb graphs on the test dataset to identify daily anomalous trajectories. An initial Reeb graph is constructed using training data with all normal trajectories. Subsequently, for each daily trajectory in the test dataset, the Reeb graph is iteratively updated day by day. To detect anomalous behaviors effectively, we compute the distance between the existing Reeb graph and every updated version that includes the additional daily trajectory. The subsequent section details our methodology for calculating this distance and presents the results derived from our case study.
3.6 Quantifying the distance between Reeb graphs
Given two Reeb graphs, a normal Reeb graph and a Reeb graph with one anomalous trajectory , each containing data points across dimensions of time (0 to 23 hours), the following rules are used to calculate the distance between Reeb graphs defined as ):
-
1.
For each hour, if nodes exist in both and , calculate the Euclidean distance between the nodes.
-
2.
If only one of the Reeb nodes graphs, or , has a node at a particular hour, calculate the distance to the temporally closest node from the other Reeb graph.
-
3.
If neither Reeb graph has a node for a given hour, the distance is 0.
Specifically, in point 2 above, we have a case where a node at time in Reeb graph has no corresponding node in . We find the Euclidean distance to the nodes in at . If there are multiple nodes in at or , then we select the one with the minimum distance. ) is the sum of the distances computed every hour using the above rules.
3.6.1 Results
In this case study, we created a synthetic test dataset to investigate both spatial anomalies (Scenario 1, see Figure 4(b)) and temporal anomalies (Scenario 3, see Figure 4(d)). The dataset comprises three days of randomly simulated normal behavior and two days of anomalous behavior. Figure 5(a) illustrates the node-level distances for both anomalous days. On Day 1, new anomalous nodes appear at hour 8 (movie theater) and hour 13 (coming back). Anomalous events on Day 3 occur at hours 2, 8, and 9. Figure 5(b) depicts the day-level anomalies; the anomalous distance for Day 1 is higher than for Day 3, reflecting the student’s travel to a more distant location on Day 1, whereas, on Day 3, the anomalies involve the same POIs.
3.7 Scalability with Reeb Graphs
We successfully applied ReeSPOT to a simulated dataset that is closer to a real-life distribution. This data is an extended version of the data that we described in this paper for proof-of-concept. Here, instead of modeling weekdays of data sampled every hour, we model the patterns over a month sampled at every 15-second interval. This results in and . For this dataset, ReeSPOT models the patterns of daily activities for a simulated population of 800,000 agents. Each agent is processed independently, and the Reeb graphs for the entire dataset were constructed within 7.2 hours, parallel processed across 384 CPU cores (AMD EPYC 9654 @ 3.7 GHz). We also implemented the spatial Reeb graph, ReeBundle as proposed in [13] but the quadratic time complexity with respect to made it computationally challenging. More specifically, for and , the Reeb graph construction took around 4 minutes for an agent. ReeSPOT is linear with respect to and thus for the same problem setting it was able to construct Reeb graphs in approximately 12 seconds on one CPU core. This is an important advantage over spatial Reeb graphs which helps us to apply our method on large-scale datasets. Multi-processing across 384 cores enabled us to construct Reeb graphs in less than 8 hours. We also tested ReeSPOT on medium-sized data with 10,000 agents over a period of one week, Reeb Graphs were computed in approximately 5.5 minutes. The above experiments show the applicability of ReeSPOT in modeling agent’s data at different resolutions (weekly, monthly, yearly) and also emphasize the scalability of the proposed algorithm.
4 Discussion and Future Work
In this paper, we proposed a Reeb graph-based approach (ReeSPOT) to model the patterns of normalcy using day-to-day human trajectory data. The proposed Reeb graphs abstract large-scale spatio-temporal data into a comprehensible topological construct. We design distinct real-life anomalous scenarios, develop trajectory generation methods, and provide a thorough interpretation of Reeb graph results. The parameters of ReeSPOT can control the granularity of the model according to different applications. On the other hand, ReeSPOT depends on the quality of the trajectory, so false positives can impact the accuracy of the model. One explanation for this is the inherent stochasticity of general human behavior.
Another application is a quantifiable sanity check for raw trajectory data such as teleports. We synthesized such scenarios and observed additional nodes in the Reeb graphs. Our experiment setting in this paper is based on the assumption that each agent is independent and the activities conducted by one agent are not related to the other. However, agents in a given population influence the behavior of each other. Such correlations could serve as additional features to our existing model. ReeSPOT has the flexibility to introduce more parameters and features to robustly support the data abstraction. Geo-foundational features describe the nature of each location the agent visited such as residential, commercial, recreational, etc. Nodes of the Reeb graphs can be labeled with such domain-specific information. Such representation can be used as an input to data-driven methods instead of directly using deep learning methods on raw GPS trajectories.
5 Acknowledgement
We would like to thank Kin Gwn Lore for the invaluable insights and assistance throughout this project. This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number 140D0423C0057 The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government.
References
- [1] Biasotti, S., Giorgi, D., Spagnuolo, M., Falcidieno, B.: Reeb graphs for shape analysis and applications. Theoretical computer science 392(1-3), 5–22 (2008)
- [2] Buchin, K., Buchin, M., van Kreveld, M., Speckmann, B., Staals, F.: Trajectory grouping structure. In: Workshop on Algorithms and Data Structures. pp. 219–230. Springer (2013)
- [3] Eagle, N., Pentland, A.: Reality mining: sensing complex social systems. Personal and ubiquitous computing 10, 255–268 (2006)
- [4] Giannotti, F., Nanni, M., Pinelli, F., Pedreschi, D.: Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 330–339 (2007)
- [5] Guo, D., Liu, S., Jin, H.: A graph-based approach to vehicle trajectory analysis. Journal of Location Based Services 4(3-4), 183–199 (2010)
- [6] Hast, M., Searle, K.M., Chaponda, M., Lupiya, J., Lubinda, J., Sikalima, J., Kobayashi, T., Shields, T., Mulenga, M., Lessler, J., et al.: The use of gps data loggers to describe the impact of spatio-temporal movement patterns on malaria control in a high-transmission area of northern zambia. International Journal of Health Geographics 18, 1–18 (2019)
- [7] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
- [8] Kinoshita, A., Takasu, A., Aihara, K., Ishii, J., Kurasawa, H., Sato, H., Nakamura, M., Adachi, J.: Gps trajectory data enrichment based on a latent statistical model. In: International Conference on Pattern Recognition Applications and Methods. vol. 2, pp. 255–262. SCITEPRESS (2016)
- [9] Lopez, P.A., Behrisch, M., Bieker-Walz, L., Erdmann, J., Flötteröd, Y.P., Hilbrich, R., Lücken, L., Rummel, J., Wagner, P., Wießner, E.: Microscopic traffic simulation using sumo. In: The 21st IEEE International Conference on Intelligent Transportation Systems. IEEE (2018), https://elib.dlr.de/124092/
- [10] Luca, M., Barlacchi, G., Lepri, B., Pappalardo, L.: A survey on deep learning for human mobility. ACM Computing Surveys (CSUR) 55(1), 1–44 (2021)
- [11] Qi, S., Bouros, P., Sacharidis, D., Mamoulis, N.: Efficient point-based trajectory search. In: International Symposium on Spatial and Temporal Databases. pp. 179–196. Springer (2015)
- [12] Sabarish, B., Karthi, R., Kumar, T.G.: Graph similarity-based hierarchical clustering of trajectory data. Procedia Computer Science 171, 32–41 (2020)
- [13] Shailja, S., Bhagavatula, V., Cieslak, M., Vettel, J.M., Grafton, S.T., Manjunath, B.: Reebundle: a method for topological modeling of white matter pathways using diffusion mri. IEEE Transactions on Medical Imaging (2023)
- [14] Shailja, S., Chen, J.W., Grafton, S.T., Manjunath, B.: Retrace: Topological evaluation of white matter tractography algorithms using reeb graphs. In: International Workshop on Computational Diffusion MRI. pp. 177–191. Springer (2023)
- [15] Shailja, S., Zhang, A., Manjunath, B.: A computational geometry approach for modeling neuronal fiber pathways. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 175–185. Springer (2021)
- [16] Shinagawa, Y., Kunii, T.L., Kergosien, Y.L.: Surface coding based on morse theory. IEEE computer graphics and applications 11(05), 66–78 (1991)
- [17] Ta, N., Zhao, Y., Chai, Y.: Built environment, peak hours and route choice efficiency: An investigation of commuting efficiency using gps data. Journal of Transport Geography 57, 161–170 (2016)
- [18] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
- [19] Wu, L., Yang, L., Huang, Z., Wang, Y., Chai, Y., Peng, X., Liu, Y.: Inferring demographics from human trajectories and geographical context. Computers, Environment and Urban Systems 77, 101368 (2019)
- [20] Zeng, J., He, X., Tang, H., Wen, J.: A next location predicting approach based on a recurrent neural network and self-attention. In: Collaborative Computing: Networking, Applications and Worksharing: 15th EAI International Conference, CollaborateCom 2019, London, UK, August 19-22, 2019, Proceedings 15. pp. 309–322. Springer (2019)
- [21] Zhang, D., Lee, K., Lee, I.: Mining hierarchical semantic periodic patterns from gps-collected spatio-temporal trajectories. Expert Systems with Applications 122, 85–101 (2019)
- [22] Zheng, Y., Liu, L., Wang, L., Xie, X.: Learning transportation mode from raw gps data for geographic applications on the web. In: Proceedings of the 17th international conference on World Wide Web. pp. 247–256 (2008)