Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
11institutetext: Electrical and Computer Engineering Department
University of California Santa Barbara

ReeSPOT: Reeb Graph Models Semantic Patterns of Normalcy in Human Trajectories

Bowen Zhang Equal Contributors    S. Shailja11footnotemark: 1    Chandrakanth Gudavalli    Connor Levenson    Amil Khan    B. S. Manjunath
Abstract

This paper introduces ReeSPOT, a novel Reeb graph-based method to model patterns of life in human trajectories (akin to a fingerprint). Human behavior typically follows a pattern of normalcy in day-to-day activities. This is marked by recurring activities within specific time periods. In this paper, we model this behavior using Reeb graphs where any deviation from usual day-to-day activities is encoded as nodes in the Reeb graph. The complexity of the proposed algorithm is linear with respect to the number of time points in a given trajectory. We demonstrate the usage of ReeSPOT and how it captures the critically significant spatial and temporal deviations using the nodes of the Reeb graph. Our case study presented in this paper includes realistic human movement scenarios: visiting uncommon locations, taking odd routes at infrequent times, uncommon time visits, and uncommon stay durations. We analyze the Reeb graph to interpret the topological structure of the GPS trajectories. Potential applications of ReeSPOT include urban planning, security surveillance, and behavioral research.

Keywords:
Reeb Graphs Graph Networks Trajectory Analysis

1 Introduction

Recently, there has been an increase in location-aware devices that use the Global Positioning System (GPS) for many applications such as finding efficient routes [17], fitness apps, understanding the progression of infectious diseases [6], and predicting demographic information [19]. This collection of movements, and thus vast amounts of raw trajectories, spotlights the need for a scalable representation of these trajectories that preserves and highlights the structure and topologically important movement patterns (Figure 1).

Human movement analysis is the core component of behavioral research, urban planning, and computational sociology [3], which helps in better modeling human behavior and predicting human movement patterns. Similarly, modeling normal human behavior can also help identify abnormal human behavior. In particular, given a set of movement patterns for a week, month, or year, we want to capture any change in the semantic “patterns of life”. In this paper, we model routine behaviors and movements that characterize daily human activities in a given city using a concept from topology, Reeb graphs.

Refer to caption
Figure 1: Map overlay of normal and anomalous trajectories from scenario 2 of the case study, annotated with semantic labels for points of interest (POIs).

Traditional trajectory analysis methods are largely based on hand-crafted geometric features and statistical techniques. Such features include traveling distance, mean velocity [22], frequencies of areas or moving patterns [4]. Statistical approaches analyze the temporal patterns with respect to the frequency of trajectory data to identify patterns such as traveling modes [8] and periodic patterns [21]. These approaches are effective for handling structured and less complex data sets but fail to generalize with high-dimensional data or the dynamic nature of human mobility patterns.

Given the amount of GPS data that can be generated by one human on a single day, another obvious direction to look at would be toward data-driven learning methods. Specifically, sampling a single agent’s movement data, sampled at a 1Hz frequency over a month, accumulates roughly 2 million data points.

Extrapolating these figures to a population of a small city like Santa Barbara, with approximately 97,000 agents, results in a dataset comprising an immense 194 billion data points. This scale poses substantial challenges in terms of computational resources and data management, and extrapolating to larger cities, such as New York City, would significantly magnify these challenges. Recent advances in deep learning have significantly enhanced the capability to model human mobility patterns by performing the next-location prediction [10]. Particularly, long short-term memory networks (LSTMs) [7] and attention-based models like Transformers [18] are good at capturing temporal regularities and anomalies in movement patterns. However, these black-box models lack interpretability, thus limiting their applicability in real-time scenarios [20].

Towards interoperability along with large-scale modeling, Graph-based methods are very popular due to their ability to represent complex spatial relationships and movement patterns efficiently. We need models that can succinctly summarize an agent’s trajectory data—retaining essential information while discarding redundancies. Transforming GPS data into graph data structures with nodes as significant geographic locations and edges as the movement information between enables intuitive models for pattern-of-life. Research directions include, Guo et al. [5]’s graph model to establish precise topological relationships among trajectories and geographic locations. Qi et al. [11] incorporate hybrid methods that blend graph-based approaches with statistical models to improve the accuracy of trajectory searches and predictions. Another such work focuses on hierarchical clustering based on graph similarity measures[12], further supporting the need for computational geometry.

In this paper, we use Reeb graphs to cluster the common behavior pattern for a given agent. Our research is motivated by and related to previous research on the construction of Reeb graphs for trajectory data [2, 13]. A Reeb graph captures the connectivity of level sets of a scalar function defined over a space, effectively summarizing the topological features of the space. In the context of trajectory data, scalar functions could represent attributes such as speed, direction, semantics, or geographical points of interest. Reeb graphs can thus map complex trajectories into more interpretable topological constructs. This abstraction facilitates the detection of anomalies by comparing the topological signatures of trajectories and identifying those that differ significantly from the norm. Our main contributions are summarized below:

  • We propose a novel Reeb graph-based approach to model the day-to-day activities of a given agent. To the best of our knowledge, this is the first demonstration of Reeb graphs to fingerprint an agent’s behavior.

  • We discuss the algorithm and its time complexity demonstrating the scalability of the proposed method.

  • We design normal and anomalous scenarios, describe the methods for trajectory generation and present detailed experiments on the interpretation and analysis of Reeb graphs.

2 Methodology

2.1 Previous work on Reeb graphs

Reeb graph was first proposed to study the topology of a manifold [16]. Nodes of the Reeb graph encode the evolution of the level sets of a real-valued function on a manifold. The location of the node is the average location of the points of the trajectories that constitute the node. Reeb graphs have been extensively used in shape analysis for diverse datasets [1]. The first study of Reeb graphs for trajectory group evolvement encodes the merging and splitting structure between different moving entities [2]. Similarly, the spatial subtrajectory clustering algorithm presented a stricter problem [13, 14, 15] but discovers geometric and topological substructure. This is a computationally challenging problem because the initialization step involves an exhaustive search of an agent’s events. Motivated by these challenges, the central focus of this paper is to develop a method for fingerprinting the behavior of an agent over time such as days, weeks, and months. Our approach encodes significant spatio-temporal points of interest—specifically, locations and durations that define critical aspects of an agent’s behavior. We redefine the grouping definitions used in our adapted Reeb graph model. The constructed Reeb graphs effectively partition a set of GPS points into meaningful nodes and edges, thereby quantifying and identifying path deviations.

Algorithm 1 Find connect and disconnect events
1:Input: Trajectories T𝑇Titalic_T and Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, threshold ϵitalic-ϵ\epsilonitalic_ϵ
2:Output: Dictionary of connect/disconnect events, eventsT,T𝑒𝑣𝑒𝑛𝑡subscript𝑠𝑇superscript𝑇events_{T,T^{\prime}}italic_e italic_v italic_e italic_n italic_t italic_s start_POSTSUBSCRIPT italic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
3:Initialize eventsT,T𝑒𝑣𝑒𝑛𝑡subscript𝑠𝑇superscript𝑇events_{T,T^{\prime}}italic_e italic_v italic_e italic_n italic_t italic_s start_POSTSUBSCRIPT italic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT as an empty dictionary
4:Initialize k0𝑘0k\leftarrow 0italic_k ← 0
5:Initialize connect_flagFalse𝑐𝑜𝑛𝑛𝑒𝑐𝑡_𝑓𝑙𝑎𝑔Falseconnect\_flag\leftarrow\text{False}italic_c italic_o italic_n italic_n italic_e italic_c italic_t _ italic_f italic_l italic_a italic_g ← False \Whilek<m𝑘𝑚k<mitalic_k < italic_m \Ifd(T[tk],T[tk])<ϵ𝑑𝑇delimited-[]subscript𝑡𝑘superscript𝑇delimited-[]subscript𝑡𝑘italic-ϵd(T[t_{k}],T^{\prime}[t_{k}])<\epsilonitalic_d ( italic_T [ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) < italic_ϵ
6:eventsT,T[tk]connect𝑒𝑣𝑒𝑛𝑡subscript𝑠𝑇superscript𝑇delimited-[]subscript𝑡𝑘connectevents_{T,T^{\prime}}[t_{k}]\leftarrow\text{{connect}}italic_e italic_v italic_e italic_n italic_t italic_s start_POSTSUBSCRIPT italic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ← connect
7:connect_flagTrue𝑐𝑜𝑛𝑛𝑒𝑐𝑡_𝑓𝑙𝑎𝑔Trueconnect\_flag\leftarrow\text{True}italic_c italic_o italic_n italic_n italic_e italic_c italic_t _ italic_f italic_l italic_a italic_g ← True \Whilek<m𝑘𝑚k<mitalic_k < italic_m and d(T[tk],T[tk])<ϵ𝑑𝑇delimited-[]subscript𝑡𝑘superscript𝑇delimited-[]subscript𝑡𝑘italic-ϵd(T[t_{k}],T^{\prime}[t_{k}])<\epsilonitalic_d ( italic_T [ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) < italic_ϵ
8:kk+1𝑘𝑘1k\leftarrow k+1italic_k ← italic_k + 1 \EndWhile\Ifk<m𝑘𝑚k<mitalic_k < italic_m
9:eventsT,T[tk]disconnect𝑒𝑣𝑒𝑛𝑡subscript𝑠𝑇superscript𝑇delimited-[]subscript𝑡𝑘disconnectevents_{T,T^{\prime}}[t_{k}]\leftarrow\text{{disconnect}}italic_e italic_v italic_e italic_n italic_t italic_s start_POSTSUBSCRIPT italic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ← disconnect \EndIf\EndIf
10:kk+1𝑘𝑘1k\leftarrow k+1italic_k ← italic_k + 1 \EndWhile
11:\ReturneventsT,T𝑒𝑣𝑒𝑛𝑡subscript𝑠𝑇superscript𝑇events_{T,T^{\prime}}italic_e italic_v italic_e italic_n italic_t italic_s start_POSTSUBSCRIPT italic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT

2.2 Reeb graph models agent pattern of normalcy

A trajectory T𝑇Titalic_T is defined as a dictionary (key: value) containing an ordered sequence of time points and their associated GPS coordinates:

T={t0:p0,t1:p1,t2:p2,,tm:pm},𝑇conditional-setsubscript𝑡0:subscript𝑝0subscript𝑡1subscript𝑝1subscript𝑡2:subscript𝑝2subscript𝑡𝑚:subscript𝑝𝑚\displaystyle T=\{t_{0}:p_{0},t_{1}:p_{1},t_{2}:p_{2},\ldots,t_{m}:p_{m}\},italic_T = { italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT : italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } , (1)

where m𝑚mitalic_m is chosen according to the desired resolution to sample the pattern of the agent. Here m𝑚mitalic_m denotes the total number of points in a given trajectory T𝑇Titalic_T. The frequency of GPS data sampling decides m𝑚mitalic_m. For example, to model the weekdays of an agent’s activities, the raw GPS data is sampled every second, giving us m=86400𝑚86400m=86400italic_m = 86400 which is the total number of seconds in a day. Similarly, if the data is sampled every hour, then m=24𝑚24m=24italic_m = 24 points per day. We define n𝑛nitalic_n as the total number of trajectories for a given agent. For example, to model month-long data, n=30𝑛30n=30italic_n = 30 and for weekdays, n=5𝑛5n=5italic_n = 5. The common setting used throughout the paper for our problem definition is m=24𝑚24m=24italic_m = 24 and n=5𝑛5n=5italic_n = 5. Each time point tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponds to a GPS coordinate pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT representing the position of the agent at time tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. pi=(lati,loni)subscript𝑝𝑖subscriptlat𝑖subscriptlon𝑖p_{i}=(\text{lat}_{i},\text{lon}_{i})italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( lat start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , lon start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), where latisubscriptlat𝑖\text{lat}_{i}lat start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the latitude and longisubscriptlong𝑖\text{long}_{i}long start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the longitude. The Euclidean distance between two GPS coordinates pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and pisubscript𝑝superscript𝑖p_{i^{\prime}}italic_p start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is calculated at time tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as follows:

d(pi,pi)=(latilati)2+(loniloni)2,𝑑subscript𝑝𝑖subscriptsuperscript𝑝𝑖superscriptsubscriptlat𝑖subscriptsuperscriptlat𝑖2superscriptsubscriptlon𝑖subscriptsuperscriptlon𝑖2\displaystyle d(p_{i},p^{\prime}_{i})=\sqrt{(\text{lat}_{i}-\text{lat}^{\prime% }_{i})^{2}+(\text{lon}_{i}-\text{lon}^{\prime}_{i})^{2}},italic_d ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = square-root start_ARG ( lat start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - lat start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( lon start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - lon start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (2)

where latisubscriptlat𝑖\text{lat}_{i}lat start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and lonisubscriptlon𝑖\text{lon}_{i}lon start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the latitude and longitude of the first point, and latisubscriptsuperscriptlat𝑖\text{lat}^{\prime}_{i}lat start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and lonisubscriptsuperscriptlon𝑖\text{lon}^{\prime}_{i}lon start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are those of the second point. d(pi,pi)𝑑subscript𝑝𝑖subscriptsuperscript𝑝𝑖d(p_{i},p^{\prime}_{i})italic_d ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) gives the 2-norm distance between two points on the Euclidean plane. This approximates the geographic distance of the points. The algorithm is defined with respect to a distance threshold ϵitalic-ϵ\epsilonitalic_ϵ within which the points are considered sufficiently close together i.e. within a small geographical area. This is the inter-trajectory distance that guides the granularity of the Reeb graphs according to the problem definition.

Refer to caption
Figure 2: Reeb Graph Construction Over Time. We show the construction of Reeb graphs R(V,E)𝑅𝑉𝐸R(V,E)italic_R ( italic_V , italic_E ) for a set of five trajectories. The appear, disappear, connect, and disconnect events are shown on the left-hand side. Changes in the grouping of trajectories due to these events are encoded as nodes on the right-hand side. Nodes of the Reeb graph \mathcal{R}caligraphic_R on the right-hand side are shown in red color and the edges are shown in black color throughout the paper.

Human behavior typically follows a pattern of normalcy in day-to-day activities. This is marked by recurring activities within specific time periods. In order to discover the large-scale spatio-temporal patterns, we represent the bundling structure of trajectories as a Reeb graph R(V,E)𝑅𝑉𝐸R(V,E)italic_R ( italic_V , italic_E ). Nodes of the Reeb graph will pinpoint critical GPS points of the agent’s pattern. Intuitively, if a continuous portion of a behavior of the agent happens at the same time and within the same spatial distance (ϵitalic-ϵ\epsilonitalic_ϵ) every day then they present a pattern of normalcy. We formalize this by introducing the concept of “bundles” to characterize normal behavior through consistent daily subtrajectory events. Each trajectory begins with an appear event at the first index and concludes with a disappear event at the last index of T𝑇Titalic_T. Deviations from this norm by more than ϵitalic-ϵ\epsilonitalic_ϵ are classified as disconnect events, while a return to the norm is labeled a connect event. Formally, for a given ϵitalic-ϵ\epsilonitalic_ϵ and m=23𝑚23m=23italic_m = 23 i.e. sampled every hour, let’s take two trajectories T𝑇Titalic_T and Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT:

  • At time t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT: p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and p0subscriptsuperscript𝑝0p^{\prime}_{0}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are the appear events.

  • At time t23subscript𝑡23t_{23}italic_t start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT: p23subscript𝑝23p_{23}italic_p start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT and p23subscriptsuperscript𝑝23p^{\prime}_{23}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT are the disappear events.

  • If d(p0,p0)ϵ,(p1,p1)ϵ,,d(pk,pk)ϵformulae-sequence𝑑subscript𝑝0subscriptsuperscript𝑝0italic-ϵformulae-sequencesubscript𝑝1subscriptsuperscript𝑝1italic-ϵ𝑑subscript𝑝𝑘subscriptsuperscript𝑝𝑘italic-ϵd(p_{0},p^{\prime}_{0})\leq\epsilon,(p_{1},p^{\prime}_{1})\leq\epsilon,\ldots,% d(p_{k},p^{\prime}_{k})\leq\epsilonitalic_d ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≤ italic_ϵ , ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≤ italic_ϵ , … , italic_d ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ italic_ϵ, but d(pk+1,pk+1)>ϵ𝑑subscript𝑝𝑘1subscriptsuperscript𝑝𝑘1italic-ϵd(p_{k+1},p^{\prime}_{k+1})>\epsilonitalic_d ( italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) > italic_ϵ, then tk+1subscript𝑡𝑘1t_{k+1}italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT represents a disconnect event between T𝑇Titalic_T and Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

  • If d(p0,p0)>ϵ,(p1,p1)>ϵ,,d(pk,pk)>ϵformulae-sequence𝑑subscript𝑝0subscriptsuperscript𝑝0italic-ϵformulae-sequencesubscript𝑝1subscriptsuperscript𝑝1italic-ϵ𝑑subscript𝑝𝑘subscriptsuperscript𝑝𝑘italic-ϵd(p_{0},p^{\prime}_{0})>\epsilon,(p_{1},p^{\prime}_{1})>\epsilon,\ldots,d(p_{k% },p^{\prime}_{k})>\epsilonitalic_d ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) > italic_ϵ , ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) > italic_ϵ , … , italic_d ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) > italic_ϵ, but d(pk+1,pk+1)ϵ𝑑subscript𝑝𝑘1subscriptsuperscript𝑝𝑘1italic-ϵd(p_{k+1},p^{\prime}_{k+1})\leq\epsilonitalic_d ( italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ≤ italic_ϵ, then tk+1subscript𝑡𝑘1t_{k+1}italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT represents a connect event between T𝑇Titalic_T and Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Algorithm 2 Construction of Reeb Graph
ConstructReebGraphset of events for all pairs of trajectories (E𝐸Eitalic_E) steps k𝑘kitalic_kfrom 0 to |E| \triangleright Dynamic Graphs \Ifappear event of T𝑇Titalic_T
insert new node T𝑇Titalic_T to Gksubscript𝐺𝑘G_{k}italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT \EndIf\Ifdisappear event of T𝑇Titalic_T
delete node T𝑇Titalic_T from Gksubscript𝐺𝑘G_{k}italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT \EndIf\Ifconnect event between Txsubscript𝑇𝑥T_{x}italic_T start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Tysubscript𝑇𝑦T_{y}italic_T start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT
insert edge (Tx,Ty)subscript𝑇𝑥subscript𝑇𝑦(T_{x},T_{y})( italic_T start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) to Gksubscript𝐺𝑘G_{k}italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT \EndIf\Ifdisconnect event of trajectories Txsubscript𝑇𝑥T_{x}italic_T start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Tysubscript𝑇𝑦T_{y}italic_T start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT
delete edge (Tx,Ty)subscript𝑇𝑥subscript𝑇𝑦(T_{x},T_{y})( italic_T start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) from Gksubscript𝐺𝑘G_{k}italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT \EndIf
P𝑃absentP\leftarrowitalic_P ← empty bundle partition\triangleright Bundle Partition
Query Gk1subscript𝐺𝑘1G_{k-1}italic_G start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT and Gksubscript𝐺𝑘G_{k}italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to get the connected components Ck1subscript𝐶𝑘1C_{k-1}italic_C start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT and Cksubscript𝐶𝑘C_{k}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT respectively; \ForAllconnected component ckCksubscript𝑐𝑘subscript𝐶𝑘c_{k}\in C_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT \IfckCk1subscript𝑐𝑘subscript𝐶𝑘1c_{k}\in C_{k-1}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT
assign the same bundle id Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to the points for trajectories in cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT; \Else
create a new bundle id Bi+1subscript𝐵𝑖1B_{i+1}italic_B start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT and assign it to the points for trajectories in cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT; \EndIf
Add Bi+1subscript𝐵𝑖1B_{i+1}italic_B start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT to P𝑃Pitalic_P \EndFor\EndFor
Construct Reeb graph R𝑅Ritalic_R from P𝑃Pitalic_P by connecting adjacent bundles with nodes and bundles as edges; \triangleright Construct Reeb graph \ReturnR𝑅Ritalic_R \EndFunction
\Function
\ForAll

2.3 Construction of Reeb graphs and analysis of time complexity

Reeb graph construction (illustrated in Figure 2) can be divided into the following major steps: event computation, construction of dynamic graphs (G𝐺Gitalic_Gs), connectivity query in the dynamic graph for bundle partition (P𝑃Pitalic_P), and construction of the Reeb graphs (R𝑅Ritalic_R) from bundles partition as shown in Figure 2. The first step of Reeb graph construction involves computing the connect and disconnect events. Algorithm 1 outlines the steps of computing events. The event computation takes 𝒪(m)𝒪𝑚\mathcal{O}(m)caligraphic_O ( italic_m ) time, where m𝑚mitalic_m represents the number of time points in the trajectories T𝑇Titalic_T and Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. At each time point, the algorithm looks for 𝒪(5×5)𝒪55\mathcal{O}(5\times 5)caligraphic_O ( 5 × 5 ) possibilities of potential events. The second step of the Reeb graph involves handling the events to construct dynamic graph G𝐺Gitalic_Gs. The nodes of G𝐺Gitalic_G represent the daily trajectories and the edges of the G𝐺Gitalic_G represent the ϵitalic-ϵ\epsilonitalic_ϵ-connectivity between them. The total number of nodes in G𝐺Gitalic_G is 5 representing one trajectory for each day of the agent. The connected component of the G𝐺Gitalic_G will give us the ϵlimit-fromitalic-ϵ\epsilon-italic_ϵ -step bundle partition of subtrajectories denoted by P={B1,B2,,Bk}𝑃𝐵1𝐵2𝐵𝑘P=\{B1,B2,\ldots,Bk\}italic_P = { italic_B 1 , italic_B 2 , … , italic_B italic_k } such that every segment in T0,T1,T2,T3,T4subscript𝑇0subscript𝑇1subscript𝑇2subscript𝑇3subscript𝑇4{T_{0},T_{1},T_{2},T_{3},T_{4}}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT is uniquely assigned to exactly one bundle. The final step is to construct the Reeb graph from these bundles. Reeb graph R𝑅Ritalic_R can be constructed from P𝑃Pitalic_P by connecting adjacent bundles with nodes and bundles as edges similar to the described construction in  [13]. So, the time complexity of the Reeb graph construction step would be 𝒪(m)𝒪𝑚\mathcal{O}(m)caligraphic_O ( italic_m ) because in the worst case, all the time points will have events. At each time, the connectivity query to the dynamic graph with 5 nodes takes constant time. The more detailed steps can be found in the Algorithm 2.

3 Experimentation/Case Study

3.1 Data generation

We model the pattern of life of a single agent over different trajectories. Each trajectory is simulated using the SUMO software package [9] and represents realistic behavior and movement patterns over the course of one week. We construct the Reeb graph for each trajectory and show how it sufficiently represents the trajectory’s information with significantly fewer nodes.

Refer to caption
Figure 3: 3D trajectory plots with computed Reeb graph nodes for scenario 1 in Section 3, where day 0 to day 4 are normal trajectories, and the anomalous trajectory is in red.

In this case study, we analyze the behavioral patterns of a simulated high-school student from the city of Santa Barbara, California (Figure 1), using trajectory data that includes multiple points of interest (POIs), such as the student’s home, school, park, grocery store, and lake. The student’s daily routine typically consists of attending school from approximately 8:00 AM to 9:00 AM, concluding at around 4:00 PM to 5:00 PM, followed by visits to recreational sites before returning home. To thoroughly investigate both normal and anomalous behavioral patterns, we generated five days of normal trajectory data, complemented by additional days tailored to each specific scenario described earlier. Each trajectory entry is recorded with timestamps, latitude, and longitude coordinates. Figure 1 displays the student’s trajectories across different POI locations for the rare location scenario, illustrating the distribution of both routine and deviant movements. Figure 3 displays the same data as a 3D plot, providing a clear spatio-temporal visualization of the student’s stay locations, duration, and revisit frequencies.

3.2 Definition of anomalous behavior

We define L𝐿Litalic_L as a set of normal POIs and their corresponding time points,

L={(lat1,lon1,t1),(lat2,lon2,t2),,(latn,lonn,tn)}𝐿𝑙𝑎subscript𝑡1𝑙𝑜subscript𝑛1subscript𝑡1𝑙𝑎subscript𝑡2𝑙𝑜subscript𝑛2subscript𝑡2𝑙𝑎subscript𝑡𝑛𝑙𝑜subscript𝑛𝑛subscript𝑡𝑛L=\{(lat_{1},lon_{1},t_{1}),(lat_{2},lon_{2},t_{2}),\dots,(lat_{n},lon_{n},t_{% n})\}italic_L = { ( italic_l italic_a italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l italic_o italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_l italic_a italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_l italic_o italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , ( italic_l italic_a italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_l italic_o italic_n start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }

where (lati,loni)𝑙𝑎subscript𝑡𝑖𝑙𝑜subscript𝑛𝑖(lat_{i},lon_{i})( italic_l italic_a italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_l italic_o italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) represents the geographic coordinates with lati[90,90]𝑙𝑎subscript𝑡𝑖9090lat_{i}\in[-90,90]italic_l italic_a italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ - 90 , 90 ] and loni[180,180]𝑙𝑜subscript𝑛𝑖180180lon_{i}\in[-180,180]italic_l italic_o italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ - 180 , 180 ], and tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the time at which these coordinates were recorded. Relative to this definition, all the anomaly behaviors for a given agent are defined as follows:

Algorithm 3 Trajectory Generation
1:Inputs:
2: POIs – List of Points of Interest as coordinates on a map.
3:TimeListn𝑇𝑖𝑚𝑒𝐿𝑖𝑠subscript𝑡𝑛TimeList_{n}italic_T italic_i italic_m italic_e italic_L italic_i italic_s italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT – Dictionary mapping each POI to normal visit times.
4:TimeLista𝑇𝑖𝑚𝑒𝐿𝑖𝑠subscript𝑡𝑎TimeList_{a}italic_T italic_i italic_m italic_e italic_L italic_i italic_s italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT – Dictionary mapping each POI to abnormal visit times.
5: Road Network – Road network graph for route generation.
6:Output:
7:T𝑇Titalic_T – A list of normal trajectories of an agent visiting specified POIs.
8:Tsuperscript𝑇T^{*}italic_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT – A list of abnormal trajectories of an agent visiting specified POIs.
9:Initialize Trajectories list \Foreach POI in the POIs list
10:Select TimeList𝑇𝑖𝑚𝑒𝐿𝑖𝑠𝑡TimeListitalic_T italic_i italic_m italic_e italic_L italic_i italic_s italic_t based on a decision rule (normal vs abnormal) \Foreach time in TimeList𝑇𝑖𝑚𝑒𝐿𝑖𝑠𝑡TimeListitalic_T italic_i italic_m italic_e italic_L italic_i italic_s italic_t
11:Generate a starting point for the agent
12:Use duarouter to calculate the shortest path from the starting point to the POI at the given time
13:Pass the list of edges to SUMO for movement simulation
14:Collect the output trajectory from SUMO
15:Append to T𝑇Titalic_T or Tsuperscript𝑇T^{*}italic_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT based on decision rule \EndFor\EndFor
16:\ReturnT𝑇Titalic_T, Tsuperscript𝑇T^{*}italic_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT

Scenario 1 (S1): Rare Location Anomaly Rare location anomaly refers to a scenario when an agent visits a new location (lat,lon,ti)L𝑙𝑎superscript𝑡𝑙𝑜superscript𝑛subscript𝑡𝑖𝐿(lat^{*},lon^{*},t_{i})\notin L( italic_l italic_a italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_l italic_o italic_n start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∉ italic_L. (lat,lon)𝑙𝑎superscript𝑡𝑙𝑜superscript𝑛(lat^{*},lon^{*})( italic_l italic_a italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_l italic_o italic_n start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is spatially different from their normal spatial geographical points of interest such as school or work. Reeb graph will encode this rare location by creating a new node localizing the abnormality.

Scenario 2 (S2): Rare Route Visit Anomaly In this scenario, the agent visits the same POI locations multiple times but utilizes a uniquely different route on a single journey. This introduces disconnect event from their normal movement pattern, resulting in a new node in the Reeb graph. More formally, if (lat,lon,tk:l)(lat,lon,t1:k1)𝑙𝑎superscript𝑡𝑙𝑜superscript𝑛subscript𝑡:𝑘𝑙𝑙𝑎𝑡𝑙𝑜𝑛subscript𝑡:1𝑘1(lat^{*},lon^{*},t_{k:l})\notin(lat,lon,t_{1:k-1})( italic_l italic_a italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_l italic_o italic_n start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_k : italic_l end_POSTSUBSCRIPT ) ∉ ( italic_l italic_a italic_t , italic_l italic_o italic_n , italic_t start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT ) and (lat,lon,tl+1:m)absent𝑙𝑎𝑡𝑙𝑜𝑛subscript𝑡:𝑙1𝑚\notin(lat,lon,t_{l+1:m})∉ ( italic_l italic_a italic_t , italic_l italic_o italic_n , italic_t start_POSTSUBSCRIPT italic_l + 1 : italic_m end_POSTSUBSCRIPT ), then nodes vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and vlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT will be added to R𝑅Ritalic_R.

Scenario 3 (S3): Uncommon Time Visit This is a case of time violation where the agent visits a familiar location at an uncommon time tsuperscript𝑡t^{*}italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT i.e, (lati,loni,t)(lati,loni,ti)𝑙𝑎subscript𝑡𝑖𝑙𝑜subscript𝑛𝑖superscript𝑡𝑙𝑎subscript𝑡𝑖𝑙𝑜subscript𝑛𝑖subscript𝑡𝑖(lat_{i},lon_{i},t^{*})\neq(lat_{i},lon_{i},t_{i})( italic_l italic_a italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_l italic_o italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≠ ( italic_l italic_a italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_l italic_o italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

Scenario 4 (S4): Uncommon Stay Duration Anomaly In this scenario the agent stays for an abnormal duration (ΔΔ\Deltaroman_Δ) at a specific location (lat,lon,ti+Δ)𝑙𝑎superscript𝑡𝑙𝑜superscript𝑛subscript𝑡𝑖Δ(lat^{*},lon^{*},t_{i+\Delta})( italic_l italic_a italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_l italic_o italic_n start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_i + roman_Δ end_POSTSUBSCRIPT ). This results in a disconnect event for the agent’s trajectory from the normal pattern of life at tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

3.3 Reeb Graph Generation

We use a down-sampling rate of one hour for Reeb graphs. This setting helps us to monitor changes in location grouping states at each hour. The threshold ϵitalic-ϵ\epsilonitalic_ϵ for spatial connect and disconnect events is set to 0.0005 GPS degrees (5.56 meters). Initially, we construct a Reeb graph from the normal activity trajectories of days 0 to 4 to model the student’s typical pattern of life.

As depicted in Figure 1 and Figure 3, ReeSPOT successfully identifies all normal POIs as a part of the Reeb graph nodes, demonstrating its efficacy in reflecting the spatial distribution of the student’s activities. Notably, an anomalous scenario depicted in Figure 1 and Figure 3 shows the student visiting a movie theater during school hours which is defined as a deviation from the normal. This is captured by a new Reeb graph node, highlighting its potential for identifying critical spatial anomalies.

Refer to caption
Figure 4: 2D Trajectory plots displaying time and latitude dimensions alongside computed Reeb graph nodes. These plots illustrate both normal and anomalous scenarios as outlined in Section 3.2. The detailed discussions on node generation and behavioral analysis can be found in Section 3.4.

3.4 Analysis and interpretation of scenarios using Reeb graphs

To better understand the formation of Reeb graph nodes and demonstrate the utility of the Reeb graph across all six scenarios, we generated time-latitude plots (Figure 4). These plots, with the hour of day on the x-axis and latitude on the y-axis, include trajectory points sampled every 10 seconds alongside Reeb graph nodes. Each plot provides a visual representation of different behavioral patterns and anomalies and illustrates ReeSPOT’s effectiveness in capturing anomalous trajectories for all scenarios. We explain the scenarios one by one below:

  • Figure 4(a) illustrates the student’s normal routine pattern, with stays at home, school, and visits to various recreational spots. Notable events include appear and disappear at the beginning and end of each day. There are three disconnect events around hour 17 which indicates divergences to different locations after school.Connect event shows trajectories getting merged back on the way home at hour 18.

  • Figure 4(b) for S1 depicts a rare location (lat,lon)𝑙𝑎superscript𝑡𝑙𝑜superscript𝑛(lat^{*},lon^{*})( italic_l italic_a italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_l italic_o italic_n start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) where we visualize an abnormal visit to the movie theater, showing three additional Reeb nodes and altered connectivity events at hour 9 and 14.

  • Figure 4(c) for S2 captures an alternative route to school. At hour 9, instead of following the normal route, the student deviates towards a direction with a lower latitude and then returns to school. This deviation is captured by the bottom Reeb graph node at hour 9. Additionally, a disconnect event occurs at 9, followed by a connect event at hour 10 when all trajectories converge at the school.

  • Figure 4(d) for S3 reveals an uncommon time anomaly, where the student attends school at hour 2 and travels to the park at around hour 10, significantly deviating from the typical schedule, but with the same POIs.

  • Figure 4(e) for S4 shows another time-related anomaly with a prolonged stay at home until almost hour 12, and similarly, 3 new nodes appear for the reeb graph because of disconnect event from the usual trajectory.

  • Figure 4(f) for S4 presents a detailed look at scenario 4, from hour 16 to hour 17. Since the reeb graph sample rate is one hour, the reeb graph nodes appear at hour 17 to represent the disconnect events in the past hour.

3.5 Reeb graph iteratively detects anomalous behavior of an agent

In the context of detecting anomalous trajectories within real-life data (test dataset), we iteratively construct Reeb graphs on the test dataset to identify daily anomalous trajectories. An initial Reeb graph is constructed using training data with all normal trajectories. Subsequently, for each daily trajectory in the test dataset, the Reeb graph is iteratively updated day by day. To detect anomalous behaviors effectively, we compute the distance between the existing Reeb graph and every updated version that includes the additional daily trajectory. The subsequent section details our methodology for calculating this distance and presents the results derived from our case study.

Refer to caption
Figure 5: (a) illustrates the Reeb graph node-level distances for both anomalous days. (b) shows the day-level anomaly scores.

3.6 Quantifying the distance between Reeb graphs

Given two Reeb graphs, a normal Reeb graph R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and a Reeb graph with one anomalous trajectory R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, each containing data points across dimensions of time (0 to 23 hours), the following rules are used to calculate the distance between Reeb graphs defined as d(R1,R2d(R_{1},R_{2}italic_d ( italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT):

  1. 1.

    For each hour, if nodes exist in both R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, calculate the Euclidean distance between the nodes.

  2. 2.

    If only one of the Reeb nodes graphs, R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, has a node at a particular hour, calculate the distance to the temporally closest node from the other Reeb graph.

  3. 3.

    If neither Reeb graph has a node for a given hour, the distance is 0.

Specifically, in point 2 above, we have a case where a node at time tksubscript𝑡𝑘t_{k}italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in Reeb graph R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT has no corresponding node in R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We find the Euclidean distance to the nodes in R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT at tk+1subscript𝑡𝑘1t_{k+1}italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. If there are multiple nodes in R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT at tk+1subscript𝑡𝑘1t_{k+1}italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT or tk1subscript𝑡𝑘1t_{k-1}italic_t start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT, then we select the one with the minimum distance. d(R1,R2d(R_{1},R_{2}italic_d ( italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) is the sum of the distances computed every hour using the above rules.

3.6.1 Results

In this case study, we created a synthetic test dataset to investigate both spatial anomalies (Scenario 1, see Figure 4(b)) and temporal anomalies (Scenario 3, see Figure 4(d)). The dataset comprises three days of randomly simulated normal behavior and two days of anomalous behavior. Figure 5(a) illustrates the node-level distances for both anomalous days. On Day 1, new anomalous nodes appear at hour 8 (movie theater) and hour 13 (coming back). Anomalous events on Day 3 occur at hours 2, 8, and 9. Figure 5(b) depicts the day-level anomalies; the anomalous distance for Day 1 is higher than for Day 3, reflecting the student’s travel to a more distant location on Day 1, whereas, on Day 3, the anomalies involve the same POIs.

3.7 Scalability with Reeb Graphs

We successfully applied ReeSPOT to a simulated dataset that is closer to a real-life distribution. This data is an extended version of the data that we described in this paper for proof-of-concept. Here, instead of modeling weekdays of data sampled every hour, we model the patterns over a month sampled at every 15-second interval. This results in m=5760𝑚5760m=5760italic_m = 5760 and n=30𝑛30n=30italic_n = 30. For this dataset, ReeSPOT models the patterns of daily activities for a simulated population of 800,000 agents. Each agent is processed independently, and the Reeb graphs for the entire dataset were constructed within 7.2 hours, parallel processed across 384 CPU cores (AMD EPYC 9654 @ 3.7 GHz). We also implemented the spatial Reeb graph, ReeBundle as proposed in  [13] but the quadratic time complexity with respect to m𝑚mitalic_m made it computationally challenging. More specifically, for n=7𝑛7n=7italic_n = 7 and m=5760𝑚5760m=5760italic_m = 5760, the Reeb graph construction took around 4 minutes for an agent. ReeSPOT is linear with respect to m𝑚mitalic_m and thus for the same problem setting it was able to construct Reeb graphs in approximately 12 seconds on one CPU core. This is an important advantage over spatial Reeb graphs which helps us to apply our method on large-scale datasets. Multi-processing across 384 cores enabled us to construct Reeb graphs in less than 8 hours. We also tested ReeSPOT on medium-sized data with 10,000 agents over a period of one week, Reeb Graphs were computed in approximately 5.5 minutes. The above experiments show the applicability of ReeSPOT in modeling agent’s data at different resolutions (weekly, monthly, yearly) and also emphasize the scalability of the proposed algorithm.

4 Discussion and Future Work

In this paper, we proposed a Reeb graph-based approach (ReeSPOT) to model the patterns of normalcy using day-to-day human trajectory data. The proposed Reeb graphs abstract large-scale spatio-temporal data into a comprehensible topological construct. We design distinct real-life anomalous scenarios, develop trajectory generation methods, and provide a thorough interpretation of Reeb graph results. The parameters of ReeSPOT can control the granularity of the model according to different applications. On the other hand, ReeSPOT depends on the quality of the trajectory, so false positives can impact the accuracy of the model. One explanation for this is the inherent stochasticity of general human behavior.

Another application is a quantifiable sanity check for raw trajectory data such as teleports. We synthesized such scenarios and observed additional nodes in the Reeb graphs. Our experiment setting in this paper is based on the assumption that each agent is independent and the activities conducted by one agent are not related to the other. However, agents in a given population influence the behavior of each other. Such correlations could serve as additional features to our existing model. ReeSPOT has the flexibility to introduce more parameters and features to robustly support the data abstraction. Geo-foundational features describe the nature of each location the agent visited such as residential, commercial, recreational, etc. Nodes of the Reeb graphs can be labeled with such domain-specific information. Such representation can be used as an input to data-driven methods instead of directly using deep learning methods on raw GPS trajectories.

5 Acknowledgement

We would like to thank Kin Gwn Lore for the invaluable insights and assistance throughout this project. This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number 140D0423C0057 The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government.

References

  • [1] Biasotti, S., Giorgi, D., Spagnuolo, M., Falcidieno, B.: Reeb graphs for shape analysis and applications. Theoretical computer science 392(1-3), 5–22 (2008)
  • [2] Buchin, K., Buchin, M., van Kreveld, M., Speckmann, B., Staals, F.: Trajectory grouping structure. In: Workshop on Algorithms and Data Structures. pp. 219–230. Springer (2013)
  • [3] Eagle, N., Pentland, A.: Reality mining: sensing complex social systems. Personal and ubiquitous computing 10, 255–268 (2006)
  • [4] Giannotti, F., Nanni, M., Pinelli, F., Pedreschi, D.: Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 330–339 (2007)
  • [5] Guo, D., Liu, S., Jin, H.: A graph-based approach to vehicle trajectory analysis. Journal of Location Based Services 4(3-4), 183–199 (2010)
  • [6] Hast, M., Searle, K.M., Chaponda, M., Lupiya, J., Lubinda, J., Sikalima, J., Kobayashi, T., Shields, T., Mulenga, M., Lessler, J., et al.: The use of gps data loggers to describe the impact of spatio-temporal movement patterns on malaria control in a high-transmission area of northern zambia. International Journal of Health Geographics 18, 1–18 (2019)
  • [7] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
  • [8] Kinoshita, A., Takasu, A., Aihara, K., Ishii, J., Kurasawa, H., Sato, H., Nakamura, M., Adachi, J.: Gps trajectory data enrichment based on a latent statistical model. In: International Conference on Pattern Recognition Applications and Methods. vol. 2, pp. 255–262. SCITEPRESS (2016)
  • [9] Lopez, P.A., Behrisch, M., Bieker-Walz, L., Erdmann, J., Flötteröd, Y.P., Hilbrich, R., Lücken, L., Rummel, J., Wagner, P., Wießner, E.: Microscopic traffic simulation using sumo. In: The 21st IEEE International Conference on Intelligent Transportation Systems. IEEE (2018), https://elib.dlr.de/124092/
  • [10] Luca, M., Barlacchi, G., Lepri, B., Pappalardo, L.: A survey on deep learning for human mobility. ACM Computing Surveys (CSUR) 55(1), 1–44 (2021)
  • [11] Qi, S., Bouros, P., Sacharidis, D., Mamoulis, N.: Efficient point-based trajectory search. In: International Symposium on Spatial and Temporal Databases. pp. 179–196. Springer (2015)
  • [12] Sabarish, B., Karthi, R., Kumar, T.G.: Graph similarity-based hierarchical clustering of trajectory data. Procedia Computer Science 171, 32–41 (2020)
  • [13] Shailja, S., Bhagavatula, V., Cieslak, M., Vettel, J.M., Grafton, S.T., Manjunath, B.: Reebundle: a method for topological modeling of white matter pathways using diffusion mri. IEEE Transactions on Medical Imaging (2023)
  • [14] Shailja, S., Chen, J.W., Grafton, S.T., Manjunath, B.: Retrace: Topological evaluation of white matter tractography algorithms using reeb graphs. In: International Workshop on Computational Diffusion MRI. pp. 177–191. Springer (2023)
  • [15] Shailja, S., Zhang, A., Manjunath, B.: A computational geometry approach for modeling neuronal fiber pathways. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 175–185. Springer (2021)
  • [16] Shinagawa, Y., Kunii, T.L., Kergosien, Y.L.: Surface coding based on morse theory. IEEE computer graphics and applications 11(05), 66–78 (1991)
  • [17] Ta, N., Zhao, Y., Chai, Y.: Built environment, peak hours and route choice efficiency: An investigation of commuting efficiency using gps data. Journal of Transport Geography 57, 161–170 (2016)
  • [18] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
  • [19] Wu, L., Yang, L., Huang, Z., Wang, Y., Chai, Y., Peng, X., Liu, Y.: Inferring demographics from human trajectories and geographical context. Computers, Environment and Urban Systems 77, 101368 (2019)
  • [20] Zeng, J., He, X., Tang, H., Wen, J.: A next location predicting approach based on a recurrent neural network and self-attention. In: Collaborative Computing: Networking, Applications and Worksharing: 15th EAI International Conference, CollaborateCom 2019, London, UK, August 19-22, 2019, Proceedings 15. pp. 309–322. Springer (2019)
  • [21] Zhang, D., Lee, K., Lee, I.: Mining hierarchical semantic periodic patterns from gps-collected spatio-temporal trajectories. Expert Systems with Applications 122, 85–101 (2019)
  • [22] Zheng, Y., Liu, L., Wang, L., Xie, X.: Learning transportation mode from raw gps data for geographic applications on the web. In: Proceedings of the 17th international conference on World Wide Web. pp. 247–256 (2008)