Graph-based Time-Series Anomaly Detection:
A Survey and Outlook

Thi Kieu Khanh Ho

{}^{\star}

Ali Karami Narges Armanfard This work is financially supported by the Natural Sciences, Engineering Research Council of Canada (NSERC), Fonds de recherche du Quebec, and the Department of Electrical and Computer Engineering at McGill University. The authors also wish to acknowledge the partial support of Calcul Quebec and Compute Canada.Thi Kieu Khanh Ho, Ali Karami, and Narges Armanfard are with McGill University and Mila - Quebec AI Institute, Montreal, Quebec, Canada.

{}^{\star}

Corresponding Author (Email: thi.k.ho@mail.mcgill.ca).

Abstract

With the recent advances in technology, a wide range of systems continue to collect a large amount of data over time and thus generate time series. Time-Series Anomaly Detection (TSAD) is an important task in various time-series applications such as e-commerce, cybersecurity, vehicle maintenance, and healthcare monitoring. However, this task is very challenging as it requires considering both the intra-variable dependency and the inter-variable dependency, where a variable can be defined as an observation in time series data. Recent graph-based approaches have made impressive progress in tackling the challenges of this field. In this survey, we conduct a comprehensive and up-to-date review of Graph-based TSAD (G-TSAD). First, we explore the significant potential of graph representation learning for time-series data. Then, we review state-of-the-art graph anomaly detection techniques in the context of time series and discuss their strengths and drawbacks. Finally, we discuss the technical challenges and potential future directions for possible improvements in this research field.

{IEEEImpStatement}

Time-series Anomaly Detection (TSAD) plays a critical role in identifying abnormal patterns or events within time-series data, enabling timely intervention and mitigation of potential risks in various fields such as cybersecurity, healthcare, finance and industrial monitoring. However, the inherent complexity of time-series data poses challenges to machine learning algorithms. Recently, many graph-based TSAD methods have emerged to effectively address these challenges. This article aims to provide researchers and practitioners with a comprehensive understanding of the state-of-the-art graph-based techniques for anomaly detection in time-series data, as well as their strengths, limitations, and potential applications. Additionally, this article serves as a road-map for future research directions, identifying areas for improvement and innovation in the field of graph-based TSAD.

{IEEEkeywords}

Graphs, Anomaly Detection, Time-series Data, Time-series Signals, Dynamic Social Networks, Videos.

1 Introduction

\IEEEPARstart

A time series is defined as an ordered sequence of values that represent the evolution of one variable, aka univariate, or multiple variables, aka multivariate, over time [1]. Time-series data is also referred to as time-stamped data, i.e., each data point or each observation in a time series corresponds to a specific time index. As time is a constituent of everything that is observable, time-series data can be found everywhere on the Earth, such as in nature (e.g., the wind speed, the temperature), in marketing and industrial activities (e.g., the stock price), in medicine (e.g., the heart and brain activity), and in surveillance and security (e.g., video streams, network traffic logs). Therefore, time-series data and its analysis have attracted significant interest in the past decade.

Time-Series Anomaly Detection (TSAD), the process of detecting unusual patterns that do not conform to the expected behavior, has been widely studied [2]. The unusual patterns can be found in many real-world applications such as ecosystem disturbances in earth sciences, structural defects in jet turbine engineering, suspicious activities such as intruders, unauthorized access, and vandalism in surveillance cameras and video streams, heart failure in cardiology, or seizures in brain activity. Most existing TSAD algorithms are designed to analyze the time series data as a univariate modality [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] or as multiple individual modalities (variables) [18, 19, 20, 21, 22, 23, 21, 24]. In other words, these algorithms only consider the intra-variable information.

In many practical systems, a more proper and reliable system modeling can be obtained if the available data are considered as multiple related modalities (variables) that are influenced by each other. More specifically, a comprehensive anomaly detection algorithm shall carefully model the variables’ interaction over time, i.e., the inter-variable dependencies, as well as the intra-variable dependencies. Explicit consideration of the inter-variable dependencies allows capturing a system’s spatial dependencies across variables. Examples of systems, which exhibit spatial dependencies, are the brain where the recording sensors are placed across the brain, and each sensor is considered as a variable; in a single time-series signal where every time-interval (window) is considered as a variable; and in a video-based system where a variable is assigned to every frame.

In the current literature, depending on the application in hand, the researchers refer the inter-variable dependency as the spatial dependency [25] or the structural dependency [26]. On the other hand, existing studies typically refer to the intra-variable dependency as the temporal dependency [27, 28].

Survey	Year	GA	GAD	TSAD	G-TSAD
[29]	2020	$\star$	–	–	–
[30]	2021	$\star$	–	–	–
[31]	2021	$\star$	–	–	–
[32]	2022	$\star$	–	–	–
[33]	2022	$\star$	–	–	–
[34]	2022	$\star$	–	–	–
[35]	2022	$\star$	–	–	–
[36]	2022	$\star$	–	–	–
[37]	2022	$\star$	–	–	–
[38]	2023	$\star$	–	–	–
[39]	2024	$\star$	–	–	–
[40]	2021	$\star$	$\star$	–	–
[41]	2022	$\star$	$\star$	–	–
[42]	2023	$\star$	$\star$	–	–
[43]	2023	$\star$	$\star$	–	–
[44]	2023	$\star$	$\star$	–	–
[45]	2019	–	–	$\star$	–
[46]	2021	–	–	$\star$	–
[2]	2021	–	–	$\star$	–
[47]	2023	–	–	$\star$	–
Ours	2024	$\star$	$\star$	$\star$	$\star$

Table 1: Comparison of the existing surveys with ours.

\star

indicates that the topic is reviewed in the corresponding survey study, – indicates otherwise.

In recent years, Graph-based Approaches (GA) have become increasingly popular in the machine learning community due to the graph’s ability to represent complex data structures. Unlike traditional tabular data representations, GA has demonstrated its performance when modeling inter-variable dependencies, where every variable is considered as a graph node, and graph edges are used to model the variable interactions [32, 30]. This versatility has led to the widespread adoption of GA across various machine learning tasks, including but not limited to, social network analysis [48], recommendation systems [49], natural language processing [50], and anomaly detection [40, 32, 33]. Graph Anomaly Detection (GAD) is an active research field that focuses on detecting abnormalities at various levels within graph-structured data, including individual graph nodes, graph edges, sub-graphs of a graph, and the whole abnormal graphs. This approach has gained considerable attention due to its applicability for a wide range of domains where data can be inherently represented as graphs.

Recognizing the significance of GA, GAD, and TSAD, many survey papers have been recently introduced and have shown solid reviews of either of these topics. However, none of them have focused on the emergence and applications of graphs for detecting anomalies in time-series data. For instance, GA surveys [29, 30, 32, 33, 34, 35, 36, 37, 38, 39] cover a broad range of graph-related techniques for various tasks such as classification, clustering, and prediction, without focusing specifically on anomaly detection. On the other hand, GAD surveys [40, 41, 42, 43, 44] provides a comprehensive review of graph-based methods for detecting various types of abnormal graph objects. However, both GA and GAD surveys solely review the graph-based methods applicable to static data (e.g., static social networks or images) and do not review the methods that are able to capture the intra-variable dependencies existing in the time-series data. Meanwhile, TSAD surveys [45, 46, 2, 47] focus on methodologies for detecting anomalies within time-series signals, disregarding other data types of the time series such as dynamic social networks and videos. Additionally, existing TSAD surveys show the ability of non-graph deep learning methods to detect anomalies only based on the intra-variable dependencies, ignoring the inter-variable dependencies; this is while both intra- and inter-variable dependencies are crucial to be considered for TSAD. Although graphs have demonstrated their effectiveness and yielded state-of-the-art results in TSAD, as is summarized in Table 1, there has been no review paper on Graph-based TSAD (G-TSAD) methods.

The contributions of this survey are as follows:

•

The first survey in G-TSAD. To the best of our knowledge, this is the first survey that reviews the state-of-the-art techniques for TSAD using graphs. Until now, all the relevant surveys focus on either GA, TSAD, or GAD. There is no dedicated and comprehensive survey on G-TSAD. Our work bridges this gap, and we expect that such structured and comprehensive survey helps to push forward the research in this active area.
•

A new perspective. We introduce new concepts in the field of G-TSAD, such as the intra- and inter-variable dependencies, and constructed graphs for time-series data. Such concepts would pave the road for future research to provide well-structured and easy-to-understand G-TSAD algorithms.
•

A detailed overview of G-TSAD. We discuss the key challenges in TSAD, the core motivations of employing graph representation learning for time-series data, and the various types of anomalies in graphs.
•

A systematic and comprehensive review. We conduct a comprehensive and up-to-date review of the state-of-the-art on G-TSAD methods and categorize them into four groups to improve their clarity and accessibility. These categories include Autoencoder (AE)-based methods, Generative Adversarial Network (GAN)-based methods, predictive-based methods, and self-supervised methods. For each category, we describe technical details, highlight their advantages and disadvantages, and compare them.
•

Outlook on future directions. We point out the technical limitations of the current research studies and suggest promising directions for future works on G-TSAD.

Paper Organization. Section 2 outlines the challenges existing in time-series data. Sections 3 and 4 provide foundational insights into graphs, common types of anomalies within graphs, and how the graph-based methods can address the time-series challenges described on Section 2. Section 5 introduces our unified taxonomy for TSAD using graphs. Within each category, we discuss the strength of individual methods in addressing the challenges outlined in Section 2, as well as their drawbacks and how the subsequent methods mitigate these shortcomings. Section 6 summarizes widely used datasets and their real-world applications. Section 7 discusses commonly employed evaluation metrics in G-TSAD research. Section 8 discusses the current research challenges and outlines promising directions for future research. Finally, we conclude our paper in Section 9.

2 Time-series Challenges

In general, a $K$ -variable time series dataset can be denoted by $X=(\mathbf{x}^{(1)},\mathbf{x}^{(2)},\ldots,\mathbf{x}^{(K)})$ , where $\mathbf{x}^{(i)}=(x^{(i)}_{1},x^{(i)}_{2},\ldots,x^{(i)}_{N})$ , and $N$ denotes the number of observations of the $i$ th variable. Examples of observations are a time interval (window) or sample in signals, a frame or sub-frames in a video sequence, or a snapshot in social networks. Working with time-series data requires careful consideration of several factors, such as intra-variable dependency, inter-variable dependency, dimensionality, non-stationarity, and noise. These factors are discussed below in detail.

Intra-variable dependency. Observations available in one variable depend on each other, i.e., there exist dependencies of the $j$ th observation and a lagged version of itself in the $i$ th variable. Positive dependencies indicate that an increase (or decrease) in the $j$ th observation is likely caused by an increase (or decrease) in the previous observations and to be followed by an increase (or decrease) in the subsequent observations of the $i$ th variable. Negative dependencies, on the other hand, suggest an inverse relationship. While intra-variable dependency provides useful insights for time series analysis, however in practice, one of the main obstacles is dealing with the non-linear complex dependencies present within a variable. In fact, the mutual relationship between observations is not straightforward, and the effect of the future/past observations on the current observation may vary over time. This matter also makes it difficult to determine the appropriate lags for consideration, as different lags may have different levels of significance.

Inter-variable dependency. It is important to capture the inter-variable dependency in time-series data as it can provide insights into the underlying relationship patterns that exist between different variables; a useful property when detecting anomalies. For example, if two variables are highly correlated, a change in one variable can be used to predict potential changes in another variable. Another example is the case if individual variables seem to have fairly normal behavior, but when all (or a subset) of variables and their interactions are considered together, an anomaly can be caught. Therefore, employing graphical models that can capture complex relationships between variables would be a great tool when detecting anomalies in time-series data.

Dimensionality. The current advances in technology allow us to record a large amount of time-series data to capture the intra-variable and inter-variable dependencies. Existing of such rich datasets allows us to design time-series analyses that are consistent and reliable across various datasets. Looking at the raw data, every data sample in every variable can be considered as one dimension; hence we are dealing with the curse of dimensionality challenge. Therefore, there is a need to develop algorithms that can handle such complicated and high-dimensional data.

Non-stationarity. When analyzing time-series data, it is important to consider trends, seasonality, and unpredictability. Although the trend is commonly known as an average tendency of the series, the trend itself may vary across a series – e.g., the average weather temperature may be constant over the course of a year; however, we observe ups and downs in the trend when focusing on the shorter time intervals, for example, the trend in the summertime is increasing while it is decreasing in the wintertime. Seasonality presents the variations of repeating short-term cycles in the series – e.g., the sale of air conditioners only increases during the summertime, while there is no such an increasing pattern in the other seasons; this causes a change of variance over time. Unpredictability occurs randomly, e.g. uncharacteristic weather patterns such as a snowy day in the summer, that are neither systematic nor predictable. Overall, such properties of the time series influence the variables statistical properties such as mean, variance, etc. Thus, time-series data is said to be non-stationary, which may easily mislead an anomaly detection method as anomalies at certain observations may not be true anomalies [9]. Detection methods that can adapt to the changes in the data structures usually require a large number of data during the training phase.

Noise. When it comes to TSAD, it is important to make a semantic distinction between noise and anomalies. Noise, considered a bread-and-butter issue in real-world systems, is the random, unwanted variation that degrades the quality of time-series data. Noise can be thought of as minor fluctuation and/or sensor sensitivity affecting the overall time series. In contrast, an anomaly is an unusual entity with respect to a set of pre-established normal observations. Therefore, anomaly detection algorithms that are explicitly designed to distinguish noise from anomaly are usually more robust when dealing with contaminated data; however, various types of noises may happen depending on the operational environment, recording sensor types, and many other factors that a data scientist may not be aware at the problem onset. Hence, it is critical to understand the nature of noise in every application and apply appropriate noise modeling/reduction techniques to not mix them with the anomaly concept.

In the subsequent Sections, we will delve into graph-based techniques and their effectiveness in tackling individual challenges or combinations thereof present in time-series data, hence enhancing our understanding of their applicability and utility in real-world scenarios.

3 Why Graphs?

Refer to caption — Figure 1: An example of anomaly detection in a time-series signal data to show the difference between TSAD (Block 1) and G-TSAD (Block 2). The inputs are three successive time intervals (S: Sensor). In the constructed graphs, the solid and dash lines, respectively, indicate the inter-variable and intra-variable dependencies, $m=3$ , and the edge features are not shown for simplicity. Normal and abnormal cases are respectively shown in black and red colors. G-TSAD has the potential to detect anomalous sensors, local-relations, regions, and time intervals.

While some recent studies have shown the inefficiency of the non-graph deep-learning based methods for TSAD [9, 51, 52], due to their utilization of simple benchmark datasets and biased evaluation protocols, many studies have provided state-of-the-art performance in TSAD when employing graphs to represent variables. This is because many real-world time-series systems can be represented as graphs, and graphs can inherently capture intra- and inter-variable dependencies.

We define a graph set representing a time-series data as below:

\mathbb{G}=\Bigl{\{}\mathcal{G}_{j},\text{Sim}\{\mathcal{G}_{j},\mathcal{G}_{j% ^{\prime}}\}\Bigl{\}}^{j,j^{\prime}\in\{1,\ldots N\}}_{j\neq j^{\prime}},

(1)

where $\mathcal{G}_{j}=\{\mathbf{M}_{j},\mathbf{A}_{j}\}$ is a graph representing the $j$ th observation, $\mathbf{M}_{j}$ is the $K\times m$ node-feature matrix the $i$ th row includes features corresponding to the $i$ th variable), $\mathbf{A}_{j}$ is the $K\times K\times m^{\prime}$ edge-feature matrix representing the relationships across nodes (variables) in the $j$ th observation, $m$ and $m^{\prime}$ respectively denotes the number of features representing the nodes and their relations. $\text{Sim}\{\cdot,\cdot\}$ defines the relation between two graphs. The $\text{Sim}\{\cdot,\cdot\}$ goal is to capture the intra-variable dependencies between two observations. This can be defined as a function of node-features and edge-features, i.e., $\text{Sim}\{\mathcal{G}_{j},\mathcal{G}_{j^{\prime}}\}=\mathcal{F}\{\mathbf{M}% _{j},\mathbf{M}_{j^{\prime}},\mathbf{A}_{j},\mathbf{A}_{j^{\prime}}\}$ .

$\mathbb{G}$ is trained dynamically if either of $\mathcal{G}_{j}$ or $\text{Sim}\{\cdot,\cdot\}$ has learnable parameters; e.g. an encoder network with a set of parameters $\xi_{1}$ can be used to provide $\mathbf{M}_{j}$ and $\mathbf{A}_{j}$ from the raw data that forms the $j$ th observation [27, 53], or a network with a set of trainable parameters $\xi_{2}$ can used as $\mathcal{F}$ [25, 54]. Note that, in many real-world applications, dynamically learning graphs is crucial since dynamic graphs allow for the adaptation of relationships between variables as the underlying data distribution changes. For example, in financial markets, the correlations between assets (variables) may fluctuate over time due to shifting market conditions. By dynamically updating the graph edges and $\text{Sim}\{\cdot,\cdot\}$ to reflect these changing relationships, dynamic graphs can effectively capture the non-stationarity inherent in time-series data. If there are no learnable parameters assigned to $\mathcal{G}_{j}$ and/or $\text{Sim}\{\mathcal{G}_{j},\mathcal{G}_{j^{\prime}}\}$ , we are dealing with static graphs [55]. In other words, a static graph is referred to the one that is built using a set of pre-defined node features and edges obtained from the $j$ th observation of the $K$ variables; e.g., with prior knowledge about the sensors’ locations in a $K$ -variable sensory system, a static graph at the $j$ th observation is built where nodes are sensors and edges are defined by the Euclidean distance between sensors. Note that in some algorithms, the relations across static graphs are learned implicitly through employing a temporal-based network such as recurrent neural network [56].

Figure 1 visualizes graphs capability for TSAD (i.e. G-TSAD) versus the non-graph approach (i.e. TSAD). In this figure, we show a challenge we may observe when dealing with multiple sensors, i.e. when sensors are recording different types of data; e.g. one sensor records the car engine temperature, while the other sensor is recording the car speed – so the data range and even the sampling frequency vary per sensor in this example [57]. In Figure 1, a 5-variable (i.e., 5 sensors) time-series data is shown, i.e. $X=(\mathbf{x}^{(1)},\mathbf{x}^{(2)},\ldots,\mathbf{x}^{(5)})$ , where three observations are shown for every sensor, i.e. $N=3$ . A time interval refers to a specific observation where five sensors are recorded simultaneously. As is shown in Block 1 of Figure 1, conventional TSAD techniques directly take time intervals as the input and use only the feature-level knowledge to detect if a time interval is abnormal, ignoring the inter-variable dependencies, hence unable to provide accurate anomaly detection at the interval level and unable to detect anomalies at more fine levels such as abnormalities at the relation and region levels.

In many real-world applications, in addition to detecting abnormal intervals, detecting abnormal regions is an extremely important matter. For example, in drug-resistant epilepsy treatment, detecting the brain’s regions where seizure begins, called Seizure Onset Zone (SOZ), is critically important; this is because the current practice is basically resecting SOZ during an epilepsy surgery [58]. The current practice for SOZ detection is based on examining the brain signals recorded through sensors embedded within the brain (under the scalp). Considering the recorded brain signals as a sensory system suggests that an accurate region-level anomaly detection would be a great asset to neurologists when identifying SOZ. G-TSAD techniques are capable of detecting anomalies in a sensory system at sensor-, local-relation, region- and interval-level due to their ability to capture both inter-variable and intra-variable dependencies (see Block 2 of Figure 1).

As depicted in Figure 1, graphs hold potential to detect various types of anomalies in a sensory multivariate system. One may wonder whether graphs can be applied to an univariate time-series signal system, where there is only a single sensor used to record the signals, in order to detect various anomaly types such as point-level, contextual-level and trend-level anomalies defined in [46]. Note that point-level anomalies typically manifest as isolated data points that deviate significantly from the surrounding data. Contextual anomalies occur when a data point exhibits abnormal behavior relative to its context at a particular time. For example, a data point representing a temperature reading of 20 degree Celsius may not be anomalous on its own. However, if this temperature reading occurs during the winter in a region where temperatures typically remain below freezing, it would be deemed as a contextual anomaly due to its significant deviation from the expected behavior within the context of the season and location. Trend-level anomalies occur when there are significant deviations from the expected trends or patterns in the time series. To this end, constructing graphs that represent the relationships between data points, short time intervals, or long time intervals, with each denoted as a variable can be beneficial. By analyzing the level of connectivity within the graph, anomalies can be identified as data points or time intervals that have few or weak connections to neighboring points, contexts, or patterns.

In addition to the signal domain, graphs present a capability in other time-series domains. Figure 2 shows two additional examples of time-series data and their corresponding constructed graphs by which both the intra- and inter-variable information are captured. Note that the three successive observations at the $j-1$ , $j$ , and $j+1$ intervals, shown in Figure 2a and 2b respectively, correspond to three successive snapshots of the social network and three successive frames of a video. In social networks, the users are usually considered as graph nodes, and the users’ interactions, e.g. friendships, as edges. Most normal nodes shall have reasonable number of connections, which indicates the user has an ordinary social network activity. A node with an exceptionally high number of connections can be defined as an abnormal user. An abnormal node may represent a celebrity, an influential person, or a leader within the network (Figure 2a). In video applications, a video can be modeled as a stream of time-evolving object-level graphs, where an object (e.g., a human body joint, an object in the scene, etc.) is considered as a graph node and edges represent the nodes relation within a video frame. Figure 2b shows an example of anomalous activity detection where a node is assigned to every body joint – any unusual/unexpected joint movements shall be detected as anomalies.

As illustrated above, graphs show the ability to capture intra-variable and inter-variable dependencies, thus effectively detecting various types of anomalies within time-series data across many real-world applications. Besides, graphs have also demonstrated their capability to address other challenges such as dimensionality [39]. One approach is to construct the static graphs based on prior knowledge [55], incorporating information such as the distance, correlations and similarity measures between variables. Based on their relevance and importance in the graph structure, graph-based methods can select features that are most discriminative or relevant for the task at hand. By focusing on a reduced set of features, dimensionality can be effectively managed while maintaining the predictive power of the data. Another strategy is to project the graph elements onto the graph embedding space using techniques such as node2vec [59] or graph neural networks [29], which can map high-dimensional data into a lower-dimensional space while preserving the structure and relationships encoded in the graph. Lastly, graph-based regularization techniques [60, 61, 62] can be applied to impose constraints on the model parameters during the graph learning process. This ensures that the learned graphs exhibit desirable properties or structures of the graphs such as smoothness, where features should change smoothly between neighboring nodes, or sparsity, to prevent overly connected graphs. As such, these regularization terms can mitigate the curse of dimensionality and improve generalization performance.

It is noteworthy that graph-based regularization techniques can also mitigate the effect of noise in the data [63, 64, 65]. For example, smoothness regularization encourages gradual changes in signal values between neighboring variables, effectively filtering out high-frequency noise components [66]. Additionally, many robust graph learning methods have been developed to minimize the impact of noisy or corrupted data when learning the graph structure [67, 68, 69]. These algorithms leverage robust optimization techniques to effectively handle outliers and noise in the input data, resulting in more accurate graph representations that are less susceptible to noise.

In summary, we have shown the capability of graphs in addressing various challenges present in time-series data described in Section 2. More technical insights into methods specifically tailored to address individual challenges or combinations thereof are provided in Section 5.

4 Anomalies in Graphs

As illustrated in Section 3, there are many types of anomalies that can exhibit in time-series data. For instance, anomalies may manifest as anomalous sensors, local relationships between sensors, regions, and time intervals within a sensory multivariate signal system. In a univariate time-series system, anomalies can take the form of point-level, contextual-level, and trend-level anomalies. Dynamic social networks may feature abnormal users (nodes) and relationships between users (edges) as anomalies. In video streams, unusual object movements can be regarded as anomalies. To provide a general categorization applicable to most of the existing time-series data, we categorize anomalies that can be observed in graphs into five groups, namely anomalous nodes, edges, sub-graphs, graphs and $\text{Sim}\{\cdot,\cdot\}$ defined in Section 3, which are respectively known as variables, the local relations between variables, small sets of variables and their corresponding connections, the full set of variables and their corresponding connections at the $j$ th observation, and the global relations between observations. Note that detecting anomalous objects in graphs constructed from time-series data is much more challenging than other graph types due to the dynamic variations of nodes and edges [32].

Figure 3 shows a graph corresponding to three successive observations, i.e. $\mathbb{G}=\Bigl{\{}\mathcal{G}_{j},\text{Sim}\{\mathcal{G}_{j},\mathcal{G}_{j% ^{\prime}}\}\Bigl{\}}_{j\neq j^{\prime}}$ , where every $\mathcal{G}_{j}$ has nine nodes, the node’s 4-dimensional feature vector (i.e., $m=4$ ) is depicted on top of the node, the edge’s 1-dimensional feature vector (i.e., $m^{\prime}=1$ ) is shown beside the edge connecting two nodes. The five types of abnormalities are illustrated in the figure. In the first observation, all nodes and edges of the graph are normal. However, in the second observation, one anomalous node, two anomalous edges, and one anomalous sub-graph are appeared. More specifically, the anomalous node #7 and two anomalous edges (i.e., the edge between nodes #6 and #8, and the edge between nodes #8 and #9) show irregular evolution of their structures and features compared to the rest of the nodes/edges in the graph. In an anomalous sub-graph, each node and edge might look normal; however, if they are considered as a group, an anomaly may be detected. As sub-graphs vary in size, as well as node and edge level properties, detecting anomalous sub-graphs is more challenging than nodes and edges. Regarding graph-level anomalies, they are defined as abnormal graphs in a graph stream – more specifically, given a sequence of graphs, an anomalous graph at the third observation can be distinguished from other graphs based on their unusual evolving pattern of nodes, edges and corresponding features.

Lastly, anomalous $\text{Sim}\{\cdot,\cdot\}$ occurs when there are unusual relationships between graphs (aka observations). As $\text{Sim}\{\cdot,\cdot\}$ can capture both short-term and long-term relations across observations, anomalous $\text{Sim}\{\cdot,\cdot\}$ can be detected by analyzing the evolution of the graphs and their relationships over time. Assume that the three observations in Figure 3 correspond to the monthly stock market; the short-term pattern variations, modeled by $\text{Sim}\{\mathcal{G}_{j-1},\mathcal{G}_{j}\}$ and $\text{Sim}\{\mathcal{G}_{j},\mathcal{G}_{j+1}\}$ , may represent a small price change. These small fluctuations can be considered normal. However, a longer relationship presented in $\text{Sim}\{\mathcal{G}_{j-1},\mathcal{G}_{j+1}\}$ may raise an abnormality.

In general, given a graph $\mathbb{G}$ , the goal of an ideal anomaly detector system is to produce a learnable anomaly scoring function $f(\cdot)$ that assigns an anomaly score to nodes, edges, sub-graphs, and graphs at every observation, and $\text{Sim}\{\cdot,\cdot\}$ at the coarse level. The larger $f(\cdot)$ is, the higher the probability of the graph object being abnormal.

Note that although five types of graph anomalies are observed in the example shown in Figure 3, currently, there is no study that targets detecting all anomalies. None of the existing methods target to detect anomalous $\text{Sim}\{\cdot,\cdot\}$ and very few works could simultaneously detect multiple anomalous objects in graphs. This will be further discussed in Section 8.

5 Categorization of G-TSAD Methods

We categorize the existing G-TSAD methods into four categories based on the employed loss function that is minimized during training. The four categories are: AE-based, GAN-based, predictive-based, and self-supervised methods. All the existing G-TSAD methods employ a notion of encoder and decoder when developing their algorithm. The concept map of each category is shown in Figure 4.

Figure 4a illustrates AE-based methods that learn an embedding for the input graph by an encoder and then reconstruct the input data from the embedding by a decoder. In the test phase, these methods detect anomalies based on the reconstruction error.

The second category is GAN-based methods, shown in Figure 4b, consisting of a generator and a discriminator. The generator inputs both noises and real samples to reduce the randomness issue. It tries to generate fake samples that are as realistic as the real samples, so the generator loss is the reconstruction loss to minimize the difference between the fake samples and the real samples. On the other hand, the discriminator tries to correctly classify the real samples as real and the generated samples as fake. So the discriminator loss is based on the difference between the discriminator’s output for the real and fake samples. The generator and discriminator are updated alternately to minimize their respective loss functions during the training. After adversarial training, the losses of both the generator and the discriminator are used as an indicator for anomaly scores.

The third category, predictive-based methods, is depicted in Figure 4c. Unlike the other categories that only consider the current and past data for loss calculation in the training phase, the predictive methods aim to maximize the network’s predictive ability, where prediction is made based on the current and past input data. In the testing phase, the prediction error is used to detect an anomaly.

The last category is self-supervised methods, shown in Figure 4d. Self-supervised methods are a subgroup of unsupervised learning methods where there is no label for the training data. Compared to the above three categories, self-supervised methods learn more meaningful representations using unlabeled data. The idea behind self-supervised methods is to design pretext tasks, which are some auxiliary tasks, using the given unlabeled data. The self-supervised loss function is to minimize the difference between the predictions made by the model and the expected outputs for the pretext tasks. In the test phase, the self-supervised loss is utilized to detect anomalies. Examples for a pretext task are temporal order prediction (i.e., to predict the correct order of shuffled time intervals), categorization of time intervals without explicit labels, or prediction of masked values in time series data.

Note that most of the existing studies from the four method categories train on only normal data, while the test set additionally includes anomalous data to verify the detection performance of the methods. We refer to the anomaly detection methods that only have access to the normal data during their training phase as unsupervised anomaly detection. Moreover, all four categories learn graph-based representations from two perspectives: the features of nodes and edges, and the connectivity patterns between nodes represented by adjacency matrices. Further details will be described in the following sub-sections.

5.1 AE-based Methods

In the literature, the AE-based methods take the full graph at every observation as the input and aim to reconstruct its components: nodes/edges features and the adjacency matrices. The origin of these methods can be traced back to vanilla AE, an encoder-decoder framework, where the encoder network $E_{\theta}$ (parameterized by $\theta$ ) learns to compress graph data into low-dimensional representations (i.e., embeddings), and the decoder network $D_{\phi}$ (parameterized by $\phi$ ) aims to reconstruct the input. This framework can be formulated as follows:

\theta^{*},\phi^{*}=\operatorname*{argmin}_{\theta,\phi}\mathcal{L_{\text{rec}% }}\Big{(}D_{\phi}(E_{\theta}(\mathbb{G})),\mathbb{G}\Big{)},

(2)

where the reconstruction loss $\mathcal{L}_{\text{rec}}$ is often the mean squared error (MSE) or the cross entropy (CE) loss [70].

DeepSphere [71] is the first AE work for G-TSAD. It incorporates a Long Short Term Memory (LSTM)-AE with hypersphere learning in a supportive manner to capture the intra-variable and inter-variable dependencies given the graph’s adjacency matrices at different observations, then reconstruct the normal patterns. The idea of employing an AE in the framework is due to the fact that AE has a strong capability in learning non-linear patterns. Such high-quality non-linear representation supports the hypersphere learning – the hypersphere boundary is trained to enclose the normal data in the representation space. The graph objects lying outside the hypersphere tend to be anomalous.

To validate the capability of DeepSphere in learning non-linear representations of the data, we compare it with a non-AE method called MTHL [72]. While DeepSphere extracts non-linear features by AE, MTHL can only extract shallow features by simply leveraging Support Vector Data Description (SVDD) [73]. After having extracted shallow features, MTHL learns the hypersphere to distinguish abnormal samples from normal samples. As a result, MTHL can only achieve a suboptimal performance in detecting abnormal graph objects compared with DeepSphere since MTHL would be inefficient in anomaly detection when dealing with a complex non-linear feature space.

Despite the great ability of DeepSphere to extract non-linear representations of data, the main challenge of this type of vanilla encoder-decoder framework is the non-regularized latent space i.e., no specific constraints are imposed on the latent space during training. Hence, the reconstruction process is done by sampling the data points from the non-regularized latent space, a strategy that could limit the model’s ability to reconstruct valid inputs in the test phase as in the training phase the AE may simply memorize to perfectly reconstruct the input data.

Variational autoencoders (VAE) [74], another type of AE-based algorithm, come into the picture to address the issue of the non-regularized latent space seen in vanilla AE. VAE maps the inputs to a distribution in the latent space rather than vectors as was done by the vanilla AE. This means instead of learning a single encoding for each input, VAE learns a distribution for the input data. VAE imposes a constraint by which the latent space has a normal distribution; this constraint guarantees that the latent space is regularized. With such a distribution, VAE allows to representation of the input with some degree of uncertainty, i.e., a more generalized data representation is learned.

Recognizing the potential of VAE, many studies have recently developed VAE-based models to improve G-TSAD performance. GReLeN [75] integrates VAE and a Graph Neural Network (GNN) with a graph dependence structure learning for anomaly detection in multivariate time-series signals. Specifically, the latent space in VAE serves as the module for feature extraction, while GNN with the graph learning strategy captures the between-sensor dependency. Each sensor is then assigned an anomaly score based on the reconstruction accuracy. GreLeN addresses the importance of learning the inter-variable dependency between sensors by its graph-based module, as well as the intra-variable dependency learned implicitly by its VAE.

Deep Variational Graph Convolutional Recurrent Network (DVGCRN) [26], another VAE-based method, was proposed to include both intra-variable and inter-variable dependencies as well as the stochasticity present in the multivariate time-series signals. DVGCRN includes three components: a stacked graph convolutional recurrent network to model the multilevel intra-variable dependency, a Gaussian-distributed channel embedding module to characterize the similarity and stochasticity of the different sensors, and a deep embedding-guided probabilistic generative network to model the hierarchical non-deterministic inter-variable relationships in the latent space; hence it can capture multilevel information at the different layers and is robust to the noisy time series. Note that DVGCRN captures multi-level information since it extends the model into a multi-layer network. Overall, DVGCRN presents robust representations of data by fully considering intra-variable, inter-variable and stochasticity characteristics, which allow the model to effectively detect anomalies while avoiding the misjudgment of fluctuations in time-series signals.

However, it is worth noting that although VAE can map the input to a distribution in the latent space to learn generalized representations of data, VAE may struggle to model high-dimensional distributions with complex structures. Thus, VAE may not reconstruct the samples that are as realistic and high-quality as the original data. Moreover, VAE models are sensitive to the hyperparameter choice and require careful tuning to achieve good performance. To tackle these challenges, another branch of AE-based methods, called Normalizing Flow (NF) [76] has been proposed and yielded a state-of-the-art performance as NF can model complex distributions in the latent space; hence, high-quality samples can be reconstructed. NF is a statistical method using the change-of-variable law of probabilities to transform a base distribution into a target distribution. NF demonstrates that its training process is much more stable and easier to converge than the VAE-based methods. NF has been successfully applied to the G-TSAD concept.

OmniAnomaly [77] is one of the first NF studies for G-TSAD. It uses planar NF, which takes a series of invertible mappings to learn non-Gaussian posterior distributions in the latent stochastic space. OmniAnomaly also incorporates a Gate Recurrent Unit (GRU) network and a VAE network to capture the intra-variable dependency among observations in the latent space. The reconstruction score ranks the abnormality level. Although OmniAnomaly, with its NF property, could capture the intra-variable dependency in time-series data, it could not capture the inter-variable dependency.

A recent novel NF-based method, called GANF [27], addresses the issue of OmniAnomaly by learning Directed Acyclic Graphs (DAG) inside a continuous flow model. Specifically, in addition to learning the intra-variable dependency in multivariate time-series data by an LSTM model, GANF includes a graph-based dependency encoder to capture inter-variable information in DAG. Note that DAG is a Bayesian network in which a node is conditionally independent of its non-parents. NF is included to learn DAG by modeling the conditional density of each node in DAG, i.e., NF expresses the density of each node through successive conditioning on historical data and uses conditional flows to learn each conditional density. In the end, samples that lie in low-density regions of the data distribution are denoted as anomalies.

However, there are several major issues with the NF-based methods in G-TSAD. NF can be computationally expensive, especially for modeling complex distributions for large datasets. This is due to the fact that each transformation in the flow requires computing invertibility and the determinant of the Jacobian matrix. Moreover, the resulting NF models may not be highly interpretable, especially for high-dimensional data, since it is difficult to understand the relationships between the input variables and the generated samples in such complex data.

The aforementioned unsupervised methods assume that the training data must be clean. However, in many real-world applications, collecting clean data poses significant challenges due to its time-consuming, costly, and labor-intensive nature. Moreover, anomalies often sneak into normal data, which is regarded as contaminated data, that comes from the data shift or human error [78]. Consequently, existing unsupervised methods, which extensively train on normal data to model its behavior, would misdetect anomaly samples encountered during training in the test phase. To address such challenges posed by contaminated data, [79] introduced an approach, called TSAD-C. TSAD-C comprises three key components: a Decontaminator, aimed at identifying and eliminating abnormal patterns likely to be anomalies; a Long-range Variable Dependency Modeling module, designed to capture long-range intra- and inter-variable dependencies; and an Anomaly Scoring module, leveraging insights from the first two modules to detect anomalies using reconstruction-based anomaly scores. Results demonstrate that TSAD-C effectively manages contaminated data and surpasses existing studies in the unsupervised TSAD literature.

It is worth noting that TSAD-C utilizes a diffusion model - a type of generative models [80, 81], in its Decontaminator. Nevertheless, similar to the drawbacks of NF, training a diffusion model can be computationally expensive, especially for large datasets and complex architectures. It also demands large memory resources, particularly during training when storing intermediate data representations. Moreover, diffusion models typically rely on various hyperparameters, including learning rates, batch sizes, and model architecture, which can significantly influence performance. Effectively tuning these hyperparameters can be challenging and time-consuming, requiring extensive experimentation.

In summary, many studies in the AE-based methods, ranging from vanilla AE, and VAE to NF, have shown promising results in G-TSAD. However, designing a model that can capture both the intra-variable and inter-variable dependencies, and/or handling contaminated data requires careful consideration of multiple factors. Large amounts of data and computational resources for training are required to produce satisfactory results, which can be expensive and time-consuming for many time-series applications. Also, explainability and interpretability are necessary to understand the model’s behavior with the detection outcomes. This ensures that the model makes unbiased decisions on detecting anomalies. Moreover, all the AE-based methods limit their applications to only time-series signals while there are many other time-series data types available in real-world applications (e.g., videos, social networks, edge streams, etc).

5.2 GAN-based Methods

GAN has yielded state-of-the-art performance in many domain applications as GAN, despite the VAE and NF-based methods, can reconstruct high-quality samples without the need for explicit estimation of the distribution [82].

Overall, parameters of the two components of GAN, i.e. the generator and the discriminator, can be found as below:

	$\displaystyle\theta^{},\phi^{}$	$\displaystyle=\operatorname*{argmin}_{\theta,\phi}\mathcal{L}_{\text{gen}}\Big% {(}D_{\phi}(E_{\theta}(\mathbb{G},z)),\mathbb{G}\Big{)},$		(3)
	$\displaystyle\psi^{*}$	$\displaystyle=\operatorname*{argmin}_{\psi}\mathcal{L}_{\text{disc}}\Big{(}% \mathbb{D}_{\psi}\Big{(}\mathbb{G^{\prime}},\mathbb{G}\Big{)}\Big{)},$		(4)

where $z$ is the random noise vector, $\mathbb{D}$ is the discriminator network (parameterized by $\psi$ ), $\mathbb{G^{\prime}}$ is the fake set of graph samples generated by the generator. The generator loss $\mathcal{L}_{\text{gen}}$ is often MSE, and the discriminator loss $\mathcal{L}_{\text{disc}}$ is often CE.

Knowing GAN’s potential, recent studies have successfully applied GAN to G-TSAD. For example, [83] proposed Cross-Correlation Graph-Based Encoder–Decoder GAN (CCG-EDGAN) for unsupervised anomaly detection in multivariate time-series signals. CCG-EDGAN consists of three modules: a pre-processing module to transform multivariate time-series signals into correlation graphs; a generator to generate the most similar fake data via the encoder-decoder-encoder structure, which captures the features in the sequential correlation graphs; and a discriminator to classify the real from reconstructed data. Note that the input includes both noise and real data to help the generator improve the reconstructed samples’ diversity and avoid the mode collapse problem referring to the situation that the generator can only produce the fake samples that look similar to each other but are not diverse enough to represent the full range of possible samples. Moreover, a residual connection module is added to the model to improve convergence speed. The defined anomaly score is based on the reconstruction error.

CCG-EDGAN encounters two main limitations. First, it could only handle the inter-variable dependency through the correlation graphs, while the intra-variable dependency could not be captured by the GAN module. Second, its proposed anomaly score computation is only based on the reconstruction error obtained by the generator, ignoring the discriminator’s potential in detecting anomalies. To overcome the first limitation, HAD-MDGAT was proposed [84]. HAD-MDGAT employs a graph attention network to capture the inter-variable correlations between sensors. A GRU network is used to capture the intra-variable patterns. Anomalies are then detected by a reconstruction scheme via GAN. CCG-EDGAN also compares the performance between GAN and vanilla AE and confirms that GAN achieves a better reconstruction and could solve the overfitting problem of vanilla AE. Despite the ability to tackle the first limitation of CCG-EDGAN, HAD-MDGAT could not tackle the second issue, i.e., HAD-MDGAT only uses the reconstruction error obtained by the generator as an anomaly indicator.

Several recent methods address both limitations, i.e. capturing both intra-variable and inter-variable information, as well as using both the generator and discriminator for computing the anomaly score. For example, STGAN [25] was proposed to detect anomalies in traffic network. STGAN includes two components: a generator to model the intra-variable and inter-variable pattern, trend, and external features; and a discriminator to distinguish the real samples from the fake samples. STGAN calls its components, respectively, spatiotemporal generator and spatiotemporal discriminator since both are based on graph convolutional gated recurrent unit that explores the intra-variable and inter-variable correlations among neighboring nodes. Anomaly score is computed based on the performance of both the generator and discriminator. While the generator detects sudden changes in traffic networks, the discriminator detects traffic anomalies such as abnormal times and locations.

While HAD-MDGAT can overcome the first challenge and STGAN can tackle both challenges seen in CCG-EDGAN, it is worth mentioning that the major issue of both HAD-MDGAT and STGAN methods is that, unlike CCG-EDGAN that uses both noises and the real data as the input to improve the generated samples diversity and unlike any GAN-based methods applied on images that use only noises as the input, HAD-MDGAT and STGAN set only the real data as the input and no noise is used to generate fake samples. This potentially leads to the mode collapse problem in the GAN framework.

In summary, although GAN-based methods have made important contributions to the field of G-TSAD, they still have some challenges that require further research and development. Many GAN-based methods suffer from the mode collapse problem when the generator can not produce a wide range of samples since taking only the real data as the initial input can make the generator create the same or very similar samples. Moreover, as the generator and the discriminator constantly compete against each other, the training process could be unstable and slow. Also, GAN-based methods are sensitive to the choice of hyperparameters; hence, careful choosing of hyperparameters to avoid unstable training, mode collapse, and poor-quality generated samples is needed. Lastly, it is important to design a GAN that can properly combine the detection ability of both the generator and discriminator when detecting anomalies.

5.3 Predictive-based Methods

As mentioned earlier, while all of the existing methods in the four categories are based on an encoder-decoder framework, the main difference between them is the direction of modeling time-series data. The three categories, including AE-based, GAN-based, and self-supervised methods, are trained to minimize the loss related to the observed data, and in the test phase, if an input observation can not provide a small loss value, it is considered as an anomaly. Predictive-based methods are different from these three categories since they endeavor to predict future values based on current and past values. A sample is detected as an anomaly if the model, trained on historical data, can not provide accurate time series forecasting.

More specifically, at each observation $j$ , a predictive-based approach calculates an expected graph $\bar{\mathcal{G}}_{j+1}$ . The anomaly score is then computed as the difference between the expected graph $\bar{\mathcal{G}}_{j+1}$ and the actual graph $\mathcal{G}_{j+1}$ . The time-series prediction problem is difficult, as learning the long-distance intra-variable complexity of observations is necessary. As a result, in addition to the features of nodes/edges, and the connectivity patterns between nodes represented by the adjacency matrices, other important properties, such as the node degree and the shortest path length between nodes, shall be considered – this enables better capturing of the long-distance dependencies. The formulation of the predictive-based scheme can be defined as follows:

\theta^{*},\phi^{*}=\operatorname*{argmin}_{\theta,\phi}\mathcal{L}_{\text{% pred}}\Big{(}D_{\phi}(E_{\theta}(\mathbb{G})),\mathcal{G}_{j+1}\Big{)},

(5)

where the prediction loss $\mathcal{L}_{\text{pred}}$ is often MSE, Root MSE (RMSE), or CE loss. The first term of the loss function represents $\bar{\mathcal{G}}_{j+1}$ .

Many recent studies have shown the potential of GNN with a graph attention network in detecting time-series anomalies in the future. For example, [53] proposed Graph Attention-based Forecasting (GDN) to predict the behavior of each sensor at each time step by learning the past. This helps detect sensors that deviate greatly from their expected behavior. GDN addresses very important issues in G-TSAD research, i.e different sensors may have different behaviors and properties (e.g., some sensors measure the water pressure while others measure the flow rate). However, existing GNN-based methods use the same set of model parameters for all sensors. Moreover, many of them pre-define the graph nodes/edges/adjacency matrices. Though using the pre-defined matrices results in a more stable training phase, defining the matrices themselves requires prior knowledge about the dataset, which may not be reachable in many applications. GDN proposes to learn the graph nodes, edges, and adjacency matrices during the learning process. Overall, GDN combines a GNN, a graph structure learning, and a graph attention module. The attention module is essentially used to fuse a node with its neighbors in the graph (i.e., to perform attention over neighbors, which allows analyzing different types of sensors). The anomaly score at each time step is then computed by the reconstruction error. Importantly, the obtained attention weights are used to provide explainability for the detected anomalies. In spite of the GDN’s great capability to address real-world issues in G-TSAD, its main limitation is that it detects anomalies mainly based on inter-variable dependencies; the intra-variable relations across different time steps are ignored.

Inspired by GDN, another similar work called FuSAGNet [85] also proposes to use a GNN and a graph structure learning with a graph attention network to predict the future time-series behavior. In addition to the prediction task as of GDN, FuSAGNet includes the reconstruction task through using a Sparse-AE that induces sparsity in its latent space. In the end, a sparsity-constrained joint optimization is introduced to optimize both Sparse-AE and GDN components. Thus, FuSAGNet can potentially achieve more accurate anomaly scores than those obtained from standalone GDN, as FuSAGNet computes anomaly scores from both the reconstruction and forecasting modules. However, similar to the challenge seen in the standalone GDN, FuSAGNet only concentrates on capturing the inter-variable dependency, and the intra-variable dependency remains unaddressed.

To overcome the limitations of GDN-based methods, MTAD-GAT [86], another approach that leverages the graph attention network, captures the inter-variable dependency between sensors as well as the intra-variable dependency among observations. The key ingredient in MTAD-GAT is that there are two graph attention layers with respect to the inter-variable and intra-variable dimensions: the feature-oriented graph attention layer to capture the causal relations between sensors and the time-oriented graph attention layer to underline the dependency along the temporal dimension. MTAD-GAT jointly trains a forecasting-based module, for single-observation prediction with RMSE, and a reconstruction-based VAE module, for capturing the data distribution of the entire time series. Although MTAD-GAT could address the issue in GDN-related studies, its graph attention module could not provide interpretability on the characteristics of different sensor types that were done by other GDN works.

As can be seen from the aforementioned studies, a graph structure learning strategy (i.e., learning the features of nodes/edges and adjacency matrices) is very important in many time-series applications. This is because, with limited prior knowledge, it is difficult to pre-define the graph structure that can carry comprehensive information about the graph. Meanwhile, uncovering the important patterns in graphs (e.g., features, adjacency matrices) is possible through learning with GNN. However, the existing methods learn the adjacency matrices in the sensor embedding space by defining measurement metrics such as the cosine similarity or top-K nearest neighbors (kNN) [87]. The cosine similarity, which computes the dot products of sensor embeddings, leads to quadratic time and space complexity relevant to the number of sensors, while top-kNN can not entirely underline the strong and wide-spread connections among sensor embeddings.

To overcome this challenge, GTA [28] presents the connection learning policy based on the Gumbel-softmax sampling approach. This is to learn bi-directed edges between sensors, thereby avoiding the issue of selecting node neighbors by the top-kNN method. GTA also handles the intra-variable dependency by a transformer-based architecture, additionally employing the graph convolution to fully explore the intra-variable context modeling process from the graph sequence. Moreover, to tackle the quadratic complexity, GTA proposes a multi-branch attention mechanism to substitute the original-head self-attention method. A forecasting-based strategy is then adopted to predict the graph at the next time step and return an anomaly score for each timestamp by MSE.

In contrast to detecting abnormal nodes of the graph in the above studies, which target individual sensors, several studies show an interest in detecting abnormal edges (e.g., the unexpected interactions between sensors or the anomalous edges in an edge stream from a dynamic graph). For example, SEDANSPOT [88] considers an anomaly detection problem in an edge stream, where the edge anomalies tend to connect parts of the graph that are sparsely connected or occur as bursts of activity. An example of a burst is an occasion (e.g., a burst of longer-than-usual phone calls during festivals). SEDANSPOT exploits these abnormal behaviors by proposing a rate-adjusted sampling technique that downsamples edges from bursty periods of time and additionally employs a holistic random walk-based edge anomaly scoring function to compare incoming edges and the whole graph.

Midas [89] also tackles the same problem in detecting anomalous edges in a stream of edges. However, unlike SEDANSPOT, which identifies individual abnormal edges, Midas aims to detect microcluster anomalies, which are the sudden appearing bursts of activity sharing many repeated edges. Midas provides theoretical explanations on the false positive rate and is also an online method that processes each edge in constant time and constant memory, while SEDANSPOT does not address these issues. Moreover, SEDANSPOT and Midas propose different hypotheses on anomaly scores. SEDANSPOT designs an anomaly scoring function that gives a higher score to an edge if adding it to a sample of edges produces a large change in distance between its incident nodes. Midas computes the Gaussian likelihood of the number of edge occurrences in the current timestamp and declares an anomaly if the likelihood is below an adjustable threshold, which is to guarantee low false positive probability.

Not limited to node and edge anomaly detection in the context of time-series data, several predictive-based studies show the interesting task of detecting anomalous sub-graphs. As is mentioned in Section 4, detecting abnormal sub-graphs is much more difficult than abnormal nodes and edges since abnormal sub-graphs vary in size and inner structures. It is even more difficult when individual nodes/edges are normal but abnormal when considered as a group of nodes/edges.

Series2Graph [90] tackles these problems by proposing a novel idea for detecting anomalous sub-graphs with varying sizes in the graphs constructed from time series data. More specifically, Series2Graph first extracts sub-sequences (aka time intervals) from the time series using a sliding window technique. It then projects the sub-sequences onto a two-dimensional embedding space. The graph is constructed by considering the densest regions in the two-dimensional embedding space, where a notable concentration of projected sub-sequences is observed. These dense regions are selected as nodes of the graph. These nodes serve as summaries of the repetitive patterns observed in the input data, which indicate normal behavior. To create the edges in the graph, Series2Graph considers that the weights of the edges are determined based on the frequency of occurrences of one sub-sequence immediately following another in the input time series. If there is a transition between two nodes, they are connected by an edge; otherwise, no edge is created. After constructing the graph, Series2Graph aims to compute the abnormality level of the sub-graphs with varying sizes in the graph using the node degree (aka the number of connected edges to the node) and the edge weights (aka the number of consecutive occurrences of two sub-sequences).

Another interesting sequence prediction study, namely Eland [91], proposes a sequence (aka a time series) augmentation technique for early-stage graph anomaly detection when data is insufficient or incomplete. Unlike Series2Graph, which first converts sequences into graphs and targets to identify abnormal sub-graphs, Eland, on the other hand, converts the original graphs of social networks to sequences and aims to detect abnormal sequences. Eland then develops a sequence-based action predictor (i.e., Seq2Seq) to forecast the users’ behaviors based on their activity history. Eland uses a GNN to train augmented graph-based sequences with the CE loss to minimize the difference between predicted and actual sequences. While Eland demonstrates that its sequence augmentation technique yields improvements upon a variety of anomaly detection methods in graphs, it only aims to analyze social network data. Dealing with other types of time-series data, such as time-series signals, which are noisy, complex, and non-stationary in nature, would bring a big challenge for Eland.

The aforementioned predictive-based methods have shown the capability to deal with many time-series applications, ranging from time-series signals and edge streams to social networks. However, none of these approaches are applicable to videos in which a video is a sequence of image frames indexed in time. GCLNC [92], WAGCN [54], and WSANV [93] are some of the earliest G-TSAD studies working on videos. While most of the existing studies use only normal data during their training phase, these three studies have taken advantage of weakly-supervised labels (e.g., include noisy labels as wrong annotations of normal graphs) during their training phase to improve the detection of abnormal segments in videos. As a video sequence has temporal evolution between frames, these methods construct graphs based on not only the similarity of the inter-variable features but also the intra-variable proximity, i.e., the short-distance intra-variable dependency to capture relationships among frames. Note that instead of pre-defining the adjacency matrices, they adjust them as the model is trained. A computed anomaly score is then assigned for each frame.

Despite the capability of these methods on videos, they only consider frame-level graphs, i.e., each frame is modeled as a graph node. Hence the relationships between frames are captured. However, in practice, it is very important to model a video as a stream of time-evolving object-level graphs, where a node is an object and edges represent inter-variable relationships among objects. Object-level graphs can help to determine the abnormal changes in actions/movements of an object (e.g., a human) across frames in a video sequence.

In summary, predictive-based methods in G-TSAD have shown great potential in a variety of time-series applications, from time-series signals, edge streams, and social networks to videos. It is also very interesting to see that many predictive-based methods integrate the reconstruction and prediction tasks to yield more accurate anomaly scores. However, several challenges still remain unsolved. While existing predictive-based methods have shown their effectiveness in building graphs to capture inter-variable information, one of the biggest challenges of these methods is modeling the long-term intra-variable dependency, which is importantly required to effectively perform time-series forecasting tasks. For example, existing methods only consider the intra-variable dependency in a short-term period where observations close to each other in the data series are expected to be similar. However, as is mentioned in Section 2, trend, seasonality, and unpredictability patterns are, in nature, always available in time-series data. Hence, forecasting future values simply based on near-distance past values is problematic. Plus, a large amount of data with a diversity of patterns is required in the training phase of forecasting tasks, but in practice, having access to a comprehensive dataset is very difficult in many time-series applications.

5.4 Self-supervised Methods

Self-supervised learning (SSL), a subset of unsupervised methods, has provided novel insights into training without annotated labels. Similar to other fields, SSL has demonstrated its effectiveness in graph-based learning methodologies [32]. The intuition of SSL is to learn transferable knowledge from massive unlabeled data with well-designed pretext tasks and then generalize the learned knowledge to the downstream tasks [94]. In the SSL-based G-TSAD research, a sub-group of SSL methods, called contrastive learning (CL), has been used so far. CL aims to learn useful representations of data by contrasting positive and negative pairs, i.e., CL is trained to create an embedding space that pulls positive samples (aka similar samples) close together and pushes negative samples (aka dissimilar samples) far apart. To achieve this, a contrastive loss function is defined to encourage the model to maximize the similarity between positive samples while minimizing the similarity between negative samples. This approach can be formally described as follows:

\theta^{*},\phi^{*}=\operatorname*{argmin}_{\theta,\phi}\mathcal{L}_{\text{con% }}\Big{(}D_{\phi}\Big{(}E_{\theta}(\mathbb{G}^{(1)}),E_{\theta}(\mathbb{G}^{(2% )})\Big{)}\Big{)},

(6)

where $\mathcal{L}_{\text{con}}$ is the contrastive loss, $\mathbb{G}^{(1)}$ and $\mathbb{G}^{(2)}$ are two different sets of graph samples, $D_{\phi}$ is the discriminator that estimates the similarity between these two sets of graph samples. In unsupervised anomaly detection, data augmentation would be essential to create negative samples/pairs while every two normal samples can potentially form a positive pair. Applying augmentation on graphs is a challenging task compared to other data types; this will be discussed in Section 8.

AddGraph [95] is an unsupervised anomalous edge detection technique that has access to only normal edges in the training phase. It employs CL to train its network where the positive samples are all the available normal edges while negative samples are generated by applying the Bernoulli distribution to the normal edges. AddGraph employs a temporal graph convolutional network with an attention module-based GRU to learn long and short-term temporal patterns of the nodes in graphs. The hidden state of the nodes at each time step is used to compute the anomalous probability for all edges. As the generated negative edges may have a possibility of being normal, it is inappropriate to use a strict loss function such as CE loss; hence, AddGraph proposes a margin-based pairwise loss, to be minimized in the training phase, that distinguishes the positive edges from the generated negative ones.

Another CL study aiming to detect abnormal edges called TADDY [96] also assumes that all normal edges are positive in the training phase, similar to AddGraph. However, to generate negative edges, it proposes a different negative sampling strategy. More specifically, given positive pairs, TADDY randomly generates candidate negative pairs with a number equal to the number of positive pairs. It then checks these candidates to ensure that they are different from the existing normal edge set. After having both positive and negative samples, TADDY develops a framework, which consists of four components: an edge-based substructure sampling to sample the target nodes and the surrounding nodes; a node encoding component to generate the node embeddings in both intra-variable and inter-variable spaces; a graph transformer to extract the intra-variable and inter-variable knowledge of edges; and a discriminative anomaly detector based on the CE loss to calculate anomaly scores for all edges.

Although TADDY validates its capability on six real-world benchmark datasets of graphs while AddGraph uses only two datasets, TADDY still remains several issues. First, TADDY implements a simple and less knowledge-required random selection technique for generating negative samples, but it does not ensure that they do not belong to the normal set. Hence, its negative sampling approach is less reliable than AddGraph. Second, given the random selection technique of TADDY, the bias can easily occur when the negative sample set is not sufficiently large to represent the full distribution of negative pairs (i.e., the distribution of negatives can be skewed); hence, additional sampling techniques are required. Lastly, it employs a binary CE loss to distinguish normal edges from negative edges; however, as is denoted in AddGraph, since it is not guaranteed that all generated negative samples are actual non-normal, using a strict loss (e.g., CE) is inappropriate for distinguishing them.

It is also worth mentioning that the major challenge in both AddGraph and TADDY is that they consider the generated negative samples as actual abnormal samples. This is while there is no access to the actual abnormal data during the training phase, and the generated negative samples are created based on the given normal data. This limits the methods’ performance as the generated negative samples might be just a variation of the normal data or just a subset of the possible abnormal distribution, resulting in biased representation learning for the actual abnormal data.

Very recent studies in unsupervised G-TSAD [55, 56] aimed at handling these challenges by recognizing that the negative samples generated based on the normal data may differ from the actual abnormal data that will be observed in the testing phase. These methods shed light on the CL for anomaly detection; they demonstrate that discriminating the positive and negative pairs in the training phase through minimizing the contrastive loss is just for providing a less noisy and more reliable representation space for the normal data as all the negative samples are created from normal data – in other words, the generated negative samples are just a transformation of the normal data, not an actual abnormal sample.

EEG-CGS [55] introduces an effective positive and negative sub-graphs sampling technique for the application of detecting abnormal sensors and abnormal regions in the brain sensory network using multivariate time-series signals. Specifically, EEG-CGS first represents multivariate time-series systems as attributed graphs to capture the relations between sensors. Then, it proposes a local substructure-based sampling strategy to sample contrastive pairs. For each target node, a positive sub-graph is sampled by closely connected nodes based on controlling the radius of surrounding contexts, while a negative sub-graph is sampled by finding the farthest nodes using the same radius. By sampling positive and negative pairs based on the knowledge of local sub-structures, EEG-CGS overcomes the issues of the random selection technique seen in TADDY.

Once positive and negative pairs are obtained, EEG-CGS designs two graph-based SSL modules for learning the relationship between a sensor and its surrounding contexts. The two SSL modules are reconstruction-based and contrastive-based. The first module is based on GNN-AE, which detects anomalies using feature information. It first anonymizes a node in the positive sub-graphs by replacing its feature vector with an all-zero vector. It then reconstructs the feature vector of the anonymized node from its surrounding nodes in the sub-graph. The reconstruction loss is computed by comparing the reconstructed node with its ground truth in the feature space. To help learn inter-variable information, a contrastive module is introduced. CL captures the matching of sub-graph pairs, which yields meaningful information for anomaly detection. An anomaly score, as a combination of the two components, is assigned to each graph node.

Similar to EEG-CGS, mVSG-VFP [56] takes advantage of both the AE-based reconstruction and contrastive SSL components for multivariate TSAD. However, the way of building its components is different from EEG-CGS. Specifically, mVSG-VFP first uses a sliding window to generate fixed-length segments in the time-series vehicle data and then builds mini-batches of segments. Regarding the AE-based module, mVSG-VFP masks a segment of one sensor at a time and encourages the GNN-based AE module to reconstruct it. Regarding the contrastive module, unlike the existing CL methods that positive and negative samples are created through a contrastive pair sampling strategy, mVSG-VFP leverages a unique characteristic in their vehicle data, i.e., time-series segments recorded during a trip (aka a continuous recording of sensors) are more similar compared to those that are produced during another vehicle trip.

Given this unique property of the vehicle data in mVSG-VFP, adjacent segments in the mini-batch are considered as positive samples, and the remaining segments in the same mini-batch are denoted as negative samples. Then, a so-called temporal contrastive loss is applied to pull adjacent segments closer and push them away from the rest of the mini-batch. In the last step, mVSG-VFP uses the reconstruction error to detect anomalous segments. Due to the data’s unique characteristics, a detected abnormal segment can imply that its adjacent segments are also anomalies. As such, mVSG-VFP addresses both intra-variable and inter-variable dependencies, while EEG-CGS only considers the latter. The main challenge of mVSG-VFP is that the proposed strategy for generating positive and negative pairs is only applicable to a data compound of operational blocks like vehicle trips, a property that is not available for many applications where using the knowledge of adjacent segments is meaningless.

CL has also been successfully applied for detecting anomalies in videos. Unlike the existing studies in the social network domain, such as AddGraph and TADDY that inefficiently consider the generated negative samples as abnormal data, several existing CL-based video anomaly detection methods assume that a few actual abnormal frames are available during the training phase. In other words, the anomaly detection process is weakly supervised.

CRFAD [97] presents a weakly supervised approach for anomaly detection in videos. It starts off with a relation-aware feature extractor to capture multi-scale features in a video using a convolutional neural network. The heart of CRFAD is a combination of a self-attention module and conditional random fields to model the interactions among local and global features across frames as graphs, where nodes are image patches and edges are the connections between nodes. It is very interesting to see that CRFAD can model object-level graphs while the existing predictive-based methods, discussed in Section 5.3, can only model frame-level graphs. Such an object-level framework can learn inter-variable interactions among the objects, which are necessary to detect complex movements. Moreover, CRFAD learns the short-term and long-term intra-variable dependencies across frames by employing a modified version of the temporal relational network. A contrastive multi-instance learning scheme is added to broaden the margin between the positive pairs (normal frames) and the negative pairs (actual abnormal frames).

Despite the object-level graph capabilities for the purpose of anomaly detection, many studies, including CFRAD, neglect the crucial matter of interpretability which is designing a model that can accurately interpret the cause of anomalies. An important application of video anomaly detection is to understand and take appropriate actions whenever an anomaly occurs. An appropriate response to an abnormal event in a video is usually dependent on its severity, which cannot be easily understandable without having an interpretable model. For example, a car illegally crossing the traffic light does require immediate attention, while a person observed as simply jaywalking does not. To address the model interpretability challenge, IVAD [98] proposes a dual-monitoring approach that monitors individual objects (aka local object monitoring) as well as their interactions with other objects present in the scene (aka global object monitoring).

In the local object monitoring branch, each object is monitored independently to determine if its action (e.g., body movement, the skeletal pose) is anomalous. An anomaly score for each frame, based on reconstruction loss, is computed by a GRU-based AE. In the global object monitoring branch, the relationships between objects (e.g., a human, a predicate, and a bike) within a frame are monitored. This creates a scene graph where objects and relations are respectively defined as nodes and edges. CL is adopted in this global monitoring module. Positive pairs are randomly selected from the normal training set, while negative pairs are generated randomly from an artificially generated set and verified to ensure that they do not belong to the positive set. Note that, unlike CFRAD, IVAD does not consider negative pairs as abnormal samples which is, as discussed above, a reasonable choice since no actual abnormal sample is available in the training phase. A contrastive loss is employed to minimize the Euclidean distance between positive samples and maximize the distance between positive and negative samples. This loss is also utilized to compute anomaly scores for each object in a frame. Finally, an in-depth sequential analysis of the statistics on both object monitoring modules of IVAD is performed to interpret the root cause of anomalies. For example, if the cause of an anomaly is from the global object monitoring branch, it is interpreted that the anomaly is caused due to a previously unseen relationship between objects. Otherwise, if the source of an anomaly is from the local object monitoring branch, the source of that anomaly is caused due to an anomalous object action.

Overall, IVAD has addressed the challenges seen in previous G-TSAD studies on videos. By providing an in-depth model interpretability, IVAD could help to understand the possible change of individual objects and the diversity of relationships between different objects in both intra-variable and inter-variable dimensions. However, in terms of creating negative samples, although IVAD could address the issues of AddGraph and TADDY, i.e. it does not use negative samples as actual abnormal samples, IVAD still uses only a simple random selection technique for the negative sample generation. Randomly selecting negative samples is not sufficient to capture the full distribution of the negatives; hence, its sampling technique is less reliable than EEG-CGS and mVSG-VFP, and therefore additional sampling strategies are required.

In summary, existing SSL studies in G-TSAD have shown interesting ideas and addressed some problems seen in the other three categories of G-TSAD methods. However, as SSL is a new topic to G-TSAD, it has not yet confirmed its effectiveness in a variety of application domains compared with predictive-based methods. Moreover, it is suggested to employ different effective graph augmentations for positive and negative sample generation to improve the performance.

6 Datasets and Applications

In this section, we provide the datasets related to time-series signals, social networks and videos that have been widely used in the field of G-TSAD to address various real-world applications. Table 2 lists the representative G-TSAD methods selected from Section 5, their associated learning tasks (discussed in Section 4), the data type used, and their target application case(s).

Category	Method	Year	Learning Task	Data Type	Evaluation Metric	Application
AE- based	DeepSphere [71]	2018	Graph	Signals, Videos	Kappa Statistics, RMSE	Various Real-world Applications
	OmniAnomaly [77]	2019	Node	Signals	Prec, Rec, F1	Various Industrial Applications
	DVGCRN [26]	2022	Node	Signals	Prec, Rec, F1	Various Industrial Applications
	GReLeN [75]	2022	Node	Signals	Prec, Rec, F1	Water System and Server Machine
	GANF [27]	2022	Node	Signals	AUC	Water System, Highway Traffic
	CAN [99]	2023	Graph	Signals	Prec, Rec, F1	Water System, Soil Moisture
	TSAD-C [79]	2024	Graph	Signals	F1, Rec, APR	Various Biomedical Applications
GAN- based	CCG-EDGAN [83]	2021	Node	Signals	Prec, Rec, F1	Various Industrial Applications
	HAD-MDGAT [84]	2022	Node	Signals	Prec, Rec, F1	Various Signal Applications
	STGAN [25]	2022	Node	Signals	Rec	Traffic Data
	RegraphGAN [100]	2023	Edge	Users’ observed data	AUC	Social Networks, Cyber Security
Predictive- based	SEDANSPOT [88]	2018	Edge	Edge Streams	Prec, Rec, F1	Social Networks
	GCLNC [92]	2019	Node	Videos	AUC	Real-world Videos
	MTAD-GAT [86]	2020	Node	Signals	Prec, Rec, F1	Various Industrial Applications
	Midas [89]	2020	Edge	Edge Streams	AUC	Social Networks
	GDN [53]	2021	Node	Signals	Prec, Rec, F1	Water System
	GTA [28]	2021	Node	Signals	Prec, Rec, F1	Water System
	Eland [91]	2021	Graph	Users’ observed data	Prec, AUC	Social Networks
	Series2Graph [90]	2022	Sub-graph	Signals	Accuracy	Various Industrial Applications
	WAGCN [54]	2022	Graph	Videos	AUC	Real-world Videos
	FuSAGNet [85]	2022	Node	Signals	Prec, Rec, F1	Water System
	MST-GAT [101]	2023	Node	Signals	Prec, Rec, F1	Various Industrial Applications
Self- supervised	AddGraph [95]	2019	Edge	Users’ observed data	AUC	Social Networks
	TADDY [96]	2021	Edge	Users’ observed data	AUC	Social Networks, Cyber Security
	CRFAD [97]	2021	Graph	Videos	AUC	Real-world Videos
	CAAD [102]	2021	Graph	Videos	AUC	Real-world Videos
	EEG-CGS [55]	2023	Node, Sub-graph	Signals	Prec, Rec, F1	Seizure Analysis
	mVSG-VFP [56]	2023	Graph	Signals	Prec, Rec, F1	Vehicle Failure Prediction
	IVAD [98]	2023	Edge, Graph	Videos	AUC	Real-world Videos
	STGA [103]	2023	Graph	Videos	AUC	Real-world Videos

Table 2: Comparison of representative G-TSAD methods from multiple aspects.

6.1 Time-series Signals

A substantial portion of existing G-TSAD studies has trained models using widely adopted benchmark time-series signal datasets, such as Yahoo S5 [104], NASA [105], SWaT [106], WADI [107], SMAP [77], and SMD [77]. More specifically, Yahoo S5 is a comprehensive collection of time-series dataset, focusing on real-world scenarios such as traffic, energy consumption, and user management. NASA includes signal data from different machines, employed for research in predictive maintenance and anomaly detection, specifically in the context of identifying anomalies related to machine faults and failures. SWaT is derived from a water treatment facility, and includes sensor readings and control system data. SWaT is particularly designed for studying the security and resilience of industrial control systems. WADI is collected from a water distribution system that includes sensor readings and operational states. It is used for research in water infrastructure monitoring and anomaly detection. SMAP is associated with soil moisture monitoring using satellite-based sensors. It includes time-series data related to soil moisture levels at different locations. Researchers leverage this dataset for studying environmental monitoring and anomaly detection in soil moisture patterns. SMD is a 5-week long time-series dataset collected from various server machines within a large Internet company, in which anomalies are annotated by domain experts.

However, it has been highlighted in recent literature [8, 108] that these benchmark datasets possess various flaws, including mislabeled ground truth, triviality, unrealistic anomaly density, and run-to-failure bias. These issues render the current benchmark time-series signal datasets unsuitable for the robust evaluation and comparison of G-TSAD algorithms. Therefore, improving benchmarking practices is crucial to ensure a more accurate assessment of G-TSAD algorithm performance in practical scenarios. [8] has introduced a reliable time-series dataset, called The UCR Time Series Anomaly Archive, specifically designed for evaluating and benchmarking anomaly detection algorithms. [109] has also introduced The UCI Machine Learning Repository, which contains datasets that cover a broad range of domains, including but not limited to finance, healthcare, biology, and social sciences. These datasets serve as valuable resources for testing and developing TSAD algorithms.

Additionally, recent studies [79, 110, 111] have showcased the superiority of their proposed TSAD methods on several publicly available and reliable alternatives collected from biomedical domains such as The PhysioNet [112], The European ST-T database [113], PTB-XL [114], ICBEB [115], and TUSZ [116]. These datasets consist of anomalies labeled by a consensus of experienced doctors. For example, PTB-XL contains 15-lead electrocardiogram recordings from a diverse set of patients. It provides a comprehensive resource for studying various cardiac conditions. ICBEB is also an electrocardiogram dataset consisting of normal data and different types of abnormal cardiac disorders. TUSZ is one of the public and largest electroencephalogram seizure datasets to date, containing normal and seizure brain activities.

6.2 Social Networks

As clarified in Section 3, social networks, which are represented as dynamic graphs, are also considered as time-series data. This is due to the fact that there are always fresh persons who enroll in the community network every day/month/year, hence, the relationship between individuals is changing overtime. Detecting anomalies in these dynamic social networks gains significant attention in G-TSAD research. Various datasets on social networks have been used. For instance, UCI Messages [117] is a social network dataset that is constructed into dynamic graphs, where each node represents a user, and each edge indicates a message exchange between two users. Email-DNC [118] is network of emails, where each node indicates a person in the United States Democratic Party, and each edge denotes an email sent from one person to another. Digg [119] is a network dataset collected from a news website, called digg.com, where each node represents a website user, and each edge indicates that one user replies to another user. Bitcoin-Alpha and Bitcoin-OTC [120, 121] are two “who-trusts-whom” networks of bitcoin users engaged in trading on the platforms www.btc-alpha.com and www.bitcoinotc.com, respectively, where each node denotes a user from the platform, and an edge is present when one user rates another on the platform. AS-Topology [122] is a network connection dataset collected from autonomous systems on the Internet, where each node denotes an autonomous system, and each edge indicates a connection between two autonomous systems.

6.3 Videos

As illustrated in Figure 2b, video data is considered as the time series as it is a sequence of frames captured over time. The frames are arranged sequentially to create the illusion of motion when played back at a rapid pace. This sequential nature of video data, where each frame depends on the preceding and succeeding frames, makes it inherently a time-series data. Various publicly available and reliable video data have been used in the G-TSAD research field such as UCF-Crime [123], Xd-Violence [124], ShanghaiTech [125], and UCSD-Peds [126].

Specifically, UCF-Crime is a large-scale dataset that consists of 1900 long untrimmed surveillance videos with 13 types of anomalies captured by CCTV camera indoors and outdoors during day and night scenarios. The anomalies are real-world situations such as stealing, road accidents, robbery, etc. Xd-Violence is a large-scale dataset that is composed of 4754 videos, in which there are 2405 multi-labeled violent videos from diverse scenarios recorded from CCTV cameras and movies. ShanghaiTech is a medium-scale dataset that comprises of 437 videos, ranging from 15 seconds to minutes in a variety of circumstances with complex light conditions and camera angles, recorded by CCTV camera in an outdoor location. UCSD-Peds is a small-scale dataset acquired by a stationary camera to record pedestrian walkways. It consists of two subsets: Peds1 with 70 videos that record groups of people walking towards and away from the camera, and Peds2 with 28 videos that record pedestrian movement parallel to the camera plane. The common anomalies are bikers, skaters or small carts.

7 Evaluation Metrics

Evaluation metrics play a crucial role in assessing the performance of G-TSAD algorithms in time-series data. In this section, we present the commonly used evaluation metrics in G-TSAD, which are also denoted in Table 2. We also discuss several issues of evaluation protocols and potential solutions.

In the domain of time series signals, several metrics such as F1 score (F1), Precision (Pre), and Recall (Rec) are commonly used. However, it is important to highlight that recent critiques [9] have raised concerns about the evaluation protocol employed in many studies [20, 127, 128]. Specifically, these existing studies compute F1 score after applying a peculiar evaluation protocol, called point adjustment (PA). Generally, PA considers that if at least one point within an anomaly segment is detected as an anomaly, the entire segment is then considered to be correctly detected as an anomaly. Consequently, this practice leads to high F1 scores, yet it results in a high possibility of overestimating the model performance. [9] conducted experiments demonstrating that without PA, the performance of existing methods yields no significant improvement over the baseline. Note that the baseline is simply a randomly initialized reconstruction model, such as an untrained autoencoder that consists of a single-layer LSTM. To address this issue, [9] proposed a new evaluation protocol, named PA%K, where K denotes the threshold, to mitigate the overestimation effect of PA. The concept behind PA%K is to apply PA to a segment only if the ratio of the number of correctly detected anomalies within that segment to its length exceeds the threshold K. Note that K can be selected manually based on prior knowledge.

While PA%K can address the PA issue, selecting a decision threshold remains non-trivial in TSAD research. There are four scenarios for threshold finding. The first scenario is a common practice where researchers directly find the threshold on the labeled test set [26, 129, 130], which can lead to biased and unfair performance evaluations. In the second scenario, the threshold is selected based on a labeled validation set that yields the best F1 score. The selected threshold is then applied to the test set [53]. However, the first two cases may not align with real-world unsupervised applications where labeled data is unavailable. To tackle this limitation, the third scenario advocates for using an unlabeled validation set to select the threshold, which is then applied to the test set [79]. Note that there is no overlap between training, validation and test sets.

The issue of the above threshold-dependent scenarios lies in their evaluation of the model’s performance at a single threshold, which may not capture the model’s behavior across the full range of possible thresholds. Hence, several methods in the domains of time-series signals, as well as social networks and videos [27, 110, 95, 96, 97, 98, 54] have adopted the fourth scenario, leveraging the threshold-independent metrics such as Area Under the Receiver-Operating Characteristic Curve (AUC) [131, 132], and Area Under the Precision-Recall Curve (APR) [133]. This approach offers a more comprehensive assessment of the model’s performance by considering multiple thresholds, which allows for a deeper understanding of the trade-offs between Prec and Rec, facilitating better decision-making when selecting the optimal operating point for a given application.

Overall, depending on the specific applications at hand and the goals of the evaluation, it is essential to use an appropriate method for selecting threshold as different scenarios may necessitate different approaches, such as supervised, semi-supervised or unsupervised approaches, to ensure accurate and fair evaluations. Moreover, it is advisable to employ a combination of evaluation metrics to comprehensively assess the model’s performance.

8 Challenges and Future Directions

Although G-TSAD is a new topic, it has been rapidly growing in popularity, as can be inferred by the number of studies published in the prestigious Artificial Intelligence venues described in Section 5. However, detecting anomalies in graphs with evolving graph features and adjacency matrices is challenging, leading to several major concerns in the existing studies. In this section, we analyze these challenges and pinpoint potential directions for future studies on G-TSAD.

Theoretical Foundation and Explainability. Despite its great success in various time-series applications, most existing G-TSAD methods are designed and evaluated by empirical experiments without sufficient theoretical foundations to prove their reliability. Plus, they ignore the explainability of learned representations and predicted results – e.g., what are the important features of nodes and edges, and adjacency matrices in graphs? Is the output (e.g., anomaly score) sufficient to conclude the abnormality of graph objects? These are important concerns for interpreting the model behavior. Therefore, we believe that in addition to the empirical design, providing a solid theoretical foundation and a deep analysis of the learned representations to improve the model’s generalization and robustness are crucial.

On Equivalence between Intra-variable and Inter-variable Dependencies. Simultaneous capturing of the intra-variable and inter-variable dependencies in time-series data is very important for effective anomaly detection. However, many existing methods could not handle this problem. Regarding intra-variable dependency, time-series data often involve long-term intra-variable correlations, yet many existing studies could not tackle this issue. Regarding inter-variable dependency, it is difficult to pre-define the graph features and adjacency matrices for many time-series systems due to the limited prior knowledge of the developer. Recent studies have shown that without pre-defined graphs, graph structure learning techniques [87] can learn the graph information, i.e., the features of nodes/edges and adjacency matrices, which dynamically evolves over time. This provides a better capturing of the system’s underlying mechanism. Therefore, a proper integrating of both graph structure learning techniques and deep sequence models is essential to provide more reliable and accurate detection techniques.

Graph Augmentation Strategy. Augmenting graphs is crucial for self-supervised methods, e.g., generating positive and negative pairs is required in CL-based methods discussed in Section 5.4. There are many image augmentation methods in the literature [94]. However, due to the dynamic nature of graphs (e.g., intra-variable complexity, non-Euclidean structure), it is not appropriate to directly apply image-based augmentations to the graphs. Most existing SSL studies only consider random sampling techniques, sub-graph sampling, or graph diffusion for augmentation – this may provide limited diversity and uncertain invariance. Having additional yet effective augmentation techniques such as feature-based (e.g., masking or shuffling a portion of nodes/edges features), edge-based (e.g., adding or removing portions of edges), sampling-based (e.g., sampling nodes and their connected edges using different sampling techniques such as importance sampling, knowledge sampling), and adaptive augmentations (e.g., using learned attention or gradient-based schemes) [30] would be promising directions for G-TSAD studies.

Detection of Multiple Types of Graph Anomalies. As is seen from Table 2, the majority of existing methods could only detect either anomalous nodes, edges, sub-graphs, or graphs. However, in many real-world systems, different observations can have different types of anomalies; e.g., in some observations, only a single sensor is anomalous, but in some other observations, a group of sensors is anomalous. Moreover, although some of the existing studies use $\text{Sim}\{\cdot,\cdot\}$ to facilitate the method performance on detecting anomalous nodes, edges, sub-graphs, and graphs, there is no study that targets to detect anomalous $\text{Sim}\{\cdot,\cdot\}$ . Since $\text{Sim}\{\cdot,\cdot\}$ presents both short-term and long-term relationships across observations, analyzing $\text{Sim}\{\cdot,\cdot\}$ can be beneficial for understanding the underlying dynamics of time-series, e.g., any anomalous $\text{Sim}\{\cdot,\cdot\}$ can be considered as an unexpected shift of time-series dynamic. Thus, having a model that can detect multiple types of anomalies, including $\text{Sim}\{\cdot,\cdot\}$ , would be a very interesting research line.

Broader Scope of Methodology. Each category in the G-TSAD studies has its own strengths and drawbacks. AE-based and predictive-based methods are simple to implement as the losses are easy to build, but recovering the input and learning long-term dependencies is a challenging task to realize. In contrast, despite its capability shown in existing studies, GAN-based methods are difficult to implement due to the adversarial training process that needs a careful selection of hyperparameters, the network architecture, and the optimization algorithm. SSL methods have shown promising results but designing the frameworks, augmentation techniques, and loss functions heavily relies on empirical experiments. These disadvantages may cause poor performance in detecting anomalies. To improve overall anomaly detection accuracy, leveraging the complementary strengths of these approaches is important. For example, we can integrate the AE-based reconstruction and predictive-based models to detect a wider range of anomalies. While the reconstruction model detects anomalies that occur in the current/past time, the predictive-based model can detect anomalies that occur in the future.

Another example is a hybrid SSL-based method, e.g., a combination of the predictive SSL-based and contrastive SSL-based modules to achieve better generalization of the learned representations. Moreover, this combined approach can be adapted to different types of anomalies; e.g., the contrastive SSL-based module learns to detect sudden spikes in the time series, while the predictive SSL-based module can be trained to detect future abnormal trends or other patterns unseen in the training data. Overall, the hybrid SSL-based approach can benefit from multiple pretext tasks of the different SSL modules, yet it is important to design an effective joint learning framework to balance each component in the model [32]. Given its great potential, hybrid methods would be interesting for further G-TSAD studies.

Datasets and Evaluation Metrics. As emphasized in Section 6, acknowledging the limitations of current benchmark TSAD datasets is crucial for advancing the field. It is important to invest efforts in curating high-quality benchmark datasets that reflect real-world scenarios and challenges. Additionally, as discussed in Section 7, the existing challenges with evaluation protocols and metrics underscore the need for a comprehensive evaluation approach. It is essential to employ a combination of evaluation metrics to thoroughly assess model performance, ensuring an accurate and fair comparison between methods while improving reproducibility.

9 Conclusion

This paper conducts a comprehensive survey of G-TSAD. First, we introduce new concepts in G-TSAD, the major challenges of time-series data, the advantage of representing them as graphs, and the types of anomalies in graphs. Then, we present a systematic taxonomy that groups G-TSAD into four categories: AE-based, GAN-based, predictive-based, and self-supervised methods. For each category, we discuss the technical details of the methods, their strengths and weaknesses, as well as comparisons between them. A wide range of practical time-series applications are also introduced. Finally, we point out several limitations of the current research and suggest promising directions for future works of G-TSAD. We hope this survey will serve as a useful reference for follow-up researchers to explore more research in this field.

References

[1] S. Schmidl, P. Wenig, and T. Papenbrock, “Anomaly detection in time series: a comprehensive evaluation,” Proceedings of the VLDB Endowment, vol. 15, no. 9, pp. 1779–1797, 2022.
[2] A. Blázquez, A. Conde, U. Mori, and J. A. Lozano, “A review on outlier/anomaly detection in time series data,” ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1–33, 2021.
[3] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding gaussian mixture model for unsupervised anomaly detection,” in International conference on learning representations, 2018.
[4] H. Ren, B. Xu, Y. Wang, C. Yi, C. Huang, X. Kou, T. Xing, M. Yang, J. Tong, and Q. Zhang, “Time-series anomaly detection service at microsoft,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 3009–3017.
[5] T. Kieu, B. Yang, C. Guo, and C. S. Jensen, “Outlier detection for time series with recurrent autoencoder ensembles.” in The International Joint Conference on Artificial Intelligence, 2019, pp. 2725–2732.
[6] K.-H. Lai, D. Zha, J. Xu, Y. Zhao, G. Wang, and X. Hu, “Revisiting time series outlier detection: Definitions and benchmarks,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
[7] Q. Rebjock, B. Kurt, T. Januschowski, and L. Callot, “Online false discovery rate control for anomaly detection in time series,” Advances in Neural Information Processing Systems, vol. 34, pp. 26 487–26 498, 2021.
[8] R. Wu and E. Keogh, “Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress,” IEEE Transactions on Knowledge and Data Engineering, 2021.
[9] S. Kim, K. Choi, H.-S. Choi, B. Lee, and S. Yoon, “Towards a rigorous evaluation of time-series anomaly detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 7194–7201.
[10] A. Deng, A. Goodge, L. Y. Ang, and B. Hooi, “Cadet: Calibrated anomaly detection for mitigating hardness bias,” in The International Joint Conference on Artificial Intelligence, 2022.
[11] H. Hojjati and N. Armanfard, “Self-supervised acoustic anomaly detection via contrastive learning,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 3253–3257.
[12] R. Morais, V. Le, T. Tran, B. Saha, M. Mansour, and S. Venkatesh, “Learning regularity in skeleton trajectories for anomaly detection in videos,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11 996–12 004.
[13] J.-C. Feng, F.-T. Hong, and W.-S. Zheng, “Mist: Multiple instance self-training framework for video anomaly detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 14 009–14 018.
[14] M.-I. Georgescu, A. Barbalau, R. T. Ionescu, F. S. Khan, M. Popescu, and M. Shah, “Anomaly detection in video via self-supervised and multi-task learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 742–12 752.
[15] C. Huang, J. Wen, Y. Xu, Q. Jiang, J. Yang, Y. Wang, and D. Zhang, “Self-supervised attentive generative adversarial networks for video anomaly detection,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
[16] N.-C. Ristea, N. Madan, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised predictive convolutional attentive block for anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 576–13 586.
[17] J.-C. Wu, H.-Y. Hsieh, D.-J. Chen, C.-S. Fuh, and T.-L. Liu, “Self-supervised sparse representation for video anomaly detection,” in European Conference on Computer Vision. Springer, 2022, pp. 729–745.
[18] C. Zhang, D. Song, Y. Chen, X. Feng, C. Lumezanu, W. Cheng, J. Ni, B. Zong, H. Chen, and N. V. Chawla, “A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data,” in Proceedings of the AAAI conference on artificial intelligence, 2019, pp. 1409–1416.
[19] B. Zhou, S. Liu, B. Hooi, X. Cheng, and J. Ye, “Beatgan: Anomalous rhythm detection using adversarially generated time series.” in The International Joint Conference on Artificial Intelligence, 2019, pp. 4433–4439.
[20] J. Audibert, P. Michiardi, F. Guyard, S. Marti, and M. A. Zuluaga, “Usad: Unsupervised anomaly detection on multivariate time series,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 3395–3404.
[21] Y. Zhang, Y. Chen, J. Wang, and Z. Pan, “Unsupervised deep anomaly detection for multi-sensor time-series signals,” IEEE Transactions on Knowledge and Data Engineering, 2021.
[22] A. Abdulaal, Z. Liu, and T. Lancewicki, “Practical approach to asynchronous multivariate time series anomaly detection and localization,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 2485–2494.
[23] X. Chen, L. Deng, F. Huang, C. Zhang, Z. Zhang, Y. Zhao, and K. Zheng, “Daemon: Unsupervised anomaly detection and interpretation for multivariate time series,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2021, pp. 2225–2230.
[24] S. Tuli, G. Casale, and N. R. Jennings, “Tranad: Deep transformer networks for anomaly detection in multivariate time series data,” arXiv preprint arXiv:2201.07284, 2022.
[25] L. Deng, D. Lian, Z. Huang, and E. Chen, “Graph convolutional adversarial networks for spatiotemporal anomaly detection,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 6, pp. 2416–2428, 2022.
[26] W. Chen, L. Tian, B. Chen, L. Dai, Z. Duan, and M. Zhou, “Deep variational graph convolutional recurrent network for multivariate time series anomaly detection,” in International Conference on Machine Learning, 2022, pp. 3621–3633.
[27] E. Dai and J. Chen, “Graph-augmented normalizing flows for anomaly detection of multiple time series,” in International Conference on Learning Representations, 2022.
[28] Z. Chen, D. Chen, X. Zhang, Z. Yuan, and X. Cheng, “Learning graph structures with transformer for multivariate time series anomaly detection in iot,” IEEE Internet of Things Journal, 2021.
[29] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020.
[30] L. Wu, H. Lin, C. Tan, Z. Gao, and S. Z. Li, “Self-supervised learning on graphs: Contrastive, generative, or predictive,” IEEE Transactions on Knowledge and Data Engineering, 2021.
[31] F. Xia, K. Sun, S. Yu, A. Aziz, L. Wan, S. Pan, and H. Liu, “Graph learning: A survey,” IEEE Transactions on Artificial Intelligence, vol. 2, no. 2, pp. 109–127, 2021.
[32] Y. Liu, M. Jin, S. Pan, C. Zhou, Y. Zheng, F. Xia, and P. Yu, “Graph self-supervised learning: A survey,” IEEE Transactions on Knowledge and Data Engineering, 2022.
[33] Y. Xie, Z. Xu, J. Zhang, Z. Wang, and S. Ji, “Self-supervised learning of graph neural networks: A unified review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
[34] Y. Zhou, H. Zheng, X. Huang, S. Hao, D. Li, and J. Zhao, “Graph neural networks: Taxonomy, advances, and trends,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13, no. 1, pp. 1–54, 2022.
[35] M. M. Li, K. Huang, and M. Zitnik, “Graph representation learning in biomedicine and healthcare,” Nature Biomedical Engineering, vol. 6, no. 12, pp. 1353–1369, 2022.
[36] X. Liu, M. Yan, L. Deng, G. Li, X. Ye, D. Fan, S. Pan, and Y. Xie, “Survey on graph neural network acceleration: An algorithmic perspective,” in The International Joint Conference on Artificial Intelligence, 2022.
[37] C. Zhang, K. Ding, J. Li, X. Zhang, Y. Ye, N. V. Chawla, and H. Liu, “Few-shot learning on graphs: A survey,” The International Joint Conference on Artificial Intelligence, 2022.
[38] C. Gao, Y. Zheng, N. Li, Y. Li, Y. Qin, J. Piao, Y. Quan, J. Chang, D. Jin, X. He et al., “A survey of graph neural networks for recommender systems: Challenges, methods, and directions,” ACM Transactions on Recommender Systems, vol. 1, no. 1, pp. 1–51, 2023.
[39] S. Khoshraftar and A. An, “A survey on graph representation learning methods,” ACM Transactions on Intelligent Systems and Technology, vol. 15, no. 1, pp. 1–55, 2024.
[40] X. Ma, J. Wu, S. Xue, J. Yang, C. Zhou, Q. Z. Sheng, H. Xiong, and L. Akoglu, “A comprehensive survey on graph anomaly detection with deep learning,” IEEE Transactions on Knowledge and Data Engineering, 2021.
[41] H. Kim, B. S. Lee, W.-Y. Shin, and S. Lim, “Graph anomaly detection with graph neural networks: Current status and challenges,” IEEE Access, 2022.
[42] A. D. Pazho, G. A. Noghre, A. A. Purkayastha, J. Vempati, O. Martin, and H. Tabkhi, “A survey of graph-based deep learning for anomaly detection in distributed systems,” IEEE Transactions on Knowledge and Data Engineering, 2023.
[43] J. Ren, F. Xia, I. Lee, A. Noori Hoshyar, and C. Aggarwal, “Graph learning for anomaly analytics: Algorithms, applications, and challenges,” ACM Transactions on Intelligent Systems and Technology, vol. 14, no. 2, pp. 1–29, 2023.
[44] T. Bilot, N. El Madhoun, K. Al Agha, and A. Zouaoui, “Graph neural networks for intrusion detection: A survey,” IEEE Access, 2023.
[45] A. A. Cook, G. Mısırlı, and Z. Fan, “Anomaly detection for iot time-series data: A survey,” IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6481–6494, 2019.
[46] K. Choi, J. Yi, C. Park, and S. Yoon, “Deep learning for anomaly detection in time-series data: review, analysis, and guidelines,” IEEE Access, 2021.
[47] G. Li and J. J. Jung, “Deep learning for anomaly detection in multivariate time series: Approaches, applications, and challenges,” Information Fusion, vol. 91, pp. 93–102, 2023.
[48] C. Wilson, A. Sala, K. P. Puttaswamy, and B. Y. Zhao, “Beyond social graphs: User interactions in online social networks and their implications,” ACM Transactions on the Web (TWEB), vol. 6, no. 4, pp. 1–31, 2012.
[49] B. Shao, X. Li, and G. Bian, “A survey of research hotspots and frontier trends of recommendation systems from the perspective of knowledge graph,” Expert Systems with Applications, vol. 165, p. 113764, 2021.
[50] L. Wu, Y. Chen, H. Ji, and B. Liu, “Deep learning on graphs for natural language processing,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 2651–2653.
[51] J. Audibert, P. Michiardi, F. Guyard, S. Marti, and M. A. Zuluaga, “Do deep neural networks contribute to multivariate time series anomaly detection?” Pattern Recognition, vol. 132, p. 108945, 2022.
[52] A. Garg, W. Zhang, J. Samaran, R. Savitha, and C.-S. Foo, “An evaluation of anomaly detection and diagnosis in multivariate time series,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 6, pp. 2508–2517, 2021.
[53] A. Deng and B. Hooi, “Graph neural network-based anomaly detection in multivariate time series,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, pp. 4027–4035.
[54] C. Cao, X. Zhang, S. Zhang, P. Wang, and Y. Zhang, “Adaptive graph convolutional networks for weakly supervised anomaly detection in videos,” arXiv preprint arXiv:2202.06503, 2022.
[55] T. K. K. Ho and N. Armanfard, “Self-supervised learning for anomalous channel detection in eeg graphs: Application to seizure analysis,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
[56] H. Hojjati, M. Sadeghi, and N. Armanfard, “Multivariate time-series anomaly detection with temporal self-supervision and graphs: Application to vehicle failure prediction,” in The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2023.
[57] R. R. Chowdhury, J. Li, X. Zhang, D. Hong, R. K. Gupta, and J. Shang, “Primenet: Pre-training for irregular multivariate time series,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
[58] S. P. Burns, S. Santaniello, R. B. Yaffe, C. C. Jouny, N. E. Crone, G. K. Bergey, W. S. Anderson, and S. V. Sarma, “Network dynamics of the brain and influence of the epileptic seizure onset zone,” Proceedings of the National Academy of Sciences, vol. 111, no. 49, pp. E5321–E5330, 2014.
[59] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864.
[60] X. Zhu, C. Lei, H. Yu, Y. Li, J. Gan, and S. Zhang, “Robust graph dimensionality reduction.” in International Joint Conference on Artificial Intelligencec, 2018, pp. 3257–3263.
[61] H. Qu, L. Li, Z. Li, and J. Zheng, “Supervised discriminant isomap with maximum margin graph regularization for dimensionality reduction,” Expert Systems with Applications, vol. 180, p. 115055, 2021.
[62] H. Yang, K. Ma, and J. Cheng, “Rethinking graph regularization for graph neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 5, 2021, pp. 4573–4581.
[63] J. Zeng, J. Pang, W. Sun, and G. Cheung, “Deep graph laplacian regularization for robust denoising of real images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
[64] S. Rey and A. G. Marques, “Robust graph-filter identification with graph denoising regularization,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 5300–5304.
[65] X. Li, H. Fan, and J. Liu, “Noise-aware clustering based on maximum correntropy criterion and adaptive graph regularization,” Information Sciences, vol. 626, pp. 42–59, 2023.
[66] S. Tang, J. A. Dunnmon, Q. Liangqiong, K. K. Saab, T. Baykaner, C. Lee-Messer, and D. L. Rubin, “Modeling multivariate biosignals with graph neural networks and structured state space models,” in Conference on Health, Inference, and Learning. PMLR, 2023, pp. 50–71.
[67] Z. Kang, H. Pan, S. C. Hoi, and Z. Xu, “Robust graph learning from noisy data,” IEEE Transactions on Cybernetics, vol. 50, no. 5, pp. 1833–1843, 2019.
[68] P. Berger, G. Hannak, and G. Matz, “Efficient graph learning from noisy and incomplete data,” IEEE Transactions on Signal and Information Processing over Networks, vol. 6, pp. 105–119, 2020.
[69] X. Du, T. Bian, Y. Rong, B. Han, T. Liu, T. Xu, W. Huang, Y. Li, and J. Huang, “Noise-robust graph learning by estimating and leveraging pairwise interactions,” Transactions on Machine Learning Research, 2023.
[70] Z. Zhang and M. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” Advances in neural information processing systems, vol. 31, 2018.
[71] X. Teng, M. Yan, A. M. Ertugrul, and Y.-R. Lin, “Deep into hypersphere: Robust and unsupervised anomaly discovery in dynamic networks,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018.
[72] X. Teng, Y.-R. Lin, and X. Wen, “Anomaly detection in dynamic networks using multi-view time-series hypersphere learning,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 827–836.
[73] D. M. Tax and R. P. Duin, “Support vector data description,” Machine learning, vol. 54, pp. 45–66, 2004.
[74] M. J. Kusner, B. Paige, and J. M. Hernández-Lobato, “Grammar variational autoencoder,” in International conference on machine learning. PMLR, 2017, pp. 1945–1954.
[75] W. Zhang, C. Zhang, and F. Tsung, “Grelen: Multivariate time series anomaly detection from the perspective of graph relational learning,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, 2022, pp. 2390–2397.
[76] I. Kobyzev, S. J. Prince, and M. A. Brubaker, “Normalizing flows: An introduction and review of current methods,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 11, pp. 3964–3979, 2020.
[77] Y. Su, Y. Zhao, C. Niu, R. Liu, W. Sun, and D. Pei, “Robust anomaly detection for multivariate time series through stochastic recurrent neural network,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2828–2837.
[78] X. Jiang, J. Liu, J. Wang, Q. Nie, K. Wu, Y. Liu, C. Wang, and F. Zheng, “Softpatch: Unsupervised anomaly detection with noisy data,” Advances in Neural Information Processing Systems, vol. 35, pp. 15 433–15 445, 2022.
[79] T. K. K. Ho and N. Armanfard, “Multivariate time-series anomaly detection with contaminated data,” arXiv preprint arXiv:2308.12563, 2024.
[80] Y. Tashiro, J. Song, Y. Song, and S. Ermon, “Csdi: Conditional score-based diffusion models for probabilistic time series imputation,” Advances in Neural Information Processing Systems, vol. 34, pp. 24 804–24 816, 2021.
[81] K. Rasul, C. Seward, I. Schuster, and R. Vollgraf, “Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting,” in International Conference on Machine Learning. PMLR, 2021, pp. 8857–8868.
[82] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
[83] H. Liang, L. Song, J. Du, X. Li, and L. Guo, “Consistent anomaly detection and localization of multivariate time series via cross-correlation graph-based encoder–decoder gan,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–10, 2021.
[84] L. Zhou, Q. Zeng, and B. Li, “Hybrid anomaly detection via multihead dynamic graph attention networks for multivariate time series,” IEEE Access, vol. 10, pp. 40 967–40 978, 2022.
[85] S. Han and S. S. Woo, “Learning sparse latent graph representations for anomaly detection in multivariate time series,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 2977–2986.
[86] H. Zhao, Y. Wang, J. Duan, C. Huang, D. Cao, Y. Tong, B. Xu, J. Bai, J. Tong, and Q. Zhang, “Multivariate time-series anomaly detection via graph attention network,” in 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 2020, pp. 841–850.
[87] Y. Zhu, W. Xu, J. Zhang, Y. Du, J. Zhang, Q. Liu, C. Yang, and S. Wu, “A survey on graph structure learning: Progress and opportunities,” in The International Joint Conference on Artificial Intelligence, 2021.
[88] D. Eswaran and C. Faloutsos, “Sedanspot: Detecting anomalies in edge streams,” in 2018 IEEE International conference on data mining (ICDM). IEEE, 2018, pp. 953–958.
[89] S. Bhatia, B. Hooi, M. Yoon, K. Shin, and C. Faloutsos, “Midas: Microcluster-based detector of anomalies in edge streams,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 3242–3249.
[90] P. Boniol and T. Palpanas, “Series2graph: Graph-based subsequence anomaly detection for time series,” arXiv preprint arXiv:2207.12208, 2022.
[91] T. Zhao, B. Ni, W. Yu, Z. Guo, N. Shah, and M. Jiang, “Action sequence augmentation for early graph-based anomaly detection,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 2668–2678.
[92] J.-X. Zhong, N. Li, W. Kong, S. Liu, T. H. Li, and G. Li, “Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1237–1246.
[93] N. Li, J.-X. Zhong, X. Shu, and H. Guo, “Weakly-supervised anomaly detection in video surveillance via graph convolutional label noise cleaning,” Neurocomputing, vol. 481, pp. 154–167, 2022.
[94] H. Hojjati, T. K. K. Ho, and N. Armanfard, “Self-supervised anomaly detection in computer vision and beyond: A survey and outlook,” Neural Networks, p. 106106, 2024.
[95] L. Zheng, Z. Li, J. Li, Z. Li, and J. Gao, “Addgraph: Anomaly detection in dynamic graph using attention-based temporal gcn,” in The International Joint Conference on Artificial Intelligence, 2019, pp. 4419–4425.
[96] Y. Liu, S. Pan, Y. G. Wang, F. Xiong, L. Wang, Q. Chen, and V. C. Lee, “Anomaly detection in dynamic graphs via transformer,” IEEE Transactions on Knowledge and Data Engineering, 2021.
[97] D. Purwanto, Y.-T. Chen, and W.-H. Fang, “Dance with self-attention: A new look of conditional random fields on anomaly detection in videos,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 173–183.
[98] K. Doshi and Y. Yilmaz, “Towards interpretable video anomaly detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2655–2664.
[99] F. Xia, X. Chen, S. Yu, M. Hou, M. Liu, and L. You, “Coupled attention networks for multivariate time series anomaly detection,” IEEE Transactions on Emerging Topics in Computing, 2023.
[100] D. Guo, Z. Liu, and R. Li, “Regraphgan: A graph generative adversarial network model for dynamic network anomaly detection,” Neural Networks, vol. 166, pp. 273–285, 2023.
[101] C. Ding, S. Sun, and J. Zhao, “Mst-gat: A multimodal spatial–temporal graph attention network for time series anomaly detection,” Information Fusion, vol. 89, pp. 527–536, 2023.
[102] S. Chang, Y. Li, J. S. Shen, J. Feng, and Z. Zhou, “Contrastive attention for video anomaly detection,” IEEE Transactions on Multimedia, 2021.
[103] H. Chen, X. Mei, Z. Ma, X. Wu, and Y. Wei, “Spatial–temporal graph attention network for video anomaly detection,” Image and Vision Computing, vol. 131, p. 104629, 2023.
[104] N. Laptev, S. Amizadeh, and Y. Billawala, “S5-a labeled anomaly detection dataset, version 1.0 (16m),” 2015.
[105] K. Hundman, V. Constantinou, C. Laporte, I. Colwell, and T. Soderstrom, “Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 387–395.
[106] A. P. Mathur and N. O. Tippenhauer, “Swat: A water treatment testbed for research and training on ics security,” in 2016 international workshop on cyber-physical systems for smart water networks (CySWater). IEEE, 2016, pp. 31–36.
[107] C. M. Ahmed, V. R. Palleti, and A. P. Mathur, “Wadi: a water distribution testbed for research in the design of secure cyber physical systems,” in Proceedings of the 3rd international workshop on cyber-physical systems for smart water networks, 2017, pp. 25–28.
[108] D. Wagner, T. Michels, F. C. Schulz, A. Nair, M. Rudolph, and M. Kloft, “TimeseAD: Benchmarking deep multivariate time-series anomaly detection,” Transactions on Machine Learning Research, 2023.
[109] S. D. Bay, D. Kibler, M. J. Pazzani, and P. Smyth, “The uci kdd archive of large data sets for data mining research and experimentation,” ACM SIGKDD explorations newsletter, vol. 2, no. 2, pp. 81–85, 2000.
[110] T. Lai, T. K. K. Ho, and N. Armanfard, “Open-set multivariate time-series anomaly detection,” arXiv preprint arXiv:2310.12294, 2023.
[111] E. C. Erkus and V. Purutcuoğlu, “A new collective anomaly detection approach using pitch frequency and dissimilarity: Pitchy anomaly detection (pad),” Journal of Computational Science, p. 102084, 2023.
[112] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,” circulation, vol. 101, no. 23, pp. e215–e220, 2000.
[113] A. Taddei, G. Distante, M. Emdin, P. Pisani, G. Moody, C. Zeelenberg, and C. Marchesi, “The european st-t database: standard for evaluating systems for the analysis of st-t changes in ambulatory electrocardiography,” European heart journal, vol. 13, no. 9, pp. 1164–1172, 1992.
[114] P. Wagner, N. Strodthoff, R.-D. Bousseljot, D. Kreiseler, F. I. Lunze, W. Samek, and T. Schaeffter, “Ptb-xl, a large publicly available electrocardiography dataset,” Scientific data, vol. 7, no. 1, p. 154, 2020.
[115] F. Liu, C. Liu, L. Zhao, X. Zhang, X. Wu, X. Xu, Y. Liu, C. Ma, S. Wei, Z. He et al., “An open access database for evaluating the algorithms of electrocardiogram rhythm and morphology abnormality detection,” Journal of Medical Imaging and Health Informatics, vol. 8, no. 7, pp. 1368–1373, 2018.
[116] V. Shah, E. Von Weltin, S. Lopez, J. R. McHugh, L. Veloso, M. Golmohammadi, I. Obeid, and J. Picone, “The temple university hospital seizure detection corpus,” Frontiers in neuroinformatics, vol. 12, p. 83, 2018.
[117] T. Opsahl and P. Panzarasa, “Clustering in weighted networks,” Social networks, vol. 31, no. 2, pp. 155–163, 2009.
[118] R. Rossi and N. Ahmed, “The network data repository with interactive graph analytics and visualization,” in Proceedings of the AAAI conference on artificial intelligence, vol. 29, no. 1, 2015.
[119] M. De Choudhury, H. Sundaram, A. John, and D. D. Seligmann, “Social synchrony: Predicting mimicry of user actions in online social media,” in 2009 International conference on computational science and engineering, vol. 4. IEEE, 2009, pp. 151–158.
[120] S. Kumar, F. Spezzano, V. Subrahmanian, and C. Faloutsos, “Edge weight prediction in weighted signed networks,” in 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 2016, pp. 221–230.
[121] S. Kumar, B. Hooi, D. Makhija, M. Kumar, C. Faloutsos, and V. Subrahmanian, “Rev2: Fraudulent user prediction in rating platforms,” in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 2018, pp. 333–341.
[122] B. Zhang, R. Liu, D. Massey, and L. Zhang, “Collecting the internet as-level topology,” ACM SIGCOMM Computer Communication Review, vol. 35, no. 1, pp. 53–61, 2005.
[123] W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6479–6488.
[124] P. Wu, J. Liu, Y. Shi, Y. Sun, F. Shao, Z. Wu, and Z. Yang, “Not only look, but also listen: Learning multimodal violence detection under weak supervision,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16. Springer, 2020, pp. 322–339.
[125] W. Luo, W. Liu, and S. Gao, “A revisit of sparse coding based anomaly detection in stacked rnn framework,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 341–349.
[126] W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and localization in crowded scenes,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 1, pp. 18–32, 2013.
[127] H. Xu, W. Chen, N. Zhao, Z. Li, J. Bu, Z. Li, Y. Liu, Y. Zhao, D. Pei, Y. Feng et al., “Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications,” in Proceedings of the 2018 world wide web conference, 2018, pp. 187–196.
[128] L. Shen, Z. Li, and J. Kwok, “Timeseries anomaly detection using temporal hierarchical one-class network,” Advances in Neural Information Processing Systems, vol. 33, pp. 13 016–13 026, 2020.
[129] C. U. Carmona, F.-X. Aubet, V. Flunkert, and J. Gasthaus, “Neural contextual anomaly detection for time series,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, L. D. Raedt, Ed. International Joint Conferences on Artificial Intelligence Organization, 7 2022, pp. 2843–2851.
[130] W. Zhang, C. Zhang, and F. Tsung, “Grelen: Multivariate time series anomaly detection from the perspective of graph relational learning,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, vol. 7, 2022, pp. 2390–2397.
[131] H. Zhang, Z. Wu, Z. Wang, Z. Chen, and Y.-G. Jiang, “Prototypical residual networks for anomaly detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 16 281–16 291.
[132] X. Yao, R. Li, J. Zhang, J. Sun, and C. Zhang, “Explicit boundary guided semi-push-pull contrastive learning for supervised anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 24 490–24 499.
[133] J. Davis and M. Goadrich, “The relationship between precision-recall and roc curves,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 233–240.

Graph-based Time-Series Anomaly Detection: A Survey and Outlook