Unsupervised anomaly detection in multivariate time series with online evolving spiking neural networks

Bäßler, Dennis; Kortus, Tobias; Gühring, Gabriele

doi:10.1007/s10994-022-06129-4

Unsupervised anomaly detection in multivariate time series with online evolving spiking neural networks

Open access
Published: 10 March 2022

Volume 111, pages 1377–1408, (2022)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

Unsupervised anomaly detection in multivariate time series with online evolving spiking neural networks

Download PDF

8902 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

With the increasing demand for digital products, processes and services the research area of automatic detection of signal outliers in streaming data has gained a lot of attention. The range of possible applications for this kind of algorithms is versatile and ranges from the monitoring of digital machinery and predictive maintenance up to applications in analyzing big data healthcare sensor data. In this paper we present a method for detecting anomalies in streaming multivariate times series by using an adapted evolving Spiking Neural Network. As the main components of this work we contribute (1) an alternative rank-order-based learning algorithm which uses the precise times of the incoming spikes for adjusting the synaptic weights, (2) an adapted, realtime-capable and efficient encoding technique for multivariate data based on multi-dimensional Gaussian Receptive Fields and (3) a continuous outlier scoring function for an improved interpretability of the classifications. Spiking neural networks are extremely efficient when it comes to process time dependent information. We demonstrate the effectiveness of our model on a synthetic dataset based on the Numenta Anomaly Benchmark with various anomaly types. We compare our algorithm to other streaming anomaly detecting algorithms and can prove that our algorithm performs better in detecting anomalies while demanding less computational resources for processing high dimensional data.

Exact variable-length anomaly detection algorithm for univariate and multivariate time series

Article 31 July 2018

Spiking neural networks for predictive and explainable modelling of multimodal streaming data with a case study on financial time series and online news

Article Open access 26 October 2023

Real-Time Deep Learning-Based Anomaly Detection Approach for Multivariate Data Streams with Apache Flink

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Detecting outliers in time series data, especially streaming data, has gained significant relevance due to the recent exponential growth in the amount of data captured in big-data and IoT applications (Ahmad et al. 2017; Munir et al. 2019; Macia̧g et al. 2021). Particularly the detection of anomalies in streaming time series data places high demands on the development of effective and efficient algorithms (Ahmad et al. 2017). Real-time capable algorithms are required for many applications in order to process these generated or recorded data. In many cases it is not easy to collect sufficient training data with marked anomalies for the supervised training of an anomaly detector to identify anomalies in data streams (Macia̧g et al. 2021). It is therefore particularly important to design anomaly detectors that can correctly classify anomalies from data where none of the input values need to be classified. Ideally, the designed anomaly detector should learn in an online mode in which the current input values adjust the parameters of the detector for better anomaly detection of future input data. Since conventional machine learning algorithms are in many cases unable to cope with these requirements or can only handle them with a large expenditure of resources, there is a great interest in new efficient solutions. Spiking neural networks have the capability for processing information in a fast way, both in terms of energy and data, due to their functional similarity to the brain (Bing et al. 2018a; Gerstner and Kistler 2002; Lobo et al. 2018) thus they are particularly well suited for online detection of anomalies. Although they are one of the major exponents of the third generation of artificial neural networks, it took a while till they were applied to an online learning approach (Lobo et al. 2018, 2020).

In this paper, we extend an existing algorithm for anomaly detection by Maciąg et al. (2021) from the field of evolving Spiking Neural Networks, in which learning processes, neural communication and classification of data instances are based exclusively on spike exchange between neurons. The approach by Maciąg et al. (2021) is however solely designed for univariate time series. Since multivariate data streams are in practice quiet common we contribute in this work several appropriate measures to make the algorithm work for multivariate data. Here, we focused both on the runtime and real-time capability of the algorithm as well as the general performance for the detection of different types of anomalies which are presented in appendix A.1. Our results can be summarized in the following way:

We present an online capable encoding technique for multivariate times series data. Our method is based on the work of Panuku and Sekhar (2008). It provides a significant performance boost to a parallel execution of several univariate versions of the algorithm of Maciąg et al. (2021).
We use a new continuous outlier score, which is adapted from Ahmad et al. (2017). This outlier score can be executed in an online manner and provides significantly better results as the currently used technique in Maciąg et al. (2021).
We apply the SpikeTemp learning approach by Wang et al. (2017) as an improvement for the existing technique and therefore getting better and faster results for anomaly detection even in the univariate case.

Compared to deep learning models the main advantage of our model is, that it needs no time for training and less memory space, while its performance is in a similar range (see Geiger et al. 2020 and Sect. 5). In the following we use outlier for anomalous data points as well as anomaly. The difference between an outlier and an anomaly is small (see Aggarwal 2013) and some authors use both expressions as well, i.e. Shukla and Sengupta (2020).

2 Related work

Due to the extensive demand for efficient and versatile algorithms for outlier detection a wide variety of approaches has been studied in the existing literature. Depending on the application domain, different kinds of algorithms and techniques have been used. In the following section we present a general review of currently available related literature sources regarding to either univariate or multivariate unsupervised and semi-supervised outlier detection in time series given the applicability of these techniques in a streaming scenario.

One of the most commonly used technique for detecting outliers in a wide variety of different application domains is the Local Outlier Factor (LOF) as proposed by Breunig et al. (2000). This technique uses a density measure which describes how isolated a given instance is with regards to its neighborhood. Pokrajac et al. (2007) proposed an incremental version of this algorithm that makes it suitable for an efficient detection of outliers in data streams. Both the static as well as the incremental LOF algorithm are not designed for high dimensional data as well as for temporal relationships in the input data (Gühring et al. 2019; Pokrajac et al. 2007), thus making it unapplicable for those specific applications.

Statistical autoregressive (AR) models such as ARIMA, VAR, VARIMA have also been extensively studied in the past for detecting outliers in both univariate and multivariate time series (Hau and Tong 1989; Moayedi and Masnadi-Shirazi 2008; Li et al. 2019b). These kinds of models particularly excel by their simplicity and the lack of an initially required training phase and are therefore also suitable for streaming data processing. However, since ARIMA models can barely handle nonlinear relationships, their approach to complex real-world problems is not always satisfactory (Zhang 2003).

Recent advances in deep learning like natural language processing (NLP) or speech recognition have caused an increasing utilization of various network architectures in outlier detection research (Chalapathy and Chawla 2019). These range from Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM) Networks, Autoencoders up to Generative Adversarial Networks. Both discriminative and generative models are used in this context. However, due to the pronounced class imbalance and the associated lack of appropriate data labels, discriminative models are used far less often. Fu et al. (2019) proposed a CNN based denoising autoencoder architecture for detecting outliers in time series data. The autoencoder structure is used for unsupervised feature learning of variable groups and is complemented by a classification layer that distinguishes normal from abnormal samples. Another time series anomaly detection method based on deep learning was proposed by Mahotra et al. (2016). With their EncDec-AD LSTM architecture they propose an encoder decoder scheme for reconstructing the given input signal using the compressed vector representation of the decoder. In order to compute the likelihood of an anomaly they model a normal distribution over the determined error vectors. Generative Adversarial Networks are also extensively present in current research. Geiger et al. (2020) and Li et al. (2019a) both use an adversarial minimax training approach on an LSTM architecture in order to detect outliers in time series. Both use the discriminative and the reconstructive components of the architecture in order to distinguish fake from real data using a multi component outlier scoring system. Deep learning based architectures, however, require for training and inference a vast amount of computing power, making an application of these approaches in a streaming context especially on low power devices infeasible.

To exploit the advances from both, the traditional approaches for outlier detection and deep neural networks there is currently an increased use of hybrid models that are either used in order to improve the quality, the speed or the scalability of the algorithm. Munir et al. (2019) uses a combination of an ARIMA model and a two layered Convolutional Neural Network in a residual learning scheme for the prediction of a given time series where both models complement each other for optimal results. This information is then further utilized for detecting existing outliers in the signal using the Euclidean distance between the real and predicted time step. Another hybrid model is proposed by Shukla and Sengupta (2020). They use a combination of hierarchical clustering and a Long Short Term Memory Network (LSTM) where the hierarchical clustering condenses similar correlating input data which are then fed into a LSTM network enabling the architecture to scale well for high dimensional data. Papadimitriou et al. (2005) proposed a neural network type architecture (SPRIT) for detecting outliers in data streams by reconstructing a given time series using a dynamic repository of incrementally estimated principal components which are approximated by the weights of a hidden neuron. The number of required eigencomponents is determined by a heuristic, which compares the energy of the input data and the estimated eigencomponents. Since each additional added principal component can summarize further aspects of the signal which cannot be explained by the previous principal components, an increase of the repository indicates the existence of an outlier in the current time step.

Advances in neuroscientific disciplines produced a large number of novel approaches of biological plausible learning algorithms for the detection of anomalies in signals. A heavily biologically inspired learning approach used in current anomaly detection research are Spiking Neural Networks (SNN). Xing et al. (2019) proposed a hybrid learning approach using a combination of an evolving Spiking Neural Network architecture as well as a restricted Boltzmann machine for detecting outliers in time series. The concept of an evolving Spiking Neural Network architecture for the efficient processing of data streams is used in several other publications. Demertzis et al. (2017, 2019) use in multiple papers an evolving Spiking Neural Network approach based on a simplified Leaky Integrate and Fire model (LIF) and a Rank Order Coding scheme in a semi-supervised learning scenario. Maciąg et al. (2021) use a similar architecture for an Online evolving Spiking Neural network to detect signal outliers in time series data in an online mode using the error between the reconstructed and the given time series. The evolving architecture used by the previously mentioned approaches enable a continuous adaption to changing statistics of input data, making them well suited for online learning (Wysoski et al. 2006).

3 Pattern discovery in multivariate time series

In this paper we present an online capable algorithm, which is able to detect anomalies in multivariate time series. The detection of anomalies in multivariate times series is not the same as detecting anomalies in a single valued time series, since also correlations between the different dimensions have to be taken into account. Since this is also one of the main differences between our paper and the work of Macia̧g et al. (2021), we here present as a benchmark algorithm the SPIRIT algorithm, another algorithm focusing on outlier detection of streaming multivariate time series. Streaming Pattern dIscoveRy in multIple Time-series (SPRIT) by Papadimitriou et al. (2005) is a fast online capable multivariate time series algorithm. It is according to Aggarwal (2013) one of the most well known unsupervised algorithms which is designed to work not only on single data streams but takes into account a global direction of correlation between different simultaneous data streams. In order to evaluate multivariate time series for anomaly detection most of the time a method for dimensionality reduction is used. The SPIRIT algorithm uses an online principal component analysis for this task. We explain it here in more detail.

The goal of the SPIRIT algorithm is to approximate a representation of the input signal with the smallest possible number of principal components. The algorithm does not require an initial training phase as the principal components are continuously updated with incoming data points $\mathbf {x}_{t} = \left[ x_{t,1},\ldots ,x_{t,n}\right] ^T$ of the n-dimensional data stream.

The principal components matrix of the form $W \in {\mathbb {R}}^{n \times k}$ where n is the number of dimensions of the incoming signal and k is the number of the current principal components are initially represented by the conical unit vector $\mathbf {u} = \left[ u_{1} \dots u_{k}\right]$. For each time step the eigencomponents making up the matrix W are updated. Therefore the current data point $\mathbf {x}_{t}$ is used to initialize the temporary variable $\acute{\mathbf {x}}_{1}$ which is used to calculate the projection $\mathbf {y}$ of $\acute{\mathbf {x}}_{i}$ on the eigencomponents of $W_{i}$ according to $y_{i} = W_{i}^{T} \acute{\mathbf {x}}_{i}$. On the basis of this projection, the energy $d_{i} \leftarrow \lambda d_{i} + y_{i}^{2}$ is updated and the reconstruction error $\mathbf {e}_{i} = \acute{\mathbf {x}}_{i} - y_{i} W_{i}$ of the ith projection is calculated. Papadimitriou et al. (2005) introduce in this context the exponential forgetting factor $\lambda$ as a hyperparameter for the model. This makes it possible to control the influence of time-dependent trends. Values between 0.96 and 0.98 are suggested in Papadimitriou et al. (2005). Based on the reconstruction error $\mathbf {e}_{i}$, the eigencomponents are optimized using gradient descent with a learning rate of $\gamma = \frac{1}{d_{i}}$. The previously mentioned steps are iteratively repeated for $i \le k$ for the next eigencomponents with a new $\acute{\mathbf {x}}_{i+1} = \acute{\mathbf {x}}_i - y_{i} W_{i}$ to reduce the remaining reconstruction error.

To estimate the required number of principal components, two additional metrics are used:

The continuous average energy $E_{t}$ on the input data at time t.
The average total energy of the eigencomponents ${\tilde{E}}_{(k)}$, which is calculated from the sum of the individual energies of the eigencomponents ${\tilde{E}}_{i,t}$ at time t, obtained from Eq. (2).

$$\begin{aligned} E_{t}= & {} \frac{\lambda \left( t-1\right) E_{t-1} + \Vert \mathbf {x}_{t}\Vert ^{2}}{t} \end{aligned}$$

(1)

$$\begin{aligned} {\tilde{E}}_{i,t}= & {} \frac{\lambda \left( t-1\right) {\tilde{E}}_{i,t-1} + y_{i,t}^{2}}{t} \end{aligned}$$

(2)

$$\begin{aligned} {\tilde{E}}_{(k)}= & {} \sum _{i=1}^{k} {\tilde{E}}_{i,t} \end{aligned}$$

(3)

Two further hyperparameters, an upper energy limit $F_{E}$ and a lower energy limit $f_{E}$ are used to determine whether the existing number of principal components k is sufficient or surplus. If ${\tilde{E}}_{(k)} < f_{E} E_{t}$, a further principal component is required to sufficiently represent the data i.e. $k \leftarrow k + 1$. However, if ${\tilde{E}}_{(k)} > F_{E} E_{t}$ applies, $k-1$ principal components are sufficient to represent the data with appropriate quality, i.e. $k \leftarrow k - 1$. Based on the previous result, either the k-th eigencomponent is removed from the matrix W or, if the increased number of eigencomponents k is smaller than the dimension of the data, a new eigencomponent in the form of the canonical unit vector $u_{k+1}$ is added to the matrix W.

According to Papadimitriou et al. (2005), the number of eigencomponents k or the reconstruction error e can be used as a metric to determine the affiliation of the data point to the class of outliers. In this paper we use the SPIRIT algorithm as a comparison model to benchmark the performance of our model. In order to provide comparable values we extend the outlier detection from Papadimitriou et al. (2005) by the same continuous outlier score that we use for our anomaly detection model (see Sect. 4.5).

4 Spiking Neural Networks for outlier detection

4.1 Spiking neural networks

The notion of Spiking Neural Networks is used to describe a group of artificial neural networks that have their origins in the field of neuroscience. These neuron models, also known as the third generation of neural networks (Maas 1997), provide a plausible model of the neuroscientific processes of information processing in biological neurons via the coding of sensory information in spikes. They encode information from preceding neuron layers by generating time-varying pulses caused by the counteraction of excitatory and inhibitory potentials of preceding layers (Gerstner and Kistler 2002). The timing of a spike is formally determined by the positive crossing of a threshold value $\vartheta$ of the potential u, caused by an excitatory signal of a preceding neuron (Gerstner and Kistler 2002).

$$\begin{aligned} t^{(f)}: u(t^{(f)})=\vartheta \quad \text { and } \quad \frac{du(t)}{dt}\bigg |_{t=t^{(f)}} > 0 \end{aligned}$$

(4)

For the mathematical approximation of these complex process, different models exist. But usually only the Integrate and Fire model of Louis Lapicque (1907) and its variations (Gerstner and Kistler 2002) in conjunction with the Rank Order Coding (ROC) model according to Thorpe et al. (1998) are of high relevance in technical applications, especially with regard to the low complexity and the potential to process large amounts of data.

This new type of artificial, biologically inspired networks is characterized, especially with respect to their structural differences, by their high performance compared to conventional neural networks (Maas 1997). Due to the high potential of Spiking Neural Networks, they can be found in research projects of different disciplines such as signal processing (Amirshahi and Hashemi 2019), speech recognition (Wu et al. 2020) or control engineering (Bing et al. 2018b).

4.2 Online evolving Spiking Neural Networks

Evolving Spiking Neural Networks (eSNN) (Wysoski et al. 2008) and their variation of Online evolving Spiking Neural Networks (OeSNN) represent a subcategory of Spiking Neural Networks. The neural activities are commonly modeled by a simplified Leaky Integrate and Fire (LIF) neuron model (Schliebs and Kasabov 2013; Wysoski et al. 2008). Whereas the topology of the adaptive evolving layer changes continuously with new incoming data from the previous input layer (Watts 2009). In this paper we use a OeSNN model for anomaly detection developed by Maciąg et al. (2021) as baseline architecture, which consists exclusively of an input and an output layer without any hidden layers. The output layer is a dynamically growing repository of neurons with a limitation of the maximum number of neurons in the repository. The number of input neurons is fixed and is determined by the user-defined parameter $NI_{size}$, while the maximum number of output neurons is specified by $NO_{size}$. The continuously incoming input data of a univariate input data stream is buffered in a sliding window W of size $W_{size}$ for internal processing. The values inside the window are encoded to spike times for the input neurons using a fixed number of evenly distributed overlapping Gaussian Receptive Fields (GRF) which are further described in Sect. 4.3. For the output layer a simplified neural LIF model (Schliebs and Kasabov 2013; Wysoski et al. 2008) is applied. In this model, the postsynaptic potential (PSP) is accumulated at the output neuron based on the input signals from the preceding neural layer until it reaches its post-synaptic potential threshold $\vartheta$ as in Eq. (4). Reaching the PSP threshold causes the output neuron to fire and its PSP value is reset to 0 (Macia̧g et al. 2021).

Based on the error of the output layer, either a new neuron is added to the existing output layer or the parameters of an existing neuron are updated (Kasabov 2006). At first a new output neuron is created for each value $x_{t}$ of the univariate input data stream. The corresponding weights for the newly created neuron are initialized according to the spike order of the input neurons as described in Wysoski et al. (2006). Each newly created output neuron is then either added as a new instance to the output neuron repository or merged with an existing output neuron in the output repository. This behavior is controlled by a user-defined parameter sim, which is located in the (0, 1] range. For this purpose, the similarity between the newly created output neuron k and each of the other output neurons in the repository is calculated. The similarity is defined as the reciprocal of the Euclidean distance between the weights of the newly added output neuron and the other output neurons. If the similarity to one of the existing neurons exceeds the predefined threshold value sim, then the newly added neuron will be merged with the most similar neuron as in Maciąg et al. (2021). If no existing output neuron is similar enough for the defined threshold sim, the new output neuron is added to the output repository. If the repository has reached its maximum size, the oldest neuron in the output layer is replaced by the newly created neuron (Macia̧g et al. 2021).

For anomaly detection a vector of error values is calculated between predicted and observed values of the window W. Therefore for every predicted scalar value $y_{t}$ and observed scalar value $x_{t}$ of the window W the absolute difference $e_{t} = | x_{t} - y_{t} |$ is calculated, which results in an error vector $\mathbf {e}$ of the dimension $W_{size}$. Based on this vector, the mean value ${\bar{x}}_{e}$ and the standard deviation $s_{e}^2$ of the error values of $\mathbf {e}$ are used to classify $x_{t}$ either as normal or anomalous. If the difference between $e_{t}$ and ${\bar{x}}_{e}$ is greater than $\epsilon \cdot s_{e}^2$, where $\epsilon$ is a user-defined threshold, then $x_{t}$ is classified as an anomaly (Macia̧g et al. 2021).

In summary, each scalar time series value $x_{t}$ goes through the following steps to generate a prediction and gets classified as normal or anomalous:

1.
The input window W is updated with the value $x_{t}$ and the GRF of the input neurons are initialized.
2.
The value $x_{t}$ is encoded by the GRF into spike times (see Sect. 4.3).
3.
The coded spike times are then used to determine the spike order of the input neuron.
4.
Based on the spike order of the input neuron, the PSP of the output neuron is accumulated in the same order.
5.
The output neuron that reaches its PSP threshold first generates the prediction $y_{t}$.
6.
The new prediction $y_{t}$ is compared to the actual value $x_{t}$ and classified as normal or anomaly.
7.
If no anomaly is detected, the output neuron that generated the prediction $y_{t}$ is corrected according to the deviation from the input value $x_{t}$.
8.
In parallel to steps 6 and 7, a new output neuron is generated independently of the anomaly classification. This neuron is then merged or added to the repository depending on the similarity threshold sim as described before.

Originally, the model by Maciąg et al. (2021) with the name OeSNN-UAD according to Online evolving Spiking Neural Networks for Unsupervised Anomaly Detection was designed for univariate data processing only. For anomaly detection in multivariate time series, one instance of the model can be executed per dimension, but then no correlation between the dimensions is considered as shown in Sect. 5. We therefore develop an appropriate measure to improve the processing of multivariate data as described in Sect. 4.3. Our focus is primarily on the runtime and real-time capability as well as the general performance of the detection of different types of anomalies. When talking about detection it is important to look at the the correctly classified outliers, the unrecognized outliers and the data instances which are incorrectly classified as outliers. We deal with this in Sect. 5.4, where we define the F1 score as an appropriate metric to compare our results to other result in the literature. Additionally, we improve the anomaly detection by extending the model with an anomaly probability score (see Sect. 4.5) and reduce the computing costs through an improved learning approach which is described in Sect. 4.4.

4.3 Efficient encoding of multivariate data with Gaussian Receptive Fields

The biological inspired structure of receptive fields and especially the Gaussian Receptive Fields plays an elementary role in the efficient processing of data in SNN. In the biological context, a receptive field is defined as a spatially limited area of interconnected sensory receptors that convert incoming visual, auditory or similar stimuli into electrical stimulus potentials (Lindeberg 2013). The incoming visual stimuli are encoded by receptive fields into electrical signals that are further processed by subsequent neuron layers. Due to the sparse interaction of receptive fields with subsequent neurons, this coding can be used to efficiently process the information in the subsequent neuron layers (Lindeberg 2013). This biological concept is applied in different types of artificial neural networks, such as the Convolutional Neural Networks (Goodfellow et al. 2016) or the Spiking Neural Networks (Macia̧g et al. 2021; Panuku and Sekhar 2008; Hopkins et al. 2018).

The encoding of the input data is performed in Maciąg et al. (2021) using a one dimensional Gaussian Receptive Field, where multiple Gaussian distributions are placed equally over the input window W. This approach however limits the algorithm to one dimensional input data. The parallel execution of several OeSNN instances for processing multivariate data in each dimension entails several limitations which affect the performance of the model in terms of runtime and the identification of complex outliers. In order to eliminate these limitations or to minimize them in their manifestation, we examine in the following an efficient modeling technique for multidimensional Gaussian Receptive Fields. It is summarized at the end of this subsection with points 1. to 4.

We use multidimensional Gaussian distribution functions on multidimensional clusters $C_i$, which we obtain from the input data via a k-Means clustering. For the two-dimensional case the clustering algorithm is illustrated in the upper left-hand side of Fig. 4. We do the clustering based on the work of Panuku and Sekhar (2008) in which they model a defined number of multivariate Gaussian distributions over the incoming data. The placement of the receptive fields differs here from the one-dimensional variant of the OeSNN-UAD architecture. In contrast to the OeSNN-UAD architecture, the placement of the receptive fields is not evenly distributed over the incoming data, instead it is placed specifically in those regions where clusters appear (cf. Fig. 1). This approach can significantly reduce the required number of receptive fields on the input data, which in turn yields a positive effect on the runtime of the algorithm.

The excitation factor of the following input neurons is calculated for each input neuron in the same manner as in the one-dimensional approach via the normal distribution function of the respective cluster $C_i$, as it is illustrated in Fig. 4 in the upper right-hand side. It is defined, as shown in Eq. (5), by the center $\mathbf {\mu }_{i} \in {\mathbb {R}}^{n}$ and the covariance $\Sigma _{i} \in {\mathbb {R}}^{n \times n}$ of the cluster $C_{i}$.

$$\begin{aligned} f_{C_i}(x) = \exp \left[ -\frac{1}{2}\left( \mathbf {x}-\mathbf {\mu }_{i}\right) ^{T}\Sigma _{i}^{-1}\left( \mathbf {x}-\mathbf {\mu }_{i}\right) \right] \end{aligned}$$

(5)

From the excitation factor $f_{C_i}(x)$ the spike time $t_i$ is calculated by

$$\begin{aligned} t_i=1-f_{C_i}(x) \end{aligned}$$

(6)

as it is shown in Fig. 4 by moving from the upper right-hand side to the lower left-hand side. However, the variant as proposed by Panuku and Sekhar (2008) is primarily intended for an offline setting, in which the receptive fields are initialized in an initial training phase. In order to preserve the online character of the anomaly detection component, both the k-Means clustering and the calculation of the covariance of the clusters for each incoming data point are calculated recursively.

For the clustering of the incoming data, an online-capable version of the Lloyd k-Means algorithm (King 2012) is used, which can perform the adjustment of the clusters in a streaming based application with constant runtime. Here, all $C_{i}$ clusters are initiated by the incoming data and all further data points are assigned to the existing cluster with the smallest Euclidean distance.The associated cluster center $\mathbf {\mu }_{i}$ as well as the covariance $\Sigma _{i}$ is recursively estimated for each data point $\mathbf {x}_{t}$ belonging to the cluster using Eqs. (7) and (8).

$$\begin{aligned} \mathbf {\mu }_{i}= & {} \mathbf {\mu }_{i,l} = \mathbf {\mu }_{i,l-1}+\frac{1}{l}\left( \mathbf {x}_{t}-\mathbf {\mu }_{i,l-1}\right) \end{aligned}$$

(7)

$$\begin{aligned} \Sigma _{i}= & {} \Sigma _{i,l} = \left( 1-\frac{1}{l}\right) \left( \Sigma _{i,l-1}+\frac{1}{l}\left( \mathbf {x}_{t}-\mathbf {\mu }_{i,l-1}\right) \left( \mathbf {x}_{t}-\mathbf {\mu }_{i,l-1}\right) ^{T}\right) \end{aligned}$$

(8)

In summary the calculation of the encoding of multivariate data streams is done via the following iterative algorithm. It differs from Maciąg et al. (2021) not only in the usage of multivariate Gaussian distributions but even more by the clustering of the data points, which makes it possible to reduce the number of Gaussian distributions and obtain better results:

1.
Calculate a predefined number of multivariate clusters $C_i$ with center $\mu _i$ and covariance matrix $\Sigma _i$ via k-Means clustering with the data already obtained.
2.
Place a multidimensional Gaussian distribution with mean $\mu _i$ and covariance matrix $\Sigma _i$ over each of the clusters $C_i$.
3.
For each incoming data point $x_t$ calculate the excitation factors $f_{C_i}(x_t)$ for each of the cluster $C_i$ according to Eq. (5).
4.
Assign each incoming data point $x_t$ to the nearest cluster $C_i$ and update the mean $\mu _i$ and covariance matrix $\Sigma _i$ of this cluster as in Eqs. (7) and (8) and start again with 1.

In order to verify the effectiveness of the changes made for the online-capable operation in comparison to the standard algorithms, we execute both versions of the algorithms for a signal with 10,000 data points and compare the results.

In a subsequent step, as shown in Fig. 2, we determine the absolute distances (Manhattan distance) of the vector or matrix components between the cluster center $\mathbf {\mu }_{i}$ and the estimated covariance matrices $\Sigma _{i}$ for the time series shown in Fig. 1.

We observe that both the cluster center and the covariance matrices asymptotically converge to the results of the offline algorithm. The estimated cluster center converge significantly faster than the covariance matrices. Nevertheless, in the evaluation of the algorithm, we could not detect any negative influence due to the slightly inaccurate covariance estimation.

4.4 Learning capabilities with dynamic weight adaption

The architecture by Maciąg et al. (2021) uses a learning method adopted from Wysoski et al. (2006) in order to calculate the networks weights. This online learning method uses the ranking of the occurring spikes to update the weights. The weight change is calculated with

$$\begin{aligned} \varDelta w_{ji} = mod^{ order(j) } \end{aligned}$$

(9)

where $\varDelta w_{ji}$ is the change in weight $w_{ji}$ between neuron j in the input layer and neuron i in the output layer. The constant mod represents the modulation factor, which is in the range of (0, 1). The order(j) parameter corresponds to the index for the neuron j in a list sorted by spike time. The exact time information is discarded and not included in the calculation of the weights. Wang et al. (2017) published an improved Rank-Order-Based learning procedure for SNN, which is called SpikeTemp. In SpikeTemp the spike time is used directly to update the weights. Equation (10) describes the change in the weights $w_{ji}$ of the synapse that connects a neuron j to an output neuron i, where $t_{j}$ represents the spike time of the neuron j and $\tau$ is a constant scaling factor used as a hyperparameter (see Sect. 5.3) (Wang et al. 2017).

$$\begin{aligned} \varDelta w_{ji} = exp \left( -\frac{t_j}{\tau } \right) \end{aligned}$$

(10)

In this learning procedure, the available time information is used directly and not only the order of the spike times as in Eq. (9), so that the time interval influences the weight change. When using the Rank-Order-Based approach of Wysoski et al. (2006) the learning algorithm only provides a constant weight update. Therefore, the postsynaptic potential (PSP) for this output neuron is always identical for an input pattern that may have different spike times but produces the same fire order. With the SpikeTemp approach, different spike times always lead to different weight changes and a different PSP as determine in Eq. (11). Consequently, the weight changes and the PSP correlate better with the input pattern, which contributes to improved learning performance. Since the SpikeTemp approach also eliminates the need to order all spikes in a window, it also reduces the computational effort (Wang et al. 2017).

$$\begin{aligned} PSP(i,t) = \sum \limits _{j \in [1..N]} w_{ji} \cdot exp \left( -\frac{t_j}{\tau } \right) \end{aligned}$$

(11)

The following steps describe the customized learning procedure with SpikeTemp for our online evolving SNN model. It is also illustrated in the lower right hand side of the overview picture Fig. 4 explaining our model.

1.
For each input value an output neuron is generated as described in Maciąg et al. (2021). However, the weights to the input neurons are now initialized to a constant value. Wang et al. (2017) determined the experimental value of 0.1 for a SNN classification model. We change this constant initialization factor for the online evolving SNN to a hyperparameter in the model, using 0.1 as a reference value (see Sect. 5.3). The weight change from Eq. (10) is then added to the initial base weight.
2.
The PSP for an additional neuron in the output layer is now calculated using Eq. (11). The calculation of the neuron fire threshold remains unchanged and is adapted from the architecture of Maciąg et al. (2021).
3.
The addition of the new output neuron to the output neuron repository remains unchanged to Maciąg et al. (2021). If the similarity to one of the existing output neurons is greater than a predefined threshold, the newly added output neuron is merged with the most similar output neuron, otherwise the output neuron is added to the repository.

4.5 Anomaly detection using a continuous outlier score

The architecture of Maciąg et al. (2021) uses the difference between the prediction and the actual value and compares the deviation to the previously determined deviations. As soon as this exceeds a user defined threshold value, the data point is classified as an outlier. However is not unusual to have occasional jumps in a time series, which lead to prediction errors, because of a slight time shift between prediction and input data (see Fig. 3). To handle these scenarios we replace the outlier detection function by the Numenta scoring function of Ahmad et al. (2017). This makes it possible to determine for each data point by use of the reconstruction error a probability with which a data point belongs to the outlier category. For this purpose, the squared reconstruction error (see Fig. 3 middle) for each step t is assumed to be a representation of a continuous normal distribution function. To calculate the distribution the continuous mean $\mu _{t}$ and variance $\sigma _{t}$ are calculated based on the squared reconstruction error. The exact procedure is shown in Fig. 3. The green area marks the initialization phase of the algorithm.

The left picture shows the time series and the prediction of our new algorithm. Here the anomaly consists of a missing amplitude starting at the index 1200. From this the squared reconstruction error is calculated, which is shown in the center. The outlier probability is given by the complement of the Q-function on the squared reconstruction error as shown in Eq. (12):

$$\begin{aligned} L_{t} = 1 - Q \left( \frac{\tilde{\mu _{t}} - \mu _{t}}{\sigma _{t}} \right) . \end{aligned}$$

(12)

The variable $\tilde{\mu _{t}}$ represents the mean value of the squared reconstruction error of a defined time window W. The mean $\mu _{t}$ and the variance $\sigma _{t}$ are calculated using the Welford online algorithm (Welford 1962). The outlier score in Eq. (12) is calculated for each dimension separately on the corresponding reconstruction error and is then combined into one score by just taking the highest value of the outlier score over all dimensions at time step t. Replacing the outlier detection with the Numenta scoring function allows our newly adapted model to output a continuous outlier probability, whereas the architecture of Maciąg et al. (2021) only outputs a binary classification of outliers. However, when comparing our algorithm with other anomaly detection algorithms in Sect. 5 we clearly need to identify whether a single multivariate data point belongs to an anomaly or not, so that a binary classification is needed. We therefore classify every data point with an outlier score above the threshold of 0.8 as an outlier and every data point with an outlier score below 0.8 not as an outlier as it is also indicated in Fig. 3 (right).

Papadimitriou et al. (2005) use the number of eigencomponents k as an anomaly indicator in the SPIRIT algorithm to determine the outlier class. To achieve a better comparability between the reference algorithm and the modified OeSNN, the outlier score in Eq. (12) is also used in the SPIRIT data processing pipeline. For this purpose, the reconstruction error per dimension is used as an input. The outlier scores of the individual dimensions are then merged into one outlier score using the respective maximum value, which makes both models easily comparable.

5 Experimental evaluation

5.1 Overview of models

For the following evaluation of the performance of algorithms, we combined the adjustments in Sects. 4.3, 4.4 and 4.5 of the initial algorithm according to Maciąg et al. (2021) incrementally as shown in Table 1. Therefore we address all models based on the initial OeSNN-UAD algorithm with prefix OeSNN. In addition, we add letters A-D consecutively according to the integrated extensions. Here the model OeSNN-A represents the version in the paper of Maciąg et al. (2021), which is extended with each following letter by the adaptations shown in Table 1. For a general overview of our final model OeSNN-D we also refer to Fig. 4.

The models were implemented with Cython (Behnel et al. 2011). Wrapping the external C++ library as a Python extension allows an accelerated execution compared to a Python native implementation, which is especially beneficial for online and real-time data processing.

Table 1 Model overview

Unsupervised anomaly detection in multivariate time series with online evolving spiking neural networks

Abstract

Similar content being viewed by others

Exact variable-length anomaly detection algorithm for univariate and multivariate time series

Spiking neural networks for predictive and explainable modelling of multimodal streaming data with a case study on financial time series and online news

Real-Time Deep Learning-Based Anomaly Detection Approach for Multivariate Data Streams with Apache Flink

1 Introduction

2 Related work

3 Pattern discovery in multivariate time series

4 Spiking Neural Networks for outlier detection

4.1 Spiking neural networks

4.2 Online evolving Spiking Neural Networks

4.3 Efficient encoding of multivariate data with Gaussian Receptive Fields

4.4 Learning capabilities with dynamic weight adaption

4.5 Anomaly detection using a continuous outlier score

5 Experimental evaluation

5.1 Overview of models

5.2 Evaluation benchmark

5.3 Hyperparameter optimization

5.3.1 Generation of validation data

5.3.2 Efficient hyperparameter selection

5.4 Model evaluation

5.5 Experimental determination of runtimes

5.5.1 Influence of number of data points

5.5.2 Influence of dimensionality of the input data

5.6 Anomaly detection of different algorithms

5.6.1 Anomaly detection and increasing dimensionality of the data

5.6.2 Detection of different outlier types in the NAB dataset

5.6.3 Anomaly detection in the Yahoo Webscope dataset

6 Conclusion

Availability of data and material

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent for publication

Additional information

Publisher's Note

A Appendix

A Appendix

1.1 A.1 Signal categories of the extended Numenta Anomaly Benchmark

1.2 A.2 Hyperparameter search space

1.3 A.3 Determined hyperparameters for the evaluation of the datasets

1.4 A.4 Anomaly detection results

1.5 A.5 Visualization of the outlier scores for different algorithms

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation