Leveraging AutoEncoders and chaos theory to improve adversarial example detection

Pedraza, Anibal; Deniz, Oscar; Singh, Harbinder; Bueno, Gloria

doi:10.1007/s00521-024-10141-1

Leveraging AutoEncoders and chaos theory to improve adversarial example detection

Original Article
Open access
Published: 24 July 2024

Volume 36, pages 18265–18275, (2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Leveraging AutoEncoders and chaos theory to improve adversarial example detection

Download PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

The phenomenon of adversarial examples is one of the most attractive topics in machine learning research these days. These are particular cases that are able to mislead neural networks, with critical consequences. For this reason, different approaches are considered to tackle the problem. On the one side, defense mechanisms, such as AutoEncoder-based methods, are able to learn from the distribution of adversarial perturbations to detect them. On the other side, chaos theory and Lyapunov exponents (LEs) have also been shown to be useful to characterize them. This work proposes the combination of both domains. The proposed method employs these exponents to add more information to the loss function that is used during an AutoEncoder training process. As a result, this method achieves a general improvement in adversarial examples detection performance for a wide variety of attack methods.

Emphasizing Similar Feature Representations to Defend Against Adversarial Attacks

Rethinking Adversarial Examples Exploiting Frequency-Based Analysis

Adversarial Examples in Deep Neural Networks: An Overview

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Adversarial examples are one of the most interesting topics in the current computer vision research. It was first mentioned by [1], in which it was pointed out how small pixel perturbations on the input pixels could dramatically change the prediction of deep learning models. The potential risk of this phenomenon is that these perturbations can remain invisible to humans, thus confusing neural networks while presenting no noticeable difference to a human operator. As an example, see Fig. 1. For this reason, it is important to develop a mechanism that detects these inputs, preventing them from making further steps in critical systems workflows.

On the one hand, there are the so-called “attacks” methods, which are aimed at crafting these malicious perturbations that can potentially put the models at risk. This can be expressed as a formal statement, as observed in Eq. (1). To obtain an adversarial example $X'$, an unperturbed image X is modified with a calculated noise $\delta X$, which should be the least possible noise that changes the prediction class of the original f(X).

$$\begin{aligned} X' = X + \arg \min _{\delta X} \{||\delta X|| \;\; s.t. \;\; f(X+\delta X) \ne f(X) \} \end{aligned}$$

(1)

On the other hand, research efforts are also put into “defenses” methods that study the properties of input images to detect these perturbations in order to flag the potential malicious images.

This work aims to combine the machine learning domain with the chaos theory domain. For this reason, an introduction to the latter is mandatory. Chaos theory is considered as one of the most important branches derived from nonlinear dynamics. These fields have increased the understanding of complex systems, from weather patterns and communications to biological processes. Then, it is important to introduce nonlinear dynamics to understand better how chaos theory has been developed. Unlike linear systems, where cause and effect are proportional, nonlinear systems exhibit complex, often unexpected patterns. This field analyzes the interactions among the different components to understand emergent behavior. For example, the works [2, 3] represent significant advancements in understanding the complex behaviors of advanced composite materials and nanoscale structures under various loading and environmental conditions. They provide essential insights for the design and optimization of new materials and structures in engineering applications, highlighting the ongoing innovations and analytical methods in the field.

In the case of chaos theory, it is more focused on the analysis of a system sensitivity to initial conditions. Even in a deterministic system, when unpredictable behavior is observed, we refer to it as chaotic. The same behavior can be observed in the machine learning domain with the introduced concept of adversarial examples. Small perturbations in the initial conditions (input image) turn the network to behave in a chaotic state (unpredictable output). For this reason, the combination of both fields seems natural. As a first approach to this integration, this work proposes the application of a chaos theory-related metric, such as Lyapunov exponents (LEs), to the optimization of the training process for a neural network architecture, an AutoEncoder. Therefore, the main contributions of this work are defined as follows:

An analysis of the implications of chaos theory on neural network inputs is performed, through LEs.
A novel loss function combining common divergence metrics and chaos theory metrics is proposed.
An adversarial AutoEncoder is trained with both approaches, showing how the proposal contributes to increase adversarial example detection rate.

2 Related work

2.1 Chaos theory

Chaos theory is applied in multiple domains. For example, in the medical field, we can find works such as [4, 5], in which chaos theory was helpful to discover random points candidates to jump from the low-fitness values to a high-fitness value region. On the one side, [4] employed these points to train a classifier using a predator–prey adaptive-inertia chaotic particle swarm optimization algorithm. This classifier was employed to develop a novel method able to detect an alcohol use disorder, with the help of Hu moment invariants to extract features from brain slices. This algorithm was also prepared to be installed on medical robots. On the other side, [5] was focused on the detection of abnormal breast in digital mammography. For this purpose, they proposed a novel chaotic adaptive real-coded biogeography-based optimization to train multilayer perceptron classifier. To feed this classifier, they used fractional Fourier entropy to extract global features from preprocessed images, being able to select 23 distinguishing features.

Another interesting domain is related to communications. For example, [6] uses the Lyapunov direct method to bound signals and make their synchronization error converge to zero. This leverages a brain emotional learning-based intelligent controller (BELBIC) to develop a secure communication system. For this reason, chaos theory becomes more suitable than other state-of-the-art techniques, such as neuro-fuzzy systems. At the same time, controller and observer synchronization performance increases, due to the mitigation of the chaotic signals from transmitter and receiver, respectively. In [7], there is another interesting application of chaos theory in the communications domain. With a similar approach, this work proposes an adaptive controller for chaos synchronization using quantum neural networks. Using the Lyapunov theorem, the proposed system is able to estimate the uncertainties due to external disturbance from environmental conditions. As a result, the synchronization procedure is performed with negligible error. This method could also be applied in cryptography.

2.2 Adversarial attacks

To deceive a specific neural network model, various strategies can be employed. The considered assumptions are referred to as the threat model. There is a white-box threat model if the attacker is allowed to have access to all parameters of the neural network and the distribution of training images. Alternatively, if the attacker only has access to the model’s prediction results (via queries, for example), the threat model is referred to as black-box. Although the latter is more restrictive, there are attacks that can correctly estimate the distribution of classes in order to generate a gradient direction in which a perturbed image is incorrectly predicted.

In this study, seven attack methods from both threat models are compared in order to evaluate the effectiveness of the proposed defense mechanism. To achieve that, a set of adversarial examples is crafted from each method. After that, the original defense method and our proposed improved defense method are trained on those samples, checking the adversarial detection rate in each case.

The Fast Gradient Sign Method (FGSM) [1] was the seminal attack that demonstrated the influence of adversarial examples in deep learning. It computes the gradient direction of a loss function that accounts for the network’s input. To achieve that, the derivatives of the loss function are computed with respect to the parameters of the network, in the so-called backpropagation mechanism. Calculating this with respect to the input image gives a pixel perturbation that, when smoothed using a small epsilon value and added to that input image, is able to cause the output class prediction to drift away from the original ground truth.

In this work, additional variants of the FGSM attack are also considered. These strategies were designed to counteract forthcoming defenses and improve their performance across a wider range of architectures. For example, in the Projected Gradient Descent (PGD, [8]), the perturbation is projected on an Lp-ball for each iteration. This can be based on any L-norm distance of a specified radius. In consequence, the crafted perturbations can be kept small while, at the same time, they remain in the range of the input data, so they are more difficult to be detected by both defense methods and humans.

Another attack method, DeepFool [9], was one of the first to achieve good results in complex datasets such as ImageNet. The method estimates a bounded hyperplane that contains the classifier predictions of the same class. Using this information, the attack attempts to exceed its border, so it directly calculates a perturbation that induces an adversarial example. Another approach, Elastic-Net Attack (EAD) [10], defines the adversarial examples as a regularization optimization problem. The objective of this method is to calculate a minimum Lp distance (usually L1, but it can be extended to the L2 order domain). Others, like Universal [11], compute a single perturbation pattern that is applied to all the inputs to find a general modification that is not dependent on a specific image, like most attack algorithms. To achieve that, the method analyzes the geometric correlations among the high-dimensional decision boundaries of a given neural network classifier.

Some of these methods were able to break the most popular defenses at that time. For example, in EAD, the authors claimed that adversarial distillation [12] (a popular defense) heavily reduced its effectiveness when defending against the adversarial examples crafted with that method. Moreover, they also claimed that using adversarial examples from EAD in addition to Carlini & Wagner (C&W) [13] adversarial examples was able to increase the adversarial training defense robustness, showing that the field is in constant evolution.

Finally, regarding the black-box threat model, two methods are considered in our experiments. As explained before, these methods are able to craft adversarial examples querying the model to compute slight perturbation steps. The first is Boundary attack [14] (also known as Decision-Based Attack). This method starts with a large perturbation that ensures the adversarial behavior, and then it is gradually reduced while keeping the adversarial class. As an evolved version of the previous method, we also consider the HopSkipJump [15] attack (also known as Boundary Attack++ in early versions). This method optimizes the number of queries thanks to a better estimation of the decision gradient, using binary information about the boundary.

Considering the aforementioned attack methods, they are implemented in our work with the parameter settings detailed in Table 1.

Table 1 Attacks and parameters setup in this work

Full size table

2.3 Adversarial detection

In the state-of-the-art, different methods have been proposed for adversarial attacks and defenses. On the attack side, most of the efforts are put into fooling the network with the least perturbation possible, while on the defense side, the objective is to detect as many adversarial images as possible or reduce the impact of those into the model accuracy. For the latter purpose, different approaches are provided. For example, one of the most common is to retrain the neural network models with additional images that include adversarial examples crafted from the legit training dataset images. This sort of “data augmentation with adversarial examples” is the so-called adversarial training [16] technique. It has been proved to be effective to increase the model robustness against unseen adversarial examples, even the ones crafted using other methods different than the ones used during training. Moreover, by identifying the features that differ from those in the training data, the authors of [17, 18] suggested a false data injection attack utilizing the AutoEncoder technique. Samuel et al. [19] proposed the classification model based on a multivariate time series in which a gradient adversarial transformation network is combined with the adversarial AutoEncoders. Following the same trend, [20] employs the AutoEncoders to perform a subset scanning over its activations, looking for anomalous patterns related to adversarial noise.

Some works have developed other approaches based on different kinds of classifiers. For example, [21] builds an ensemble of maximum-margin softmax to learn high-density features to discriminate between adversarial and legitimate samples. In [22], a multiobjective multifidelity Bayesian optimization algorithm is proposed. This is aimed at designing fault classification models that can solve the security-accuracy trade-off. It is also worth mentioning the work described in [23], since neural network activations are employed to detect adversarial, using the Fourier domain instead of the images.

Other authors apply some preprocessing before the input image is fed to the network [24], so that the potential malicious effect of the perturbation is prevented. Adversarial attacks rely on carefully modified pixels that shift the values of activations throughout inner layers. Therefore, a method that modifies the majority of pixels within a small range, while preserving the visual information, can also get rid of the specific pixel values that were causing the adversarial behavior. One solution presently used to combat the issue of adversarial attacks is denoising image classifiers [25]. In that work, the authors devised a method to restore the ground truth from noisy data damaged by malicious perturbations. Also, a block-matching convolutional neural networks (CNN) [26] for image denoising was proposed as a preprocessing module that does not call for the classifier to be retrained.

Another interesting approach is to apply the chaos theory knowledge as an analogy to perturbations of adversarial examples. This was first proposed by the seminal work [27] and further developed in more recent studies as [28,29,30]. There, adversarial perturbations are considered as chaos points in different parts of the network, such as the input image itself or the inner features of the network layers. As a result, it is possible to apply properties and metrics from this theory to detect potential adversarial examples with tools such as LEs. On these works, this has been proved as a useful metric to detect adversarial examples. For this reason, it is expected that its use could be leveraged as a defense method, obtaining better results than other metrics that have been employed before.

As a basis, the AutoEncoder defense method proposed by [31] is trained on clean and adversarial images. When a new input image is fed, the distribution it belongs to is detected. For its purpose, this original method employs a Kullback–Leibler (KL) divergence metric. In this context, the specific contribution of the paper can be summarized as follows: a novel defense method based on a customized loss metric that is not based on divergence but on chaos theory, specifically on LEs. For this purpose, the loss calculates these exponents on both a clean and an adversarial example, trying to minimize the difference between them, considering that the lower the value, the more chaos is reduced by the defense.

3 Methodology

Given the potential of LE in detecting the noise related to adversarial examples, the idea in this work is to build a defense method that leverages this potential into an active mechanism to prevent the consequences of adversarial examples. For this purpose, the defense framework proposed by [31] is used as a basis. As it is shown in Fig. 2, an AutoEncoder is trained on two main subsets: legit images and adversarial examples.

This method is able to score the matching distribution of a given sample, as a certain probability of belonging to a potential adversarial. Moreover, in the middle of the AutoEncoder structure, the vector representation can be corrected to match the legit distribution. In consequence, adversarial perturbations can be corrected to prevent their effects when fed to the network.

The main difference with the aforementioned methods [28,29,30] is that the adversarial examples are not only detected, but the AutoEncoder is able to reconstruct the potential adversarial into an image in which its effect has been eliminated. Depending on the adversarial distribution threshold, a sample can be classified as a regular or a potential adversarial. Therefore, this lets the algorithm to choose between the original image and the image that has been reconstructed from the AutoEncoder, minimizing the impact of the adversarial example on the results of the image classification model.

Originally, the architecture of the method employs a KL divergence metric as the loss function when training and optimizing the AutoEncoder detector. In our implementation, this is substituted by a Lyapunov-based formula known as Lyap loss, which better captures the potential perturbations of adversarial examples.

Indeed, LEs are a crucial tool for understanding the behavior of dynamical systems, particularly in the context of chaos theory [32]. The maximum Lyapunov exponent (MLE) is a critical measure in the study of dynamical systems, particularly in identifying chaotic behavior introduced by the small perturbation. The MLE quantifies the average rate at which nearby trajectories in the phase space diverge or converge over time. A positive MLE is a strong indication of chaos, demonstrating that small differences in initial conditions can lead to exponentially divergent outcomes. Mathematically, the MLE is defined in Eq. (2).

$$\begin{aligned} \lambda _{max} = \displaystyle \lim _{t \rightarrow \infty } \lim _{\delta _x(0) \rightarrow 0}\frac{1}{t}\ln \frac{|\delta _x(t)|}{|\delta _x(0)|} \end{aligned}$$

(2)

where $\delta _x(0)$ is the initial separation between two nearby trajectories, and $\delta _x(t)$ is the separation at time t. A positive $\lambda _{max}$ indicates that $\delta _x(t)$ grows exponentially, signifying chaotic behavior. In contrast, a zero or negative MLE indicates neutral stability or convergence, respectively, implying non-chaotic behavior. Quantitatively, two trajectories in phase space with an initial separation vector $\delta Z_0$ diverge at a rate given by Eq. (3).

$$\begin{aligned} ||\delta (t)|| \approx ||\delta (0)||e^{\lambda t} \end{aligned}$$

(3)

where $\lambda$ is the LE, and the explanation of this chaos quantifier is provided in Fig. 3.

The proposed loss function is defined in Eq. (4).

$$\begin{aligned} Loss = Loss_{KL} + 10^{max(\lambda _{i}^{L_{k}}-\lambda _{i}^{A_{k}})} \end{aligned}$$

(4)

where $\lambda _{i}^{L_{k}}$ and $\lambda _{i}^{A_{k}}$ represent the LE computed from the $k_{th}$ legit image ($L_{k})$ and corresponding adversarial example ($A_{k}$), respectively. Here, $k (1<k<N)$ refers to the number of images in the image datasets. The first four MLEs are calculated over a given pair, which are paired as $(\lambda _{1}^{L_{k}}, \lambda _{2}^{L_{k}}, \lambda _{3}^{L_{k}}, \lambda _{4}^{L_{k}})$ and $(\lambda _{1}^{A_{k}}, \lambda _{2}^{A_{k}}, \lambda _{3}^{A_{k}}, \lambda _{4}^{A_{k}})$, respectively. Later, the differences between MLEs are calculated, so positive values mean that the perturbation is being reduced, being the opposite otherwise. After that, only the max values are kept, to detect the presence of the potentially most chaotic point, wherever is found. Finally, the result is elevated by 10, to increase exponentially the value of the loss when chaos is found, and reduce it in the same sense when it is being reduced during the training process. As a result, the defined loss makes the AutoEncoder to converge to a point in which it is able to reduce the chaoticity of any given sample. In consequence, once processed by the trained model, an image that induces less chaoticity will be classified with better accuracy, preventing the malicious effect of the adversarial attack.

The main novelty of this approach is to consider the neural network as a chaotic system, something not considered in other detection methods, including the baseline AutoEncoder.

4 Experiments

4.1 Datasets

This work employs three different datasets to conduct the experiments. Therefore, it is possible to contrast the obtained results in a wider range of conditions, for example, covering grayscale or color images, from small or large size, with simple patterns or real-world objects.

The first dataset is MNIST, which was published in [33]. Across different domains of computer vision research, it is one of the most common benchmarks. It consists of white handwritten digits over a black background, as seen in Fig. 4. Despite its simplicity, it is a very useful dataset to study in detail the specific features that are affected by a given attack or defense method. Specifically, this dataset contains 60,000 images for training and 10,000 for testing, with 28 x 28 pixel size each.

The next dataset is a variant of the previous one, called Fashion-MNIST, developed by the Zalando Company [34]. The size and number of images are the same, while the objects are related to clothing, with thumbnails of different types, such as shirts, bags, coats or trousers, see Fig. 5. In comparison with the regular MNIST, these contain more texture details and, as a result, more complexity.

Finally, the CIFAR-10 [35] dataset is also considered. This has a larger pixel size of the images (32 x 32 pixels), and they are in color, with three channels for RGB. The 10 classes represent real-world common objects from two domains: motor transport (automobile, truck, ship and airplane) and animal species (deer, frog, cat, dog, bird and horse). Some examples of these objects can be found in Fig. 6.

4.2 Results

With the attacks and datasets previously defined, the proposed method is compared in two versions: The first one employs the original loss based on the KL divergence metric, while the second one substitutes it with a variant that is based on LE.

In Fig. 7, the Lyapunov spectrum for 50 legitimate and corresponding FGSM adversarial examples is displayed. We plot MLE for $\epsilon =0.05$, which will significantly affect the computed value of the Lyap loss in Eq. (4). Note that $\epsilon$ is a small, dimensionless hyperparameter used to control the perturbation level [36]. When the images are perturbed to deceive the network, it is observed that the MLEs are positive. In contrast, the MLEs for legit images are negative. The histogram shows that the quantiles of the empirical distribution of $\lambda _{L}$ for adversarial images are all positive, proving the importance of the $\lambda _{L}$ for perturbation detection.

The impact of different perturbation levels at $\epsilon =0.01$ to 0.25 on the estimated Lyap loss over 50 randomly selected images from the MNIST dataset and its average values are displayed in Fig. 8. We can observe that the Lyap loss gradually grows as the perturbation level increases. LEs are a measure used in chaos theory to categorize the pace at which extremely near trajectories separate from one another. A positive MLE is believed to mean that the system is chaotic among all LEs. In principle, we analyze the statistical characteristics of the Lyapunov spectra in the non-chaotic and chaotic phases by computing the Lyap loss from MLE of the legit and adversarial examples. Considering a threshold learned during the training process of the AutoEncoder, it is possible to discern whether the loss values belong to a legit or an adversarial input.

For the MNIST dataset, the Lyapunov variant performs better in all cases, from a 9% to 18% increment, despite it also has larger values in standard deviation. The results of MNIST dataset are shown in Table 2. They are better visualized in Fig. 9.

Table 2 MNIST dataset results

Full size table

For the Fashion-MNIST dataset, the Lyapunov loss in the defense obtains better results and the divergence metric, with the greatest increment in FGSM attack (up to +37% detection rate). However, the Universal attack is the only case in which this method is performing worse, for the whole experimentation. The results of Fashion-MNIST dataset are shown in Table 3. They are better visualized in Fig. 10.

Table 3 Fashion-MNIST dataset results

Full size table

For the CIFAR-10 dataset, the Lyapunov version achieves more than 10% average detection rate, with low standard deviation values, except for the Universal attack. Specifically, the largest increments are observed in the gradient-based attacks, both FGSM and PGD. The results of CIFAR-10 dataset are shown in Table 4. They are better visualized in Fig. 11.

Table 4 CIFAR-10 dataset results

Full size table

5 Conclusion and future work

This work shows the application of the chaos theory domain to the adversarial example detection. First, a given adversarial defense based on AutoEncoder and a divergence metric is tested. With the addition of LEs, it is possible to extract more information during the training process of the AutoEncoder. This enhancement makes it possible to obtain promising results for small-size datasets, such as MNIST or CIFAR. Specifically, the results obtained in comparison with the original method show a performance increase that ranges from 9% up to 37% in the detection rate of adversarial examples. Given the wide variety of attack methods that are considered, this shows that the performance is increased consistently.

We have checked that a chaos-based metric is useful in adversarial example detection. For this purpose, we have chosen a modification to the training loss function of a deep learning AutoEncoder. The experiments show an improvement in the overall results against the baseline loss function. While this does not achieve a new best state-of-the-art detection method, it represents a milestone in the further application of these methods deriving from chaos theory into this problem of machine learning.

For example, as future work, further refinement of the employed formula would be required for larger datasets and networks, such as ImageNet, in order to better capture the spatial information of perturbations in such a high-dimensional problem.

Data availability

Since all datasets belong to the state-of-the-art and are properly referenced, no data are needed to be shared in this article.

References

Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: International conference on learning representations (ICLR)
Li C, Zhu C, Lim C et al (2022) Nonlinear in-plane thermal buckling of rotationally restrained functionally graded carbon nanotube reinforced composite shallow arches under uniform radial loading. Appl Math Mech 43(12):1821–1840
Article MathSciNet Google Scholar
Li H, Wang W, Lai S et al (2023) Nonlinear vibration and stability analysis of rotating functionally graded piezoelectric nanobeams. Int J Struct Stabil Dyn 24(9):24501037
MathSciNet Google Scholar
Zhang YD, Zhang Y, Lv YD et al (2017) Alcoholism detection by medical robots based on hu moment invariants and predator-prey adaptive-inertia chaotic particle swarm optimization. Comput Electr Eng 63:126–138
Article Google Scholar
Zhang Y, Wu X, Lu S et al (2016) Smart detection on abnormal breasts in digital mammography based on contrast-limited adaptive histogram equalization and chaotic adaptive real-coded biogeography-based optimization. Simulation 92(9):873–885
Article Google Scholar
Samimi M, Majidi MH, Khorashadizadeh S (2020) Secure communication based on chaos synchronization using brain emotional learning. AEU-Int J Electron Commun 127:153424
Article Google Scholar
Aliabadi F, Majidi MH, Khorashadizadeh S (2022) Chaos synchronization using adaptive quantum neural networks and its application in secure communication and cryptography. Neural Comput Appl 34(8):6521–6533
Article Google Scholar
Madry A, Makelov A, Schmidt L, et al (2018) Towards deep learning models resistant to adversarial attacks. In: International conference on learning representations (ICLR)
Moosavi-Dezfooli SM, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2574–2582
Chen PY, Sharma Y, Zhang H, et al (2018) EAD: elastic-net attacks to deep neural networks via adversarial examples. In: Thirty-second AAAI conference on artificial intelligence
Moosavi-Dezfooli SM, Fawzi A, Fawzi O, et al (2017) Universal adversarial perturbations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1765–1773
Papernot N, McDaniel P, Wu X, et al (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP), IEEE, pp 582–597
Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (sp), IEEE, pp 39–57
Brendel W, Rauber J, Bethge M (2017) Decision-based adversarial attacks: reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248
Chen J, Jordan MI, Wainwright MJ (2020) HopSkipJumpAttack: a query-efficient decision-based attack. In: 2020 IEEE symposium on security and privacy (sp), IEEE, pp 1277–1294
Papernot N, McDaniel P, Wu X, et al (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP), pp 582–597
Wang C, Tindemans S, Pan K, et al (2020) Detection of false data injection attacks using the autoencoder approach. In: 2020 international conference on probabilistic methods applied to power systems (PMAPS), pp 1–6
Ye H, Liu X, Yan A et al (2022) Detect adversarial examples by using feature autoencoder. In: Sun X, Zhang X, Xia Z et al (eds) Artif Intell Secur. Springer International Publishing, Cham, pp 233–242
Chapter Google Scholar
Samuel H, Fazle K, Houshang D (2021) Generating adversarial samples on multivariate time series using variational autoencoders. IEEE/CAA J Autom Sin 8(9):1523–1538
Article Google Scholar
Cintas C, Speakman S, Akinwande V, et al (2021) Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp 876–882
Hassanin M, Moustafa N, Tahtali M et al (2022) Rethinking maximum-margin softmax for adversarial robustness. Comput Secur 116:102640
Article Google Scholar
Zhuo Y, Song Z, Ge Z (2023) Security versus accuracy: trade-off data modeling to safe fault classification systems. IEEE Trans Neural Netw Learn Syst
Harder P, Pfreundt FJ, Keuper M, et al (2021) Spectraldefense: detecting adversarial attacks on cnns in the fourier domain. In: 2021 international joint conference on neural networks (IJCNN), IEEE, pp 1–8
Mustafa A, Khan SH, Hayat M et al (2020) Image super-resolution as a defense against adversarial attacks. IEEE Trans Image Process 29:1711–1724
Article MathSciNet Google Scholar
Yan H, Zhang J, Feng J, et al (2022) Towards adversarially robust deep image denoising
Pawlicki M, Choraś RS (2021) Preprocessing pipelines including block-matching convolutional neural network for image denoising to robustify deep reidentification against evasion attacks. Entropy 23(10):1304
Article Google Scholar
Prabhu VU, Desai N, Whaley J (2017) On Lyapunov exponents and adversarial perturbation. Deep Learning Security Workshop (Singapore)
Pedraza A, Deniz O, Bueno G (2020) Approaching adversarial example classification with chaos theory. Entropy 22(11):1201
Article MathSciNet Google Scholar
Pedraza A, Deniz O, Bueno G (2022) Lyapunov stability for detecting adversarial image examples. Chaos Solitons Fract 155:111745
Article Google Scholar
Deniz O, Pedraza A, Bueno G (2022) Detecting chaos in adversarial examples. Chaos Solitons Fract 163:112577
Article MathSciNet Google Scholar
Vacanti G, Van Looveren A (2020) Adversarial detection and correction by matching prediction distributions. arXiv preprint arXiv:2002.09364
Wolf A, Swift JB, Swinney HL et al (1985) Determining lyapunov exponents from a time series. Phys D 16(3):285–317
Article MathSciNet Google Scholar
Bottou L, Cortes C, Denker JS, et al (1994) Comparison of classifier methods: a case study in handwritten digit recognition. In: Proceedings of the 12th IAPR international conference on pattern recognition, conference B: computer vision and image processing, vol 2. IEEE, Jerusalem, pp 77–82
Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical Report TR-2009, University of Toronto
Zhang J, Li C (2020) Adversarial examples: opportunities and challenges. IEEE Trans Neural Netw Learn Syst 31(7):2578–2593
MathSciNet Google Scholar

Download references

Acknowledgments

This work was partially funded by project PDC2021-121197-C22 (funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR), Horizon Europe dAIEdge Grant no. 101120726 by the European Commission and project SBPLY/21/180501/000025 by the Autonomous Government of Castilla-La Mancha.

Author information

Authors and Affiliations

VISILAB, University of Castilla la Mancha, Av Camilo Jose Cela, 13071, Ciudad Real, Spain
Anibal Pedraza, Oscar Deniz, Harbinder Singh & Gloria Bueno

Authors

Anibal Pedraza
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Deniz
View author publications
You can also search for this author in PubMed Google Scholar
Harbinder Singh
View author publications
You can also search for this author in PubMed Google Scholar
Gloria Bueno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anibal Pedraza.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest associated with this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pedraza, A., Deniz, O., Singh, H. et al. Leveraging AutoEncoders and chaos theory to improve adversarial example detection. Neural Comput & Applic 36, 18265–18275 (2024). https://doi.org/10.1007/s00521-024-10141-1

Download citation

Received: 11 December 2023
Accepted: 27 June 2024
Published: 24 July 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s00521-024-10141-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Leveraging AutoEncoders and chaos theory to improve adversarial example detection

Abstract

Similar content being viewed by others

Emphasizing Similar Feature Representations to Defend Against Adversarial Attacks

Rethinking Adversarial Examples Exploiting Frequency-Based Analysis

Adversarial Examples in Deep Neural Networks: An Overview

1 Introduction