Abstract
Security-sensitive applications that rely on Deep Neural Networks (DNNs) are vulnerable to small perturbations that are crafted to generate Adversarial Examples. The (AEs) are imperceptible to humans and cause DNN to misclassify them. Many defense and detection techniques have been proposed. Model’s confidences and Dropout, as a popular way to estimate the model’s uncertainty, have been used for AE detection but they showed limited success against black- and gray-box attacks. Moreover, the state-of-the-art detection techniques have been designed for specific attacks or broken by others, need knowledge about the attacks, are not consistent, increase model parameters overhead, are time-consuming, or have latency in inference time. To trade off these factors, we revisit the model’s uncertainty and confidences and propose a novel unsupervised ensemble AE detection mechanism that 1) uses the uncertainty method called SelectiveNet, 2) processes model layers outputs, i.e. feature maps, to generate new confidence probabilities. The detection method is called SFAD. Experimental results show that the proposed approach achieves better performance against black- and gray-box attacks than the state-of-the-art methods and achieves comparable performance against white-box attacks. Moreover, results show that SFAD is fully robust against High Confidence Attacks (HCAs) for MNIST and partially robust for CIFAR10 datasets.1
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The source code is available in https://github.com/aldahdooh/detectors_review
Detector is compared with the results that are reported in the original paper [81]
The AEs that are able to fool a model are called successful AEs, otherwise, are called failed or unsuccessful AEs
References
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, San Diego
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Shen D, Wu G, Suk H-I (2017) Deep learning in medical image analysis. Ann Rev Biomed Eng 19:221–248
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I J, Fergus R (2014) Intriguing properties of neural networks. In: Bengio Y, LeCun Y (eds) 2nd International Conference on Learning Representations, ICLR 2014, Conference Track Proceedings, Banff
Goodfellow I J, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, San Diego
Guo W, Mu D, Xu J, Su P, Wang G, Xing X (2018) Lemna: Explaining deep learning based security applications. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp 364–379
Akhtar N, Mian A (2018) Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access 6:14410–14430
Hao-Chen H X Y M, Deb L D, Anil H L J-L T, Jain K (2020) Adversarial attacks and defenses in images, graphs and text: A review. Int J Autom Comput 17(2):151–178
Kurakin A, Goodfellow I, Bengio S (2017) Adversarial examples in the physical world. ICLR Workshop
Moosavi-Dezfooli S-M, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2574–2582
Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: 2017 ieee symposium on security and privacy (sp). IEEE, pp 39–57
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: 6th International Conference on Learning Representations, ICLR 2018, Conference Track Proceedings. OpenReview.net, Vancouver
Papernot N, McDaniel P D, Goodfellow I J (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. CoRR arXiv:1605.07277
Chen P-Y, Zhang H, Sharma Y, Yi J, Hsieh C-J (2017) Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp 15–26
Engstrom L, Tran B, Tsipras D, Schmidt L, Madry A (2019) Exploring the landscape of spatial robustness. In: International Conference on Machine Learning, pp 1802–1811
Su J, Vargas D V, Sakurai K (2019) One pixel attack for fooling deep neural networks. IEEE Trans Evol Comput 23(5):828–841
Kotyan S, Vasconcellos Vargas D (2019) Adversarial robustness assessment: Why both l0 and \(l_{\infty }\) attacks are necessary, pp arXiv–1906
Gal Y, Ghahramani Z (2016) Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning. PMLR, pp 1050–1059
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Feinman R, Curtin R R, Shintre S, Gardner A B (2017) Detecting adversarial samples from artifacts. CoRR arXiv:1703.00410
Smith L, Gal Y (2018) Understanding measures of uncertainty for adversarial example detection. In: Globerson A, Silva R (eds) Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018. AUAI Press, Monterey, pp 560–569
Sheikholeslami F, Jain S, Giannakis G B (2020) Minimum uncertainty based detection of adversaries in deep neural networks. In: Information Theory and Applications Workshop, ITA 2020. IEEE, San Diego, pp 1–16
Geifman Y, El-Yaniv R (2019) Selectivenet: A deep neural network with an integrated reject option. CoRR arXiv:1901.09192
Hendrycks D, Gimpel K (2017) A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net
Aigrain J, Detyniecki M (2019) Detecting adversarial examples and other misclassifications in neural networks by introspection. CoRR arXiv:1905.09186
Monteiro J, Albuquerque I, Akhtar Z, Falk T H (2019) Generalizable adversarial examples detection based on bi-model decision mismatch. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE, pp 2839–2844
Sotgiu A, Demontis A, Melis M, Biggio B, Fumera G, Feng X, Roli F (2020) Deep neural rejection against adversarial examples. EURASIP J Inf Secur 2020:1–10
Xu W, Evans D, Qi Y (2018) Feature squeezing: Detecting adversarial examples in deep neural networks. In: 25th Annual Network and Distributed System Security Symposium, NDSS 2018. The Internet Society, San Diego
Athalye A, Carlini N, Wagner D A (2018) Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: Dy JG, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan. Proceedings of Machine Learning Research, vol 80. PMLR, Stockholm, pp 274–283
Carlini N, Wagner D A (2017) Magnet and “efficient defenses against adversarial attacks” are not robust to adversarial examples. CoRR arXiv:1711.08478
Bulusu S, Kailkhura B, Li B, Varshney P K, Song D (2020) Anomalous example detection in deep learning: A survey. IEEE Access 8:132330–132347
Lust J, Condurache A P (2020) Gran: An efficient gradient-norm based detector for adversarial and misclassified examples. In: 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2020, Bruges, pp 7–12
Ma S, Liu Y (2019) Nic: Detecting adversarial samples with neural network invariant checking. In: Proceedings of the 26th Network and Distributed System Security Symposium (NDSS 2019)
Gao Y, Doan B G, Zhang Z, Ma S, Zhang J, Fu A, Nepal S, Kim H (2020) Backdoor attacks and countermeasures on deep learning: A comprehensive review. CoRR aRxiv:https://arxiv.org/abs/2007.10760
Melis M, Demontis A, Biggio B, Brown G, Fumera G, Roli F (2017) Is deep learning safe for robot vision? adversarial examples against the icub humanoid. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 751–759
Lu J, Issaranon T, Forsyth D (2017) Safetynet: Detecting and rejecting adversarial examples robustly. In: Proceedings of the IEEE International Conference on Computer Vision, pp 446–454
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Liu S, Johns E, Davison A J (2019) End-to-end multi-task learning with attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1871–1880
Lecuyer M, Atlidakis V, Geambasu R, Hsu D, Jana S (2019) Certified robustness to adversarial examples with differential privacy. In: 2019 IEEE Symposium on Security and Privacy (SP). IEEE, pp 656–672
Liu X, Cheng M, Zhang H, Hsieh C-J (2018) Towards robust neural networks via random self-ensemble. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 369– 385
Liu X, Xiao T, Si S, Cao Q, Kumar S, Hsieh C-J (2020) How does noise help robustness? explanation and exploration under the neural sde framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 282–290
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto
Carlini N, Wagner D (2017) Adversarial examples are not easily detected: Bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp 3–14
Ma X, Li B, Wang Y, Erfani S M, Wijewickrema S N R, Schoenebeck G, Song D, Houle M E, Bailey J (2018) Characterizing adversarial subspaces using local intrinsic dimensionality. In: 6th International Conference on Learning Representations, ICLR 2018, Conference Track Proceedings. OpenReview.net, Vancouver
Xie C, Tan M, Gong B, Yuille A L, Le Q V (2020) Smooth adversarial training. CoRR arXiv:https://arxiv.org/abs/2006.14536
Tramèr F, Kurakin A, Papernot N, Goodfellow I J, Boneh D, McDaniel P D (2018) Ensemble adversarial training: Attacks and defenses. In: 6th International Conference on Learning Representations, ICLR 2018, Conference Track Proceedings. OpenReview.net, Vancouver
Xie C, Wu Y, van der Maaten L, Yuille A L, He K (2019) Feature denoising for improving adversarial robustness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 501–509
Borkar T, Heide F, Karam L (2020) Defending against universal attacks through selective feature regeneration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 709–719
Liao F, Liang M, Dong Y, Pang T, Hu X, Zhu J (2018) Defense against adversarial attacks using high-level representation guided denoiser. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1778–1787
Mustafa A, Khan S H, Hayat M, Shen J, Shao L (2019) Image super-resolution as a defense against adversarial attacks. IEEE Trans Image Process 29:1711–1724
Prakash A, Moran N, Garber S, DiLillo A, Storer J (2018) Deflecting adversarial attacks with pixel deflection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8571–8580
Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP). IEEE, pp 582–597
Papernot N, McDaniel P, Goodfellow I, Jha S, Celik Z B, Swami A (2017) Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp 506–519
Gu S, Rigazio L (2015) Towards deep neural network architectures robust to adversarial examples. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, Workshop Track Proceedings, San Diego
Nayebi A, Ganguli S (2017) Biologically inspired protection of deep networks from adversarial attacks. CoRR arXiv:1703.09202
Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 427–436
Grosse K, Manoharan P, Papernot N, Backes M, McDaniel P D (2017) On the (statistical) detection of adversarial examples. CoRR arXiv:1702.06280
Metzen J H, Genewein T, Fischer V, Bischoff B (2017) On detecting adversarial perturbations. In: 5th International Conference on Learning Representations, ICLR 2017, Conference Track Proceedings. OpenReview.net, Toulon
Wang S, Gong Y (2021) Adversarial example detection based on saliency map features. Appl Intell:1–14
Eniser H F, Christakis M, Wüstholz V (2020) RAID: randomized adversarial-input detection for neural networks. CoRR arXiv:https://arxiv.org/abs/2002.02776
Meng D, Chen H (2017) Magnet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 135–147
Potra F A, Wright S J (2000) Interior-point methods. J Comput Appl Math 124(1-2):281–302
Bendale A, Boult T E (2016) Towards open set deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1563–1572
Ruder S (2017) An overview of multi-task learning in deep neural networks. CoRR arXiv:1706.05098
Vandenhende S, Georgoulis S, Proesmans M, Dai D, Gool L V (2020) Revisiting multi-task learning in the deep learning era. CoRR arXiv:https://arxiv.org/abs/2004.13379
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7482–7491
Chen Z, Badrinarayanan V, Lee C-Y, Rabinovich A (2018) Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In: Dy J G, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Proceedings of Machine Learning Research, vol 80. PMLR, Stockholmsmässan, pp 793–802
Guo M, Haque A, Huang D-A, Yeung S, Fei-Fei L (2018) Dynamic task prioritization for multitask learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 270–287
Sener O, Koltun V (2018) Multi-task learning as multi-objective optimization. In: Advances in Neural Information Processing Systems, pp 527–538
Zhang L, Tan Z, Song J, Chen J, Bao C, Ma K (2019) Scan: A scalable neural networks framework towards compact and efficient models. In: Advances in Neural Information Processing Systems, pp 4027–4036
Zhang L, Yu M, Chen T, Shi Z, Bao C, Ma K (2020) Auxiliary training: Towards accurate and robust models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 372–381
Zhang L, Song J, Gao A, Chen J, Bao C, Ma K (2019) Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3713–3722
Biggio B, Corona I, Maiorca D, Nelson B, Šrndić N, Laskov P, Giacinto G, Roli F (2013) Evasion attacks against machine learning at test time. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 387–402
Andriushchenko M, Croce F, Flammarion N, Hein M (2020) Square attack: a query-efficient black-box adversarial attack via random search. In: European Conference on Computer Vision. Springer, pp 484–501
Chen J, Jordan M I, Wainwright M J (2020) Hopskipjumpattack: A query-efficient decision-based attack. In: 2020 ieee symposium on security and privacy (sp). IEEE, pp 1277–1294
Storn R, Price K V (1997) Differential evolution - A simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359
Pang T, Du C, Dong Y, Zhu J (2018) Towards robust detection of adversarial examples. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, pp 4584–4594
Aldahdooh A, Hamidouche W, Fezza S A, Déforges O (2022) Adversarial example detection for dnn models: A review and experimental comparison. Artif Intell Rev
Acknowledgements
The project is funded by both Région Bretagne (Brittany region), France, and direction générale de l’armement (DGA).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The source code is available in https://aldahdooh.github.io/SFAD/.
Rights and permissions
About this article
Cite this article
Aldahdooh, A., Hamidouche, W. & Déforges, O. Revisiting model’s uncertainty and confidences for adversarial example detection. Appl Intell 53, 509–531 (2023). https://doi.org/10.1007/s10489-022-03373-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03373-y