Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Advanced defensive distillation with ensemble voting and noisy logits

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper we provide an approach for deep neural networks that protects against adversarial examples in image classification-type problems. blackUnlike adversarial training, our approach is independent to the obtained adversarial examples through min-max optimization. The approach relies on the defensive distillation mechanism. This defence mechanism, while very successful at the time, was defeated in less than a year due to a major intrinsic vulnerability: the availability of the neural network’s logit layer to the attacker. We overcome this vulnerability and enhance defensive distillation by two mechanisms: 1) a mechanism to hide the logit layer (noisy logit) which increases robustness at the expense of accuracy, and, 2) a mechanism that improves accuracy but does not always increase robustness (ensemble network). We show that by combining the two mechanisms and incorporating a voting method, we can provide protection against adversarial examples while retaining accuracy. We formulate potential attacks on our approach with different threat models. The experimental results demonstrate the effectiveness of our approach. We also provide a robustness guarantee along with an interpretation for the guarantee.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Additional experiments plus the source code for the experiments are available at: https://github.com/liangy42/nn_robust_ensemble.

References

  1. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks. arXiv:1312.6199

  2. Goodfellow I, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: International Conference on Learning Representations. arXiv:1412.6572

  3. Kurakin A, Goodfellow I, Bengio S (2016) Adversarial examples in the physical world. arXiv:1607.02533

  4. Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP). IEEE, pp 582–597

  5. Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE, pp 39–57

  6. Bastani O, Ioannou Y, Lampropoulos L, Vytiniotis D, Nori A, Criminisi A (2016) Measuring neural net robustness with constraints. In: Advances in neural information processing systems, pp 2613–2621

  7. Weng T-W, Zhang H, Chen P-Y, Yi J, Su D, Gao Y, Hsieh C-J, Daniel L (2018) Evaluating the robustness of neural networks: An extreme value theory approach. In: 6th International Conference on Learning Representations, ICLR 2018, Conference Track Proceedings. OpenReview.net, Vancouver

  8. Hendrycks D, Dietterich T G (2019) Benchmarking neural network robustness to common corruptions and perturbations. In: 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net, New Orleans

  9. Lécuyer M, Atlidakis V, Geambasu R, Hsu D, Jana S (2019) Certified robustness to adversarial examples with differential privacy. In: 2019 IEEE Symposium on Security and Privacy, SP 2019. https://doi.org/10.1109/SP.2019.00044. IEEE, San Francisco, pp 656–672

  10. Cohen J M, Rosenfeld E, Kolter J Z (2019) Certified adversarial robustness via randomized smoothing. In: Chaudhuri K, Salakhutdinov R (eds) International conference on machine learning, ICML, Proceedings of Machine Learning Research, vol 97. PMLR, pp 1310–1320

  11. Kurakin A, Goodfellow I J, Bengio S (2017) Adversarial machine learning at scale. In: 5th International Conference on Learning Representations, ICLR 2017, Conference Track Proceedings. https://openreview.net/forum?id=BJm4T4Kgx. OpenReview.net, Toulon

  12. Wang Y, Zou D, Yi J, Bailey J, Ma X, Gu Q (2020) Improving adversarial robustness requires revisiting misclassified examples. In: 8th International Conference on Learning Representations, ICLR 2020. https://openreview.net/forum?id=rklOg6EFwS. OpenReview.net, Addis Ababa

  13. Kannan H, Kurakin A, Goodfellow I (2018) Adversarial logit pairing. arXiv:1803.06373

  14. Tramèr F, Kurakin A, Papernot N, Goodfellow I J, Boneh D, McDaniel P D (2018) Ensemble adversarial training: Attacks and defenses. In: 6th International Conference on Learning Representations, ICLR 2018, Conference Track Proceedings. https://openreview.net/forum?id=rkZvSe-RZ. OpenReview.net, Vancouver

  15. Pang T, Yang X, Dong Y, Xu T, Zhu J, Su H (2020) Boosting adversarial training with hypersphere embedding. In: Larochelle H, Ranzato M, Hadsell R, Balcan M-F, Lin H-T (eds) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, virtual. https://proceedings.neurips.cc/paper/2020/hash/5898d8095428ee310bf7fa3da1864ff7-Abstract.html

  16. Bai T, Luo J, Zhao J, Wen B, Wang Q (2021) Recent advances in adversarial training for adversarial robustness. In: Zhou Z-H (ed) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021. https://doi.org/10.24963/ijcai.2021/591. ijcai.org, Virtual Event / Montreal, pp 4312–4321

  17. You Z, Ye J, Li K, Xu Z, Wang P (2019) Adversarial noise layer: Regularize neural network by adding noise. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE, pp 909–913

  18. He Z, Rakin AS, Fan D (2019) Parametric noise injection: Trainable randomness to improve deep neural network robustness against adversarial attack. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 588–597

  19. Pang T, Xu K, Du C, Chen N, Zhu J (2019) Improving adversarial robustness via promoting ensemble diversity. In: International Conference on Machine Learning. PMLR, pp 4970–4979

  20. Strauss T, Hanselmann M, Junginger A, Ulmer H (2017) Ensemble methods as a defense to adversarial perturbations against deep neural networks. arXiv:1709.03423

  21. Tramèr F, Papernot N, Goodfellow IJ, Boneh D, McDaniel PD (2017) The space of transferable adversarial examples. CoRR, arXiv:1704.03453

  22. Carlini N, Athalye A, Papernot N, Brendel W, Rauber J, Tsipras D, Goodfellow I, Madry A, Kurakin A (2019) On evaluating adversarial robustness. arXiv:1902.06705

  23. Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, pp 372–387

  24. Jacobsen J-H, Behrmannn J, Carlini N, Tramer F, Papernot N (2019) Exploiting excessive invariance caused by norm-bounded adversarial robustness. arXiv:1903.10484

  25. Yang Y, Zhang G, Katabi D, Xu Z (2019) Me-net: Towards effective adversarial robustness with matrix estimation. In: International Conference on Machine Learning, pp 7025–70 34

  26. Papernot N, Abadi M, Erlingsson U, Goodfellow I, Talwar K (2016) Semi-supervised knowledge transfer for deep learning from private training data. arXiv:1610.05755

  27. Cubuk ED, Zoph B, Schoenholz SS, Le QV (2018) Intriguing properties of adversarial examples. https://openreview.net/forum?id=rk6H0ZbRb

  28. Hung K, Fithian W, et al. (2019) Rank verification for exponential families. Ann Stat 47 (2):758–782

    Article  MathSciNet  MATH  Google Scholar 

  29. Teng J, Lee G-H, Yuan Y (2020) l1 adversarial robustness certificates: a randomized smoothing approach. https://openreview.net/forum?id=H1lQIgrFDS

  30. LeCun Y, Cortes C, Burges CJC (1998) The mnist database of handwritten digits. Available online at: http://yann.lecun.com/exdb/mnist/. Last accessed: Mar. 2019

  31. Krizhevsky A (2009) Learning multiple layers of features from tiny images. Available online at: https://www.cs.toronto.edu/~kriz/cifar.html. Last accessed: Mar. 2019

  32. Papernot N, McDaniel P, Goodfellow I (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv:1605.07277

  33. Xu H, Ma Y, Liu H-C, Deb D, Liu H, Tang J-L, Jain AK (2020) Adversarial attacks and defenses in images, graphs and text: A review. Int J Autom Comput 17(2):151–178

    Article  Google Scholar 

  34. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network

  35. Guo C, Rana M, Cissé M, van der Maaten L (2017) Countering adversarial images using input transformations. CoRR, arXiv:1711.00117

  36. Song Y, Kim T, Nowozin S, Ermon S, Kushman N (2017) Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. CoRR, arXiv:1710.10766

  37. Metzen J H, Genewein T, Fischer V, Bischoff B (2017) On detecting adversarial perturbations. In: 5th International Conference on Learning Representations, ICLR 2017, Conference Track Proceedings. https://openreview.net/forum?id=SJzCSf9xg. OpenReview.net, Toulon

  38. Hendrycks D, Gimpel K (2017) Early methods for detecting adversarial images. In: 5th International Conference on Learning Representations, ICLR 2017, Workshop Track Proceedings. https://openreview.net/forum?id=B1dexpDug. OpenReview.net, Toulon

  39. Katz G, Barrett CW, Dill DL, Julian K, Kochenderfer MJ (2017) Reluplex: An efficient SMT solver for verifying deep neural networks. In: Majumdar R, Kuncak V (eds) Computer aided verification - 29th international conference, CAV 2017, Proceedings, part I, Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-319-63387-9_5, vol 10426. Springer, Heidelberg, pp 97–117

  40. Gehr T, Mirman M, Drachsler-Cohen D, Tsankov P, Chaudhuri S, Vechev M (2018) Ai2: Safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE Symposium on Security and Privacy (SP), pp 3–18

  41. Hein M, Andriushchenko M (2017) Formal guarantees on the robustness of a classifier against adversarial manipulation. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017. https://proceedings.neurips.cc/paper/2017/hash/e077e1a544eec4f0307cf5c3c721d944-Abstract.html, Long Beach, pp 2266–2276

  42. Liu X, Cheng M, Zhang H, Hsieh C-J (2018) Towards robust neural networks via random self-ensemble

  43. Buckman J, Roy A, Raffel C, Goodfellow IJ (2018) Thermometer encoding: One hot way to resist adversarial examples. In: 6th International Conference on Learning Representations, ICLR 2018, Conference Track Proceedings. https://openreview.net/forum?id=S18Su--CW. OpenReview.net, Vancouver

  44. Samangouei P, Kabkab M, Chellappa R (2018) Defense-gan: Protecting classifiers against adversarial attacks using generative models. CoRR, arXiv:1805.06605

  45. Athalye A, Carlini N, Wagner DA (2018) Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: Dy JG, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Proceedings of Machine Learning Research. http://proceedings.mlr.press/v80/athalye18a.html, vol 80. PMLR, Stockholmsmässan, pp 274–283

  46. Wong E, Kolter JZ (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Dy JG, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Proceedings of Machine Learning Research. http://proceedings.mlr.press/v80/wong18a.html, vol 80. PMLR, Stockholmsmässan, pp 5283–5292

  47. Qin C, Martens J, Gowal S, Krishnan D, Dvijotham K, Fawzi A, De S, Stanforth R, Kohli P (2019) Adversarial robustness through local linearization. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2019file/0defd533d51ed0a10c5c9dbf93ee78a5-Paper.pdf, vol 32. Curran Associates, Inc.

  48. Zhang H, Yu Y, Jiao J, Xing E, El Ghaoui L, Jordan M (2019) Theoretically principled trade-off between robustness and accuracy. In: International Conference on Machine Learning. PMLR, pp 7472–7482

  49. Liu X, Li Y, Wu C, Hsieh C-J (2019) Adv-BNN: Improved adversarial defense through robust bayesian neural network. In: 7th International Conference on Learning Representations, ICLR 2019. https://openreview.net/forum?id=rk4Qso0cKm. OpenReview.net, New Orleans

  50. Kariyappa S, Qureshi MK (2019) Improving adversarial robustness of ensembles with diversity training. arXiv:1901.09981

  51. Yang H, Zhang J, Dong H, Inkawhich N, Gardner A, Touchet A, Wilkes W, Berry H, Li H (2020) DVERGE: diversifying vulnerabilities for enhanced robust generation of ensembles. In: Larochelle H, Ranzato M, Hadsell R, Balcan M-F, Lin H-T (eds) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual. https://proceedings.neurips.cc/paper/2020/hash/3ad7c2ebb96fcba7cda0cf54a2e802f5-Abstract.html

  52. Zhang H, Chen H, Xiao C, Gowal S, Stanforth R, Li B, Boning D S, Hsieh C-J (2020) Towards stable and efficient training of verifiably robust neural networks. In: 8th International Conference on Learning Representations, ICLR 2020. https://openreview.net/forum?id=Skxuk1rFwB. OpenReview.net, Addis Ababa

  53. Croce F, Hein M (2020) Provable robustness against all adversarial $l_p$-perturbations for $p$geq 1$. In: 8th International Conference on Learning Representations, ICLR 2020. https://openreview.net/forum?id=rklk_ySYPB. OpenReview.net, Addis Ababa

  54. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach FR, Blei DM (eds) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, JMLR Workshop and Conference Proceedings. http://proceedings.mlr.press/v37/ioffe15.html, vol 37. JMLR.org, Lille, pp 448–456

  55. Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations

  56. Guo M, Yang Y, Xu R, Liu Z, Lin D (2020) When nas meets robustness: In search of robust architectures against adversarial attacks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 628–637

Download references

Acknowledgements

We thank anonymous reviewers for their suggestions and feedback. Supports from Vector Institute and Natural Sciences & Engineering Research Council of Canada (NSERC) are acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Samavi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Work done while a student at McMaster University, Canada

Appendices

Appendix:

1.1 A Superimposition Attack using \(L_{\infty }\) Norm

In this section we present the results from applying the Superimposition (3 ×) attack with the \(L_{\infty }\) norm to our model. In Table 7 we observe similarly that Noisy Logit reduces the success rate of attack on the individual networks and Ensemble Voting allows accuracy to be further improved for both datasets.

Table 7 Distributions for Superimposition (3 ×) of adversarial attack with the \(L_{\infty }\) norm

1.2 B A second look at noisy logit

The results for the Random Single Network Attack in Section 4.3 provided some insights into how Noisy Logit works to reduce transferability across different neural networks. However, in the case of the MNIST dataset, we observed that although Noisy Logit changes the distribution of perturbations in the adversarial examples, it appears there is no benefit to using Noisy Logit as Ensemble Voting alone provides better accuracy rates. In this section, we look at the output of each individual network in isolation when it’s being targeted, to see whether applying Noisy Logit improves robustness in a single network. In Fig. 9 we craft adversarial examples corresponding to a single sample and single target, on each of the 50 networks in the ensemble.

Fig. 9
figure 9

Adversarial examples without and with Noisy Logit applied

In Fig. 9a and b, the sample input is the digit 7 and target is the digit 0. Observe that the distribution of perturbations is changed if we apply Noisy Logit, where in Fig. 9b we see more occurrences in the tails (i.e. very small or very large perturbations). In Fig. 9a each targeted network mis-classifies its corresponding adversarial example as 0 (corresponding to 100% success rate of Carlini-Wagner on a single network); whereas in Fig. 9b, only 8 of the networks misclassify as 0, and 29 of the networks still correctly classify as 7 as shown in Table 8a. Therefore, for an individual MNIST network with Noisy Logit applied, the success rate of a targeted Carlni-Wagner attack is low. Thus, a single MNIST network is more robust to adversarial examples if Noisy Logit is applied, however the accuracy rate suffers since extra noise is added.

Table 8 Classifications of the 50 networks corresponding to Fig. 9b and d, respectively

In Fig. 9c and d, the sample input is the object ship and the target is the object airplane. There is no noticeable difference in the distribution of perturbations whether or not Noisy Logit is applied. In Fig. 9c, again each targeted network misclassifies its corresponding adversarial example as the target (airplane); whereas in Fig. 9d, only 1 of the networks misclassifies as airplane, and 38 of the networks still correctly classify as ship as shown in Table 8. Therefore, the success rate of a targeted Carlni-Wagner attack on a single CIFAR10 network is very low, i.e., robustness of a single CIFAR10 network is increased in the presence of Noisy Logit. Note also the accuracy rate only slightly suffers from extra noise added.

C Sample results for superimposition attacks

In Figs. 10a–13b samples of resulting images with adversarial perturbations are shown. The leftmost column displays the original images, the middle columns display the adversarial examples with the two or three smallest perturbations, the last column shows the superimposition of the two or three adversarial examples. The rows correspond to different targets being applied in the adversarial examples. Classifications of these images are provided in Tables 9 and 10.

Fig. 10
figure 10

Adversarial Images using Superimposition (2 × and 3 ×) MNIST Sample 1

Fig. 11
figure 11

Adversarial Images using Superimposition (2 × and 3 ×) MNIST Sample 2

Fig. 12
figure 12

Adversarial Images using Superimposition (3 ×) CIFAR Sample 1

Fig. 13
figure 13

Adversarial Images using Superimposition (2 × and 3 ×) CIFAR Sample 2

Table 9 Classifications of MNIST
Table 10 Classifications of CIFAR

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, Y., Samavi, R. Advanced defensive distillation with ensemble voting and noisy logits. Appl Intell 53, 3069–3094 (2023). https://doi.org/10.1007/s10489-022-03495-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03495-3

Keywords