Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3133956.3134057acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

MagNet: A Two-Pronged Defense against Adversarial Examples

Published: 30 October 2017 Publication History

Abstract

Deep learning has shown impressive performance on hard perceptual problems. However, researchers found deep learning systems to be vulnerable to small, specially crafted perturbations that are imperceptible to humans. Such perturbations cause deep learning systems to mis-classify adversarial examples, with potentially disastrous consequences where safety or security is crucial. Prior defenses against adversarial examples either targeted specific attacks or were shown to be ineffective.
We propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet neither modifies the protected classifier nor requires knowledge of the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. The detector networks learn to differentiate between normal and adversarial examples by approximating the manifold of normal examples. Since they assume no specific process for generating adversarial examples, they generalize well. The reformer network moves adversarial examples towards the manifold of normal examples, which is effective for correctly classifying adversarial examples with small perturbation. We discuss the intrinsic difficulties in defending against whitebox attack and propose a mechanism to defend against graybox attack. Inspired by the use of randomness in cryptography, we use diversity to strengthen MagNet. We show empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.

Supplemental Material

MP4 File

References

[1]
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604 .07316, 2016.
[2]
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
[3]
Shreyansh Daftry, J Andrew Bagnell, and Martial Hebert. Learning transferable policies for monocular reactive mav control. arXiv preprint arXiv:1608.00627, 2016.
[4]
Chelsea Finn and Sergey Levine. Deep visual foresight for planning robot motion. arXiv preprint arXiv:1610.00696, 2016.
[5]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
[6]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
[7]
Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.
[8]
Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435, 2016.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770--778, 2016.
[10]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503 .02531, 2015.
[11]
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82--97, 2012.
[12]
Wookhyun Jung, Sangwon Kim, and Sangyong Choi. Poster: deep learning for zero-day flash malware detection. In 36th IEEE Symposium on Security and Privacy, 2015.
[13]
Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, and Sergey Levine. Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182, 2017.
[14]
Jernej Kos, Ian Fischer, and Dawn Song. Adversarial examples for generative models. arXiv preprint arXiv:1702.06832, 2017.
[15]
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images, 2009.
[16]
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. Ask me anything: dynamic memory networks for natural language processing. In International Conference on Machine Learning, pages 1378--1387, 2016.
[17]
Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.
[18]
Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of handwritten digits, 1998.
[19]
Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations (ICLR), 2017.
[20]
Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations (ICLR), April 24--26, 2017.
[21]
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401, 2016.
[22]
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015.
[23]
H. Narayanan and S. Mitter. Sample complexity of testing the manifold hypothesis. In NIPS, 2010.
[24]
N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroSP), 2016.
[25]
N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against 12 deep neural networks. In IEEE Symposium on Security and Privacy, 2016.
[26]
Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel. Cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610 .00768, 2016.
[27]
Nicolas Papernot, Patrick D. McDaniel, Ananthram Swami, and Richard E. Harang. Crafting adversarial input sequences for recurrent neural networks. CoRR, abs/1604.08275, 2016.
[28]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506--519, 2017.
[29]
Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. Malware classification with recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pages 1916-- 1920. IEEE, 2015.
[30]
Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: increasing local stability of neural nets through robust optimization. arXiv preprint arXiv :1511.05432, 2015.
[31]
Dinggang Shen, Guorong Wu, and Heung-Il Suk. Deep learning in medical image analysis. Annual Review of Biomedical Engineering, (0), 2017.
[32]
Justin Sirignano, Apaar Sadhwani, and Kay Giesecke. Deep learning for mortgage risk, 2016.
[33]
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
[34]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
[35]
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and PierreAntoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096-- 1103, 2008.
[36]
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371--3408, 2010.
[37]
Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples for semantic segmentation and object detection. arXiv preprint arXiv:1703 .08603, 2017.

Cited By

View all
  • (2024)Mitigating Adversarial Attacks in Object Detection through Conditional Diffusion ModelsMathematics10.3390/math1219309312:19(3093)Online publication date: 2-Oct-2024
  • (2024)Improving Adversarial Robustness of Ensemble Classifiers by Diversified Feature Selection and Stochastic AggregationMathematics10.3390/math1206083412:6(834)Online publication date: 12-Mar-2024
  • (2024)A Holistic Review of Machine Learning Adversarial Attacks in IoT NetworksFuture Internet10.3390/fi1601003216:1(32)Online publication date: 19-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
October 2017
2682 pages
ISBN:9781450349468
DOI:10.1145/3133956
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adversarial example
  2. autoencoder
  3. neural network

Qualifiers

  • Research-article

Conference

CCS '17
Sponsor:

Acceptance Rates

CCS '17 Paper Acceptance Rate 151 of 836 submissions, 18%;
Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '24
ACM SIGSAC Conference on Computer and Communications Security
October 14 - 18, 2024
Salt Lake City , UT , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)279
  • Downloads (Last 6 weeks)26
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Mitigating Adversarial Attacks in Object Detection through Conditional Diffusion ModelsMathematics10.3390/math1219309312:19(3093)Online publication date: 2-Oct-2024
  • (2024)Improving Adversarial Robustness of Ensemble Classifiers by Diversified Feature Selection and Stochastic AggregationMathematics10.3390/math1206083412:6(834)Online publication date: 12-Mar-2024
  • (2024)A Holistic Review of Machine Learning Adversarial Attacks in IoT NetworksFuture Internet10.3390/fi1601003216:1(32)Online publication date: 19-Jan-2024
  • (2024)Lightweight Privacy Protection via Adversarial SampleElectronics10.3390/electronics1307123013:7(1230)Online publication date: 26-Mar-2024
  • (2024)Lightweight Robust Image Classifier Using Non-Overlapping Image Compression FiltersApplied Sciences10.3390/app1419863614:19(8636)Online publication date: 25-Sep-2024
  • (2024)Defense against Adversarial Attacks in Image Recognition Based on Multilayer FiltersApplied Sciences10.3390/app1418811914:18(8119)Online publication date: 10-Sep-2024
  • (2024)A Survey of Adversarial Attacks: An Open Issue for Deep Learning Sentiment Analysis ModelsApplied Sciences10.3390/app1411461414:11(4614)Online publication date: 27-May-2024
  • (2024)Enhancing CT Segmentation Security against Adversarial Attack: Most Activated Filter ApproachApplied Sciences10.3390/app1405213014:5(2130)Online publication date: 4-Mar-2024
  • (2024)ROLDEF: RObust Layered DEFense for Intrusion Detection Against Adversarial Attacks2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546886(1-6)Online publication date: 25-Mar-2024
  • (2024)A Survey on Convolutional Neural Networks and Their Performance Limitations in Image Recognition TasksJournal of Sensors10.1155/2024/27973202024:1Online publication date: 12-Jul-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media