research-article

MagNet: A Two-Pronged Defense against Adversarial Examples

Authors:

Hao ChenAuthors Info & Claims

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

Pages 135 - 147

https://doi.org/10.1145/3133956.3134057

Published: 30 October 2017 Publication History

Abstract

Deep learning has shown impressive performance on hard perceptual problems. However, researchers found deep learning systems to be vulnerable to small, specially crafted perturbations that are imperceptible to humans. Such perturbations cause deep learning systems to mis-classify adversarial examples, with potentially disastrous consequences where safety or security is crucial. Prior defenses against adversarial examples either targeted specific attacks or were shown to be ineffective.

We propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet neither modifies the protected classifier nor requires knowledge of the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. The detector networks learn to differentiate between normal and adversarial examples by approximating the manifold of normal examples. Since they assume no specific process for generating adversarial examples, they generalize well. The reformer network moves adversarial examples towards the manifold of normal examples, which is effective for correctly classifying adversarial examples with small perturbation. We discuss the intrinsic difficulties in defending against whitebox attack and propose a mechanism to defend against graybox attack. Inspired by the use of randomness in cryptography, we use diversity to strengthen MagNet. We show empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.

Supplemental Material

MP4 File

Download
3115.18 MB

References

[1]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604 .07316, 2016.

[2]

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.

[3]

Shreyansh Daftry, J Andrew Bagnell, and Martial Hebert. Learning transferable policies for monocular reactive mav control. arXiv preprint arXiv:1608.00627, 2016.

[4]

Chelsea Finn and Sergey Levine. Deep visual foresight for planning robot motion. arXiv preprint arXiv:1610.00696, 2016.

[5]

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.

[6]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.

Digital Library

[7]

Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.

[8]

Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435, 2016.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770--778, 2016.

[10]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503 .02531, 2015.

[11]

Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82--97, 2012.

[12]

Wookhyun Jung, Sangwon Kim, and Sangyong Choi. Poster: deep learning for zero-day flash malware detection. In 36th IEEE Symposium on Security and Privacy, 2015.

Digital Library

[13]

Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, and Sergey Levine. Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182, 2017.

[14]

Jernej Kos, Ian Fischer, and Dawn Song. Adversarial examples for generative models. arXiv preprint arXiv:1702.06832, 2017.

[15]

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images, 2009.

[16]

Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. Ask me anything: dynamic memory networks for natural language processing. In International Conference on Machine Learning, pages 1378--1387, 2016.

Digital Library

[17]

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.

[18]

Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of handwritten digits, 1998.

[19]

Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations (ICLR), 2017.

[20]

Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations (ICLR), April 24--26, 2017.

[21]

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401, 2016.

[22]

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015.

[23]

H. Narayanan and S. Mitter. Sample complexity of testing the manifold hypothesis. In NIPS, 2010.

[24]

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroSP), 2016.

[25]

N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against 12 deep neural networks. In IEEE Symposium on Security and Privacy, 2016.

[26]

Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel. Cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610 .00768, 2016.

[27]

Nicolas Papernot, Patrick D. McDaniel, Ananthram Swami, and Richard E. Harang. Crafting adversarial input sequences for recurrent neural networks. CoRR, abs/1604.08275, 2016.

[28]

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506--519, 2017.

Digital Library

[29]

Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. Malware classification with recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pages 1916-- 1920. IEEE, 2015.

[30]

Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: increasing local stability of neural nets through robust optimization. arXiv preprint arXiv :1511.05432, 2015.

[31]

Dinggang Shen, Guorong Wu, and Heung-Il Suk. Deep learning in medical image analysis. Annual Review of Biomedical Engineering, (0), 2017.

[32]

Justin Sirignano, Apaar Sadhwani, and Kay Giesecke. Deep learning for mortgage risk, 2016.

[33]

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806, 2014.

[34]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.

[35]

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and PierreAntoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096-- 1103, 2008.

Digital Library

[36]

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371--3408, 2010.

Digital Library

[37]

Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples for semantic segmentation and object detection. arXiv preprint arXiv:1703 .08603, 2017.

Cited By

Ye XZhang QCui SYing ZSun JDu X(2024)Mitigating Adversarial Attacks in Object Detection through Conditional Diffusion ModelsMathematics10.3390/math1219309312:19(3093)Online publication date: 2-Oct-2024
https://doi.org/10.3390/math12193093
Zhang FLi KRen Z(2024)Improving Adversarial Robustness of Ensemble Classifiers by Diversified Feature Selection and Stochastic AggregationMathematics10.3390/math1206083412:6(834)Online publication date: 12-Mar-2024
https://doi.org/10.3390/math12060834
Khazane HRidouani MSalahdine FKaabouch N(2024)A Holistic Review of Machine Learning Adversarial Attacks in IoT NetworksFuture Internet10.3390/fi1601003216:1(32)Online publication date: 19-Jan-2024
https://doi.org/10.3390/fi16010032
Show More Cited By

Index Terms

MagNet: A Two-Pronged Defense against Adversarial Examples
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Software and application security
    1. Domain-specific security and privacy architectures

Recommendations

POSTER: Zero-Day Evasion Attack Analysis on Race between Attack and Defense
ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security

Deep neural networks (DNNs) exhibit excellent performance in machine learning tasks such as image recognition, pattern recognition, speech recognition, and intrusion detection. However, the usage of adversarial examples, which are intentionally ...
Analysis of neural network detectors for network attacks

While network attacks play a critical role in many advanced persistent threat (APT) campaigns, an arms race exists between the network defenders and the adversary: to make APT campaigns stealthy, the adversary is strongly motivated to evade the detection ...
Fuzzing-based hard-label black-box attacks against machine learning models
Abstract
Machine learning models are vulnerable to adversarial examples. We study the most realistic hard-label black-box attacks in this paper. The main limitation of the existing attacks is that they need a large number of model queries, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

October 2017

2682 pages

ISBN:9781450349468

DOI:10.1145/3133956

General Chair:
Bhavani Thuraisingham
The University of Texas at Dallas, USA
,
Program Chairs:
David Evans
University of Virginia
,
Tal Malkin
Columbia University
,
Dongyan Xu
Purdue University

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '17

Sponsor:

SIGSAC

CCS '17: 2017 ACM SIGSAC Conference on Computer and Communications Security

October 30 - November 3, 2017

Texas, Dallas, USA

Acceptance Rates

CCS '17 Paper Acceptance Rate 151 of 836 submissions, 18%;

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '24

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

638
Total Citations
View Citations
3,046
Total Downloads

Downloads (Last 12 months)279
Downloads (Last 6 weeks)26

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ye XZhang QCui SYing ZSun JDu X(2024)Mitigating Adversarial Attacks in Object Detection through Conditional Diffusion ModelsMathematics10.3390/math1219309312:19(3093)Online publication date: 2-Oct-2024
https://doi.org/10.3390/math12193093
Zhang FLi KRen Z(2024)Improving Adversarial Robustness of Ensemble Classifiers by Diversified Feature Selection and Stochastic AggregationMathematics10.3390/math1206083412:6(834)Online publication date: 12-Mar-2024
https://doi.org/10.3390/math12060834
Khazane HRidouani MSalahdine FKaabouch N(2024)A Holistic Review of Machine Learning Adversarial Attacks in IoT NetworksFuture Internet10.3390/fi1601003216:1(32)Online publication date: 19-Jan-2024
https://doi.org/10.3390/fi16010032
Xie GHou GPei QHuang H(2024)Lightweight Privacy Protection via Adversarial SampleElectronics10.3390/electronics1307123013:7(1230)Online publication date: 26-Mar-2024
https://doi.org/10.3390/electronics13071230
Wang MLiu Z(2024)Lightweight Robust Image Classifier Using Non-Overlapping Image Compression FiltersApplied Sciences10.3390/app1419863614:19(8636)Online publication date: 25-Sep-2024
https://doi.org/10.3390/app14198636
Wang MLiu Z(2024)Defense against Adversarial Attacks in Image Recognition Based on Multilayer FiltersApplied Sciences10.3390/app1418811914:18(8119)Online publication date: 10-Sep-2024
https://doi.org/10.3390/app14188119
Vázquez-Hernández MMorales-Rosales LAlgredo-Badillo IFernández-Gregorio SRodríguez-Rangel HCórdoba-Tlaxcalteco M(2024)A Survey of Adversarial Attacks: An Open Issue for Deep Learning Sentiment Analysis ModelsApplied Sciences10.3390/app1411461414:11(4614)Online publication date: 27-May-2024
https://doi.org/10.3390/app14114614
Lee WKim Y(2024)Enhancing CT Segmentation Security against Adversarial Attack: Most Activated Filter ApproachApplied Sciences10.3390/app1405213014:5(2130)Online publication date: 4-Mar-2024
https://doi.org/10.3390/app14052130
Gungor ORosing TAksanli B(2024)ROLDEF: RObust Layered DEFense for Intrusion Detection Against Adversarial Attacks2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546886(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546886
Rangel GCuevas-Tello JNunez-Varela JPuente CSilva-Trujillo A(2024)A Survey on Convolutional Neural Networks and Their Performance Limitations in Image Recognition TasksJournal of Sensors10.1155/2024/27973202024:1Online publication date: 12-Jul-2024
https://doi.org/10.1155/2024/2797320
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents