Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3531536.3532966acmconferencesArticle/Chapter ViewAbstractPublication Pagesih-n-mmsecConference Proceedingsconference-collections
research-article

Hiding Needles in a Haystack: Towards Constructing Neural Networks that Evade Verification

Published: 23 June 2022 Publication History

Abstract

Machine learning models are vulnerable to adversarial attacks, where a small, invisible, malicious perturbation of the input changes the predicted label. A large area of research is concerned with verification techniques that attempt to decide whether a given model has adversarial inputs close to a given benign input. Here, we show that current approaches to verification have a key vulnerability: we construct a model that is not robust but passes current verifiers. The idea is to insert artificial adversarial perturbations by adding a backdoor to a robust neural network model. In our construction, the adversarial input subspace that triggers the backdoor has a very small volume, and outside this subspace the gradient of the model is identical to that of the clean model. In other words, we seek to create a "needle in a haystack" search problem. For practical purposes, we also require that the adversarial samples be robust to JPEG compression. Large "needle in the haystack" problems are practically impossible to solve with any search algorithm. Formal verifiers can handle this in principle, but they do not scale up to real-world networks at the moment, and achieving this is a challenge because the verification problem is NP-complete. Our construction is based on training a hiding and a revealing network using deep steganography. Using the revealing network, we create a separate backdoor network and integrate it into the target network. We train our deep steganography networks over the CIFAR-10 dataset. We then evaluate our construction using state-of-the-art adversarial attacks and backdoor detectors over the CIFAR-10 and the ImageNet datasets. We made the code and models publicly available at https://github.com/szegedai/hiding-needles-in-a-haystack.

References

[1]
Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. 2019. Square Attack: a query-efficient black-box adversarial attack via random search. CoRR abs/1912.00049 (2019). arXiv:1912.00049 http://arxiv.org/abs/1912.00049
[2]
Shumeet Baluja. 2017. Hiding Images in Plain Sight: Deep Steganography. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/838e8afb1ca34354ac209f53d90c3a43-Paper.pdf
[3]
Rudy Bunel, Jingyue Lu, Ilker Turkaslan, Philip H. S. Torr, Pushmeet Kohli, and M. Pawan Kumar. 2020. Branch and Bound for Piecewise Linear Neural Network Verification. Journal of Machine Learning Research 21, 42 (2020), 1--39. http://jmlr.org/papers/v21/19--468.html
[4]
Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. 2020. RobustBench: a standardized adversarial robustness benchmark. CoRR abs/2010.09670 (2020). arXiv:2010.09670 https://arxiv.org/abs/2010.09670
[5]
Francesco Croce and Matthias Hein. 2019. Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack. CoRR abs/1907.02044 (2019). arXiv:1907.02044 http://arxiv.org/abs/1907.02044
[6]
Francesco Croce and Matthias Hein. 2020. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13--18 July 2020, Virtual Event. PMLR, 2206--2216.
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 248--255.
[8]
Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, and Timothy A Mann. 2021. Improving Robustness using Generated Data. Advances in Neural Information Processing Systems 34 (2021).
[9]
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 7 (2019), 47230--47244. https://doi.org/10.1109/ACCESS.2019.2909068
[10]
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR abs/1503.02531 (2015). arXiv:1503.02531 http://arxiv.org/abs/1503.02531
[11]
Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Computer Aided Verification, Rupak Majumdar and Viktor Kuncak (Eds.). Springer International Publishing, Cham, 97--117. https://doi.org/10.1007/978--3--319--63387--9_5
[12]
Panagiota Kiourti, Kacper Wardega, Susmit Jha, and Wenchao Li. 2020. TrojDRL: Evaluation of Backdoor Attacks on Deep Reinforcement Learning. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1--6. https://doi.org/10.1109/DAC18072.2020.9218663
[13]
Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report.
[14]
Shaofeng Li, Minhui Xue, Benjamin Zi Hao Zhao, Haojin Zhu, and Xinpeng Zhang. 2021. Invisible Backdoor Attacks on Deep Neural Networks Via Steganography and Regularization. IEEE Trans. Dependable Secur. Comput. 18, 5 (2021), 2088--2105. https://doi.org/10.1109/TDSC.2020.3021407
[15]
Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu. 2021. Invisible Backdoor Attack With Sample-Specific Triggers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 16463--16472. http://arxiv.org/abs/2012.03816
[16]
David J. Miller, Zhen Xiang, and George Kesidis. 2020. Adversarial Learning Targeting Deep Neural Network Classification: A Comprehensive Review of Defenses Against Attacks. Proc. IEEE 108, 3 (March 2020), 402--433. https://doi.org/10.1109/JPROC.2020.2970615
[17]
Rahul Rade and Seyed-Mohsen Moosavi-Dezfooli. 2021. Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off. In ICML 2021 Workshop on Adversarial Machine Learning. https://openreview.net/forum?id=BuD2LmNaU3a
[18]
Sylvestre-Alvise Rebuffi, Sven Gowal, Dan A. Calian, Florian Stimberg, Olivia Wiles, and Timothy A. Mann. 2021. Fixing Data Augmentation to Improve Adversarial Robustness. CoRR abs/2103.01946 (2021). arXiv:2103.01946 https://arxiv.org/abs/2103.01946
[19]
Hadi Salman, Andrew Ilyas, Logan Engstrom, Ashish Kapoor, and Aleksander Madry. 2020. Do Adversarially Robust ImageNet Models Transfer Better?. In ArXiv preprint arXiv:2007.08489.
[20]
Masoumeh Shafieinejad, Nils Lukas, Jiaqi Wang, Xinda Li, and Florian Kerschbaum. 2021. On the Robustness of Backdoor-based Watermarking in Deep Neural Networks. In IH&MMSec '21: ACM Workshop on Information Hiding and Multimedia Security, Virtual Event, Belgium, June, 22--25, 2021, Dirk Borghys, Patrick Bas, Luisa Verdoliva, Tomás Pevný, Bin Li, and Jennifer Newman (Eds.). ACM, 177--188. https://doi.org/10.1145/3437880.3460401
[21]
Richard Shin and Dawn Song. 2017. JPEG-resistant adversarial images. In NIPS 2017 Workshop on Machine Learning and Computer Security, Vol. 1.
[22]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations (ICLR). http://arxiv.org/abs/1312.6199
[23]
Ruixiang Tang, Mengnan Du, Ninghao Liu, Fan Yang, and Xia Hu. 2020. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 218--228.
[24]
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. 2019. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In 2019 IEEE Symposium on Security and Privacy (SP). 707--723. https://doi.org/10.1109/SP.2019.00031
[25]
Cheng-Hsin Weng, Yan-Ting Lee, and Shan-Hung Brandon Wu. 2020. On the trade-off between adversarial and backdoor robustness. Advances in Neural Information Processing Systems 33 (2020).
[26]
Chaoning Zhang, Philipp Benz, Adil Karjauv, and In So Kweon. 2021. Universal Adversarial Perturbations Through the Lens of Deep Steganography: Towards a Fourier Perspective. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2--9, 2021. AAAI Press, 3296--3304. https://ojs.aaai.org/index.php/AAAI/article/view/16441
[27]
Chaoning Zhang, Adil Karjauv, Philipp Benz, and In So Kweon. 2021. Towards Robust Deep Hiding Under Non-Differentiable Distortions for Practical Blind Watermarking. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 5158--5166. https://doi.org/10.1145/3474085.3475628
[28]
Chaoning Zhang, Chenguo Lin, Philipp Benz, Kejiang Chen, Weiming Zhang, and In So Kweon. 2021. A Brief Survey on Deep Learning Based Data Hiding, Steganography and Watermarking. CoRR abs/2103.01607 (2021). https://arxiv.org/abs/2103.01607
[29]
Dániel Zombori, Balázs Bánhelyi, Tibor Csendes, István Megyeri, and Márk Jelasity. 2021. Fooling a Complete Neural Network Verifier. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=4IwieFS44l

Cited By

View all
  • (2023)Stealthy Frequency-Domain Backdoor Attacks: Fourier Decomposition and Fundamental Frequency InjectionIEEE Signal Processing Letters10.1109/LSP.2023.333012630(1677-1681)Online publication date: 2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IH&MMSec '22: Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security
June 2022
177 pages
ISBN:9781450393553
DOI:10.1145/3531536
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Trojan attack
  2. adversarial robustness
  3. backdoor attack
  4. neural networks

Qualifiers

  • Research-article

Funding Sources

Conference

IH&MMSec '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 128 of 318 submissions, 40%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Stealthy Frequency-Domain Backdoor Attacks: Fourier Decomposition and Fundamental Frequency InjectionIEEE Signal Processing Letters10.1109/LSP.2023.333012630(1677-1681)Online publication date: 2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media