research-article

Using Reinforcement Learning to Escape Automatic Filter-based Adversarial Example Defense

Authors:

Shaojiang Deng,

Gang ZhouAuthors Info & Claims

ACM Transactions on Sensor Networks, Volume 20, Issue 5

Article No.: 113, Pages 1 - 26

https://doi.org/10.1145/3688847

Published: 20 September 2024 Publication History

Abstract

Deep neural networks can be easily fooled by the adversarial example, which is a specially crafted example with subtle and intentional perturbations. A plethora of papers have proposed to use filters to effectively defend against adversarial example attacks. However, we demonstrate that the automatic filter-based defenses may not be reliable. In this article, we present URL2AED, Using a Reinforcement Learning scheme TO escape the automatic filter-based Adversarial Example Defenses. Specifically, URL2AED uses a specially crafted policy gradient reinforcement learning (RL) algorithm to generate adversarial examples (AEs) that can escape automatic filter-based AE defenses. In particular, we properly design reward functions in policy-gradient RL for targeted attacks and non-targeted attacks, respectively. Furthermore, we customize training algorithms to reduce the possible action space in policy-gradient RL to accelerate URL2AED training while still ensuring that URL2AED generates successful AEs. To demonstrate the performance of the proposed URL2AED, we conduct extensive experiments on three public datasets in terms of different perturbation degrees of parameter, different filter parameters, transferability, and time consumption. The experimental results show that URL2AED achieves high attack success rates for automatic filter-based defenses and good cross-model transferability.

References

[1]

M. Abdar, M. A. Fahami, L. Rundo, P. Radeva, A. F. Frangi, U. R. Acharya, A. Khosravi, H.-K. Lam, A. Jung, and S. Nahavandi. 2023. Hercules: Deep hierarchical attentive multilevel fusion model with uncertainty quantification for medical image classification. IEEE Trans. Ind. Inform. 19, 1 (2023), 274–285.

[2]

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In Proc. CVPR, 1701–1708.

[3]

Y. Ding, F. Tan, J. Geng, Z. Qin, M. Cao, K. -K. R. Choo, and Z. Qin. 2023. Interpreting universal adversarial example attacks on image classification models. IEEE Trans. Dependable Secure Comput. 20, 4 (2023), 3392–3407.

Digital Library

[4]

T. Yang, S. Zhu, C. Chen, S. Yan, M. Zhang, and A. Willis. 2020. Mutualnet: Adaptive convnet via mutual learning from network width and resolution. In Proc. ECCV, 299–315.

[5]

X. Li, A. You, Z. Zhu, H. Zhao, M. Yang, K. Yang, S. Tan, and Y. Tong. 2020. Semantic flow for fast and accurate scene parsing. In Proc. ECCV, 775–793.

[6]

G. Sun, W. Wang, J. Dai, and L. Van Gool. 2020. Mining cross-image semantics for weakly supervised semantic segmentation. In Proc. ECCV, 347–365.

[7]

T. Wang, Z. Zhang, and K. -L. Tsui. 2023. A deep generative approach for rail foreign object detections via semisupervised learning. IEEE Trans. Ind. Inform. 19, 1 (2023), 459–468.

[8]

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. 2020. End-to-end object detection with transformers. In Proc. ECCV, 213–229.

[9]

J. Ye, Y. Mao, J. Song, X. Wang, C. Jin, and M. Song. 2022. Safe distillation box. In Proc. AAAI, 3117–3124.

[10]

J. Ye, Y. Fu, J. Song, X. Yang, S. Liu, X. Jin, and M. Song. Learning with recoverable forgetting. In Proc. ECCV. 87–103.

[11]

X. Yang, J. Ye, and X. Wang. 2022. Factorizing knowledge in neural networks. In Proc. ECCV, 73–91.

[12]

X. Yang, D. Zhou, S. Liu, J. Ye, and X. Wang. 2022. Deep model reassembly. In Proc. NeurIPS, 1–14.

[13]

A. Turner, D. Tsipras, and A. Madry. 2018. Clean-label backdoor attack. https://people.csail.mit.edu/madry/lab/cleanlabel.pdf

[14]

A. Saha, A. Subramanya, and H. Pirsiavash. 2020. Hidden trigger backdoor attacks. In Proc. AAAI, 11957–11965.

[15]

K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song. 2018. Robust physical-world attacks on deep learning models. In Proc. CVPR, 1625–1634.

[16]

B. Liang, H. Li, M. Su, X. Li, W. Shi, and X. Wang. 2021. Detecting adversarial image examples in deep neural networks with adaptive noise reduction. IEEE Trans. Dependable Secure Comput. 18, 1 (2021), 72–85.

Digital Library

[17]

W. Xu, D. Evans, and Y. Qi. 2018. Feature squeezing: Detecting adversarial examples in deep neural networks. In NDSS, 1–15.

[18]

X. Li and F. Li. 2017. Adversarial examples detection in deep networks with convolutional filter statistics. In Proc. ICCV, 5764–5772.

[19]

J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. 2017. On detecting adversarial perturbations. In Proc. ICLR, 1–12.

[20]

W. He, J. Wei, X. Chen, N. Carlini, and D. Song. 2017. Adversarial example defense: Ensembles of weak defenses are not strong. In USENIX WOOT, 1–15.

[21]

M. Osadchy, J. Hernandez-Castro, S. Gibson, O. Dunkelman, and D. Perez-Cabo. 2019. No bot expects the DeepCAPTCHA! Introducing immutable adversarial examples, with applications to CAPTCHA generation. IEEE Trans. Inf. Foren. Sec. 12, 11 (2019), 2640–2653.

Digital Library

[22]

E. Quiring, D. Klein, D. Arp, M. Johns, and K. Rieck. 2020. Adversarial preprocessing: Understanding and preventing image-scaling attacks in machine learning. In USENIX Security, 1363–1380.

[23]

Y. Song, T. Kim, S., Ermon, S. Nowozin, and N. Kushman. 2018. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In Proc. ICLR, 1–20.

[24]

B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proc. S&P, 707–723.

[25]

Q. Xiao, Y. Chen, C. Shen, Y. Chen, and K. Li. 2019. Seeing is not believing: Camouflage attacks on image scaling algorithms. In USENIX Security, 443–460.

[26]

K. Dan, Y. Li, X. Lei, H. Qin, S. Deng, and G. Zhou. 2022. Escaping filter-based adversarial example defense: A reinforcement learning approach. In Proc. MASS. 1–9.

[27]

Y. Dong, T. Pang, H. Su, and J. Zhu. 2019. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proc. CVPR, 4312–4321.

[28]

J. Lin, C. Song, K. He, L. Wang, and J. E. Hopcroft. 2020. Nesterov accelerated gradient and scale invariance for adversarial attacks. In Proc. ICLR, 1–12.

[29]

I. J. Goodfellow, J. Shlens, and C. Szegedy. 2015. Explaining and harnessing adversarial examples. In Proc. ICLR, 1–11.

[30]

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proc. ICLR, 1–23.

[31]

Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. 2018. Boosting adversarial attacks with momentum. In Proc. CVPR, 9185–9193.

[32]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. 2014. Intriguing properties of neural networks. In Proc. ICLR, 1–10.

[33]

N. Carlini and D. Wagner. 2017. Towards evaluating the robustness of neural networks. In Proc. S&P, 39–57.

[34]

A. Kurakin, I. Goodfellow, and S. Bengio. 2017. Adversarial machine learning at scale. In Proc. ICLR. 1–17.

[35]

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. 2016. The limitations of deep learning in adversarial settings. In EuroS&P, 372–387.

[36]

S. M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. 2016. Deepfool: A simple and accurate method to fool deep neural networks. In Proc. CVPR, 2574–2582.

[37]

P. Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C. J. Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proc. AISec, 15–26.

[38]

J. Su, D. V. Vargas, and K. Sakurai. 2019. One pixel attack for fooling deep neural networks. IEEE Trans. Evolut. Comput. 23, 5 (2019), 828–841.

[39]

B. Wang, M. Zhao, W. Wang, F. Wei, Z. Qin, and K. Ren. 2021. Are you confident that you have successfully generated adversarial examples? IEEE Trans. Circuits Syst. Video Technol. 31, 6 (2021), 2089–2099.

[40]

C. Xie, Y. Wu, L. Maaten, A. L. Yuille, and K. He. 2019. Feature denoising for improving adversarial robustness. In Proc. CVPR, 501–509.

[41]

S. Sankaranarayanan, A. Jain, R. Chellappa, and S. N. Lim. 2018. Regularizing deep networks using efficient layerwise adversarial training. In Proc. AAAI, 4008–4015.

[42]

B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. 2017. Enhanced deep residual networks for single image super-resolution. In Proc. CVPR Workshops. 1132–1140.

[43]

N. Akhtar, J. Liu, and A. Mian. 2018. Defense against universal adversarial perturbations. In Proc. CVPR, 3389–3398.

[44]

E. Raff, J. Sylvester, S. Forsyth, and M. McLean. 2019. Barrage of random transforms for adversarially robust defense. In Proc. CVPR, 6528–6537.

[45]

Z. Wang, M. Song, S. Zheng, Z. Zhang, Y. Song, and Q. Wang. 2021. Invisible adversarial attack against deep neural networks: An adaptive penalization approach. IEEE Trans. Dependable Secure Comput. 18, 3 (2021), 1474–1488.

[46]

A. Mustafa, S. H. Khan, M. Hayat, J. Shen, and L. Shao. 2019. Image super-resolution as a defense against adversarial attacks. IEEE Trans. Image Process. 29 (2019), 1711–1724.

Digital Library

[47]

S. G. Chang, B. Yu, and M. Vetterli. 2000. Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process. 9, 9 (2000), 1532–1546.

Digital Library

[48]

J. C. Aicedo and S. Lazebnik. 2015. Active object localization with deep reinforcement learning. In Proc. ICCV, 2488–2496.

[49]

J. Oh, X. Guo, H. Lee, R. L. Lewis, and S. Singh. 2015. Action-conditional video prediction using deep networks in Atari games. In Proc. NeurIPS, 2863–2871.

[50]

M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Proc. NeurIPS, 1–9.

[51]

S. Hansen. 2016. Using deep q-learning to control optimization hyperparameters. arXiv preprint arXiv:1602.04062.

[52]

C. Inn, S. Levine, and P. Abbeel. 2016. Guided cost learning: Deep inverse optimal control via policy optimization. In Proc. ICML, 49–58.

[53]

S. Levine, C. Finn, T. Darrell, and P. Abbeel. 2016. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1 (2016), 1334–1373.

Digital Library

[54]

C. J. C. H. Watkins. 1989. Learning from Delayed Rewards. PhD Thesis. King’s College.

[55]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with deep reinforcement learning. In Proc. NeurIPS. 1–9.

[56]

R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. In Proc. NeurIPS, 1057–1063.

[57]

D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. 2014. Deterministic policy gradient algorithms. In Proc. ICML, 387–395.

[58]

K. Mo, W. Tang, J. Li, and X. Yuan. 2023. Attacking deep reinforcement learning with decoupled adversarial policy. IEEE Trans. Dependable Secure Comput. 20, 1 (2023), 758–768.

[59]

Y. LeCun, C. Cortes, and C. J. C. Burges. 1998. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/

[60]

F. S. Samaria and A. C. Harter. 1994. Parameterisation of a stochastic model for human face identification. In Proc. ACV. 138–142.

[61]

A. Krizhevsky and G. Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report, University of Toronto.

[62]

N. Papernot, I. Goodfellow, R. Sheatsley, R. Feinman, and P. McDaniel. 2016. CleverHans v1.0.0: An adversarial machine learning library. Technical Report, Pennsylvania State University.

[63]

K. He, X. Zhang, and S. Ren. 2016. Deep residual learning for image recognition. In Proc. CVPR, 770–778.

[64]

Z. Liu, Q. Liu, T. Liu, N. Xu, X. Lin, Y. Wang, and W. Wen. 2019. Feature distillation: DNN-oriented jpeg compression against adversarial examples. In Proc. CVPR, 860–868.

[65]

C. Guo, M. Rana, M. Cisse, and L. V. D. Maaten. 2018. Countering adversarial images using input transformations. In Proc. ICLR, 1–12.

[66]

Y. Lecun and L. Bottou. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.

[67]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proc. NeurIPS, 1–9.

[68]

K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proc. ICLR. 1–14.

Cited By

Song PZhou YYang XLiu DHu ZWang DWang M(2024)Efficiently Gluing Pre-trained Language and Vision Models for Image CaptioningACM Transactions on Intelligent Systems and Technology10.1145/3682067Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1145/3682067
Wen HSong XChen XWei YNie LChua THui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657727

Index Terms

Using Reinforcement Learning to Escape Automatic Filter-based Adversarial Example Defense
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Database and storage security

Recommendations

XSS adversarial example attacks based on deep reinforcement learning
Abstract
Cross-site scripting (XSS) attack is one of the most serious security problems in web applications. Although deep neural network (DNN) has been used in XSS attack detection and achieved unprecedented success, it is vulnerable to ...
Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain

In recent years, machine learning algorithms, and more specifically deep learning algorithms, have been widely used in many fields, including cyber security. However, machine learning systems are vulnerable to adversarial attacks, and this limits the ...
Learning defense transformations for counterattacking adversarial examples
Abstract
Deep neural networks (DNNs) are vulnerable to adversarial examples with small perturbations. Adversarial defense thus has been an important means which improves the robustness of DNNs by defending against adversarial examples. Existing defense ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Sensor Networks

ACM Transactions on Sensor Networks Volume 20, Issue 5

September 2024

349 pages

EISSN:1550-4867

DOI:10.1145/3618084

Editor:
Wen Hu
University of New South Wales, Australia

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 20 September 2024

Online AM: 15 August 2024

Accepted: 11 August 2024

Revised: 15 June 2024

Received: 06 February 2024

Published in TOSN Volume 20, Issue 5

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Funds for Creative Research Groups of Chongqing Municipal Education Commission
National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
134
Total Downloads

Downloads (Last 12 months)134
Downloads (Last 6 weeks)9

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Song PZhou YYang XLiu DHu ZWang DWang M(2024)Efficiently Gluing Pre-trained Language and Vision Models for Image CaptioningACM Transactions on Intelligent Systems and Technology10.1145/3682067Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1145/3682067
Wen HSong XChen XWei YNie LChua THui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657727

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents