Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Using Reinforcement Learning to Escape Automatic Filter-based Adversarial Example Defense

Published: 20 September 2024 Publication History

Abstract

Deep neural networks can be easily fooled by the adversarial example, which is a specially crafted example with subtle and intentional perturbations. A plethora of papers have proposed to use filters to effectively defend against adversarial example attacks. However, we demonstrate that the automatic filter-based defenses may not be reliable. In this article, we present URL2AED, Using a Reinforcement Learning scheme TO escape the automatic filter-based Adversarial Example Defenses. Specifically, URL2AED uses a specially crafted policy gradient reinforcement learning (RL) algorithm to generate adversarial examples (AEs) that can escape automatic filter-based AE defenses. In particular, we properly design reward functions in policy-gradient RL for targeted attacks and non-targeted attacks, respectively. Furthermore, we customize training algorithms to reduce the possible action space in policy-gradient RL to accelerate URL2AED training while still ensuring that URL2AED generates successful AEs. To demonstrate the performance of the proposed URL2AED, we conduct extensive experiments on three public datasets in terms of different perturbation degrees of parameter, different filter parameters, transferability, and time consumption. The experimental results show that URL2AED achieves high attack success rates for automatic filter-based defenses and good cross-model transferability.

References

[1]
M. Abdar, M. A. Fahami, L. Rundo, P. Radeva, A. F. Frangi, U. R. Acharya, A. Khosravi, H.-K. Lam, A. Jung, and S. Nahavandi. 2023. Hercules: Deep hierarchical attentive multilevel fusion model with uncertainty quantification for medical image classification. IEEE Trans. Ind. Inform. 19, 1 (2023), 274–285.
[2]
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In Proc. CVPR, 1701–1708.
[3]
Y. Ding, F. Tan, J. Geng, Z. Qin, M. Cao, K. -K. R. Choo, and Z. Qin. 2023. Interpreting universal adversarial example attacks on image classification models. IEEE Trans. Dependable Secure Comput. 20, 4 (2023), 3392–3407.
[4]
T. Yang, S. Zhu, C. Chen, S. Yan, M. Zhang, and A. Willis. 2020. Mutualnet: Adaptive convnet via mutual learning from network width and resolution. In Proc. ECCV, 299–315.
[5]
X. Li, A. You, Z. Zhu, H. Zhao, M. Yang, K. Yang, S. Tan, and Y. Tong. 2020. Semantic flow for fast and accurate scene parsing. In Proc. ECCV, 775–793.
[6]
G. Sun, W. Wang, J. Dai, and L. Van Gool. 2020. Mining cross-image semantics for weakly supervised semantic segmentation. In Proc. ECCV, 347–365.
[7]
T. Wang, Z. Zhang, and K. -L. Tsui. 2023. A deep generative approach for rail foreign object detections via semisupervised learning. IEEE Trans. Ind. Inform. 19, 1 (2023), 459–468.
[8]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. 2020. End-to-end object detection with transformers. In Proc. ECCV, 213–229.
[9]
J. Ye, Y. Mao, J. Song, X. Wang, C. Jin, and M. Song. 2022. Safe distillation box. In Proc. AAAI, 3117–3124.
[10]
J. Ye, Y. Fu, J. Song, X. Yang, S. Liu, X. Jin, and M. Song. Learning with recoverable forgetting. In Proc. ECCV. 87–103.
[11]
X. Yang, J. Ye, and X. Wang. 2022. Factorizing knowledge in neural networks. In Proc. ECCV, 73–91.
[12]
X. Yang, D. Zhou, S. Liu, J. Ye, and X. Wang. 2022. Deep model reassembly. In Proc. NeurIPS, 1–14.
[13]
A. Turner, D. Tsipras, and A. Madry. 2018. Clean-label backdoor attack. https://people.csail.mit.edu/madry/lab/cleanlabel.pdf
[14]
A. Saha, A. Subramanya, and H. Pirsiavash. 2020. Hidden trigger backdoor attacks. In Proc. AAAI, 11957–11965.
[15]
K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song. 2018. Robust physical-world attacks on deep learning models. In Proc. CVPR, 1625–1634.
[16]
B. Liang, H. Li, M. Su, X. Li, W. Shi, and X. Wang. 2021. Detecting adversarial image examples in deep neural networks with adaptive noise reduction. IEEE Trans. Dependable Secure Comput. 18, 1 (2021), 72–85.
[17]
W. Xu, D. Evans, and Y. Qi. 2018. Feature squeezing: Detecting adversarial examples in deep neural networks. In NDSS, 1–15.
[18]
X. Li and F. Li. 2017. Adversarial examples detection in deep networks with convolutional filter statistics. In Proc. ICCV, 5764–5772.
[19]
J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. 2017. On detecting adversarial perturbations. In Proc. ICLR, 1–12.
[20]
W. He, J. Wei, X. Chen, N. Carlini, and D. Song. 2017. Adversarial example defense: Ensembles of weak defenses are not strong. In USENIX WOOT, 1–15.
[21]
M. Osadchy, J. Hernandez-Castro, S. Gibson, O. Dunkelman, and D. Perez-Cabo. 2019. No bot expects the DeepCAPTCHA! Introducing immutable adversarial examples, with applications to CAPTCHA generation. IEEE Trans. Inf. Foren. Sec. 12, 11 (2019), 2640–2653.
[22]
E. Quiring, D. Klein, D. Arp, M. Johns, and K. Rieck. 2020. Adversarial preprocessing: Understanding and preventing image-scaling attacks in machine learning. In USENIX Security, 1363–1380.
[23]
Y. Song, T. Kim, S., Ermon, S. Nowozin, and N. Kushman. 2018. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In Proc. ICLR, 1–20.
[24]
B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proc. S&P, 707–723.
[25]
Q. Xiao, Y. Chen, C. Shen, Y. Chen, and K. Li. 2019. Seeing is not believing: Camouflage attacks on image scaling algorithms. In USENIX Security, 443–460.
[26]
K. Dan, Y. Li, X. Lei, H. Qin, S. Deng, and G. Zhou. 2022. Escaping filter-based adversarial example defense: A reinforcement learning approach. In Proc. MASS. 1–9.
[27]
Y. Dong, T. Pang, H. Su, and J. Zhu. 2019. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proc. CVPR, 4312–4321.
[28]
J. Lin, C. Song, K. He, L. Wang, and J. E. Hopcroft. 2020. Nesterov accelerated gradient and scale invariance for adversarial attacks. In Proc. ICLR, 1–12.
[29]
I. J. Goodfellow, J. Shlens, and C. Szegedy. 2015. Explaining and harnessing adversarial examples. In Proc. ICLR, 1–11.
[30]
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proc. ICLR, 1–23.
[31]
Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. 2018. Boosting adversarial attacks with momentum. In Proc. CVPR, 9185–9193.
[32]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. 2014. Intriguing properties of neural networks. In Proc. ICLR, 1–10.
[33]
N. Carlini and D. Wagner. 2017. Towards evaluating the robustness of neural networks. In Proc. S&P, 39–57.
[34]
A. Kurakin, I. Goodfellow, and S. Bengio. 2017. Adversarial machine learning at scale. In Proc. ICLR. 1–17.
[35]
N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. 2016. The limitations of deep learning in adversarial settings. In EuroS&P, 372–387.
[36]
S. M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. 2016. Deepfool: A simple and accurate method to fool deep neural networks. In Proc. CVPR, 2574–2582.
[37]
P. Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C. J. Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proc. AISec, 15–26.
[38]
J. Su, D. V. Vargas, and K. Sakurai. 2019. One pixel attack for fooling deep neural networks. IEEE Trans. Evolut. Comput. 23, 5 (2019), 828–841.
[39]
B. Wang, M. Zhao, W. Wang, F. Wei, Z. Qin, and K. Ren. 2021. Are you confident that you have successfully generated adversarial examples? IEEE Trans. Circuits Syst. Video Technol. 31, 6 (2021), 2089–2099.
[40]
C. Xie, Y. Wu, L. Maaten, A. L. Yuille, and K. He. 2019. Feature denoising for improving adversarial robustness. In Proc. CVPR, 501–509.
[41]
S. Sankaranarayanan, A. Jain, R. Chellappa, and S. N. Lim. 2018. Regularizing deep networks using efficient layerwise adversarial training. In Proc. AAAI, 4008–4015.
[42]
B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. 2017. Enhanced deep residual networks for single image super-resolution. In Proc. CVPR Workshops. 1132–1140.
[43]
N. Akhtar, J. Liu, and A. Mian. 2018. Defense against universal adversarial perturbations. In Proc. CVPR, 3389–3398.
[44]
E. Raff, J. Sylvester, S. Forsyth, and M. McLean. 2019. Barrage of random transforms for adversarially robust defense. In Proc. CVPR, 6528–6537.
[45]
Z. Wang, M. Song, S. Zheng, Z. Zhang, Y. Song, and Q. Wang. 2021. Invisible adversarial attack against deep neural networks: An adaptive penalization approach. IEEE Trans. Dependable Secure Comput. 18, 3 (2021), 1474–1488.
[46]
A. Mustafa, S. H. Khan, M. Hayat, J. Shen, and L. Shao. 2019. Image super-resolution as a defense against adversarial attacks. IEEE Trans. Image Process. 29 (2019), 1711–1724.
[47]
S. G. Chang, B. Yu, and M. Vetterli. 2000. Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process. 9, 9 (2000), 1532–1546.
[48]
J. C. Aicedo and S. Lazebnik. 2015. Active object localization with deep reinforcement learning. In Proc. ICCV, 2488–2496.
[49]
J. Oh, X. Guo, H. Lee, R. L. Lewis, and S. Singh. 2015. Action-conditional video prediction using deep networks in Atari games. In Proc. NeurIPS, 2863–2871.
[50]
M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Proc. NeurIPS, 1–9.
[51]
S. Hansen. 2016. Using deep q-learning to control optimization hyperparameters. arXiv preprint arXiv:1602.04062.
[52]
C. Inn, S. Levine, and P. Abbeel. 2016. Guided cost learning: Deep inverse optimal control via policy optimization. In Proc. ICML, 49–58.
[53]
S. Levine, C. Finn, T. Darrell, and P. Abbeel. 2016. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1 (2016), 1334–1373.
[54]
C. J. C. H. Watkins. 1989. Learning from Delayed Rewards. PhD Thesis. King’s College.
[55]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with deep reinforcement learning. In Proc. NeurIPS. 1–9.
[56]
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. In Proc. NeurIPS, 1057–1063.
[57]
D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. 2014. Deterministic policy gradient algorithms. In Proc. ICML, 387–395.
[58]
K. Mo, W. Tang, J. Li, and X. Yuan. 2023. Attacking deep reinforcement learning with decoupled adversarial policy. IEEE Trans. Dependable Secure Comput. 20, 1 (2023), 758–768.
[59]
Y. LeCun, C. Cortes, and C. J. C. Burges. 1998. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
[60]
F. S. Samaria and A. C. Harter. 1994. Parameterisation of a stochastic model for human face identification. In Proc. ACV. 138–142.
[61]
A. Krizhevsky and G. Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report, University of Toronto.
[62]
N. Papernot, I. Goodfellow, R. Sheatsley, R. Feinman, and P. McDaniel. 2016. CleverHans v1.0.0: An adversarial machine learning library. Technical Report, Pennsylvania State University.
[63]
K. He, X. Zhang, and S. Ren. 2016. Deep residual learning for image recognition. In Proc. CVPR, 770–778.
[64]
Z. Liu, Q. Liu, T. Liu, N. Xu, X. Lin, Y. Wang, and W. Wen. 2019. Feature distillation: DNN-oriented jpeg compression against adversarial examples. In Proc. CVPR, 860–868.
[65]
C. Guo, M. Rana, M. Cisse, and L. V. D. Maaten. 2018. Countering adversarial images using input transformations. In Proc. ICLR, 1–12.
[66]
Y. Lecun and L. Bottou. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[67]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proc. NeurIPS, 1–9.
[68]
K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proc. ICLR. 1–14.

Cited By

View all
  • (2024)Efficiently Gluing Pre-trained Language and Vision Models for Image CaptioningACM Transactions on Intelligent Systems and Technology10.1145/3682067Online publication date: 29-Jul-2024
  • (2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024

Index Terms

  1. Using Reinforcement Learning to Escape Automatic Filter-based Adversarial Example Defense

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Sensor Networks
      ACM Transactions on Sensor Networks  Volume 20, Issue 5
      September 2024
      349 pages
      EISSN:1550-4867
      DOI:10.1145/3618084
      • Editor:
      • Wen Hu
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 20 September 2024
      Online AM: 15 August 2024
      Accepted: 11 August 2024
      Revised: 15 June 2024
      Received: 06 February 2024
      Published in TOSN Volume 20, Issue 5

      Check for updates

      Author Tags

      1. Adversarial examples
      2. image classification
      3. reinforcement learning
      4. filter

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • Funds for Creative Research Groups of Chongqing Municipal Education Commission
      • National Science Foundation

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)134
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Efficiently Gluing Pre-trained Language and Vision Models for Image CaptioningACM Transactions on Intelligent Systems and Technology10.1145/3682067Online publication date: 29-Jul-2024
      • (2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media