Abstract
Deep learning models are vulnerable to adversarial examples. As one of the most threatening types for practical deep learning systems, physical adversarial examples have received extensive attention in recent years. However, due to the insufficient focus on intrinsic characteristics such as model-agnostic features, existing studies generate adversarial perturbations with unsatisfactory transferability on attacking different models. Motivated by the viewpoint that attention reflects the intrinsic characteristics of the recognition process, we propose the Transferable Attention Attack (TA\(_2\)) method to generate adversarial camouflages with strong transferable attacking ability by taking advantage of visual attention mechanism, i.e., triplet attention suppression. As for attacking, we generate transferable adversarial camouflages by distracting the model-shared similar attention patterns from the target to non-target regions, therefore promoting the transferable attacking ability. Furthermore, we enhance the attacking ability by converging the model attention of the non-ground-truth class, which exploits the lateral inhibition of visual models and activates the model perception for wrong classes. Besides, considering the visually suspicious appearance, we also introduce human attention to help improve their visual naturalness. We conduct extensive experiments in both the digital and physical worlds for classification tasks and comprehensively investigate the effectiveness of the discovered model attention mechanism, demonstrating that our method outperforms state-of-the-art methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability Statement
In this paper, we employ the 3D environment to generate training and testing datasets. All the experiments and ablations are based on them. The datasets generated and analyzed during the current study are available at https://drive.google.com/drive/folders/1vspvRxnZ3shOV4kM5ELcO9-xztapBThS. Beyond the data availability, our code will also be released freely after acceptance.
References
Athalye, A., Engstrom, L., Ilyas, A., & Kwok, K. (2017). Synthesizing robust adversarial examples. arXiv e-prints arXiv:1707.07397.
Blakemore, C., Carpenter, R. H., & Georgeson, M. A. (1970). Lateral inhibition between orientation detectors in the human visual system. Nature, 228(2), 37–39.
Brown, T. B., Mané, D., Roy, A., Abadi, M., & Gilmer, J. (2017). Adversarial patch. arXiv preprint arXiv:1712.09665.
Canny, J. (1986). A computational approach to edge detection. In PAMI, PAMI-8.
Chattopadhay, A., Sarkar, A., Howlader, P., & Balasubramanian, V.N. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In WACV (2018).
Connor, C. E., Egeth, H. E., & Yantis, S. (2004). Visual attention: Bottom-up versus top-down. Current Biology, 14(19), R850–R852.
Dong, Y., Liao, F., Pang, T., & Su, H. (2018). Boosting adversarial attacks with momentum. In CVPR.
Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., & Li, J. (2018). Boosting adversarial attacks with momentum. In CVPR.
Dong, Y., Pang, T., Su, H., & and Zhu, J. (2019). Evading defenses to transferable adversarial examples by translation-invariant attacks. In CVPR.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2021). An image is worth \(16\times 16\) words: Transformers for image recognition at scale. In ICLR 2021.
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In CoRL.
Duan, Y., Chen, J., Zhou, X., Zou, J., He, Z., Zhang, J., Zhang, W., & Pan, Z. (2022). Learning coated adversarial camouflages for object detectors. In L. De Raedt (Ed.), Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23–29 2022 (pp. 891–897). ijcai.org.
Duan, R., Ma, X., Wang, Y., Bailey, J., Qin, A. K., & Yang, Y. (2020). Adversarial camouflage: Hiding physical-world attacks with natural styles. In CVPR.
Elsayed, G., Shankar, S., Cheung, B., Papernot, N., Kurakin, A., Goodfellow, I., & Sohl-Dickstein, J. (2018). Adversarial examples that fool both computer vision and time-limited humans. In NeurIPS.
Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., Prakash, A., Kohno, T., & Song, D. (2018). Robust physical-world attacks on deep learning visual classification. In CVPR.
Feng, W., Wu, B., Zhang, T., Zhang, Y., & Zhang, Y. (2021). Meta-attack: Class-agnostic and model-agnostic physical adversarial attack. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7787–7796).
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
Hentrich, M. (2015). Methodology and coronary artery disease cure. SSRN 2645417.
Horé, A., & Ziou, D. (2010). Image quality metrics: PSNR vs. In ICPR SSIM.
Huang, L., Gao, C., Zhou, Y., Xie, C., Yuille, A. L., Zou, C., & Liu, N. (2020). Universal physical camouflage attacks on object detectors. In CVPR.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2016). Densely connected convolutional networks. https://doi.org/10.48550/arXiv.1608.06993
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial examples are not bugs, they are features. In NeurIPS.
Inkawhich, N., Wen, W., Li, H. H., & Chen, Y. (2019). Feature space perturbations yield more transferable adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7066–7074).
Jia, Y., Lu, Y., Velipasalar, S., Zhong, Z., & Wei, T. (2019). Enhancing cross-task transferability of adversarial examples with dispersion reduction. arXiv preprint arXiv:1905.03333.
Jia, W., Li, L., Li, Z., & Liu, S. (2021). Deep learning geometry compression artifacts removal for video-based point cloud compression. International Journal of Computer Vision, 129(11), 2947–2964.
Jia, S., Yin, B., Yao, T., Ding, S., Shen, C., Yang, X., & Ma, C. (2022). Adv-attribute: Inconspicuous and transferable adversarial attack on face recognition. Advances in Neural Information Processing Systems, 35, 34136–34147.
Jin, H., Liao, S., & Shao, L. (2021). Pixel-in-pixel net: Towards efficient facial landmark detection in the wild. International Journal of Computer Vision, 129(12), 3174–3194.
Kazemi, E., Kerdreux, T., & Wang, L. (2023). Minimally distorted structured adversarial attacks. International Journal of Computer Vision, 131(1), 160–176.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NeurIPS.
Kurakin, A., Goodfellow, I. J., & Bengio, S. (2018). Adversarial examples in the physical world. In Artificial intelligence safety and security (pp. 99–112). Chapman and Hall/CRC.
Kurakin, A., Goodfellow, I. J., & Bengio, S. (2017). Adversarial examples in the physical world. In ICLR workshop.
Li, J., Li, D., Savarese, S., & Hoi, S. (2023). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597.
Li, J., Li, D., Xiong, C., & Hoi, S. (2022). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning (pp. 12888–12900). PMLR.
Li, T., Liu, A., Liu, X., Xu, Y., Zhang, C., & Xie, X. (2021). Understanding adversarial robustness via critical attacking route. Information Sciences, 547, 568–578.
Li, H., Tao, R., Li, J., Qin, H., Ding, Y., Wang, S., & Liu, X. (2021). Multi-pretext attention network for few-shot learning with self-supervision. In 2021 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1–6). IEEE.
Li, B., Zhang, Y., Chen, L., Wang, J., Pu, F., Yang, J., Li, C. and Liu, Z. (2023). Otter: A multi-modal model with in-context instruction tuning. arXiv preprint arXiv:2305.03726.
Liu, A., Huang, T., Liu, X., Xu, Y., Ma, Y., Chen, X., Maybank, S. J., & Tao, D. (2020). Spatiotemporal attacks for embodied agents. In ECCV.
Liu, A., Liu, X., Fan, J., Ma, Y., Zhang, A., Xie, H., & Tao, D. Perceptual-sensitive GAN for generating adversarial patches. In AAAI.
Liu, A., Wang, J., Liu, X., Zhang, C., Cao, B., & Yu, H. (2020). Patch attack for automatic check-out. In ECCV.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
Mohamed, A., Dahl, G. E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 14–22.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV.
Smith, A. Ray. (1979). Tint fill. SIGGRAPH. Computer Graphics, 13(2), 276–283.
Su, Y., Lan, T., Li, H., Xu, J., Wang, Y., & Cai, D. (2023). Pandagpt: One model to instruction-follow them all. arXiv preprint arXiv:2305.16355.
Suryanto, N., Kim, Y., Kang, H., Larasati, H. T., Yun, Y., Le, T. T. H., Yang, H., Oh, S. Y., & Kim, H. (2022). Dta: Physical camouflage attacks using differentiable transformation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 15305–15314).
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In NeurIPS.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
Tao, R., Wei, Y., Li, H., Liu, A., Ding, Y., Qin, H., & Liu, X. (2021). Over-sampling de-occlusion attention network for prohibited items detection in noisy x-ray images. arXiv preprint arXiv:2103.00809.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2020). Training data-efficient image transformers & distillation through attention. arXiv:2012.12877.
Tricoche, L., Ferrand-Verdejo, J., Pélisson, D., & Meunier, M. (2020). Peer presence effects on eye movements and attentional performance. In Front Behav Neurosci.
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., & Madry, A. (2019). Robustness may be at odds with accuracy. In ICLR.
Wang, D., Jiang, T., Sun, J., Zhou, W., Gong, Z., Zhang, X., Yao, W., & Chen, X. Fca: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack. In Proceedings of the AAAI conference on artificial intelligence (pp. 2414–2422).
Wang, J., Liu, A., Yin, Z., Liu, S., Tang, S., & Liu, X. (2021). Dual attention suppression attack: Generate adversarial camouflage in physical world. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8565–8574).
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Wang, D., Jiang, T., Sun, J., Zhou, W., Gong, Z., Zhang, X., Yao, W., & Chen, X. (2022). Fca: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack. Proceedings of the AAAI Conference on Artificial Intelligence, 36(2), 2414–2422.
Wei, X. S., Cui, Q., Yang, L., Wang, P., & Liu, L. (2019). RPC: A large-scale retail product checkout dataset. arXiv preprint arXiv:1901.07249.
Wu, W., Su, Y., Chen, X., Zhao, S., King, I., Lyu, M. R., & Tai, Y. W. (2020). Boosting the transferability of adversarial samples via attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1161–1170).
Xiao, C., Yang, D., Li, B., Deng, J., & Liu, M. (2019). Meshadv: Adversarial meshes for visual recognition. In CVPR.
Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks, 2017.
Xie,Cihang, Zhang,Zhishuai, Zhou,Yuyin, Bai,Song, Wang,Jianyu, Ren,Zhou, Yuille,Alan L. (2019). Improving transferability of adversarial examples with input diversity. In CVPR.
Zatorre, R. J., Mondor, T. A., & Evans, A. C. (1999). Auditory attention to space and frequency activates similar cerebral systems. In Neuroimage.
Zhang, Y., Foroosh, H., David, P., & Gong, B. (2019). CAMOU: Learning physical vehicle camouflages to adversarially attack detectors in the wild. In ICLR.
Zhang, Y., Gong, Z., Zhang, Y., Li, Y., Bin, K., Qi, J., Xue, W., & Zhong, P. (2022). Transferable physical attack against object detection with separable attention. CoRR arXiv:2205.09592.
Zhang, C., Liu, A., Liu, X., Xu, Y., Yu, H., Ma, Y., & Li, T. (2020). Interpreting and improving adversarial robustness of deep neural networks with neuron sensitivity. IEEE Transactions on Image Processing, 30, 1291–1304.
Zhang, X., Qin, H., Ding, Y., Gong, R., Yan, Q., Tao, R., Li, Y., Yu, F., & Liu, X. (2021) Diversifying sample generation for data-free quantization. In IEEE CVPR.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. Learning deep features for discriminative localization. In CVPR.
Zisserman, A., & Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Acknowledgements
This work was supported by The National Key Research and Development Plan of China (2020AAA0103502), and the National Natural Science Foundation of China (62022009, 61872021).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Oliver Zendel.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Liu, X., Yin, Z. et al. Generate Transferable Adversarial Physical Camouflages via Triplet Attention Suppression. Int J Comput Vis 132, 5084–5100 (2024). https://doi.org/10.1007/s11263-024-02098-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-024-02098-4