Improving Adversarial Transferability via Model Alignment

Ma, Avery; Farahmand, Amir-massoud; Pan, Yangchen; Torr, Philip; Gu, Jindong

doi:10.1007/978-3-031-73033-7_5

Avery Ma^13,14,
Amir-massoud Farahmand^13,14,
Yangchen Pan¹⁵,
Philip Torr¹⁵ &
…
Jindong Gu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15120))

Included in the following conference series:

European Conference on Computer Vision

191 Accesses

Abstract

Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model’s ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measures the divergence in the predictions between the source model and another, independently trained model, referred to as the witness model. To understand the effect of model alignment, we conduct a geometric analysis of the resulting changes in the loss landscape. Extensive experiments on the ImageNet dataset, using a variety of model architectures, demonstrate that perturbations generated from aligned source models exhibit significantly higher transferability than those from the original source model. Our source code is available at https://github.com/averyma/model-alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Whale Falls, All Thrive: Mitigating Attention Gap to Improve Adversarial Transferability

Improving Robustness by Enhancing Weak Subnets

Enforcing Linearity in DNN Succours Robustness and Adversarial Image Generation

References

Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. (1974)
Google Scholar
Benz, P., Zhang, C., Kweon, I.S.: Batch normalization increases adversarial vulnerability and decreases adversarial transferability: A non-robust feature perspective. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Byun, J., Cho, S., Kwon, M.J., Kim, H.S., Kim, C.: Improving the transferability of targeted adversarial examples through object-based diverse input. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Carratino, L., Cissé, M., Jenatton, R., Vert, J.P.: On mixup regularization. J. Mach. Learn. Res. (JMLR) (2022)
Google Scholar
Charles, Z., Rosenberg, H., Papailiopoulos, D.: A geometric perspective on the transferability of adversarial directions. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2019)
Google Scholar
Chen, Y., Xian, Y., Koepke, A., Shan, Y., Akata, Z.: Distilling audio-visual knowledge by compositional contrastive learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Cheng, H., et al.: Typography leads semantic diversifying: amplifying adversarial transferability across multimodal large language models. arXiv preprint arXiv:2405.20090 (2024)
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.: AutoAugment: learning augmentation strategies from data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Deng, Y., Zheng, X., Zhang, T., Chen, C., Lou, G., Kim, M.: An analysis of adversarial attacks and defenses on autonomous driving models. In: IEEE International Conference on Pervasive Computing and Communications (PerCom) (2020)
Google Scholar
Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., Li, J.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Dong, Y., Pang, T., Su, H., Zhu, J.: Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16$\,\times \,$16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Fawzi, A., Moosavi-Dezfooli, S.M., Frossard, P.: The robustness of deep networks: a geometrical perspective. In: IEEE Signal Processing Magazine (2017)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Gu, J., et al.: A survey on transferability of adversarial examples across deep neural networks. Trans. Mach. Learn. Res. (TMLR) (2024)
Google Scholar
Gubri, M., Cordy, M., Papadakis, M., Traon, Y.L., Sen, K.: LGV: boosting adversarial example transferability from large geometric vicinity. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
Google Scholar
Guo, Y., Li, Q., Chen, H.: Backpropagating linearly improves transferability of adversarial examples. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)
Google Scholar
Han, D., Jia, X., Bai, Y., Gu, J., Liu, Y., Cao, X.: OT-Attack: Enhancing adversarial transferability of vision-language models via optimal transport optimization. arXiv preprint arXiv:2312.04403 (2023)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (2016)
Google Scholar
He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Advances in Neural Information Processing Systems (NeurIPS) (2019)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML) (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NeurIPS) (2012)
Google Scholar
Li, Q., Guo, Y., Zuo, W., Chen, H.: Making substitute models more Bayesian can enhance transferability of adversarial examples. In: International Conference on Learning Representations (ICLR) (2022)
Google Scholar
Li, Y., Bai, S., Xie, C., Liao, Z., Shen, X., Yuille, A.: Regional homogeneity: towards learning transferable universal adversarial perturbations against defenses. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Google Scholar
Li, Z., Shi, C., Xie, Y., Liu, J., Yuan, B., Chen, Y.: Practical adversarial attacks against speaker recognition systems. In: Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications (2020)
Google Scholar
Lin, J., Song, C., He, K., Wang, L., Hopcroft, J.E.: Nesterov accelerated gradient and scale invariance for adversarial attacks. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and black-box attacks. In: International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Luo, H., Gu, J., Liu, F., Torr, P.: An image is worth 1000 lies: transferability of adversarial images across prompts on vision-language models. In: International Conference on Learning Representations (ICLR) (2024)
Google Scholar
Ma, A., Dvornik, N., Zhang, R., Pishdad, L., Derpanis, K.G., Fazly, A.: SAGE: saliency-guided mixup with optimal rearrangements. In: British Machine Vision Conference (BMVC) (2022)
Google Scholar
Ma, A., Pan, Y., Farahmand, A.M.: Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods. Trans. Mach. Learn. Res. (TMLR) (2023)
Google Scholar
Ma, W., Li, Y., Jia, X., Xu, W.: Transferable adversarial attack for both vision transformers and convolutional networks via momentum integrated gradients. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Ma, Y., Chen, Y., Akata, Z.: Distilling knowledge from self-supervised teacher by embedding graph alignment. In: British Machine Vision Conference (BMVC) (2022)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Mahmood, K., Mahmood, R., Van Dijk, M.: On the robustness of vision transformers to adversarial examples. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Müller, R., Kornblith, S., Hinton, G.E.: When does label smoothing help? In: Advances in Neural Information Processing Systems (NeurIPS) (2019)
Google Scholar
Naseer, M., Ranasinghe, K., Khan, S., Khan, F.S., Porikli, F.: On improving adversarial transferability of vision transformers. In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Qayyum, A., Qadir, J., Bilal, M., Al-Fuqaha, A.: Secure and robust machine learning for healthcare: a survey. In: IEEE Reviews in Biomedical Engineering (2020)
Google Scholar
Qian, Y., He, S., Zhao, C., Sha, J., Wang, W., Wang, B.: LEA2: a lightweight ensemble adversarial attack via non-overlapping vulnerable frequency regions. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Qin, Z., Fan, Y., Liu, Y., Shen, L., Zhang, Y., Wang, J., Wu, B.: Boosting the transferability of adversarial attacks with reverse adversarial perturbation. In: Advances in Neural Information Processing Systems (NeurIPS) (2022)
Google Scholar
Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Steiner, A., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J., Beyer, L.: How to train your ViT? Data, augmentation, and regularization in vision transformers. Trans. Mach. Learn. Res. (TMLR) (2021)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Verma, V., et al.: Manifold Mixup: better representations by interpolating hidden states. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Wang, H., Wu, X., Huang, Z., Xing, E.P.: High-frequency component helps explain the generalization of convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Wang, X., He, K.: Enhancing the transferability of adversarial attacks through variance tuning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Waseda, F., Nishikawa, S., Le, T.N., Nguyen, H.H., Echizen, I.: Closer look at the transferability of adversarial examples: how they fool different models differently. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2023)
Google Scholar
Wei, Z., Chen, J., Goldblum, M., Wu, Z., Goldstein, T., Jiang, Y.G.: Towards transferable adversarial attacks on vision transformers. In: AAAI Conference on Artificial Intelligence (AAAI) (2022)
Google Scholar
Wu, D., Wang, Y., Xia, S.T., Bailey, J., Ma, X.: Skip connections matter: On the transferability of adversarial examples generated with ResNets. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Wu, L., Zhu, Z.: Towards understanding and improving the transferability of adversarial examples in deep neural networks. In: Asian Conference on Machine Learning (2020)
Google Scholar
Wu, W., Su, Y., Lyu, M.R., King, I.: Improving the transferability of adversarial samples with adversarial transformations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Xiao, Z., et al.: Improving transferability of adversarial patches on face recognition with generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Xiaosen, W., Tong, K., He, K.: Rethinking the backward propagation for adversarial transferability. In: Advances in Neural Information Processing Systems (NeurIPS) (2024)
Google Scholar
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Xiong, Y., Lin, J., Zhang, M., Hopcroft, J.E., He, K.: Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Yin, D., Gontijo Lopes, R., Shlens, J., Cubuk, E.D., Gilmer, J.: A Fourier perspective on model robustness in computer vision. In: Advances in Neural Information Processing Systems (NeurIPS) (2019)
Google Scholar
Yu, W., Gu, J., Li, Z., Torr, P.: Reliable evaluation of adversarial transferability. arXiv preprint arXiv:2306.08565 (2023)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Zhang, C., Benz, P., Karjauv, A., Cho, J.W., Zhang, K., Kweon, I.S.: Investigating top-k white-box and transferable black-box attack. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Zhang, L., Deng, Z., Kawaguchi, K., Ghorbani, A., Zou, J.: How does mixup help with robustness and generalization? In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Zhang, Y., et al.: Why does little robustness help? A further step towards understanding adversarial transferability. In: Proceedings of the IEEE Symposium on Security and Privacy (SP) (2024)
Google Scholar
Zhao, P., Chen, P.Y., Das, P., Ramamurthy, K.N., Lin, X.: Bridging mode connectivity in loss landscapes and adversarial robustness. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Zhao, Z., Liu, Z., Larson, M.: On success and simplicity: a second look at transferable targeted attacks. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Google Scholar
Zhou, D., et al.: Understanding the robustness in vision transformers. In: International Conference on Machine Learning (ICML) (2022)
Google Scholar
Zhu, Y., Sun, J., Li, Z.: Rethinking adversarial transferability from a data distribution perspective. In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Zou, J., Pan, Z., Qiu, J., Liu, X., Rui, T., Li, W.: Improving the transferability of adversarial examples with resized-diverse-inputs, diversity-ensemble and region fitting. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Google Scholar

Download references

Acknowledgments

Avery Ma acknowledges the funding from the Natural Sciences and Engineering Research Council (NSERC) through the Canada Graduate Scholarships - Doctoral (CGS D) program. Amir-massoud Farahmand acknowledges the funding from the CIFAR AI Chairs program, as well as the support of the NSERC through the Discovery Grant program (2021-03701). Yangchen Pan, Philip Torr and Jindong Gu acknowledge the support from the UKRI Grant: Turing AI Fellowship EP/W002981/1, EPSRC/MURI Grant: EP/N019474/, and the Royal Academy of Engineering. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute. We would like to also thank the members of the Adaptive Agents Lab who provided feedback on a draft of this paper.

Author information

Authors and Affiliations

University of Toronto, Toronto, Canada
Avery Ma & Amir-massoud Farahmand
Vector Institute, Toronto, Canada
Avery Ma & Amir-massoud Farahmand
University of Oxford, Oxford, UK
Yangchen Pan, Philip Torr & Jindong Gu

Authors

Avery Ma
View author publications
You can also search for this author in PubMed Google Scholar
Amir-massoud Farahmand
View author publications
You can also search for this author in PubMed Google Scholar
Yangchen Pan
View author publications
You can also search for this author in PubMed Google Scholar
Philip Torr
View author publications
You can also search for this author in PubMed Google Scholar
Jindong Gu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Avery Ma .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 341 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, A., Farahmand, Am., Pan, Y., Torr, P., Gu, J. (2025). Improving Adversarial Transferability via Model Alignment. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15120. Springer, Cham. https://doi.org/10.1007/978-3-031-73033-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-73033-7_5
Published: 31 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73032-0
Online ISBN: 978-3-031-73033-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Adversarial Transferability via Model Alignment

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Whale Falls, All Thrive: Mitigating Attention Gap to Improve Adversarial Transferability

Improving Robustness by Enhancing Weak Subnets

Enforcing Linearity in DNN Succours Robustness and Adversarial Image Generation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 341 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Improving Adversarial Transferability via Model Alignment

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Whale Falls, All Thrive: Mitigating Attention Gap to Improve Adversarial Transferability

Improving Robustness by Enhancing Weak Subnets

Enforcing Linearity in DNN Succours Robustness and Adversarial Image Generation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 341 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation