Abstract
Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model’s ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measures the divergence in the predictions between the source model and another, independently trained model, referred to as the witness model. To understand the effect of model alignment, we conduct a geometric analysis of the resulting changes in the loss landscape. Extensive experiments on the ImageNet dataset, using a variety of model architectures, demonstrate that perturbations generated from aligned source models exhibit significantly higher transferability than those from the original source model. Our source code is available at https://github.com/averyma/model-alignment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. (1974)
Benz, P., Zhang, C., Kweon, I.S.: Batch normalization increases adversarial vulnerability and decreases adversarial transferability: A non-robust feature perspective. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Byun, J., Cho, S., Kwon, M.J., Kim, H.S., Kim, C.: Improving the transferability of targeted adversarial examples through object-based diverse input. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Carratino, L., Cissé, M., Jenatton, R., Vert, J.P.: On mixup regularization. J. Mach. Learn. Res. (JMLR) (2022)
Charles, Z., Rosenberg, H., Papailiopoulos, D.: A geometric perspective on the transferability of adversarial directions. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2019)
Chen, Y., Xian, Y., Koepke, A., Shan, Y., Akata, Z.: Distilling audio-visual knowledge by compositional contrastive learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Cheng, H., et al.: Typography leads semantic diversifying: amplifying adversarial transferability across multimodal large language models. arXiv preprint arXiv:2405.20090 (2024)
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.: AutoAugment: learning augmentation strategies from data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Deng, Y., Zheng, X., Zhang, T., Chen, C., Lou, G., Kim, M.: An analysis of adversarial attacks and defenses on autonomous driving models. In: IEEE International Conference on Pervasive Computing and Communications (PerCom) (2020)
Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., Li, J.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Dong, Y., Pang, T., Su, H., Zhu, J.: Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,\)16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021)
Fawzi, A., Moosavi-Dezfooli, S.M., Frossard, P.: The robustness of deep networks: a geometrical perspective. In: IEEE Signal Processing Magazine (2017)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (ICLR) (2015)
Gu, J., et al.: A survey on transferability of adversarial examples across deep neural networks. Trans. Mach. Learn. Res. (TMLR) (2024)
Gubri, M., Cordy, M., Papadakis, M., Traon, Y.L., Sen, K.: LGV: boosting adversarial example transferability from large geometric vicinity. In: Proceedings of the European Conference on Computer Vision (ECCV) (2022)
Guo, Y., Li, Q., Chen, H.: Backpropagating linearly improves transferability of adversarial examples. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)
Han, D., Jia, X., Bai, Y., Gu, J., Liu, Y., Cao, X.: OT-Attack: Enhancing adversarial transferability of vision-language models via optimal transport optimization. arXiv preprint arXiv:2312.04403 (2023)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (2016)
He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Advances in Neural Information Processing Systems (NeurIPS) (2019)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML) (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NeurIPS) (2012)
Li, Q., Guo, Y., Zuo, W., Chen, H.: Making substitute models more Bayesian can enhance transferability of adversarial examples. In: International Conference on Learning Representations (ICLR) (2022)
Li, Y., Bai, S., Xie, C., Liao, Z., Shen, X., Yuille, A.: Regional homogeneity: towards learning transferable universal adversarial perturbations against defenses. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Li, Z., Shi, C., Xie, Y., Liu, J., Yuan, B., Chen, Y.: Practical adversarial attacks against speaker recognition systems. In: Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications (2020)
Lin, J., Song, C., He, K., Wang, L., Hopcroft, J.E.: Nesterov accelerated gradient and scale invariance for adversarial attacks. In: International Conference on Learning Representations (ICLR) (2020)
Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and black-box attacks. In: International Conference on Learning Representations (ICLR) (2016)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Luo, H., Gu, J., Liu, F., Torr, P.: An image is worth 1000 lies: transferability of adversarial images across prompts on vision-language models. In: International Conference on Learning Representations (ICLR) (2024)
Ma, A., Dvornik, N., Zhang, R., Pishdad, L., Derpanis, K.G., Fazly, A.: SAGE: saliency-guided mixup with optimal rearrangements. In: British Machine Vision Conference (BMVC) (2022)
Ma, A., Pan, Y., Farahmand, A.M.: Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods. Trans. Mach. Learn. Res. (TMLR) (2023)
Ma, W., Li, Y., Jia, X., Xu, W.: Transferable adversarial attack for both vision transformers and convolutional networks via momentum integrated gradients. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Ma, Y., Chen, Y., Akata, Z.: Distilling knowledge from self-supervised teacher by embedding graph alignment. In: British Machine Vision Conference (BMVC) (2022)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations (ICLR) (2018)
Mahmood, K., Mahmood, R., Van Dijk, M.: On the robustness of vision transformers to adversarial examples. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Müller, R., Kornblith, S., Hinton, G.E.: When does label smoothing help? In: Advances in Neural Information Processing Systems (NeurIPS) (2019)
Naseer, M., Ranasinghe, K., Khan, S., Khan, F.S., Porikli, F.: On improving adversarial transferability of vision transformers. In: International Conference on Learning Representations (ICLR) (2021)
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Qayyum, A., Qadir, J., Bilal, M., Al-Fuqaha, A.: Secure and robust machine learning for healthcare: a survey. In: IEEE Reviews in Biomedical Engineering (2020)
Qian, Y., He, S., Zhao, C., Sha, J., Wang, W., Wang, B.: LEA2: a lightweight ensemble adversarial attack via non-overlapping vulnerable frequency regions. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Qin, Z., Fan, Y., Liu, Y., Shen, L., Zhang, Y., Wang, J., Wu, B.: Boosting the transferability of adversarial attacks with reverse adversarial perturbation. In: Advances in Neural Information Processing Systems (NeurIPS) (2022)
Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
Steiner, A., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J., Beyer, L.: How to train your ViT? Data, augmentation, and regularization in vision transformers. Trans. Mach. Learn. Res. (TMLR) (2021)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: International Conference on Learning Representations (ICLR) (2014)
Verma, V., et al.: Manifold Mixup: better representations by interpolating hidden states. In: International Conference on Learning Representations (ICLR) (2019)
Wang, H., Wu, X., Huang, Z., Xing, E.P.: High-frequency component helps explain the generalization of convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Wang, X., He, K.: Enhancing the transferability of adversarial attacks through variance tuning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Waseda, F., Nishikawa, S., Le, T.N., Nguyen, H.H., Echizen, I.: Closer look at the transferability of adversarial examples: how they fool different models differently. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2023)
Wei, Z., Chen, J., Goldblum, M., Wu, Z., Goldstein, T., Jiang, Y.G.: Towards transferable adversarial attacks on vision transformers. In: AAAI Conference on Artificial Intelligence (AAAI) (2022)
Wu, D., Wang, Y., Xia, S.T., Bailey, J., Ma, X.: Skip connections matter: On the transferability of adversarial examples generated with ResNets. In: International Conference on Learning Representations (ICLR) (2020)
Wu, L., Zhu, Z.: Towards understanding and improving the transferability of adversarial examples in deep neural networks. In: Asian Conference on Machine Learning (2020)
Wu, W., Su, Y., Lyu, M.R., King, I.: Improving the transferability of adversarial samples with adversarial transformations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Xiao, Z., et al.: Improving transferability of adversarial patches on face recognition with generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Xiaosen, W., Tong, K., He, K.: Rethinking the backward propagation for adversarial transferability. In: Advances in Neural Information Processing Systems (NeurIPS) (2024)
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Xiong, Y., Lin, J., Zhang, M., Hopcroft, J.E., He, K.: Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Yin, D., Gontijo Lopes, R., Shlens, J., Cubuk, E.D., Gilmer, J.: A Fourier perspective on model robustness in computer vision. In: Advances in Neural Information Processing Systems (NeurIPS) (2019)
Yu, W., Gu, J., Li, Z., Torr, P.: Reliable evaluation of adversarial transferability. arXiv preprint arXiv:2306.08565 (2023)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Zhang, C., Benz, P., Karjauv, A., Cho, J.W., Zhang, K., Kweon, I.S.: Investigating top-k white-box and transferable black-box attack. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR) (2018)
Zhang, L., Deng, Z., Kawaguchi, K., Ghorbani, A., Zou, J.: How does mixup help with robustness and generalization? In: International Conference on Learning Representations (ICLR) (2021)
Zhang, Y., et al.: Why does little robustness help? A further step towards understanding adversarial transferability. In: Proceedings of the IEEE Symposium on Security and Privacy (SP) (2024)
Zhao, P., Chen, P.Y., Das, P., Ramamurthy, K.N., Lin, X.: Bridging mode connectivity in loss landscapes and adversarial robustness. In: International Conference on Learning Representations (ICLR) (2020)
Zhao, Z., Liu, Z., Larson, M.: On success and simplicity: a second look at transferable targeted attacks. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Zhou, D., et al.: Understanding the robustness in vision transformers. In: International Conference on Machine Learning (ICML) (2022)
Zhu, Y., Sun, J., Li, Z.: Rethinking adversarial transferability from a data distribution perspective. In: International Conference on Learning Representations (ICLR) (2021)
Zou, J., Pan, Z., Qiu, J., Liu, X., Rui, T., Li, W.: Improving the transferability of adversarial examples with resized-diverse-inputs, diversity-ensemble and region fitting. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Acknowledgments
Avery Ma acknowledges the funding from the Natural Sciences and Engineering Research Council (NSERC) through the Canada Graduate Scholarships - Doctoral (CGS D) program. Amir-massoud Farahmand acknowledges the funding from the CIFAR AI Chairs program, as well as the support of the NSERC through the Discovery Grant program (2021-03701). Yangchen Pan, Philip Torr and Jindong Gu acknowledge the support from the UKRI Grant: Turing AI Fellowship EP/W002981/1, EPSRC/MURI Grant: EP/N019474/, and the Royal Academy of Engineering. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute. We would like to also thank the members of the Adaptive Agents Lab who provided feedback on a draft of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, A., Farahmand, Am., Pan, Y., Torr, P., Gu, J. (2025). Improving Adversarial Transferability via Model Alignment. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15120. Springer, Cham. https://doi.org/10.1007/978-3-031-73033-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-73033-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73032-0
Online ISBN: 978-3-031-73033-7
eBook Packages: Computer ScienceComputer Science (R0)