A unified pruning framework for vision transformers

Yu, Hao; Wu, Jianxin

doi:10.1007/s11432-022-3646-6

A unified pruning framework for vision transformers

Letter
Published: 04 April 2023

Volume 66, article number 179101, (2023)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Hao Yu¹ &
Jianxin Wu¹

723 Accesses
26 Citations
Explore all metrics

Conclusion

In this study, we proposed a novel method called UP-ViTs to prune ViTs in a unified manner. Our framework can prune all components in a ViT and its variants, maintain the models’ structure, and generalize well into downstream tasks. UP-ViTs achieve state-of-the-art results when pruning various ViT backbones. Moreover, we studied the transferring ability of the compressed model and found that our UP-ViTs also outperform original ViTs. We also extended our method into NLP tasks and obtained more efficient transformer models. Please refer to the appendix for more details.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998–6008
Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention. In: Proceedings of International Conference on Machine Learning, 2021. 10347–10357
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations (ICLR), 2021
Yu S, Chen T, Shen J, et al. Unified visual transformer compression. In: Proceedings of International Conference on Learning Representations (ICLR), 2021
Xu Y, Zhang Z, Zhang M, et al. Evo-ViT: slow-fast token evolution for dynamic vision transformer. 2021. ArXiv:2108.01390
Luo J H, Wu J. Neural network pruning with residual-connections and limited-data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1458–1467
Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009. 248–255

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 62276123. 61921006).

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Hao Yu & Jianxin Wu

Authors

Hao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianxin Wu.

Additional information

Supporting information

Appendixes A-E. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted. without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File