Conclusion
In this study, we proposed a novel method called UP-ViTs to prune ViTs in a unified manner. Our framework can prune all components in a ViT and its variants, maintain the models’ structure, and generalize well into downstream tasks. UP-ViTs achieve state-of-the-art results when pruning various ViT backbones. Moreover, we studied the transferring ability of the compressed model and found that our UP-ViTs also outperform original ViTs. We also extended our method into NLP tasks and obtained more efficient transformer models. Please refer to the appendix for more details.
References
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998–6008
Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention. In: Proceedings of International Conference on Machine Learning, 2021. 10347–10357
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations (ICLR), 2021
Yu S, Chen T, Shen J, et al. Unified visual transformer compression. In: Proceedings of International Conference on Learning Representations (ICLR), 2021
Xu Y, Zhang Z, Zhang M, et al. Evo-ViT: slow-fast token evolution for dynamic vision transformer. 2021. ArXiv:2108.01390
Luo J H, Wu J. Neural network pruning with residual-connections and limited-data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1458–1467
Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009. 248–255
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 62276123. 61921006).
Author information
Authors and Affiliations
Corresponding author
Additional information
Supporting information
Appendixes A-E. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted. without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
Supplementary File
Rights and permissions
About this article
Cite this article
Yu, H., Wu, J. A unified pruning framework for vision transformers. Sci. China Inf. Sci. 66, 179101 (2023). https://doi.org/10.1007/s11432-022-3646-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3646-6