research-article

BVT-IMA: binary vision transformer with information-modified attention

AUTHORs:

Guangming ShiAuthors Info & Claims

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

Article No.: 1757, Pages 15761 - 15769

https://doi.org/10.1609/aaai.v38i14.29505

Published: 20 February 2024 Publication History

Abstract

As a compression method that can significantly reduce the cost of calculations and memories, model binarization has been extensively studied in convolutional neural networks. However, the recently popular vision transformer models pose new challenges to such a technique, in which the binarized models suffer from serious performance drops. In this paper, an attention shifting is observed in the binary multi-head self-attention module, which can influence the information fusion between tokens and thus hurts the model performance. From the perspective of information theory, we find a correlation between attention scores and the information quantity, further indicating that a reason for such a phenomenon may be the loss of the information quantity induced by constant moduli of binarized tokens. Finally, we reveal the information quantity hidden in the attention maps of binary vision transformers and propose a simple approach to modify the attention values with look-up information tables so that improve the model performance. Extensive experiments on CIFAR-100/TinyImageNet/ImageNet-1k demonstrate the effectiveness of the proposed information-modified attention on binary vision transformers.

References

[1]

Amini, A.; Periyasamy, A. S.; and Behnke, S. 2021. T6d-direct: Transformers for multi-object 6d pose direct regression. In DAGM German Conference on Pattern Recognition, 530-544. Springer.

[2]

Bai, H.; Zhang, W.; Hou, L.; Shang, L.; Jin, J.; Jiang, X.; Liu, Q.; Lyu, M.; and King, I. 2021. BinaryBERT: Pushing the Limit of BERT Quantization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4334-4348.

[3]

Bai, Y.; Wang, Y.-X.; and Liberty, E. 2019. ProxQuant: Quantized Neural Networks via Proximal Operators. In International Conference on Learning Representations.

[4]

Bengio, Y.; Leonard, N.; and Courville, A. C. 2013. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. CoRR, abs/1308.3432.

[5]

Bethge, J.; Bartz, C.; Yang, H.; Chen, Y.; and Meinel, C. 2021. MeliusNet: An Improved Network Architecture for Binary Neural Networks. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 1438-1447.

[6]

Bolya, D.; Fu, C.-Y.; Dai, X.; Zhang, P.; Feichtenhofer, C.; and Hoffman, J. 2022. Token Merging: Your ViT But Faster. In The Eleventh International Conference on Learning Representations.

[7]

Bulat, A.; Martinez, B.; and Tzimiropoulos, G. 2021. High-Capacity Expert Binary Networks. In International Conference on Learning Representations.

[8]

Chen, T.; Cheng, Y.; Gan, Z.; Yuan, L.; Zhang, L.; and Wang, Z. 2021. Chasing sparsity in vision transformers: An end-to-end exploration. Advances in Neural Information Processing Systems, 34.

[9]

Chuanyang, Z.; Li, Z.; Zhang, K.; Yang, Z.; Tan, W.; Xiao, J.; Ren, Y.; and Pu, S. 2022. SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.

[10]

Courbariaux, M.; and Bengio, Y. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR, abs/1602.02830.

[11]

Ding, L.; Lin, D.; Lin, S.; Zhang, J.; Cui, X.; Wang, Y.; Tang, H.; and Bruzzone, L. 2022. Looking outside the window: Wide-context transformer for the semantic segmentation of high-resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing.

[12]

Ding, R.; Liu, H.; and Zhou, X. 2022. IE-Net: Information-enhanced binary neural networks for accurate classification. Electronics, 11(6): 937.

[13]

Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.

[14]

Esser, S. K.; McKinstry, J. L.; Bablani, D.; Appuswamy, R.; and Modha, D. S. 2020. Learned Step Size Quantization. In International Conference on Learning Representations.

[15]

Gao, T.; Xu, C.; Zhang, L.; and Kong, H. 2023. GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples. CoRR, abs/2305.07931.

[16]

He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; and Jiang, W. 2021. Transreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15013-15022.

[17]

He, Y.; Lou, Z.; Zhang, L.; Liu, J.; Wu, W.; Zhou, H.; and Zhuang, B. 2023a. BiViT: Extremely Compressed Binary Vision Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 5651-5663.

[18]

He, Y.; Zhang, L.; Wu, W.; and Zhou, H. 2023b. Binarizing by Classification: Is Soft Function Really Necessary? IEEE Transactions on Circuits and Systems for Video Technology, 1-1.

[19]

Kenton, J. D. M.-W. C.; and Toutanova, L. K. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, 2.

[20]

Krizhevsky, A.; and Hinton, G. 2009. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 1(4).

[21]

Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; and Soricut, R. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations.

[22]

Li, Y.; Xu, S.; Zhang, B.; Cao, X.; Gao, P.; and Guo, G. 2022. Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.

[23]

Liang, Y.; Chongjian, G.; Tong, Z.; Song, Y.; Wang, J.; and Xie, P. 2021. EViT: Expediting Vision Transformers via Token Reorganizations. In International Conference on Learning Representations.

[24]

Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; and Guo, B. 2021a. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012-10022.

[25]

Liu, Z.; Luo, W.; Wu, B.; Yang, X.; Liu, W.; and Cheng, K.-T. 2020a. Bi-real net: Binarizing deep network towards realnetwork performance. International Journal of Computer Vision (IJCV), 128(1): 202-219.

Digital Library

[26]

Liu, Z.; Oguz, B.; Pappu, A.; Xiao, L.; Yih, S.; Li, M.; Krishnamoorthi, R.; and Mehdad, Y. 2022. BiT: Robustly Binarized Multi-distilled Transformer. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.

[27]

Liu, Z.; Shen, Z.; Li, S.; Helwegen, K.; Huang, D.; and Cheng, K.-T. 2021b. How do adam and training strategies help bnns optimization? In International Conference on Machine Learning. PMLR.

[28]

Liu, Z.; Shen, Z.; Savvides, M.; and Cheng, K.-T. 2020b. ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions. In European Conference on Computer Vision (ECCV).

[29]

Liu, Z.; Wang, Y.; Han, K.; Zhang, W.; Ma, S.; and Gao, W. 2021c. Post-Training Quantization for Vision Transformer. In Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems.

[30]

Martinez, B.; Yang, J.; Bulat, A.; and Tzimiropoulos, G. 2020. Training binary neural networks with real-to-binary convolutions. In International Conference on Learning Representations.

[31]

Mishra, A.; and Marr, D. 2018. Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. In International Conference on Learning Representations.

[32]

Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.

[33]

Pouransari, H.; and Ghili, S. 2014. Tiny ImageNet Visual Recognition Challenge.

[34]

Qin, H.; Ding, Y.; Zhang, M.; Yan, Q.; Liu, A.; Dang, Q.; Liu, Z.; and Liu, X. 2022. BiBERT: Accurate Fully Binarized BERT. In International Conference on Learning Representations (ICLR).

[35]

Qin, H.; Gong, R.; Liu, X.; Shen, M.; Wei, Z.; Yu, F.; and Song, J. 2020. Forward and Backward Information Retention for Accurate Binary Neural Networks. In IEEE CVPR.

[36]

Rastegari, M.; Ordonez, V.; Redmon, J.; and Farhadi, A. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, 525-542. Springer.

[37]

Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3): 211-252.

[38]

Ryoo, M. S.; Piergiovanni, A.; Arnab, A.; Dehghani, M.; and Angelova, A. 2021. TokenLearner: Adaptive Space-Time Tokenization for Videos. In Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems.

[39]

Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; and Jegou, H. 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, 10347-10357. PMLR.

[40]

Xu, S.; Li, Y.; Ma, T.; Lin, M.; Dong, H.; Zhang, B.; Gao, P.; and Lu, J. 2023. Resilient Binary Neural Network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 10620-10628.

[41]

Yang, H.; Yin, H.; Molchanov, P.; Li, H.; and Kautz, J. 2021. NViT: Vision Transformer Compression and Parameter Redistribution. CoRR, abs/2110.04869.

[42]

Yin, M.; Uzkent, B.; Shen, Y.; Jin, H.; and Yuan, B. 2023. GOHSP: A Unified Framework of Graph and Optimization-Based Heterogeneous Structured Pruning for Vision Transformer. In AAAI, 10954-10962. AAAI Press.

[43]

Yu, S.; Chen, T.; Shen, J.; Yuan, H.; Tan, J.; Yang, S.; Liu, J.; and Wang, Z. 2021. Unified Visual Transformer Compression. In International Conference on Learning Representations.

[44]

Yuan, Z.; Xue, C.; Chen, Y.; Wu, Q.; and Sun, G. 2022. PTQ4ViT: Post-training Quantization forVision Transformers withTwin Uniform Quantization. Springer, Cham.

[45]

Zhang, J.; Peng, H.; Wu, K.; Liu, M.; Xiao, B.; Fu, J.; and Yuan, L. 2022a. MiniViT: Compressing Vision Transformers with Weight Multiplexing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12145-12154.

[46]

Zhang, J.; Su, Z.; Feng, Y.; Lu, X.; Pietikainen, M.; and Liu, L. 2022b. Dynamic Binary Neural Network by Learning Channel-Wise Thresholds. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1885-1889.

[47]

Zhang, Y.; Zhang, Z.; and Lew, L. 2022. PokeBNN: A Binary Pursuit of Lightweight Accuracy. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12465-12475. Los Alamitos, CA, USA: IEEE Computer Society.

[48]

Zhang, Z.; Zhang, H.; Zhao, L.; Chen, T.; ; Arik, S. ; and Pfister, T. 2022c. Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding. In AAAI Conference on Artificial Intelligence (AAAI).

Index Terms

BVT-IMA: binary vision transformer with information-modified attention

Index terms have been assigned to the content through auto-classification.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

February 2024

23861 pages

ISBN:978-1-57735-887-9

Copyright © 2024 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 20 February 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten