Increasing Information Entropy of Both Weights and Activations for the Binary Neural Networks
Abstract
:1. Introuction
2. Preliminaries
2.1. Binarized Neural Networks
2.2. Information Entropy
3. Proposed Method
3.1. Median Loss (ML)
3.2. Batch Median of Activations (BMA)
4. Experiments and Discussion
4.1. Experimental Details
Algorithm 1. Training Flow of the BNNs using Our-proposed ML and BMA. |
Input: A mini-batch of inputs and targets. |
01: Compute the binarized weights and input activations: = sign(BMA()), = sign(( − ())/()) |
02: Compute the output activations: = conv(,), = Act( × + ) |
03: Compute the output losses: , |
04: Compute the gradients employing the method adopted in XNOR-Net [14] |
05: Update the : − , where is the learning rate. |
4.2. Ablation Studies
4.2.1. Median Loss (ML)
4.2.2. Regularization Parameter
4.2.3. Batch Median of Activations (BMA)
4.2.4. Comparison with State-of-the-Art Methods
4.2.5. Storage Cost and Calculation Complexity Analyses
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Han, K.; Guo, J.; Zhang, C.; Zhu, M. Attribute-aware Attention Model for Fine-grained Representation Learning. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea, 22–26 October 2018; pp. 2040–2048. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Srinivas, S.; Babu, R.V. Data-free Parameter Pruning for Deep Neural Networks. arXiv 2015, arXiv:1507.06149. [Google Scholar]
- Han, S.; Pool, J.; Tran, J.; Dally, W.J. Learning Both Weights and Connections for Efficient Neural Networks. arXiv 2015, arXiv:1506.02626. [Google Scholar]
- Chen, W.; Wilson, J.; Tyree, S.; Weinberger, K.; Chen, Y. Compressing Neural Networks with The Hashing Trick. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2285–2294. [Google Scholar]
- Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 4114–4122. [Google Scholar]
- Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-net: Imagenet Classification Using Binary Convolutional Neural Networks. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 525–542. [Google Scholar]
- Li, F.; Zhang, B.; Liu, B. Ternary Weight Networks. arXiv 2016, arXiv:1605.04711. [Google Scholar]
- Zhou, A.; Yao, A.; Guo, Y.; Xu, L.; Chen, Y. Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights. arXiv 2017, arXiv:1702.03044. [Google Scholar]
- Zhou, S.; Wu, Y.; Ni, Z.; Zhou, X.; Wen, H.; Zou, Y. Dorefa-net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. arXiv 2016, arXiv:1606.06160. [Google Scholar]
- Qin, H.; Gong, R.; Liu, X.; Shen, M.; Wei, Z.; Yu, F.; Song, J. Forward and Backward Information Retention for Accurate Binary Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2250–2259. [Google Scholar]
- Ignatov, D.; Ignatov, A. Controlling Information Capacity of Binary Neural Network. Pattern Recognit. Lett. 2020, 138, 276–281. [Google Scholar] [CrossRef]
- Qin, H.; Gong, R.; Liu, X.; Bai, X.; Song, J.; Sebe, N. Binary Neural Networks: A Survey. Pattern Recognit. 2020, 105, 107281. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Lu, J.; Tao, C.; Zhou, J.; Tian, Q. Learning Channel-Wise Interactions for Binary Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 568–577. [Google Scholar]
- Yin, P.; Zhang, S.; Lyu, J.; Osher, S.; Qi, Y.; Xin, J. Blended Coarse Gradient Descent for Full Quantization of Deep Neural Net-works. Res. Math. Sci. 2019, 6, 14. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Wu, B.; Luo, W.; Yang, X.; Liu, W.; Cheng, K.-T. Bi-real net: Enhancing the Performance of 1-bit CNNs with Improved Representational Capability and Advanced Training Algorithm. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 722–737. [Google Scholar]
- Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to + 1 or −1. arXiv 2016, arXiv:1602.02830. [Google Scholar]
- Bengio, Y.; Léonard, N.; Courville, A. Estimating or Propagating Gradients through Stochastic Neurons for Conditional Computation. arXiv 2013, arXiv:1308.3432. [Google Scholar]
- Darabi, S.; Belbahri, M.; Courbariaux, M.; Nia, V.P. BNN+: Improved Binary Network Training. In Proceedings of the Sixth International Conference on Learning Representations, Vancouver, BC, Canada, 29 April–3 May 2018; pp. 1–10. [Google Scholar]
- Kim, H.; Kim, J.; Kim, J. Binary Duo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations. In Proceedings of the 8th International Conference on Learning Representations (ICLR 2020), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Liu, S.; Zhu, H. Binary Convolutional Neural Network with High Accuracy and Compression Rate. In Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 20–22 December 2019; pp. 43–48. [Google Scholar]
- Shannon, C.E. A mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
- Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
- Japkowicz, N.; Shah, M. Evaluating Learning Algorithms: A Classification Perspective; Cambridge University Press: Cambridge, UK, 2011; pp. 42–72. [Google Scholar]
- Loffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Krizhevsky, A.; Nair, V.; Hinton, G.E. The Cifar-10 Dataset. 2014, p. 6. Available online: http://www.cs.toronto.edu/kriz/cifar.Html (accessed on 9 August 2021).
Method 1 [18] | Method 2 (ML) | Referenced Baseline [14] | |
---|---|---|---|
Accuracy (%) | 83.85 | 84.30 | 80.33 |
(Mean + STD)% | 0.32 | 0.32 | 0.39 |
λ | 10−6 | 10−5 | 10−4 | 10−3 | 10−2 |
---|---|---|---|---|---|
Accuracy (%) | 83.67 | 84.12 | 84.30 | 83.96 | 84.07 |
(Mean + STD)% | 0.41 | 0.29 | 0.28 | 0.34 | 0.37 |
Topology | Method | Bit-Width (W/A) | Accuracy (%) |
---|---|---|---|
ResNet-20 | XNOR | 1/1 | 80.33 |
XNOR+ML | 1/1 | 84.30 | |
XNOR+BMA | 1/1 | 83.31 | |
XNOR+ML+BMA | 1/1 | 85.00 |
Topology | Method | Bit-Width (W/A) | Accuracy (%) |
---|---|---|---|
BiReal-18 | IR-Net | 1/1 | 86.30 |
IR-Net+ML | 1/1 | 86.60 | |
IR-Net+BMA | 1/1 | 86.49 | |
IR-Net+ML+BMA | 1/1 | 86.83 |
Topology | Method | Bit-Width (W/A) | Top-1 (%) | Top-5 (%) |
---|---|---|---|---|
ResNet-18 | Floating Point | 32/32 | 69.6 | 89.2 |
ABC-Net | 1/1 | 42.7 | 67.6 | |
XNOR | 1/1 | 51.2 | 73.2 | |
Bi-Real | 1/1 | 56.4 | 79.5 | |
CI-BCNN | 1/1 | 56.7 | 80.1 | |
IR-Net | 1/1 | 58.1 | 80.0 | |
Bi-M 1 | 1/1 | 57.0 | 79.9 | |
Bi-B 2 | 1/1 | 56.7 | 79.7 | |
Bi-MB 3 | 1/1 | 57.7 | 80.2 | |
Bi-IR-MB 4 | 1/1 | 58.5 | 80.8 | |
ResNet-34 | Floating Point | 32/32 | 73.3 | 91.3 |
ABC-Net | 1/1 | 52.4 | 76.5 | |
Bi-Real | 1/1 | 62.2 | 83.9 | |
CI-BCNN | 1/1 | 62.4 | 84.8 | |
IR-Net | 1/1 | 62.9 | 84.1 | |
BI-M 1 | 1/1 | 62.7 | 84.1 | |
Bi-B 2 | 1/1 | 62.4 | 83.9 | |
Bi-MB 3 | 1/1 | 63.1 | 84.3 | |
Bi-IR-MB 4 | 1/1 | 63.2 | 84.3 |
Topology | Method | Storage Cost (Mbit) | FLOPs |
---|---|---|---|
ResNet-18 | Floating-Point | 374.1 Mbit | 1.81 × 109 |
XNOR | 33.7 Mbit | 1.67 × 108 | |
Bi-Real | 33.6 Mbit | 1.63 × 108 | |
Bi-Real+ML+BMA | 33.6 Mbit | 1.63 × 108 | |
ResNet-34 | Floating-Point | 697.3 Mbit | 3.66 × 109 |
XNOR | 43.9 Mbit | 1.98 × 108 | |
Bi-Real | 43.7 Mbit | 1.93 × 108 | |
Bi-Real+ML+BMA | 43.7 Mbit | 1.93 × 108 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zou, W.; Cheng, S.; Wang, L.; Fu, G.; Shang, D.; Zhou, Y.; Zhan, Y. Increasing Information Entropy of Both Weights and Activations for the Binary Neural Networks. Electronics 2021, 10, 1943. https://doi.org/10.3390/electronics10161943
Zou W, Cheng S, Wang L, Fu G, Shang D, Zhou Y, Zhan Y. Increasing Information Entropy of Both Weights and Activations for the Binary Neural Networks. Electronics. 2021; 10(16):1943. https://doi.org/10.3390/electronics10161943
Chicago/Turabian StyleZou, Wanbing, Song Cheng, Luyuan Wang, Guanyu Fu, Delong Shang, Yumei Zhou, and Yi Zhan. 2021. "Increasing Information Entropy of Both Weights and Activations for the Binary Neural Networks" Electronics 10, no. 16: 1943. https://doi.org/10.3390/electronics10161943