Entropy-based pruning method for convolutional neural networks

Hur, Cheonghwan; Kang, Sanggil

doi:10.1007/s11227-018-2684-z

Entropy-based pruning method for convolutional neural networks

Published: 10 November 2018

Volume 75, pages 2950–2963, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

1060 Accesses
3 Altmetric
Explore all metrics

Abstract

Various compression approaches including pruning techniques have been developed to lighten the computational complexity of neural networks. Most pruning techniques determine the threshold of pruning weights or input features based on statistical analysis of the value of weights after completing their training. Their compression performance is limited because they do not take into account the contribution of weights to output during training. To solve this problem, we propose an entropy-based pruning technique that determines the threshold by considering the average amount of information from the weights to output while training. In the experiment section, we demonstrate and analyze our method for a convolutional neural network image classifier modeled by using Mixed National Institute of Standards and Technology image data. From the experimental results, our technique shows that compression performance has improved by more than 28% overall, compared to the well-known pruning technique. Also, the pruning speed has improved by 14%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Mao H et al (2017) Exploring the granularity of sparsity in convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017
Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient DNNS. In: Advances In Neural Information Processing Systems, 2016, pp 1379–1387
Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on CPUs. In: Proceedings of Deep Learning and Unsupervised Feature Learning NIPS Workshop, 2011, p 4
Denton EL et al (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, 2014, pp 1269–1277
Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, 2015, pp 3123–3131
Hubara I et al (2016) Binarized neural networks. In: Advances in Neural Information Processing Systems, 2016, pp 4107–4115
Denil M et al (2013) Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems, 2013, pp 2148–2156
Ye J (2005) Generalized low rank approximations of matrices. Mach Learn 61(1–3):167–191
Article MATH Google Scholar
Yu D, Li Deng (2011) Deep learning and its applications to signal and information processing [exploratory dsp]. IEEE Signal Process Mag 28(1):145–154
Article Google Scholar
Cheng J et al (2017) Quantized CNN: a unified approach to accelerate and compress convolutional networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2017.2774288
Google Scholar
Schneider P, Biehl M, Hammer B (2009) Adaptive relevance matrices in learning vector quantization. Neural Comput 21(12):3532–3561
Article MathSciNet MATH Google Scholar
Kim JK, Kang S (2017) Neural network-based coronary heart disease risk prediction using feature correlation analysis. J Healthc Eng. https://doi.org/10.1155/2017/2780501
Google Scholar
Le Cun Y, Denker J (1989) Sove Solla, Richard Howard and Lawrence Jockel, “Optimal Brain Damage,”. In: Proceedings of 1989 IEEE Conference on Neural Information Processing Systems—Natural and Synthetic, 1989
Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Advances in neural information processing systems, p 164–171
Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12(6):1386–1399
Article Google Scholar
Han S et al (2015) Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, 2015, pp 1135–1143
Karnin ED (1990) A simple procedure for pruning back-propagation trained neural networks. IEEE Trans Neural Netw 1(2):239–242
Article Google Scholar
Chauvin Y, Rumelhart DE (eds) (1995) Backpropagation theory, architectures, and applications. Psychology Press, Hove
Google Scholar
Lindblad G (1973) Entropy, information and quantum measurements. Commun Math Phys 33(4):305–322
Article MathSciNet Google Scholar
Föllmer H (1973) On entropy and information gain in random fields. Probab Theory Relat Fields 26(3):207–217
MathSciNet MATH Google Scholar
Borland L, Plastino AR, Tsallis C (1998) Information gain within nonextensive thermostatistics. J Math Phys 39(12):6490–6501
Article MathSciNet MATH Google Scholar
Nalewajski* RF (2005) Partial communication channels of molecular fragments and their entropy/information indices. Mol Phys 103(4):451–470
Article Google Scholar
Huerta MA, Robertson HS (1969) Entropy, information theory, and the approach to equilibrium of coupled harmonic oscillator systems. J Stat Phys 1(3):393–414
Article Google Scholar
Ebeling W (1993) Entropy and information in processes of self-organization: uncertainty and predictability. Physica A Stat Mech Appl 194(1–4):563–575
Article Google Scholar
Lecun Y, Cortes C, Burges CJC (2010) MNIST handwritten digit database. AT&T Labs [Online]. http://yann.lecun.com/exdb/mnist. Accessed 16 Nov 2017
Krizhevsky A, Nair V, Hinton G (2014) The CIFAR-10 dataset. Online http://www.cs.toronto.edu/kriz/cifar.html. Accessed 16 Nov 2017
Demmel J, Kahan W (1990) Accurate singular values of bidiagonal matrices. SIAM J Sci Stat Comput 11(5):873–912
Article MathSciNet MATH Google Scholar
Hall BA et al (1998) Method for adaptive quantization by multiplication of luminance pixel blocks by a modified, frequency ordered hadamard matrix. U.S. Patent No 5,786,856, 1998
Berg A, Deng J, Fei-Fei L (2012) Large scale visual recognition challenge 2012. www.imagenet.org/challenges. Accessed 16 Nov 2017
Lee K, Ellis DPW (2010) Audio-based semantic concept classification for consumer video. IEEE Trans Audio Speech Lang Process 18(6):1406–1416
Article Google Scholar
Polyak A, Wolf L (2015) Channel-level acceleration of deep face representations. IEEE Access 3:2163–2175
Article Google Scholar
Kharab A, Guenther RB (2011) An introduction to numerical methods: a MATLAB approach. CRC Press, Boca Raton
Book MATH Google Scholar
Liu X et al (2018) Efficient sparse-winograd convolutional neural networks. arXiv preprint arXiv:1802.06367
Han S et al (2016) DSD: regularizing deep neural networks with dense-sparse-dense training flow. arXiv preprint arXiv:1607.04381
Hoeffding W et al (1948) The central limit theorem for dependent random variables. Duke Math J 15(3):773–780
Article MathSciNet MATH Google Scholar
Meek C, Thiesson B, Heckerman D (2002) The learning-curve sampling method applied to model-based clustering. J Mach Learn Res 2:397–418
MathSciNet MATH Google Scholar
Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. In: OSDI, 2016, pp 265–283

Download references

Acknowledgement

This work was supported by Inha University Research Grant.

Author information

Authors and Affiliations

Department of Computer Engineering, Inha University, Incheon, Republic of Korea
Cheonghwan Hur & Sanggil Kang

Authors

Cheonghwan Hur
View author publications
You can also search for this author in PubMed Google Scholar
Sanggil Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanggil Kang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hur, C., Kang, S. Entropy-based pruning method for convolutional neural networks. J Supercomput 75, 2950–2963 (2019). https://doi.org/10.1007/s11227-018-2684-z

Download citation

Published: 10 November 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s11227-018-2684-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Entropy-based pruning method for convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Compression of Deep Convolutional Neural Networks Using Effective Channel Pruning

Kde-Entropy: preserve efficient filter

Pruning Convolutional Neural Networks via Stochastic Gradient Hard Thresholding

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Entropy-based pruning method for convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Compression of Deep Convolutional Neural Networks Using Effective Channel Pruning

Kde-Entropy: preserve efficient filter

Pruning Convolutional Neural Networks via Stochastic Gradient Hard Thresholding

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation