Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Structured Pruning of Deep Convolutional Neural Networks

Published: 09 February 2017 Publication History

Abstract

Real-time application of deep learning algorithms is often hindered by high computational complexity and frequent memory accesses. Network pruning is a promising technique to solve this problem. However, pruning usually results in irregular network connections that not only demand extra representation efforts but also do not fit well on parallel computation. We introduce structured sparsity at various scales for convolutional neural networks: feature map-wise, kernel-wise, and intra-kernel strided sparsity. This structured sparsity is very advantageous for direct computational resource savings on embedded computers, in parallel computing environments, and in hardware-based systems. To decide the importance of network connections and paths, the proposed method uses a particle filtering approach. The importance weight of each particle is assigned by assessing the misclassification rate with a corresponding connectivity pattern. The pruned network is retrained to compensate for the losses due to pruning. While implementing convolutions as matrix products, we particularly show that intra-kernel strided sparsity with a simple constraint can significantly reduce the size of the kernel and feature map tensors. The proposed work shows that when pruning granularities are applied in combination, we can prune the CIFAR-10 network by more than 70% with less than a 1% loss in accuracy.

References

[1]
M. Sanjeev Arulampalam, Simon Maskell, Neil Gordon, and Tim Clapp. 2002. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on Signal Processing 50, 2 (2002), 174--188.
[2]
James Carpenter, Peter Clifford, and Paul Fearnhead. 1999. Improved particle filter for nonlinear problems. In IEEE Proceedings on Radar, Sonar and Navigation 146. IET, 2--7.
[3]
Giovanna Castellano, Anna Maria Fanelli, and Marcello Pelillo. 1997. An iterative pruning algorithm for feedforward neural networks. IEEE Transactions on Neural Networks 8, 3 (1997), 519--531.
[4]
Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition. Suvisoft.
[5]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. Cudnn: Efficient primitives for deep learning. Arxiv Preprint Arxiv:1410.0759 (2014).
[6]
Maxwell D. Collins and Pushmeet Kohli. 2014. Memory bounded deep convolutional networks. Arxiv Preprint Arxiv:1412.1442 (2014).
[7]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems. 3105--3113.
[8]
Neil J. Gordon, David J. Salmond, and Adrian F. M. Smith. 1993. Novel approach to nonlinear/non-gaussian bayesian state estimation. In IEEE Proceedings on F-Radar and Signal Processing 140. IET, 107--113.
[9]
Song Han, Huizi Mao, and William J. Dally. 2015a. A deep neural network compression pipeline: Pruning, quantization, huffman encoding. Arxiv Preprint Arxiv:1510.00149 (2015).
[10]
Song Han, Jeff Pool, John Tran, and William Dally. 2015b. Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems. 1135--1143.
[11]
Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and others. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82--97.
[12]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Arxiv Preprint Arxiv:1502.03167 (2015).
[13]
A. Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical report, University of Toronto.
[14]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems. 1097--1105.
[15]
Ngai Ming Kwok, Gu Fang, and Weizhen Zhou. 2005. Evolutionary particle filter: Re-sampling from the genetic algorithm perspective. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005). IEEE, 2935--2940.
[16]
Vadim Lebedev and Victor Lempitsky. 2015. Fast convnets using group-wise brain damage. Arxiv Preprint Arxiv:1506.02515 (2015).
[17]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.
[18]
Tiancheng Li, Shudong Sun, Tariq Pervez Sattar, and Juan Manuel Corchado. 2014. Fight sample degeneracy and impoverishment in particle filters: A review of intelligent approaches. Expert Systems with Applications 41, 8 (2014), 3944--3954.
[19]
Michael Mathieu, Mikael Henaff, and Yann LeCun. 2013. Fast training of convolutional networks through FFTs. Arxiv Preprint Arxiv:1312.5851 (2013).
[20]
Kazuyuki Nakamura, Ryo Yoshida, Masao Nagasaki, Satoru Miyano, and Tomoyuki Higuchi. 2009. Parameter estimation of in silico biological pathways with particle filtering towards a petascale computing. In Proceedings of the Pacific Symposium on Biocomputing 14. 227--238.
[21]
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011.
[22]
Katja Nummiaro, Esther Koller-Meier, and Luc Van Gool. 2003. An adaptive color-based particle filter. Image and Vision Computing 21, 1 (2003), 99--110.
[23]
Adam Polyak and Lior Wolf. 2015. Channel-level acceleration of deep face representations. IEEE Access 3 (2015), 2163--2175.
[24]
Russell Reed. 1993. Pruning algorithms-a survey. IEEE Transactions on Neural Networks 4, 5 (1993), 740--747.
[25]
Pierre Sermanet, Soumith Chintala, and Yann LeCun. 2012. Convolutional neural networks applied to house numbers digit classification. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR). IEEE, 3288--3291.
[26]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Arxiv Preprint Arxiv:1409.1556 (2014).
[27]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929--1958.
[28]
Slawomir W. Stepniewski and Andy J. Keane. 1997. Pruning backpropagation neural networks using modern stochastic optimisation techniques. Neural Computing 8 Applications 5, 2 (1997), 76--98.
[29]
Daniel Strigl, Klaus Kofler, and Stefan Podlipnig. 2010. Performance and scalability of GPU-based convolutional neural networks. In Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. IEEE, 317--324.
[30]
Wonyong Sung, Sungho Shin, and Kyuyeon Hwang. 2015. Resiliency of deep neural networks under quantization. Arxiv Preprint Arxiv:1511.06488 (2015).
[31]
Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4 (2012), 2.
[32]
Jaco Vermaak, Arnaud Doucet, and Patrick Pérez. 2003. Maintaining multimodality through mixture tracking. In Proceedings of the 9th IEEE International Conference on Computer Vision, 2003. IEEE, 1110--1116.
[33]
Li Wan, Matthew Zeiler, Sixin Zhang, Yann L. Cun, and Rob Fergus. 2013. Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning (ICML-13). 1058--1066.

Cited By

View all
  • (2024)A Review of Neural Network Lightweighting TechniquesInnovation & Technology Advances10.61187/ita.v1i2.361:2(1-16)Online publication date: 16-Jan-2024
  • (2024)Streamlining YOLOv7 for Rapid and Accurate Detection of Rapeseed Varieties on Embedded DeviceSensors10.3390/s2417558524:17(5585)Online publication date: 28-Aug-2024
  • (2024)Deep Learning-Based Automatic River Flow Estimation Using RADARSAT ImageryRemote Sensing10.3390/rs1610180816:10(1808)Online publication date: 20-May-2024
  • Show More Cited By

Index Terms

  1. Structured Pruning of Deep Convolutional Neural Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 13, Issue 3
    Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems
    July 2017
    418 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/3051701
    • Editor:
    • Yuan Xie
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 09 February 2017
    Accepted: 01 October 2016
    Revised: 01 August 2016
    Received: 01 March 2016
    Published in JETC Volume 13, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep convolutional neural networks
    2. feature map pruning
    3. intra-kernel strided sparsity
    4. structured pruning

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)552
    • Downloads (Last 6 weeks)58
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Review of Neural Network Lightweighting TechniquesInnovation & Technology Advances10.61187/ita.v1i2.361:2(1-16)Online publication date: 16-Jan-2024
    • (2024)Streamlining YOLOv7 for Rapid and Accurate Detection of Rapeseed Varieties on Embedded DeviceSensors10.3390/s2417558524:17(5585)Online publication date: 28-Aug-2024
    • (2024)Deep Learning-Based Automatic River Flow Estimation Using RADARSAT ImageryRemote Sensing10.3390/rs1610180816:10(1808)Online publication date: 20-May-2024
    • (2024)An Agile Super-Resolution Network via Intelligent Path SelectionMathematics10.3390/math1207109412:7(1094)Online publication date: 5-Apr-2024
    • (2024)A Systematic Evaluation of Recurrent Neural Network Models for Edge Intelligence and Human Activity Recognition ApplicationsAlgorithms10.3390/a1703010417:3(104)Online publication date: 28-Feb-2024
    • (2024)Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to AskProceedings of the VLDB Endowment10.14778/3659437.365945617:8(2036-2049)Online publication date: 1-Apr-2024
    • (2024)TEFLON: Thermally Efficient Dataflow-aware 3D NoC for Accelerating CNN Inferencing on Manycore PIM ArchitecturesACM Transactions on Embedded Computing Systems10.1145/366527923:5(1-23)Online publication date: 14-Aug-2024
    • (2024)A Review on the emerging technology of TinyMLACM Computing Surveys10.1145/366182056:10(1-37)Online publication date: 22-Jun-2024
    • (2024)MobileNetV3 Layer Sensitivity and SparsityProceedings of the ACM/IEEE 6th International Workshop on Software Engineering Research & Practices for the Internet of Things10.1145/3643794.3648288(38-43)Online publication date: 20-Apr-2024
    • (2024)Large Multimodal Model Compression via Iterative Efficient Pruning and DistillationCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648321(235-244)Online publication date: 13-May-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media