research-article

Structured Pruning of Deep Convolutional Neural Networks

Authors:

Wonyong SungAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 13, Issue 3

Article No.: 32, Pages 1 - 18

https://doi.org/10.1145/3005348

Published: 09 February 2017 Publication History

Abstract

Real-time application of deep learning algorithms is often hindered by high computational complexity and frequent memory accesses. Network pruning is a promising technique to solve this problem. However, pruning usually results in irregular network connections that not only demand extra representation efforts but also do not fit well on parallel computation. We introduce structured sparsity at various scales for convolutional neural networks: feature map-wise, kernel-wise, and intra-kernel strided sparsity. This structured sparsity is very advantageous for direct computational resource savings on embedded computers, in parallel computing environments, and in hardware-based systems. To decide the importance of network connections and paths, the proposed method uses a particle filtering approach. The importance weight of each particle is assigned by assessing the misclassification rate with a corresponding connectivity pattern. The pruned network is retrained to compensate for the losses due to pruning. While implementing convolutions as matrix products, we particularly show that intra-kernel strided sparsity with a simple constraint can significantly reduce the size of the kernel and feature map tensors. The proposed work shows that when pruning granularities are applied in combination, we can prune the CIFAR-10 network by more than 70% with less than a 1% loss in accuracy.

References

[1]

M. Sanjeev Arulampalam, Simon Maskell, Neil Gordon, and Tim Clapp. 2002. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on Signal Processing 50, 2 (2002), 174--188.

Digital Library

[2]

James Carpenter, Peter Clifford, and Paul Fearnhead. 1999. Improved particle filter for nonlinear problems. In IEEE Proceedings on Radar, Sonar and Navigation 146. IET, 2--7.

[3]

Giovanna Castellano, Anna Maria Fanelli, and Marcello Pelillo. 1997. An iterative pruning algorithm for feedforward neural networks. IEEE Transactions on Neural Networks 8, 3 (1997), 519--531.

Digital Library

[4]

Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition. Suvisoft.

[5]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. Cudnn: Efficient primitives for deep learning. Arxiv Preprint Arxiv:1410.0759 (2014).

[6]

Maxwell D. Collins and Pushmeet Kohli. 2014. Memory bounded deep convolutional networks. Arxiv Preprint Arxiv:1412.1442 (2014).

[7]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems. 3105--3113.

Digital Library

[8]

Neil J. Gordon, David J. Salmond, and Adrian F. M. Smith. 1993. Novel approach to nonlinear/non-gaussian bayesian state estimation. In IEEE Proceedings on F-Radar and Signal Processing 140. IET, 107--113.

[9]

Song Han, Huizi Mao, and William J. Dally. 2015a. A deep neural network compression pipeline: Pruning, quantization, huffman encoding. Arxiv Preprint Arxiv:1510.00149 (2015).

[10]

Song Han, Jeff Pool, John Tran, and William Dally. 2015b. Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems. 1135--1143.

Digital Library

[11]

Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and others. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82--97.

[12]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Arxiv Preprint Arxiv:1502.03167 (2015).

[13]

A. Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical report, University of Toronto.

[14]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[15]

Ngai Ming Kwok, Gu Fang, and Weizhen Zhou. 2005. Evolutionary particle filter: Re-sampling from the genetic algorithm perspective. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005). IEEE, 2935--2940.

[16]

Vadim Lebedev and Victor Lempitsky. 2015. Fast convnets using group-wise brain damage. Arxiv Preprint Arxiv:1506.02515 (2015).

[17]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.

[18]

Tiancheng Li, Shudong Sun, Tariq Pervez Sattar, and Juan Manuel Corchado. 2014. Fight sample degeneracy and impoverishment in particle filters: A review of intelligent approaches. Expert Systems with Applications 41, 8 (2014), 3944--3954.

Digital Library

[19]

Michael Mathieu, Mikael Henaff, and Yann LeCun. 2013. Fast training of convolutional networks through FFTs. Arxiv Preprint Arxiv:1312.5851 (2013).

[20]

Kazuyuki Nakamura, Ryo Yoshida, Masao Nagasaki, Satoru Miyano, and Tomoyuki Higuchi. 2009. Parameter estimation of in silico biological pathways with particle filtering towards a petascale computing. In Proceedings of the Pacific Symposium on Biocomputing 14. 227--238.

[21]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011.

[22]

Katja Nummiaro, Esther Koller-Meier, and Luc Van Gool. 2003. An adaptive color-based particle filter. Image and Vision Computing 21, 1 (2003), 99--110.

[23]

Adam Polyak and Lior Wolf. 2015. Channel-level acceleration of deep face representations. IEEE Access 3 (2015), 2163--2175.

[24]

Russell Reed. 1993. Pruning algorithms-a survey. IEEE Transactions on Neural Networks 4, 5 (1993), 740--747.

Digital Library

[25]

Pierre Sermanet, Soumith Chintala, and Yann LeCun. 2012. Convolutional neural networks applied to house numbers digit classification. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR). IEEE, 3288--3291.

[26]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Arxiv Preprint Arxiv:1409.1556 (2014).

[27]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[28]

Slawomir W. Stepniewski and Andy J. Keane. 1997. Pruning backpropagation neural networks using modern stochastic optimisation techniques. Neural Computing 8 Applications 5, 2 (1997), 76--98.

[29]

Daniel Strigl, Klaus Kofler, and Stefan Podlipnig. 2010. Performance and scalability of GPU-based convolutional neural networks. In Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. IEEE, 317--324.

Digital Library

[30]

Wonyong Sung, Sungho Shin, and Kyuyeon Hwang. 2015. Resiliency of deep neural networks under quantization. Arxiv Preprint Arxiv:1511.06488 (2015).

[31]

Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4 (2012), 2.

[32]

Jaco Vermaak, Arnaud Doucet, and Patrick Pérez. 2003. Maintaining multimodality through mixture tracking. In Proceedings of the 9th IEEE International Conference on Computer Vision, 2003. IEEE, 1110--1116.

Digital Library

[33]

Li Wan, Matthew Zeiler, Sixin Zhang, Yann L. Cun, and Rob Fergus. 2013. Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning (ICML-13). 1058--1066.

Digital Library

Cited By

Gong ZZhang HYang HLiu FLuo F(2024)A Review of Neural Network Lightweighting TechniquesInnovation & Technology Advances10.61187/ita.v1i2.361:2(1-16)Online publication date: 16-Jan-2024
https://doi.org/10.61187/ita.v1i2.36
Gu SMeng WSun G(2024)Streamlining YOLOv7 for Rapid and Accurate Detection of Rapeseed Varieties on Embedded DeviceSensors10.3390/s2417558524:17(5585)Online publication date: 28-Aug-2024
https://doi.org/10.3390/s24175585
Ziadi SChokmani KChaabani CEl Alem A(2024)Deep Learning-Based Automatic River Flow Estimation Using RADARSAT ImageryRemote Sensing10.3390/rs1610180816:10(1808)Online publication date: 20-May-2024
https://doi.org/10.3390/rs16101808
Show More Cited By

Index Terms

Structured Pruning of Deep Convolutional Neural Networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Structured Pruning with Automatic Pruning Rate Derivation for Image Processing Neural Networks
ISMSI '22: Proceedings of the 2022 6th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence

Structured pruning has been proposed for network model compression. Because most of existing structured pruning methods assign pruning rate manually, finding appropriate pruning rate to suppress the degradation of pruned model accuracy is difficult. ...
Recursive least squares method for training and pruning convolutional neural networks
Abstract
Convolutional neural networks (CNNs) have shown good performance in many practical applications. However, their high computational and storage requirements make them difficult to deploy on resource-constrained devices. To address this issue, in ...
Loss-aware automatic selection of structured pruning criteria for deep neural network acceleration
Abstract
Structured pruning is a well-established technique for compressing neural networks, making them suitable for deployment in resource-limited edge devices. This study presents an efficient loss-aware automatic selection of structured pruning (LAASP)...
Graphical abstract

Display Omitted
Highlights
- An efficient loss-aware structured pruning technique for slimming CNNs.
- Pruning-while-training approach replacing sequential train-prune-finetune process.
- Automatic selection of pruning criteria with layer-wise variable rate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 13, Issue 3

Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

July 2017

418 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/3051701

Editor:
Yuan Xie
University of California, Santa Barbara, USA

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 09 February 2017

Accepted: 01 October 2016

Revised: 01 August 2016

Received: 01 March 2016

Published in JETC Volume 13, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Research Foundation of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

405
Total Citations
View Citations
3,590
Total Downloads

Downloads (Last 12 months)552
Downloads (Last 6 weeks)58

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gong ZZhang HYang HLiu FLuo F(2024)A Review of Neural Network Lightweighting TechniquesInnovation & Technology Advances10.61187/ita.v1i2.361:2(1-16)Online publication date: 16-Jan-2024
https://doi.org/10.61187/ita.v1i2.36
Gu SMeng WSun G(2024)Streamlining YOLOv7 for Rapid and Accurate Detection of Rapeseed Varieties on Embedded DeviceSensors10.3390/s2417558524:17(5585)Online publication date: 28-Aug-2024
https://doi.org/10.3390/s24175585
Ziadi SChokmani KChaabani CEl Alem A(2024)Deep Learning-Based Automatic River Flow Estimation Using RADARSAT ImageryRemote Sensing10.3390/rs1610180816:10(1808)Online publication date: 20-May-2024
https://doi.org/10.3390/rs16101808
Jia LHu YTian XLuo WYe Y(2024)An Agile Super-Resolution Network via Intelligent Path SelectionMathematics10.3390/math1207109412:7(1094)Online publication date: 5-Apr-2024
https://doi.org/10.3390/math12071094
Lalapura VBhimavarapu VAmudha JSatheesh H(2024)A Systematic Evaluation of Recurrent Neural Network Models for Edge Intelligence and Human Activity Recognition ApplicationsAlgorithms10.3390/a1703010417:3(104)Online publication date: 28-Feb-2024
https://doi.org/10.3390/a17030104
Su ZAhmed AWang ZAnwar ACheng Y(2024)Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to AskProceedings of the VLDB Endowment10.14778/3659437.365945617:8(2036-2049)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.14778/3659437.3659456
Narang GOgbogu CDoppa JPande P(2024)TEFLON: Thermally Efficient Dataflow-aware 3D NoC for Accelerating CNN Inferencing on Manycore PIM ArchitecturesACM Transactions on Embedded Computing Systems10.1145/366527923:5(1-23)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.1145/3665279
Tsoukas VGkogkidis ABoumpa EKakarountas A(2024)A Review on the emerging technology of TinyMLACM Computing Surveys10.1145/366182056:10(1-37)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3661820
Nwiran BKrzyzak A(2024)MobileNetV3 Layer Sensitivity and SparsityProceedings of the ACM/IEEE 6th International Workshop on Software Engineering Research & Practices for the Internet of Things10.1145/3643794.3648288(38-43)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643794.3648288
Wang MZhao YLiu JChen JZhuang CGu JGuo RZhao XChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Large Multimodal Model Compression via Iterative Efficient Pruning and DistillationCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648321(235-244)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648321
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents