Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3540261.3541846guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

DominoSearch: find layer-wise fine-grained N:M sparse schemes from dense neural networks

Published: 06 December 2021 Publication History

Abstract

Neural pruning is a widely-used compression technique for Deep Neural Networks (DNNs). Recent innovations in Hardware Architectures (e.g. Nvidia Ampere Sparse Tensor Core) and N:M fine-grained Sparse Neural Network algorithms (i.e. every M-weights contains N non-zero values) reveal a promising research line of neural pruning. However, the existing N:M algorithms only address the challenge of how to train N:M sparse neural networks in a uniform fashion (i.e. every layer has the same N:M sparsity) and suffer from a significant accuracy drop for high sparsity (i.e. when sparsity > 80%). To tackle this problem, we present a novel technique -DominoSearch to find mixed N:M sparsity schemes from pre-trained dense deep neural networks to achieve higher accuracy than the uniform-sparsity scheme with equivalent complexity constraints (e.g. model size or FLOPs). For instance, for the same model size with 2.1M parameters (87.5% sparsity), our layer-wise N:M sparse ResNet18 outperforms its uniform counterpart by 2.1% top-1 accuracy, on the large-scale ImageNet dataset. For the same computational complexity of 227M FLOPs, our layer-wise sparse ResNet18 outperforms the uniform one by 1.3% top-1 accuracy. Furthermore, our layer-wise fine-grained N:M sparse ResNet50 achieves 76.7% top-1 accuracy with 5.0M parameters. This is competitive to the results achieved by layer-wise unstructured sparsity that is believed to be the upper-bound of Neural Network pruning with respect to the accuracy-sparsity trade-off. We believe that our work can build a strong baseline for further sparse DNN research and encourage future hardware-algorithm co-design work.

Supplementary Material

Additional material (3540261.3541846_supp.pdf)
Supplemental material.

References

[1]
Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. "A Survey of Quantization Methods for Efficient Neural Network Inference". In: arXiv preprint arXiv:2103.13630 (2021).
[2]
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. "Incremental network quantization: Towards lossless cnns with low-precision weights". In: arXiv preprint arXiv:1702.03044 (2017).
[3]
Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. "Haq: Hardware-aware automated quantization with mixed precision". In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, pp. 8612–8620.
[4]
Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen, and Xin Wang. "A comprehensive survey of neural architecture search: Challenges and solutions". In: arXiv preprint arXiv:2006.02903 (2020).
[5]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. "Mnasnet: Platform-aware neural architecture search for mobile". In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, pp. 2820–2828.
[6]
Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, and Quoc Le. "Bignas: Scaling up neural architecture search with big single-stage models". In: European Conference on Computer Vision. Springer. 2020, pp. 702–717.
[7]
Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. "Knowledge distillation: A survey". In: International Journal of Computer Vision (2021), pp. 1–31.
[8]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network". In: arXiv preprint arXiv:1503.02531 (2015).
[9]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. "Fitnets: Hints for thin deep nets". In: arXiv preprint arXiv:1412.6550 (2014).
[10]
Ronny Krashinsky and Olivier Giroux et al. Nvidia Ampere Sparse Tensor Core. URL: https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/.
[11]
Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, and Hongsheng Li. "Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch". In: International Conference on Learning Representations. 2021.
[12]
Nvidia Automatic SParsity. URL: https://github.com/NVIDIA/apex/tree/e2083df5eb96643c61613b9df48dd4eea6b07690/apex/contrib/sparsity.
[13]
Ziheng Wang. "SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference". In: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 2020, pp. 31–42.
[14]
Jaeho Lee, Sejun Park, Sangwoo Mo, Sungsoo Ahn, and Jinwoo Shin. "Layer-adaptive Sparsity for the Magnitude-based Pruning". In: International Conference on Learning Representations. 2021.
[15]
Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. "Rigging the lottery: Making all tickets winners". In: International Conference on Machine Learning. PMLR. 2020, pp. 2943–2952.
[16]
Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, and Ali Farhadi. "Soft Threshold Weight Reparameterization for Learnable Sparsity". In: Proceedings of the International Conference on Machine Learning. 2020.
[17]
Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H Nguyen, Madeleine Gibescu, and Antonio Liotta. "Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science". In: Nature communications 9.1 (2018), pp. 1–12.
[18]
Song Han, Jeff Pool, John Tran, and William Dally. "Learning both Weights and Connections for Efficient Neural Network". In: Advances in Neural Information Processing Systems. 2015.
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition". In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778.
[20]
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. "Designing network design spaces". In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 10428–10436.
[21]
Yiwen Guo, Anbang Yao, and Yurong Chen. "Dynamic network surgery for efficient dnns". In: arXiv preprint arXiv:1608.04493 (2016).
[22]
Huan Wang, Can Qin, Yulun Zhang, and Yun Fu. "Neural Pruning via Growing Regularization". In: arXiv preprint arXiv:2012.09243 (2020).
[23]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. "Learning structured sparsity in deep neural networks". In: arXiv preprint arXiv:1608.03665 (2016).
[24]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. "Pruning filters for efficient convnets". In: International Conference on Learning Representations. 2017.
[25]
Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, and Yuhao Zhu. "Accelerating sparse dnn models without hardware-support via tile-wise sparsity". In: arXiv preprint arXiv:2008.13006 (2020).
[26]
Xundong Wu, Xiangwen Liu, Wei Li, and Qing Wu. "Improved expressivity through dendritic neural networks". In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, pp. 8068–8079.
[27]
Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Seffi Naor, and Daniel Soudry. "Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N: M Transposable Masks". In: arXiv preprint arXiv:2102.08124 (2021).
[28]
Mingzhu Shen, Feng Liang, Chuming Li, Chen Lin, Ming Sun, Junjie Yan, and Wanli Ouyang. Once Quantized for All: Progressively Searching for Quantized Efficient Models. 2020. arXiv: 2010.04354 [cs.CV].
[29]
Zhaowei Cai and Nuno Vasconcelos. "Rethinking differentiable search for mixed-precision neural networks". In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 2349–2358.
[30]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. "ImageNet Large Scale Visual Recognition Challenge". In: International Journal of Computer Vision (IJCV) 115.3 (2015), pp. 211–252.
[31]
Trevor Gale, Erich Elsen, and Sara Hooker. "The state of sparsity in deep neural networks". In: arXiv preprint arXiv:1902.09574 (2019).
[32]
Hesham Mostafa and Xin Wang. "Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization". In: International Conference on Machine Learning. PMLR. 2019, pp. 4646–4655.
[33]
Mitchell Wortsman, Ali Farhadi, and Mohammad Rastegari. "Discovering neural wirings". In: arXiv preprint arXiv:1906.00586 (2019).
[34]
Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, and Decebal Constantin Mocanu. "Sparse Training via Boosting Pruning Plasticity with Neuroregeneration". In: arXiv preprint arXiv:2106.10404 (2021).
[35]
Huan Wang, Xinyi Hu, Qiming Zhang, Yuehai Wang, Lu Yu, and Haoji Hu. "Structured pruning for efficient convolutional neural networks via incremental regularization". In: IEEE Journal of Selected Topics in Signal Processing 14.4 (2019), pp. 775–788.
[36]
Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, and Jan Kautz. "Importance estimation for neural network pruning". In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, pp. 11264–11272.
[37]
Shaohui Lin, Rongrong Ji, Yuchao Li, Yongjian Wu, Feiyue Huang, and Baochang Zhang. "Accelerating Convolutional Networks via Global & Dynamic Filter Pruning." In: IJCAI. 2018, pp. 2425–2432.
[38]
Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. "Group Fisher Pruning for Practical Network Compression". In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research. PMLR, 2021.
[39]
Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. "Metapruning: Meta learning for automatic neural network channel pruning". In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019, pp. 3296–3305.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems
December 2021
30517 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 06 December 2021

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media