Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3649476.3658699acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

Resource-Aware Saliency-Guided Differentiable Pruning for Deep Neural Networks

Published: 12 June 2024 Publication History

Abstract

The increasing demand for efficient deep learning model deployment on Tiny Machine Learning (tinyML) and Edge platforms necessitates the development of methods that enable automated and effective network pruning, tailored to tinyML hardware constraints. In this paper, we present a novel differentiable pruning method that accepts total available memory on a tinyML hardware and employs saliency based measurements to identify and prune less significant connections within a deep neural network (DNN). Our approach integrates network compression within the training process, adapting resource utilization to the specific constraints of FPGAs, particularly focusing on on-chip memory. By leveraging a custom tinyML accelerator, we enable an efficient hardware-software co-design. Our framework further quantizes the model to int-8 to optimize the balance between model size and accuracy, crucial for tinyML applications. The efficacy of our approach is examined for the compression of LeNet and VGG16 DNNs. When compared to similar state of the art pruning techniques, our approach for no drop in accuracy further compresses LeNet by 1.15 ×. In the case of VGG16, compared to the baseline implementation, for a 4% drop in accuracy we compress the model up to 55 ×. A comparative analysis of our FPGA hardware accelerator against leading image classification accelerators emphasizes the merits of our approach with a marked improvement in throughput by 1.46 × for LeNet and energy efficiency by 1.7 × for VGG16.

References

[1]
Brian Bartoldson, Ari Morcos, Adrian Barbu, and Gordon Erlebacher. 2020. The generalization-stability tradeoff in neural network pruning. Advances in Neural Information Processing Systems 33 (2020), 20852–20864.
[2]
Shih-Kang Chao, Zhanyu Wang, Yue Xing, and Guang Cheng. 2020. Directional pruning of deep neural networks. Advances in neural information processing systems 33 (2020), 13986–13998.
[3]
Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. 2023. A survey on deep neural network pruning-taxonomy, comparison, analysis, and recommendations. arXiv preprint arXiv:2308.06767 (2023).
[4]
Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang. 2017. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE transactions on computer-aided design of integrated circuits and systems 37, 1 (2017), 35–47.
[5]
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[6]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. Advances in neural information processing systems 28 (2015).
[7]
Morteza Hosseini, Mark Horton, Hiren Paneliya, Uttej Kallakuri, Houman Homayoun, and Tinoosh Mohsenin. 2019. On the complexity reduction of dense layers from o (n2) to o (nlogn) with cyclic sparsely connected layers. In Proceedings of the 56th Annual Design Automation Conference 2019. 1–6.
[8]
Morteza Hosseini, Nitheesh Manjunath, Uttej Kallakuri, Hamid Mahmoodi, Houman Homayoun, and Tinoosh Mohsenin. 2021. Cyclic Sparsely Connected Architectures: From foundations to applications. IEEE Solid-State Circuits Magazine 13, 4 (2021), 64–76.
[9]
Morteza Hosseini, Nitheesh Kumar Manjunath, Bharat Prakash, Arnab Mazumder, Vandana Chandrareddy, Houman Homayoun, and Tinoosh Mohsenin. 2021. Cyclic Sparsely Connected Architectures for Compact Deep Convolutional Neural Networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29, 10 (2021), 1757–1770.
[10]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016).
[11]
Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, and Xiaotong Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403.
[12]
Edgar Liberis and Nicholas D Lane. 2023. Differentiable Neural Network Pruning to Enable Smart Applications on Microcontrollers. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 4 (2023), 1–19.
[13]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
[14]
Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. 2019. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 17–25.
[15]
Arnab Neelim Mazumder, Jian Meng, Hasib-Al Rashid, Utteja Kallakuri, Xin Zhang, Jae-sun Seo, and Tinoosh Mohsenin. 2021. A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems (2021).
[16]
Arnab Neelim Mazumder and Tinoosh Mohsenin. 2023. Reg-TuneV2: Hardware-Aware and Multi-Objective Regression-Based Fine-Tuning Approach for DNNs on Embedded Platforms. IEEE Micro (2023).
[17]
Hasib-Al Rashid 2023. TinyM2Net-V2: A Compact Low Power Software Hardware Architecture for Multimodal Deep Neural Networks. ACM Transactions on Embedded Computing Systems (2023).
[18]
James K. Reed, Zachary DeVito, Horace He, Ansley Ussery, and Jason Ansel. 2022. Torch.fx: Practical Program Capture and Transformation for Deep Learning in Python. arxiv:2112.08429 [cs.LG]
[19]
Gaurav Srivastava, Deepak Kadetotad, Shihui Yin, Visar Berisha, Chaitali Chakrabarti, and Jae-sun Seo. 2019. Joint optimization of quantization and structured sparsity for compressed deep neural networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1393–1397.
[20]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.
[21]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. Advances in neural information processing systems 29 (2016).
[22]
Xuefeng Xiao, Lianwen Jin, Yafeng Yang, Weixin Yang, Jun Sun, and Tianhai Chang. 2017. Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition. Pattern Recognition 72 (2017), 72–81.
[23]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays. 161–170.
[24]
Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Xiaolong Ma, Ning Liu, Linfeng Zhang, Jian Tang, Kaisheng Ma, Xue Lin, Makan Fardad, 2018. StructADMM: A systematic, high-efficiency framework of structured weight pruning for DNNs. arXiv preprint arXiv:1807.11091 (2018).
[25]
Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017).
[26]
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024
June 2024
797 pages
ISBN:9798400706059
DOI:10.1145/3649476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Convolutional Neural Networks
  2. Deep Learning
  3. FPGA
  4. Neural Networks
  5. Pruning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

GLSVLSI '24
Sponsor:
GLSVLSI '24: Great Lakes Symposium on VLSI 2024
June 12 - 14, 2024
FL, Clearwater, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25
Great Lakes Symposium on VLSI 2025
June 30 - July 2, 2025
New Orleans , LA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 81
    Total Downloads
  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)5
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media