research-article

Resource-Aware Saliency-Guided Differentiable Pruning for Deep Neural Networks

Authors:

Uttej Kallakuri,

Tinoosh MohseninAuthors Info & Claims

GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024

Pages 694 - 699

https://doi.org/10.1145/3649476.3658699

Published: 12 June 2024 Publication History

Abstract

The increasing demand for efficient deep learning model deployment on Tiny Machine Learning (tinyML) and Edge platforms necessitates the development of methods that enable automated and effective network pruning, tailored to tinyML hardware constraints. In this paper, we present a novel differentiable pruning method that accepts total available memory on a tinyML hardware and employs saliency based measurements to identify and prune less significant connections within a deep neural network (DNN). Our approach integrates network compression within the training process, adapting resource utilization to the specific constraints of FPGAs, particularly focusing on on-chip memory. By leveraging a custom tinyML accelerator, we enable an efficient hardware-software co-design. Our framework further quantizes the model to int-8 to optimize the balance between model size and accuracy, crucial for tinyML applications. The efficacy of our approach is examined for the compression of LeNet and VGG16 DNNs. When compared to similar state of the art pruning techniques, our approach for no drop in accuracy further compresses LeNet by 1.15 ×. In the case of VGG16, compared to the baseline implementation, for a 4% drop in accuracy we compress the model up to 55 ×. A comparative analysis of our FPGA hardware accelerator against leading image classification accelerators emphasizes the merits of our approach with a marked improvement in throughput by 1.46 × for LeNet and energy efficiency by 1.7 × for VGG16.

References

[1]

Brian Bartoldson, Ari Morcos, Adrian Barbu, and Gordon Erlebacher. 2020. The generalization-stability tradeoff in neural network pruning. Advances in Neural Information Processing Systems 33 (2020), 20852–20864.

[2]

Shih-Kang Chao, Zhanyu Wang, Yue Xing, and Guang Cheng. 2020. Directional pruning of deep neural networks. Advances in neural information processing systems 33 (2020), 13986–13998.

[3]

Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. 2023. A survey on deep neural network pruning-taxonomy, comparison, analysis, and recommendations. arXiv preprint arXiv:2308.06767 (2023).

[4]

Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang. 2017. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE transactions on computer-aided design of integrated circuits and systems 37, 1 (2017), 35–47.

[5]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[6]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. Advances in neural information processing systems 28 (2015).

[7]

Morteza Hosseini, Mark Horton, Hiren Paneliya, Uttej Kallakuri, Houman Homayoun, and Tinoosh Mohsenin. 2019. On the complexity reduction of dense layers from o (n2) to o (nlogn) with cyclic sparsely connected layers. In Proceedings of the 56th Annual Design Automation Conference 2019. 1–6.

Digital Library

[8]

Morteza Hosseini, Nitheesh Manjunath, Uttej Kallakuri, Hamid Mahmoodi, Houman Homayoun, and Tinoosh Mohsenin. 2021. Cyclic Sparsely Connected Architectures: From foundations to applications. IEEE Solid-State Circuits Magazine 13, 4 (2021), 64–76.

[9]

Morteza Hosseini, Nitheesh Kumar Manjunath, Bharat Prakash, Arnab Mazumder, Vandana Chandrareddy, Houman Homayoun, and Tinoosh Mohsenin. 2021. Cyclic Sparsely Connected Architectures for Compact Deep Convolutional Neural Networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29, 10 (2021), 1757–1770.

[10]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016).

[11]

Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, and Xiaotong Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403.

Digital Library

[12]

Edgar Liberis and Nicholas D Lane. 2023. Differentiable Neural Network Pruning to Enable Smart Applications on Microcontrollers. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 4 (2023), 1–19.

Digital Library

[13]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).

[14]

Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. 2019. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 17–25.

[15]

Arnab Neelim Mazumder, Jian Meng, Hasib-Al Rashid, Utteja Kallakuri, Xin Zhang, Jae-sun Seo, and Tinoosh Mohsenin. 2021. A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems (2021).

[16]

Arnab Neelim Mazumder and Tinoosh Mohsenin. 2023. Reg-TuneV2: Hardware-Aware and Multi-Objective Regression-Based Fine-Tuning Approach for DNNs on Embedded Platforms. IEEE Micro (2023).

[17]

Hasib-Al Rashid 2023. TinyM2Net-V2: A Compact Low Power Software Hardware Architecture for Multimodal Deep Neural Networks. ACM Transactions on Embedded Computing Systems (2023).

[18]

James K. Reed, Zachary DeVito, Horace He, Ansley Ussery, and Jason Ansel. 2022. Torch.fx: Practical Program Capture and Transformation for Deep Learning in Python. arxiv:2112.08429 [cs.LG]

[19]

Gaurav Srivastava, Deepak Kadetotad, Shihui Yin, Visar Berisha, Chaitali Chakrabarti, and Jae-sun Seo. 2019. Joint optimization of quantization and structured sparsity for compressed deep neural networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1393–1397.

[20]

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.

[21]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. Advances in neural information processing systems 29 (2016).

[22]

Xuefeng Xiao, Lianwen Jin, Yafeng Yang, Weixin Yang, Jun Sun, and Tianhai Chang. 2017. Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition. Pattern Recognition 72 (2017), 72–81.

Digital Library

[23]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays. 161–170.

Digital Library

[24]

Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Xiaolong Ma, Ning Liu, Linfeng Zhang, Jian Tang, Kaisheng Ma, Xue Lin, Makan Fardad, 2018. StructADMM: A systematic, high-efficiency framework of structured weight pruning for DNNs. arXiv preprint arXiv:1807.11091 (2018).

[25]

Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017).

[26]

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).

Index Terms

Resource-Aware Saliency-Guided Differentiable Pruning for Deep Neural Networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Pruning convolutional neural networks via filter similarity analysis
Abstract
Deep learning has shown excellent performance in many fields, especially image recognition and retrieval in recent years. The performance of convolutional neural networks (CNNs) is particularly outstanding. CNNs, however, are usually ...
Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs ...
Towards dropout training for convolutional neural networks

Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in convolutional and pooling layers is still not clear. This paper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024

June 2024

797 pages

ISBN:9798400706059

DOI:10.1145/3649476

Editors:
Inna Partin-Vaisband
University of Illinois Chicago, USA
,
Srinivas Katkoori
University of South Florida, USA
,
Lu Peng
Tulane University, USA
,
Boris Vaisband
McGill University, Canada
,
Tooraj Nikoubin
University of Texas at Dallas, USA

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

GLSVLSI '24

Sponsor:

SIGDA

GLSVLSI '24: Great Lakes Symposium on VLSI 2024

June 12 - 14, 2024

FL, Clearwater, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25

Sponsor:
sigda

Great Lakes Symposium on VLSI 2025

June 30 - July 2, 2025

New Orleans , LA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
81
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)5

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten