research-article

Public Access

Sparse BD-Net: A Multiplication-less DNN with Sparse Binarized Depth-wise Separable Convolution

Authors:

Shaahin Angizi,

Adnan Siraj Rakin,

Deliang FanAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 16, Issue 2

Article No.: 15, Pages 1 - 24

https://doi.org/10.1145/3369391

Published: 30 January 2020 Publication History

All formats PDF

Abstract

In this work, we propose a multiplication-less binarized depthwise-separable convolution neural network, called BD-Net. BD-Net is designed to use binarized depthwise separable convolution block as the drop-in replacement of conventional spatial-convolution in deep convolution neural network (DNN). In BD-Net, the computation-expensive convolution operations (i.e., Multiplication and Accumulation) are converted into energy-efficient Addition/Subtraction operations. For further compressing the model size while maintaining the dominant computation in addition/subtraction, we propose a brand-new sparse binarization method with a hardware-oriented structured sparsity pattern. To successfully train such sparse BD-Net, we propose and leverage two techniques: (1) a modified group-lasso regularization whose group size is identical to the capacity of basic computing core in accelerator and (2) a weight penalty clipping technique to solve the disharmony issue between weight binarization and lasso regularization. The experiment results show that the proposed sparse BD-Net can achieve comparable or even better inference accuracy, in comparison to the full precision CNN baseline. Beyond that, a BD-Net customized process-in-memory accelerator is designed using SOT-MRAM, which owns characteristics of high channel expansion flexibility and computation parallelism. Through the detailed analysis from both software and hardware perspectives, we provide an intuitive design guidance for software/hardware co-design of DNN acceleration on mobile embedded systems. Note that this journal submission is the extended version of our previous published paper in ISVLSI 2018 [24].

References

[1]

2011. NCSU EDA FreePDK45. Retrieved from: http://www.eda.ncsu.edu/wiki/FreePDK45:Contents.

[2]

Renzo Andri et al. 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’16). IEEE, 236--241.

[3]

Shaahin Angizi, Zhezhi He, and Deliang Fan. 2018. DIMA: A depthwise CNN in-memory accelerator. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). IEEE, 1--8.

Digital Library

[4]

Shaahin Angizi, Zhezhi He, and Deliang Fan. 2018. PIMA-logic: A novel processing-in-memory architecture for highly flexible and energy-efficient logic computation. In Proceedings of the 55th Design Automation Conference. ACM, 162.

Digital Library

[5]

Shaahin Angizi, Zhezhi He, Farhana Parveen, and Deliang Fan. 2017. RIMPA: A new reconfigurable dual-mode in-memory processing architecture with spin Hall effect-driven domain wall motion device. In Proceedings of the IEEE Computer Society Symposium on VLSI (ISVLSI’17). IEEE, 45--50.

[6]

Shaahin Angizi, Zhezhi He, Farhana Parveen, and Deliang Fan. 2018. IMCE: Energy-efficient bit-wise in-memory convolution engine for deep neural network. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference. IEEE Press, 111--116.

Digital Library

[7]

Shaahin Angizi, Jiao Sun, Wei Zhang, and Deliang Fan. 2019. AlignS: A processing-in-memory accelerator for DNA short read alignment leveraging SOT-MRAM. In Proceedings of the 56th Design Automation Conference. ACM, 144.

Digital Library

[8]

Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. 2018. Scalable methods for 8-bit training of neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 5145--5153.

[9]

Yoshua Bengio et al. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432 (2013).

[10]

Ke Chen et al. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE’12). IEEE, 33--38.

[11]

Ping Chi et al. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proceedings of the International Symposium on Computer Architecture (ISCA’16). IEEE Press.

[12]

François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251--1258.

[13]

Matthieu Courbariaux et al. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3123--3131.

[14]

Xiangyu Dong et al. 2014. NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory. In Emerging Memory Technologies. Springer, 15--50.

[15]

Thomas Vogelsang. 2010. Understanding the energy consumption of dynamic random access memories. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society.

Digital Library

[16]

Xuanyao Fong, Sumeet K. Gupta et al. 2011. KNACK: A hybrid spin-charge mixed-mode simulator for evaluating different genres of spin-transfer torque MRAM bit-cells. In Proceedings of the International Conference on Simulation of Semiconductor Processes and Devices (SISPAD’11). IEEE, 51--54.

[17]

Song Han et al. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization, and Huffman coding. In Proceedings of the International Conference on Learning Representations (ICLR’15).

[18]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Advances in Neural Information Processing Systems Conference. 1135--1143.

[19]

Kaiming He et al. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR’16). 770--778.

[20]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.

Digital Library

[21]

Zhezhi He et al. 2017. High performance and energy-efficient in-memory computing architecture based on SOT-MRAM. In Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH’17). IEEE, 97--102.

[22]

Zhezhi He, Shaahin Angizi, and Deliang Fan. 2017. Exploring STT-MRAM based in-memory computing paradigm with application of image edge extraction. In Proceedings of the IEEE International Conference on Computer Design (ICCD’17). IEEE, 439--446.

[23]

Zhezhi He, Shaahin Angizi, and Deliang Fan. 2018. Accelerating low bit-width deep convolution neural network in MRAM. In Proceedings of the IEEE Computer Society Symposium on VLSI (ISVLSI’18). IEEE, 533--538.

[24]

Zhezhi He, Shaahin Angizi, Adnan Siraj Rakin, and Deliang Fan. 2018. BD-NET: A multiplication-less DNN with binarized depthwise separable convolution. In Proceedings of the IEEE Computer Society Symposium on VLSI (ISVLSI’18). IEEE, 130--135.

[25]

Zhezhi He and Deliang Fan. 2019. Simultaneously optimizing weight and quantizer of ternary neural network using truncated Gaussian approximation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[26]

Zhezhi He, Boqing Gong, and Deliang Fan. 2019. Optimize deep convolutional neural network with ternarized weights and high accuracy. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’19). IEEE, 913--921.

[27]

Zhezhi He, Yang Zhang, Shaahin Angizi, Boqing Gong, and Deliang Fan. 2018. Exploring a SOT-MRAM based in-memory computing for data processing. IEEE Trans. Multi-Scale Comput. Syst. 4, 4 (2018), 676--685.

[28]

Andrew G. Howard et al. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint:1704.04861 (2017).

[29]

Itay Hubara et al. 2016. Binarized neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 4107--4115.

[30]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. 448--456.

Digital Library

[31]

Felix Juefei-Xu et al. 2016. Local binary convolutional neural networks. arXiv preprint arXiv:1608.06049 (2016).

[32]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 1097--1105.

[33]

Vadim Lebedev and Victor Lempitsky. 2016. Fast ConvNets using group-wise brain damage. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2554--2564.

[34]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.

[35]

Shuangchen Li et al. 2017. DRISA: A DRAM-based reconfigurable in-situ accelerator. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’17). ACM, 288--301.

[36]

Shuangchen Li, Cong Xu et al. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In Proceedings of the Design Automation Conference (DAC’16). IEEE.

[37]

Shiyu Liang and R. Srikant. 2017. Why deep neural networks for function approximation? In Proceedings of the International Conference on Learning Representations (ICLR’17).

[38]

Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 806--814.

[39]

Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18). 19--34.

Digital Library

[40]

Courbariaux Matthieu et al.2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or 1. arXiv:1602.02830 (2016).

[41]

Asit Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. 2018. WRPN: Wide reduced-precision networks. In Proceedings of the International Conference on Learning Representations (ICLR’18).

[42]

Naveen Muralimanohar et al. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories (2009), 22--31.

[43]

Yuval Netzer et al. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop, Vol. 2011. 5.

[44]

Chi-Feng Pai et al. 2012. Spin transfer torque devices utilizing the giant spin Hall effect of tungsten. Appl. Phys. Lett. 101, 12 (2012), 122404.

[45]

Wei Pan, Xiaofan Lin, and Cong Zhao. 2017. Towards accurate binary convolutional neural network. In Proceedings of the Advances in Neural Information Processing Systems Conference. 344--352.

[46]

Mohammad Rastegari et al. 2016. XNORr-Net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 525--542.

[47]

Vivek Seshadri et al. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’17). ACM, 273--287.

[48]

Laurent Sifre and Stéphane Mallat. 2014. Rigid-motion Scattering for Image Classification. Ph.D. Dissertation. Citeseer.

[49]

Synopsys Design Compiler. Product Version 14.9.2014. Synopsys, Inc. https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/design-compiler-nxt.html.

[50]

Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, and Matt Richardson. 2016. Do deep convolutional nets really need to be deep and convolutional?arXiv preprint arXiv:1603.05691 (2016).

[51]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 2074--2082.

[52]

Lixue Xia, Boxun Li, Tianqi Tang, Peng Gu, Pai-Yu Chen, Shimeng Yu, Yu Cao, Yu Wang, Yuan Xie, and Huazhong Yang. 2017. MNSIM: Simulation platform for memristor-based neuromorphic computing system. IEEE Trans. Comput.-Aided Des. Integ. Circ. Syst. 37, 5 (2017), 1009--1022.

[53]

Shuchang Zhou et al. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint:1606.06160 (2016).

[54]

Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR’17).

[55]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697--8710.

Cited By

Yang LHe ZCao YFan D(2024)A Progressive Subnetwork Searching Framework for Dynamic InferenceIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.319970335:3(3809-3820)Online publication date: Mar-2024
https://doi.org/10.1109/TNNLS.2022.3199703
Oliveira GOlgun AYağlıkçı ABostancı FGómez-Luna JGhose SMutlu O(2024)MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00024(186-203)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00024
Jin PZhang Q(2022)Grid Management Optimization of College Students under Artificial Intelligence and Wireless NetworksMobile Information Systems10.1155/2022/45140582022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/4514058
Show More Cited By

Index Terms

Sparse BD-Net: A Multiplication-less DNN with Sparse Binarized Depth-wise Separable Convolution
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
  2. Embedded and cyber-physical systems
    1. Embedded systems

Recommendations

Sparse Learning for Neural Networks with A Generalized Sparse Regularization
Abstract
Deep neural networks (DNNs) is very important and have achieved remarkable accuracies in tasks such as image processing. However, the success of DNNs heavily relies on excessive computation and parameter storage costs. To cut down the overheads, a ...
CMD: controllable matrix decomposition with global optimization for deep neural network compression
Abstract
The compression and acceleration of Deep neural networks (DNNs) are necessary steps to deploy sophisticated networks into resource-constrained hardware systems. Due to the weight matrix tends to be low-rank and sparse, several low-rank and sparse ...
Perturbation of deep autoencoder weights for model compression and classification of tabular data
Abstract
Fully connected deep neural networks (DNN) often include redundant weights leading to overfitting and high memory requirements. Additionally, in tabular data classification, DNNs are challenged by the often superior performance of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 16, Issue 2

April 2020

261 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/3375712

Editor:
Zhaojun Bai
University of California at Davis, USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 30 January 2020

Accepted: 01 October 2019

Revised: 01 September 2019

Received: 01 May 2019

Published in JETC Volume 16, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
1,332
Total Downloads

Downloads (Last 12 months)216
Downloads (Last 6 weeks)30

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang LHe ZCao YFan D(2024)A Progressive Subnetwork Searching Framework for Dynamic InferenceIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.319970335:3(3809-3820)Online publication date: Mar-2024
https://doi.org/10.1109/TNNLS.2022.3199703
Oliveira GOlgun AYağlıkçı ABostancı FGómez-Luna JGhose SMutlu O(2024)MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00024(186-203)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00024
Jin PZhang Q(2022)Grid Management Optimization of College Students under Artificial Intelligence and Wireless NetworksMobile Information Systems10.1155/2022/45140582022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/4514058
Fan ZXu XWang RWang H(2022)Fan Fault Diagnosis Based on Lightweight Multiscale Multiattention Feature Fusion NetworkIEEE Transactions on Industrial Informatics10.1109/TII.2021.312129418:7(4542-4554)Online publication date: Jul-2022
https://doi.org/10.1109/TII.2021.3121294
Oliveira GGómez-Luna JGhose SBoroumand AMutlu O(2022)Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the CloudIEEE Micro10.1109/MM.2022.320235042:6(25-38)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1109/MM.2022.3202350
Oliveira GBoroumand AGhose SGomez-Luna JMutlu O(2022)Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI54635.2022.00060(273-278)Online publication date: Jul-2022
https://doi.org/10.1109/ISVLSI54635.2022.00060
Zhu XWang RFan ZXia DLiu ZLi Z(2022)Gearbox fault identification based on lightweight multivariate multidirectional induction networkMeasurement10.1016/j.measurement.2022.110977193(110977)Online publication date: Apr-2022
https://doi.org/10.1016/j.measurement.2022.110977
Wang ZMa Y(2022)Detection and recognition of stationary vehicles and seat belts in intelligent Internet of Things traffic management systemNeural Computing and Applications10.1007/s00521-021-05870-634:5(3513-3522)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s00521-021-05870-6
Wang LWang XHawbani AXiong YZhang X(2021)Rethinking Separable Convolutional Encoders for End-to-End Semantic Image SegmentationMathematical Problems in Engineering10.1155/2021/55666912021(1-12)Online publication date: 16-Apr-2021
https://doi.org/10.1155/2021/5566691
Hajinazar NOliveira GGregorio SFerreira JGhiasi NPatel MAlser MGhose SGómez-Luna JMutlu OSherwood TBerger EKozyrakis C(2021)SIMDRAM: a framework for bit-serial SIMD processing using DRAMProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446749(329-345)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446749
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents