Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Sparse BD-Net: A Multiplication-less DNN with Sparse Binarized Depth-wise Separable Convolution

Published: 30 January 2020 Publication History

Abstract

In this work, we propose a multiplication-less binarized depthwise-separable convolution neural network, called BD-Net. BD-Net is designed to use binarized depthwise separable convolution block as the drop-in replacement of conventional spatial-convolution in deep convolution neural network (DNN). In BD-Net, the computation-expensive convolution operations (i.e., Multiplication and Accumulation) are converted into energy-efficient Addition/Subtraction operations. For further compressing the model size while maintaining the dominant computation in addition/subtraction, we propose a brand-new sparse binarization method with a hardware-oriented structured sparsity pattern. To successfully train such sparse BD-Net, we propose and leverage two techniques: (1) a modified group-lasso regularization whose group size is identical to the capacity of basic computing core in accelerator and (2) a weight penalty clipping technique to solve the disharmony issue between weight binarization and lasso regularization. The experiment results show that the proposed sparse BD-Net can achieve comparable or even better inference accuracy, in comparison to the full precision CNN baseline. Beyond that, a BD-Net customized process-in-memory accelerator is designed using SOT-MRAM, which owns characteristics of high channel expansion flexibility and computation parallelism. Through the detailed analysis from both software and hardware perspectives, we provide an intuitive design guidance for software/hardware co-design of DNN acceleration on mobile embedded systems. Note that this journal submission is the extended version of our previous published paper in ISVLSI 2018 [24].

References

[1]
2011. NCSU EDA FreePDK45. Retrieved from: http://www.eda.ncsu.edu/wiki/FreePDK45:Contents.
[2]
Renzo Andri et al. 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’16). IEEE, 236--241.
[3]
Shaahin Angizi, Zhezhi He, and Deliang Fan. 2018. DIMA: A depthwise CNN in-memory accelerator. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). IEEE, 1--8.
[4]
Shaahin Angizi, Zhezhi He, and Deliang Fan. 2018. PIMA-logic: A novel processing-in-memory architecture for highly flexible and energy-efficient logic computation. In Proceedings of the 55th Design Automation Conference. ACM, 162.
[5]
Shaahin Angizi, Zhezhi He, Farhana Parveen, and Deliang Fan. 2017. RIMPA: A new reconfigurable dual-mode in-memory processing architecture with spin Hall effect-driven domain wall motion device. In Proceedings of the IEEE Computer Society Symposium on VLSI (ISVLSI’17). IEEE, 45--50.
[6]
Shaahin Angizi, Zhezhi He, Farhana Parveen, and Deliang Fan. 2018. IMCE: Energy-efficient bit-wise in-memory convolution engine for deep neural network. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference. IEEE Press, 111--116.
[7]
Shaahin Angizi, Jiao Sun, Wei Zhang, and Deliang Fan. 2019. AlignS: A processing-in-memory accelerator for DNA short read alignment leveraging SOT-MRAM. In Proceedings of the 56th Design Automation Conference. ACM, 144.
[8]
Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. 2018. Scalable methods for 8-bit training of neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 5145--5153.
[9]
Yoshua Bengio et al. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432 (2013).
[10]
Ke Chen et al. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE’12). IEEE, 33--38.
[11]
Ping Chi et al. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proceedings of the International Symposium on Computer Architecture (ISCA’16). IEEE Press.
[12]
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251--1258.
[13]
Matthieu Courbariaux et al. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3123--3131.
[14]
Xiangyu Dong et al. 2014. NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory. In Emerging Memory Technologies. Springer, 15--50.
[15]
Thomas Vogelsang. 2010. Understanding the energy consumption of dynamic random access memories. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society.
[16]
Xuanyao Fong, Sumeet K. Gupta et al. 2011. KNACK: A hybrid spin-charge mixed-mode simulator for evaluating different genres of spin-transfer torque MRAM bit-cells. In Proceedings of the International Conference on Simulation of Semiconductor Processes and Devices (SISPAD’11). IEEE, 51--54.
[17]
Song Han et al. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization, and Huffman coding. In Proceedings of the International Conference on Learning Representations (ICLR’15).
[18]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Advances in Neural Information Processing Systems Conference. 1135--1143.
[19]
Kaiming He et al. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR’16). 770--778.
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.
[21]
Zhezhi He et al. 2017. High performance and energy-efficient in-memory computing architecture based on SOT-MRAM. In Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH’17). IEEE, 97--102.
[22]
Zhezhi He, Shaahin Angizi, and Deliang Fan. 2017. Exploring STT-MRAM based in-memory computing paradigm with application of image edge extraction. In Proceedings of the IEEE International Conference on Computer Design (ICCD’17). IEEE, 439--446.
[23]
Zhezhi He, Shaahin Angizi, and Deliang Fan. 2018. Accelerating low bit-width deep convolution neural network in MRAM. In Proceedings of the IEEE Computer Society Symposium on VLSI (ISVLSI’18). IEEE, 533--538.
[24]
Zhezhi He, Shaahin Angizi, Adnan Siraj Rakin, and Deliang Fan. 2018. BD-NET: A multiplication-less DNN with binarized depthwise separable convolution. In Proceedings of the IEEE Computer Society Symposium on VLSI (ISVLSI’18). IEEE, 130--135.
[25]
Zhezhi He and Deliang Fan. 2019. Simultaneously optimizing weight and quantizer of ternary neural network using truncated Gaussian approximation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[26]
Zhezhi He, Boqing Gong, and Deliang Fan. 2019. Optimize deep convolutional neural network with ternarized weights and high accuracy. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’19). IEEE, 913--921.
[27]
Zhezhi He, Yang Zhang, Shaahin Angizi, Boqing Gong, and Deliang Fan. 2018. Exploring a SOT-MRAM based in-memory computing for data processing. IEEE Trans. Multi-Scale Comput. Syst. 4, 4 (2018), 676--685.
[28]
Andrew G. Howard et al. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint:1704.04861 (2017).
[29]
Itay Hubara et al. 2016. Binarized neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 4107--4115.
[30]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. 448--456.
[31]
Felix Juefei-Xu et al. 2016. Local binary convolutional neural networks. arXiv preprint arXiv:1608.06049 (2016).
[32]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 1097--1105.
[33]
Vadim Lebedev and Victor Lempitsky. 2016. Fast ConvNets using group-wise brain damage. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2554--2564.
[34]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.
[35]
Shuangchen Li et al. 2017. DRISA: A DRAM-based reconfigurable in-situ accelerator. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’17). ACM, 288--301.
[36]
Shuangchen Li, Cong Xu et al. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In Proceedings of the Design Automation Conference (DAC’16). IEEE.
[37]
Shiyu Liang and R. Srikant. 2017. Why deep neural networks for function approximation? In Proceedings of the International Conference on Learning Representations (ICLR’17).
[38]
Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 806--814.
[39]
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18). 19--34.
[40]
Courbariaux Matthieu et al.2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or 1. arXiv:1602.02830 (2016).
[41]
Asit Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. 2018. WRPN: Wide reduced-precision networks. In Proceedings of the International Conference on Learning Representations (ICLR’18).
[42]
Naveen Muralimanohar et al. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories (2009), 22--31.
[43]
Yuval Netzer et al. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop, Vol. 2011. 5.
[44]
Chi-Feng Pai et al. 2012. Spin transfer torque devices utilizing the giant spin Hall effect of tungsten. Appl. Phys. Lett. 101, 12 (2012), 122404.
[45]
Wei Pan, Xiaofan Lin, and Cong Zhao. 2017. Towards accurate binary convolutional neural network. In Proceedings of the Advances in Neural Information Processing Systems Conference. 344--352.
[46]
Mohammad Rastegari et al. 2016. XNORr-Net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 525--542.
[47]
Vivek Seshadri et al. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’17). ACM, 273--287.
[48]
Laurent Sifre and Stéphane Mallat. 2014. Rigid-motion Scattering for Image Classification. Ph.D. Dissertation. Citeseer.
[49]
Synopsys Design Compiler. Product Version 14.9.2014. Synopsys, Inc. https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/design-compiler-nxt.html.
[50]
Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, and Matt Richardson. 2016. Do deep convolutional nets really need to be deep and convolutional?arXiv preprint arXiv:1603.05691 (2016).
[51]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 2074--2082.
[52]
Lixue Xia, Boxun Li, Tianqi Tang, Peng Gu, Pai-Yu Chen, Shimeng Yu, Yu Cao, Yu Wang, Yuan Xie, and Huazhong Yang. 2017. MNSIM: Simulation platform for memristor-based neuromorphic computing system. IEEE Trans. Comput.-Aided Des. Integ. Circ. Syst. 37, 5 (2017), 1009--1022.
[53]
Shuchang Zhou et al. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint:1606.06160 (2016).
[54]
Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR’17).
[55]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697--8710.

Cited By

View all
  • (2024)A Progressive Subnetwork Searching Framework for Dynamic InferenceIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.319970335:3(3809-3820)Online publication date: Mar-2024
  • (2024)MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00024(186-203)Online publication date: 2-Mar-2024
  • (2022)Grid Management Optimization of College Students under Artificial Intelligence and Wireless NetworksMobile Information Systems10.1155/2022/45140582022Online publication date: 1-Jan-2022
  • Show More Cited By

Index Terms

  1. Sparse BD-Net: A Multiplication-less DNN with Sparse Binarized Depth-wise Separable Convolution

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Journal on Emerging Technologies in Computing Systems
      ACM Journal on Emerging Technologies in Computing Systems  Volume 16, Issue 2
      April 2020
      261 pages
      ISSN:1550-4832
      EISSN:1550-4840
      DOI:10.1145/3375712
      • Editor:
      • Zhaojun Bai
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 30 January 2020
      Accepted: 01 October 2019
      Revised: 01 September 2019
      Received: 01 May 2019
      Published in JETC Volume 16, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Deep neural network
      2. in-memory computing
      3. model compression

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)216
      • Downloads (Last 6 weeks)30
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Progressive Subnetwork Searching Framework for Dynamic InferenceIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.319970335:3(3809-3820)Online publication date: Mar-2024
      • (2024)MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00024(186-203)Online publication date: 2-Mar-2024
      • (2022)Grid Management Optimization of College Students under Artificial Intelligence and Wireless NetworksMobile Information Systems10.1155/2022/45140582022Online publication date: 1-Jan-2022
      • (2022)Fan Fault Diagnosis Based on Lightweight Multiscale Multiattention Feature Fusion NetworkIEEE Transactions on Industrial Informatics10.1109/TII.2021.312129418:7(4542-4554)Online publication date: Jul-2022
      • (2022)Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the CloudIEEE Micro10.1109/MM.2022.320235042:6(25-38)Online publication date: 1-Nov-2022
      • (2022)Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI54635.2022.00060(273-278)Online publication date: Jul-2022
      • (2022)Gearbox fault identification based on lightweight multivariate multidirectional induction networkMeasurement10.1016/j.measurement.2022.110977193(110977)Online publication date: Apr-2022
      • (2022)Detection and recognition of stationary vehicles and seat belts in intelligent Internet of Things traffic management systemNeural Computing and Applications10.1007/s00521-021-05870-634:5(3513-3522)Online publication date: 1-Mar-2022
      • (2021)Rethinking Separable Convolutional Encoders for End-to-End Semantic Image SegmentationMathematical Problems in Engineering10.1155/2021/55666912021(1-12)Online publication date: 16-Apr-2021
      • (2021)SIMDRAM: a framework for bit-serial SIMD processing using DRAMProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446749(329-345)Online publication date: 19-Apr-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media