research-article

DIMA: A Depthwise CNN In-Memory Accelerator

Authors:

Shaahin Angizi,

Deliang FanAuthors Info & Claims

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pages 1 - 8

https://doi.org/10.1145/3240765.3240799

Published: 05 November 2018 Publication History

Abstract

In this work, we first propose a deep depthwise Convolutional Neural Network (CNN) structure, called Add-Net, which uses bi-narized depthwise separable convolution to replace conventional spatial-convolution. In Add-Net, the computationally expensive convolution operations (i.e. Multiplication and Accumulation) are converted into hardware-friendly Addition operations. We meticulously investigate and analyze the Add-Net's performance (i.e. accuracy, parameter size and computational cost) in object recognition application compared to traditional baseline CNN using the most popular large scale ImageNet dataset. Accordingly, we propose a Depthwise CNN In-Memory Accelerator (DIMA) based on SOT-MRAM computational sub-arrays to efficiently accelerate Add-Net within non-volatile MRAM. Our device-to-architecture co-simulation results show that, with almost the same inference accuracy to the baseline CNN on different data-sets, DIMA can obtain ∼1.4× better energy-efficiency and 15.7× speedup compared to ASICs, and, ∼1.6× better energy-efficiency and 5.6× speedup over the best processing-in-DRAM accelerators.

References

[1]

L. Cavigelli et al., “Accelerating real-time embedded scene labeling with convolutional networks,” in DAC, 2015 52nd ACM/IEEE, 2015.

[2]

R. Andri et al., “Yodann: An ultra-low power convolutional neural network accelerator based on binary weights,” in ISVLSI. IEEE, 2016, pp. 236–241.

[3]

S. Han et al., “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in ICLR'16, 2015.

[4]

S. Zhou et al., “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint:, 2016.

[5]

M. Rastegari et al., “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision. Springer, 2016, pp. 525–542.

[6]

C. Tai et al., “Convolutional neural networks with low-rank regularization,” arXiv preprint arXiv:, 2015.

[7]

P. Chi et al., “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” in ISCA. IEEE Press, 2016.

[8]

S. Li et al., “Drisa: A dram-based reconfigurable in-situ accelerator,” in Micro. ACM, 2017, pp. 288–301.

[9]

S. Angizi et al., “Imce: energy-efficient bit-wise in-memory convolution engine for deep neural network,” in Proceedings of the 23rd ASP-DAC. IEEE Press, 2018, pp. 111–116.

[10]

S. Li, C. Xu et al., “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” in DAC. IEEE, 2016.

[11]

S. Aga et al., “Compute caches,” in High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 2017, pp. 481–492.

[12]

B.C. Lee et al., “Architecting phase change memory as a scalable dram alternative,” in ACM SIGARCH Computer Architecture News, vol. 37. ACM, 2009.

Digital Library

[13]

X. Fong et al., “Spin-transfer torque devices for logic and memory: Prospects and perspectives,” IEEE TCAD, vol. 35, 2016.

[14]

S.-W. Chung et al., “4gbit density stt-mram using perpendicular mtj realized with compact cell structure,” in IEDM. IEEE, 2016.

[15]

L. Sifre and S. Mallat, “Rigid-motion scattering for image classification,” Ph.D. dissertation, Citeseer, 2014.

[16]

A.G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint:, 2017.

[17]

F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” arXiv preprint:, 2016.

[18]

C. Matthieu et al., “Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or −1,” arXiv:, 2016.

[19]

K. He et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE CVPR, 2016, pp. 770–778.

[20]

Y. Bengio et al., “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv:, 2013.

[21]

C.-F. Pai et al., “Spin transfer torque devices utilizing the giant spin hall effect of tungsten,” Applied Physics Letters, 2012.

[22]

S. Angizi et al., “Rimpa: A new reconfigurable dual-mode in-memory processing architecture with spin hall effect-driven domain wall motion device,” in ISVLSI. IEEE, 2017, pp. 45–50.

[23]

M. Courbariaux et al., “Binaryconnect: Training deep neural networks with binary weights during propagations,” in Advances in Neural Information Processing Systems, 2015, pp. 3123–3131.

[24]

Y. Netzer et al., “Reading digits in natural images with unsupervised feature learning,” in NIPS workshop, vol. 2011, 2011, p. 5.

[25]

Z. He et al., “High performance and energy-efficient in-memory computing architecture based on sot-mram,” in NANOARCH. IEEE, 2017, pp. 97–102.

[26]

X. Fong, S.K. Gupta et al., “Knack: A hybrid spin-charge mixed-mode simulator for evaluating different genres of spin-transfer torque mram bit-cells,” in SISPAD. IEEE, 2011, pp. 51–54.

[27]

(2011) Ncsu eda freepdk45. [Online]. Available: http://www.eda.ncsu.edu/wiki/FreePDK45:Contents

[28]

X. Dong et al., “Nvsim: A circuit-level performance, energy, and area model for emerging non-volatile memory,” in Emerging Memory Technologies. Springer, 2014, pp. 15–50.

[29]

S. D. C. P. V. Synopsys, Inc..

[30]

K. Chen et al., “Cacti-3dd: Architecture-level modeling for 3d die-stacked dram main memory,” in DATE, 2012. IEEE, 2012, pp. 33–38.

Cited By

Xu WLi FJiang YYong AHe XWang PCheng J(2023)Improving Extreme Low-Bit Quantization With Soft ThresholdIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.321638933:4(1549-1563)Online publication date: Apr-2023
https://doi.org/10.1109/TCSVT.2022.3216389
Soliman TLaleni NKirchner TMüller FShrivastava AKämpfe TGuntoro AWehn N(2022)FELIX: A Ferroelectric FET Based Low Power Mixed-Signal In-Memory Architecture for DNN AccelerationACM Transactions on Embedded Computing Systems10.1145/352976021:6(1-25)Online publication date: 18-Oct-2022
https://dl.acm.org/doi/10.1145/3529760
Hajinazar NOliveira GGregorio SFerreira JGhiasi NPatel MAlser MGhose SGómez-Luna JMutlu OSherwood TBerger EKozyrakis C(2021)SIMDRAM: a framework for bit-serial SIMD processing using DRAMProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446749(329-345)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446749
Show More Cited By

Index Terms

DIMA: A Depthwise CNN In-Memory Accelerator
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Hardware
  1. Emerging technologies
  2. Integrated circuits

Index terms have been assigned to the content through auto-classification.

Recommendations

Automatic Optimising CNN with Depthwise Separable Convolution on FPGA: (Abstact Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolution layers in Convolutional Neural Networks (CNNs) are effective in vision feature extraction but quite inefficient in computational resource usage. Depthwise separable convolution layer has been proposed in recent publications to enhance the ...
An FPGA-based Fine Tuning Accelerator for a Sparse CNN
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Fine-tuning learns abundant feature expression for a wide range of natural images by using a pre-trained CNN model. It can be applied to a wide range of the neural network (NN)based computer vision problems. This paper proposes an FPGA-based fine-tuning ...
Exploration of memory access optimization for FPGA-based 3D CNN accelerator
DATE '20: Proceedings of the 23rd Conference on Design, Automation and Test in Europe

Three-dimensional convolutional networks (3D CNNs) are used efficiently in various video recognition applications. Compared to traditional 2D CNNs, extra temporal dimension causes 3D CNNs more computationally intensive and to have a larger memory ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Nov 2018

939 pages

Copyright © 2018.

Publisher

IEEE Press

Publication History

Published: 05 November 2018

Permissions

Request permissions for this article.

Request Permissions

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
390
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu WLi FJiang YYong AHe XWang PCheng J(2023)Improving Extreme Low-Bit Quantization With Soft ThresholdIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.321638933:4(1549-1563)Online publication date: Apr-2023
https://doi.org/10.1109/TCSVT.2022.3216389
Soliman TLaleni NKirchner TMüller FShrivastava AKämpfe TGuntoro AWehn N(2022)FELIX: A Ferroelectric FET Based Low Power Mixed-Signal In-Memory Architecture for DNN AccelerationACM Transactions on Embedded Computing Systems10.1145/352976021:6(1-25)Online publication date: 18-Oct-2022
https://dl.acm.org/doi/10.1145/3529760
Hajinazar NOliveira GGregorio SFerreira JGhiasi NPatel MAlser MGhose SGómez-Luna JMutlu OSherwood TBerger EKozyrakis C(2021)SIMDRAM: a framework for bit-serial SIMD processing using DRAMProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446749(329-345)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446749
Han MBaek W(2021)HERTI: A Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT52795.2021.00014(90-102)Online publication date: Sep-2021
https://doi.org/10.1109/PACT52795.2021.00014
Kim HPark HKim TCho KLee ERyu SLee HChoi KLee J(2021)GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00030(249-262)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00030
He ZYang LAngizi SRakin AFan D(2020)Sparse BD-NetACM Journal on Emerging Technologies in Computing Systems10.1145/336939116:2(1-24)Online publication date: 30-Jan-2020
https://dl.acm.org/doi/10.1145/3369391
Angizi SSun JZhang WFan D(2019)AlignSProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317764(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317764
Han MHyun JPark SPark JBaek W(2019)MOSAIC: Heterogeneity-, Communication-, and Constraint-Aware Model Slicing and Execution for Accurate and Efficient Inference2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00021(165-177)Online publication date: Sep-2019
https://doi.org/10.1109/PACT.2019.00021

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents