Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3240765.3240799guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

DIMA: A Depthwise CNN In-Memory Accelerator

Published: 05 November 2018 Publication History
  • Get Citation Alerts
  • Abstract

    In this work, we first propose a deep depthwise Convolutional Neural Network (CNN) structure, called Add-Net, which uses bi-narized depthwise separable convolution to replace conventional spatial-convolution. In Add-Net, the computationally expensive convolution operations (i.e. Multiplication and Accumulation) are converted into hardware-friendly Addition operations. We meticulously investigate and analyze the Add-Net's performance (i.e. accuracy, parameter size and computational cost) in object recognition application compared to traditional baseline CNN using the most popular large scale ImageNet dataset. Accordingly, we propose a Depthwise CNN In-Memory Accelerator (DIMA) based on SOT-MRAM computational sub-arrays to efficiently accelerate Add-Net within non-volatile MRAM. Our device-to-architecture co-simulation results show that, with almost the same inference accuracy to the baseline CNN on different data-sets, DIMA can obtain ∼1.4× better energy-efficiency and 15.7× speedup compared to ASICs, and, ∼1.6× better energy-efficiency and 5.6× speedup over the best processing-in-DRAM accelerators.

    References

    [1]
    L. Cavigelli et al., “Accelerating real-time embedded scene labeling with convolutional networks,” in DAC, 2015 52nd ACM/IEEE, 2015.
    [2]
    R. Andri et al., “Yodann: An ultra-low power convolutional neural network accelerator based on binary weights,” in ISVLSI. IEEE, 2016, pp. 236–241.
    [3]
    S. Han et al., “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in ICLR'16, 2015.
    [4]
    S. Zhou et al., “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint:, 2016.
    [5]
    M. Rastegari et al., “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision. Springer, 2016, pp. 525–542.
    [6]
    C. Tai et al., “Convolutional neural networks with low-rank regularization,” arXiv preprint arXiv:, 2015.
    [7]
    P. Chi et al., “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” in ISCA. IEEE Press, 2016.
    [8]
    S. Li et al., “Drisa: A dram-based reconfigurable in-situ accelerator,” in Micro. ACM, 2017, pp. 288–301.
    [9]
    S. Angizi et al., “Imce: energy-efficient bit-wise in-memory convolution engine for deep neural network,” in Proceedings of the 23rd ASP-DAC. IEEE Press, 2018, pp. 111–116.
    [10]
    S. Li, C. Xu et al., “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” in DAC. IEEE, 2016.
    [11]
    S. Aga et al., “Compute caches,” in High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 2017, pp. 481–492.
    [12]
    B.C. Lee et al., “Architecting phase change memory as a scalable dram alternative,” in ACM SIGARCH Computer Architecture News, vol. 37. ACM, 2009.
    [13]
    X. Fong et al., “Spin-transfer torque devices for logic and memory: Prospects and perspectives,” IEEE TCAD, vol. 35, 2016.
    [14]
    S.-W. Chung et al., “4gbit density stt-mram using perpendicular mtj realized with compact cell structure,” in IEDM. IEEE, 2016.
    [15]
    L. Sifre and S. Mallat, “Rigid-motion scattering for image classification,” Ph.D. dissertation, Citeseer, 2014.
    [16]
    A.G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint:, 2017.
    [17]
    F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” arXiv preprint:, 2016.
    [18]
    C. Matthieu et al., “Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or −1,” arXiv:, 2016.
    [19]
    K. He et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE CVPR, 2016, pp. 770–778.
    [20]
    Y. Bengio et al., “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv:, 2013.
    [21]
    C.-F. Pai et al., “Spin transfer torque devices utilizing the giant spin hall effect of tungsten,” Applied Physics Letters, 2012.
    [22]
    S. Angizi et al., “Rimpa: A new reconfigurable dual-mode in-memory processing architecture with spin hall effect-driven domain wall motion device,” in ISVLSI. IEEE, 2017, pp. 45–50.
    [23]
    M. Courbariaux et al., “Binaryconnect: Training deep neural networks with binary weights during propagations,” in Advances in Neural Information Processing Systems, 2015, pp. 3123–3131.
    [24]
    Y. Netzer et al., “Reading digits in natural images with unsupervised feature learning,” in NIPS workshop, vol. 2011, 2011, p. 5.
    [25]
    Z. He et al., “High performance and energy-efficient in-memory computing architecture based on sot-mram,” in NANOARCH. IEEE, 2017, pp. 97–102.
    [26]
    X. Fong, S.K. Gupta et al., “Knack: A hybrid spin-charge mixed-mode simulator for evaluating different genres of spin-transfer torque mram bit-cells,” in SISPAD. IEEE, 2011, pp. 51–54.
    [27]
    (2011) Ncsu eda freepdk45. [Online]. Available: http://www.eda.ncsu.edu/wiki/FreePDK45:Contents
    [28]
    X. Dong et al., “Nvsim: A circuit-level performance, energy, and area model for emerging non-volatile memory,” in Emerging Memory Technologies. Springer, 2014, pp. 15–50.
    [29]
    S. D. C. P. V. Synopsys, Inc..
    [30]
    K. Chen et al., “Cacti-3dd: Architecture-level modeling for 3d die-stacked dram main memory,” in DATE, 2012. IEEE, 2012, pp. 33–38.

    Cited By

    View all
    • (2023)Improving Extreme Low-Bit Quantization With Soft ThresholdIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.321638933:4(1549-1563)Online publication date: Apr-2023
    • (2022)FELIX: A Ferroelectric FET Based Low Power Mixed-Signal In-Memory Architecture for DNN AccelerationACM Transactions on Embedded Computing Systems10.1145/352976021:6(1-25)Online publication date: 18-Oct-2022
    • (2021)SIMDRAM: a framework for bit-serial SIMD processing using DRAMProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446749(329-345)Online publication date: 19-Apr-2021
    • Show More Cited By

    Index Terms

    1. DIMA: A Depthwise CNN In-Memory Accelerator
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
          Nov 2018
          939 pages

          Publisher

          IEEE Press

          Publication History

          Published: 05 November 2018

          Permissions

          Request permissions for this article.

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 27 Jul 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)Improving Extreme Low-Bit Quantization With Soft ThresholdIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.321638933:4(1549-1563)Online publication date: Apr-2023
          • (2022)FELIX: A Ferroelectric FET Based Low Power Mixed-Signal In-Memory Architecture for DNN AccelerationACM Transactions on Embedded Computing Systems10.1145/352976021:6(1-25)Online publication date: 18-Oct-2022
          • (2021)SIMDRAM: a framework for bit-serial SIMD processing using DRAMProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446749(329-345)Online publication date: 19-Apr-2021
          • (2021)HERTI: A Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT52795.2021.00014(90-102)Online publication date: Sep-2021
          • (2021)GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00030(249-262)Online publication date: Feb-2021
          • (2020)Sparse BD-NetACM Journal on Emerging Technologies in Computing Systems10.1145/336939116:2(1-24)Online publication date: 30-Jan-2020
          • (2019)AlignSProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317764(1-6)Online publication date: 2-Jun-2019
          • (2019)MOSAIC: Heterogeneity-, Communication-, and Constraint-Aware Model Slicing and Execution for Accurate and Efficient Inference2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00021(165-177)Online publication date: Sep-2019

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media