research-article

Open access

Implementing binary neural networks in memory with approximate accumulation

Authors:

Tajana Šimunić RosingAuthors Info & Claims

ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

Pages 247 - 252

https://doi.org/10.1145/3370748.3406562

Published: 10 August 2020 Publication History

Abstract

Processing in-memory (PIM) has shown great potential to accelerate the inference tasks of binarized neural networks (BNNs) by reducing data movement between processing units and memory. However, existing PIM architectures require analog/mixed-signal circuits that do not scale with the CMOS technology. On the contrary, we propose BitNAP (Binarized neural network acceleration with in-memory ThreSholding), which performs optimization at operation, peripheral, and architecture levels for an efficient BNN accelerator. BitNAP supports row-parallel bitwise operations in crossbar memory by exploiting the switching of 1-bit bipolar resistive devices and a unique hybrid tunable thresholding operation. In order to reduce the area overhead of sensing-based operations, BitNAP presents a memory sense amplifier sharing scheme and also, a novel operation pipelining to reduce the latency overhead of sharing. We evaluate the efficiency of BitNAP on the MNIST and ImageNet datasets using popular neural networks. BitNAP is on average 1.24× (10.7×) faster and 185.6× (10.5×) more energy-efficient as compared to the state-of-the-art PIM accelerator for simple (complex) networks.

References

[1]

C. Dong et al., "Learning a deep convolutional network for image super-resolution," in ECCV, pp. 184--199, Springer, 2014.

[2]

L. Deng et al., "Deep learning: methods and applications," Foundations and Trends® in Signal Processing, vol. 7, no. 3--4, pp. 197--387, 2014.

[3]

D. Silver et al., "Mastering the game of go with deep neural networks and tree search," nature, vol. 529, no. 7587, p. 484, 2016.

[4]

K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.

[5]

M. Rastegari et al., "Xnor-net: Imagenet classification using binary convolutional neural networks," in ECCV, pp. 525--542, Springer, 2016.

[6]

S. Gupta, M. Imani, H. Kaur, and T. S. Rosing, "Nnpim: A processing in-memory architecture for neural network acceleration," IEEE Transactions on Computers, vol. 68, no. 9, pp. 1325--1337, 2019.

Digital Library

[7]

M. Imani, M. S. Razlighi, Y. Kim, S. Gupta, F. Koushanfar, and T. Rosing, "Deep learning acceleration with neuron-to-memory transformation," in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), IEEE, 2020.

[8]

L. Jiang et al., "Xnor-pop: A processing-in-memory architecture for binary convolutional neural networks in wide-io2 drams," in ISLPED, pp. 1--6, IEEE, 2017.

[9]

T. Tang et al., "Binary convolutional neural network on rram," in ASP-DAC, IEEE, 2017.

[10]

A. Shafiee et al., "Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars," in ISCA, pp. 14--26, IEEE Press, 2016.

[11]

P. Chi et al., "Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory," in ISCA, pp. 27--39, IEEE Press, 2016.

[12]

S. Gupta, M. Imani, and T. Rosing, "Exploring processing in-memory for different technologies," in Proceedings of the 2019 on Great Lakes Symposium on VLSI, pp. 201--206, 2019.

[13]

I. Hubara et al., "Binarized neural networks," in NIPS, 2016.

[14]

M. Courbariaux et al., "Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1," arXiv:1602.02830, 2016.

[15]

L. Ni et al., "An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary rram crossbar," in DASP-DAC, pp. 280--285, IEEE, 2016.

[16]

L. Ni et al., "Distributed in-memory computing on binary rram crossbar," JETC, 2017.

[17]

D. Kim et al., "Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory," in ISCA, IEEE, 2016.

[18]

Y. Li et al., "A 7.663-tops 8.2-w energy-efficient fpga accelerator for binary convolutional neural networks," in FPGA, pp. 290--291, 2017.

[19]

R. Andri et al., "Yodann: An ultra-low power convolutional neural network accelerator based on binary weights.," in ISVLSI, pp. 236--241, 2016.

[20]

X. Sun, S. Yin, X. Peng, R. Liu, J.-s. Seo, and S. Yu, "Xnor-rram: A scalable and parallel resistive synaptic architecture for binary neural networks," in DATE, IEEE, 2018.

[21]

S. Kvatinsky et al., "Magic memristor aided logic," TCAS II, vol. 61, no. 11, pp. 895--899, 2014.

[22]

S. Gupta et al., "Felix: fast and energy-efficient logic in memory," in ICCAD, p. 55, ACM, 2018.

[23]

S. Gupta, M. Imani, J. Sim, A. Huang, F. Wu, M. H. Najafi, and T. Rosing, "Scrimp: A general stochastic computing architecture using reram in-memory processing," in 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, 2020.

[24]

S. Kvatinsky et al., "Vteam: A general model for voltage-controlled memristors," TCAS II, 2015.

[25]

M. Imani, S. Gupta, Y. Kim, and T. Rosing, "Floatpim: In-memory acceleration of deep neural network training with high precision," in Proceedings of the 46th International Symposium on Computer Architecture, pp. 802--815, 2019.

[26]

Y. LeCun et al., "Lenet-5, convolutional neural networks," URL: http://yann.lecun.com/exdb/lenet, p. 20, 2015.

[27]

L. Song et al., "Rebnn: in-situ acceleration of binarized neural networks in reram using complementary resistive cell," CCF Transactions on HPC, 2019.

[28]

S. Angizi and D. Fan, "Imc: energy-efficient in-memory convolver for accelerating binarized deep neural network," in Proceedings of the Neuromorphic Computing Symposium, ACM, 2017.

Cited By

Tang WLee MWu JXu YYu YLiu YNi KWang YYang HNarayanan VLi X(2023)FeFET-Based Logic-in-Memory Supporting SA-Free Write-Back and Fully Dynamic Access With Reduced Bitline Charging Activity and Recycled Bitline ChargeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.325196170:6(2398-2411)Online publication date: Jun-2023
https://doi.org/10.1109/TCSI.2023.3251961

Index Terms

Implementing binary neural networks in memory with approximate accumulation
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures
  2. Integrated circuits
    1. Semiconductor memory
      1. Non-volatile memory

Recommendations

Exploring Processing In-Memory for Different Technologies
GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

The recent emergence of IoT has led to a substantial increase in the amount of data processed. Today, a large number of applications are data intensive, involving massive data transfers between processing core and memory. These transfers act as a ...
Novel, parallel and differential synaptic architecture based on NAND flash memory for high-density and highly-reliable binary neural networks
Abstract
A novel synaptic architecture based on a NAND flash memory structure is proposed as a high-density synapse capable of exclusive NOR (XNOR) operation for binary neural networks (BNNs). For the first time, a 4T2S-based synaptic ...
On Endurance of Processing in (Nonvolatile) Memory
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture

Processing-in-Memory (PIM) architectures have gained popularity due to their ability to alleviate the memory wall by performing large numbers of operations within the memory itself. On top of this, nonvolatile memory (NVM) technologies offer highly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

August 2020

263 pages

ISBN:9781450370530

DOI:10.1145/3370748

Conference Chairs:
David Atienza Alonso
EPFL
,
Qinru Qiu
Syracuse Univ.
,
Program Chairs:
Sherief Reda
Brown Univ.
,
Yiran Chen
Duke Univ.

Copyright © 2020 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE CAS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF
SRCGlobal Research Collaboration
DARPA

Conference

ISLPED '20

Sponsor:

SIGDA

ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design

August 10 - 12, 2020

Massachusetts, Boston

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
1,093
Total Downloads

Downloads (Last 12 months)144
Downloads (Last 6 weeks)30

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tang WLee MWu JXu YYu YLiu YNi KWang YYang HNarayanan VLi X(2023)FeFET-Based Logic-in-Memory Supporting SA-Free Write-Back and Fully Dynamic Access With Reduced Bitline Charging Activity and Recycled Bitline ChargeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2023.325196170:6(2398-2411)Online publication date: Jun-2023
https://doi.org/10.1109/TCSI.2023.3251961

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten