Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Distributed In-Memory Computing on Binary RRAM Crossbar

Published: 17 March 2017 Publication History

Abstract

The recently emerging resistive random-access memory (RRAM) can provide nonvolatile memory storage but also intrinsic computing for matrix-vector multiplication, which is ideal for the low-power and high-throughput data analytics accelerator performed in memory. However, the existing RRAM crossbar--based computing is mainly assumed as a multilevel analog computing, whose result is sensitive to process nonuniformity as well as additional overhead from AD-conversion and I/O. In this article, we explore the matrix-vector multiplication accelerator on a binary RRAM crossbar with adaptive 1-bit-comparator--based parallel conversion. Moreover, a distributed in-memory computing architecture is also developed with the according control protocol. Both memory array and logic accelerator are implemented on the binary RRAM crossbar, where the logic-memory pair can be distributed with the control bus protocol. Experimental results have shown that compared to the analog RRAM crossbar, the proposed binary RRAM crossbar can achieve significant area savings with better calculation accuracy. Moreover, significant speedup can be achieved for matrix-vector multiplication in neural network--based machine learning such that the overall training and testing time can be both reduced. In addition, large energy savings can be also achieved when compared to the traditional CMOS-based out-of-memory computing architecture.

References

[1]
Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. Proceedings of the IEEE 98, 12, 2237--2251.
[2]
Pai-Yu Chen, Deepak Kadetotad, Zihan Xu, Abinash Mohanty, Binbin Lin, Jieping Ye, Sarma Vrudhula, Jae-sun Seo, Yu Cao, and Shimeng Yu. 2015. Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In Proceedings of the 2015 Design, Automation, and Test in Europe Conference and Exhibition. 854--859.
[3]
Leon O. Chua. 1971. Memristor—the missing circuit element. IEEE Transactions on Circuit Theory 18, 5, 507--519.
[4]
Adam Coates, Andrew Y. Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 215--223.
[5]
Deliang Fan, Mrigank Sharad, and Kaushik Roy. 2014. Design and synthesis of ultralow energy spin-memristor threshold logic. IEEE Transactions on Nanotechnology 13, 3, 574--583.
[6]
Wei Fei, Hao Yu, Wei Zhang, and Kiat Seng Yeo. 2012. Design exploration of hybrid CMOS and memristor circuit by new modified nodal analysis. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 20, 6, 1012--1025.
[7]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 249--256.
[8]
Peng Gu, Boxun Li, Tianqi Tang, Shimeng Yu, Yu Cao, Yu Wang, and Huazhong Yang. 2015. Technological exploration of RRAM crossbar array for matrix-vector multiplication. In Proceedings of the 2015 20th Asia and South Pacific Design Automation Conference (ASP-DAC’15). IEEE, Los Alamitos, CA, 106--111.
[9]
Simon S. Haykin. 2009. Neural Networks and Learning Machines. Vol. 3. Pearson Education, Upper Saddle River, NJ.
[10]
Nicholas J. Higham. 2009. Cholesky factorization. Wiley Interdisciplinary Reviews: Computational Statistics 1, 2, 251--254.
[11]
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7, 1527--1554.
[12]
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49, University of Massachusetts, Amherst.
[13]
Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. 2006. Extreme learning machine: Theory and applications. Neurocomputing 70, 1, 489--501.
[14]
Rajiv Joshi, Rouwaida Kanj, Peiyuan Wang, and Hai Helen Li. 2011. Universal statistical cure for predicting memory loss. In Proceedings of the International Conference on Computer-Aided Design. IEEE, Los Alamitos, CA, 236--239.
[15]
J. Kang, B, Gao, B. Chen, P.-Y. Huang, F. F. Zhang, Y. X. Deng, L. F. Liu, et al. 2014. 3D RRAM: Design and optimization. In Proceedings of the 2014 12th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT’14). IEEE, Los Alamitos, CA, 1--4.
[16]
Kuk-Hwan Kim, Siddharth Gaba, Dana Wheeler, Jose M. Cruz-Albrecht, Tahir Hussain, Narayan Srinivasa, and Wei Lu. 2011. A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. Nano Letters 12, 1, 389--395.
[17]
Yongtae Kim, Yong Zhang, and Peng Li. 2012. A digital neuromorphic VLSI architecture with memristor crossbar synaptic array for machine learning. In Proceedings of the 2012 IEEE International Symposium on Cloud Computing (SOCC’12). IEEE, Los Alamitos, CA, 328--333.
[18]
Richard T. Kouzes, Gordon A. Anderson, Stephen T. Elbert, Ian Gorton, and Deborah K. Gracio. 2009. The changing paradigm of data-intensive computing. Computer 11, 26--34.
[19]
Vipin Kumar, Ritu Sharma, Erdal Uzunlar, Li Zheng, Rizwan Bashirullah, Paul Kohl, Muhannad S. Bakir, and Azad Naeemi. 2014. Airgap interconnects: Modeling, optimization, and benchmarking for backplane, PCB, and interposer applications. IEEE Transactions on Components, Packaging and Manufacturing Technology 4, 8, 1335--1346.
[20]
Yann A. LeCun, Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. 2012. Efficient backprop. In Neural Networks: Tricks of the Trade. Springer, 9--48.
[21]
H. Y. Lee, P. S. Che, T. Y. Wu, Y. S. Che, C. C. Wan, P. J. Tzen, C. H. Lin, F. Chen, C. H. Lien, and M. J. Tsai. 2008. Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM. In Proceedings of the IEEE International Electron Devices Meeting (IEDM’08). IEEE, Los Alamitos, CA, 1--4.
[22]
Xiaoxiao Liu, Mengjie Mao, Beiye Liu, Hai Li, Yiran Chen, Boxun Li, Yu Wang, et al. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. In Proceedings of the 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC’15). IEEE, Los Alamitos, CA, 1--6.
[23]
W. Lu, K.-H. Kim, T. Chang, and S. Gaba. 2011. Two-terminal resistive switches (memristors) for memory and logic applications. In Proceedings of the Asia and South Pacific and Design Automation Conference (ASP-DAC’11).
[24]
Shoun Matsunaga, Jun Hayakawa, Shoji Ikeda, Katsuya Miura, Tetsuo Endoh, Hideo Ohno, and Takahiro Hanyu. 2009. MTJ-based nonvolatile logic-in-memory circuit, future prospects and issues. In Proceedings of the Conference on Design, Automation, and Test in Europe. 433--435.
[25]
Klaus-Robert Müller, Michael Tangermann, Guido Dornhege, Matthias Krauledat, Gabriel Curio, and Benjamin Blankertz. 2008. Machine learning for real-time single-trial EEG-analysis: From brain--computer interfacing to mental state monitoring. Journal of Neuroscience Methods 167, 1, 82--90.
[26]
Leibin Ni, Yuhao Wang, Hao Yu, Wei Yang, Chuliang Weng, and Junfeng Zhao. 2016. An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar. In Proceedings of the Asia and South Pacific and Design Automation Conference (ASP-DAC’16).
[27]
Sunghyun Park, Masood Qazi, Li-Shiuan Peh, and Anantha P. Chandrakasan. 2013. 40.4 fJ/bit/mm low-swing on-chip signaling with self-resetting logic repeaters embedded within a mesh NoC in 45nm SOI CMOS. In Proceedings of the Conference on Design, Automation, and Test in Europe. 1637--1642.
[28]
Yang Shang, Wei Fei, and Hao Yu. 2012. Analysis and modeling of internal state variables for dynamic effects of nonvolatile memory devices. IEEE Transactions on Circuits and Systems I: Regular Papers 59, 9, 1906--1918.
[29]
Pratap Narayan Singh, Ashish Kumar, Chandrajit Debnath, and Rakesh Malik. 2007. 20mW, 125 Msps, 10 bit pipelined ADC in 65nm standard digital CMOS process. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC’07). IEEE, Los Alamitos, CA, 189--192.
[30]
Tathagata Srimani, Bibhas Manna, Anand Kumar Mukhopadhyay, Kaushik Roy, and Mrigank Sharad. 2015. Energy efficient and high performance current-mode neural network circuit using memristors and digitally assisted analog CMOS neurons. arXiv:1511.09085.
[31]
Dmitri B. Strukov, Gregory S. Snider, Duncan R. Stewart, and R. Stanley Williams. 2008. The missing memristor found. Nature 453, 7191, 80--83.
[32]
Johan A. K. Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural Processing Letters 9, 3, 293--300.
[33]
T. Tan and Z. Sun. 2010. CASIA-FingerprintV5. Available at http://biometrics.idealtest.org/.
[34]
Yuhao Wang, Hao Yu, Leibin Ni, Guang-Bin Huang, Mei Yan, Chuliang Weng, Wei Yang, and Junfeng Zhao. 2015. An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Transactions on Nanotechnology 14, 6, 998--1012.
[35]
Yuhao Wang, Hao Yu, and Wei Zhang. 2014. Nonvolatile CBRAM-crossbar-based 3-D-integrated hybrid memory for data retention. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 5, 957--970.
[36]
Paul J. Werbos. 1990. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78, 10, 1550--1560.
[37]
Stanley R. Williams. 2008. How we found the missing memristor. IEEE Spectrum 45, 12, 28--35.
[38]
Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemometrics and Intelligent Laboratory Systems 2, 1--3, 37--52.
[39]
David H. Wolpert. 1996. The lack of a priori distinctions between learning algorithms. Neural Computation 8, 7, 1341--1390.
[40]
H.-S. Philip Wong, H.-Y. Lee, S. Yu, Y.-S. Chen, Y. Wu, P.-S. Chen, B. Lee, F. T. Chen, and M.-J. Tsai. 2012. Metal--oxide RRAM. Proceedings of the IEEE 100, 6, 1951--1970.
[41]
John Wright, Allen Y. Yang, Arvind Ganesh, Shankar S. Sastry, and Yi Ma. 2009. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2, 210--227.
[42]
Hao Yu and Yuhao Wang. 2014. Design Exploration of Emerging Nano-scale Non-volatile Memory. Springer.

Cited By

View all
  • (2024)Brain-inspired computing systems: a systematic literature reviewThe European Physical Journal B10.1140/epjb/s10051-024-00703-697:6Online publication date: 6-Jun-2024
  • (2024)In-memory computing: characteristics, spintronics, and neural network applications insightsMultiscale and Multidisciplinary Modeling, Experiments and Design10.1007/s41939-024-00517-0Online publication date: 9-Jul-2024
  • (2023)Accurate and Energy-Efficient Bit-Slicing for RRAM-Based Neural NetworksIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2022.31913977:1(164-177)Online publication date: Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 13, Issue 3
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems
July 2017
418 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/3051701
  • Editor:
  • Yuan Xie
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 17 March 2017
Accepted: 01 September 2016
Revised: 01 July 2016
Received: 01 March 2016
Published in JETC Volume 13, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. L2-norm--based machine learning
  2. RRAM crossbar
  3. hardware accelerator

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • MOE Tier-2
  • Singapore NRF-CRP

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)113
  • Downloads (Last 6 weeks)12
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Brain-inspired computing systems: a systematic literature reviewThe European Physical Journal B10.1140/epjb/s10051-024-00703-697:6Online publication date: 6-Jun-2024
  • (2024)In-memory computing: characteristics, spintronics, and neural network applications insightsMultiscale and Multidisciplinary Modeling, Experiments and Design10.1007/s41939-024-00517-0Online publication date: 9-Jul-2024
  • (2023)Accurate and Energy-Efficient Bit-Slicing for RRAM-Based Neural NetworksIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2022.31913977:1(164-177)Online publication date: Feb-2023
  • (2023)An Energy-Efficient Inference Engine for a Configurable ReRAM-Based Neural Network AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318446442:3(740-753)Online publication date: Mar-2023
  • (2023)Severity-Based Hierarchical ECG Classification Using Neural NetworksIEEE Transactions on Biomedical Circuits and Systems10.1109/TBCAS.2023.324268317:1(77-91)Online publication date: Feb-2023
  • (2023)Turning to Information Theory to Bring in-Memory Computing into PracticeIEEE BITS the Information Theory Magazine10.1109/MBITS.2023.33337983:3(64-77)Online publication date: Sep-2023
  • (2023)Read-disturb Detection Methodology for RRAM-based Computation-in-Memory Architecture2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)10.1109/AICAS57966.2023.10168638(1-5)Online publication date: 11-Jun-2023
  • (2022)Application of neuromorphic resistive random access memory in image processingActa Physica Sinica10.7498/aps.71.2022046371:14(148504)Online publication date: 2022
  • (2022)Gradient-based Bit Encoding Optimization for Noise-Robust Binary Memristive Crossbar2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774714(1111-1114)Online publication date: 14-Mar-2022
  • (2022)Analog Computation with RRAM and Supporting CircuitsAnalog Circuits for Machine Learning, Current/Voltage/Temperature Sensors, and High-speed Communication10.1007/978-3-030-91741-8_2(17-32)Online publication date: 25-Mar-2022
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media