research-article

RNNFast: An Accelerator for Recurrent Neural Networks Using Domain-Wall Memory

Authors:

Mohammad Hossein Samavatian,

Radu TeodorescuAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 16, Issue 4

Article No.: 38, Pages 1 - 27

https://doi.org/10.1145/3399670

Published: 18 September 2020 Publication History

Abstract

Recurrent Neural Networks (RNNs) are an important class of neural networks designed to retain and incorporate context into current decisions. RNNs are particularly well suited for machine learning problems in which context is important, such as speech recognition and language translation.

This work presents RNNFast, a hardware accelerator for RNNs that leverages an emerging class of non-volatile memory called domain-wall memory (DWM). We show that DWM is very well suited for RNN acceleration due to its very high density and low read/write energy. At the same time, the sequential nature of input/weight processing of RNNs mitigates one of the downsides of DWM, which is the linear (rather than constant) data access time.

RNNFast is very efficient and highly scalable, with flexible mapping of logical neurons to RNN hardware blocks. The basic hardware primitive, the RNN processing element (PE), includes custom DWM-based multiplication, sigmoid and tanh units for high density and low energy. The accelerator is designed to minimize data movement by closely interleaving DWM storage and computation. We compare our design with a state-of-the-art GPGPU and find 21.8× higher performance with 70× lower energy.

References

[1]

[n.d.]. DeepBench. Retrieved on August 12, 2020 from https://svail.github.io/DeepBench/.

[2]

[n.d.]. NVIDIA CUDA Deep Neural Network library. Retrieved on August 12, 2020 from https://developer.nvidia.com/cudnn.

[3]

[n.d.]. Optimizing Recurrent Neural Networks in cuDNN 5. Retrieved on August 12, 2020 from https://devblogs.nvidia.com/parallelforall/optimizing-recurrent-neural-networks-cudnn-5/.

[4]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). IEEE, 1--13.

Digital Library

[5]

Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Awni Y. Hannun, Billy Jun, Tony Han, Patrick LeGresley, Xiangang Li, Libby Lin, Sharan Narang, Andrew Y. Ng, Sherjil Ozair, Ryan Prenger, Sheng Qian, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Chong Wang, Yi Wang, Zhiqian Wang, Bo Xiao, Yan Xie, Dani Yogatama, Jun Zhan, and Zhenyao Zhu. 2016. Deep speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of the 33nd International Conference on Machine Learning (ICML’16). 173--182. http://jmlr.org/proceedings/papers/v48/amodei16.html

[6]

Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, et al. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 715--731.

Digital Library

[7]

Aayush Ankit, Abhronil Sengupta, Priyadarshini Panda, and Kaushik Roy. 2017. RESPARC: A reconfigurable and energy-efficient architecture with memristive crossbars for deep spiking neural networks. Arxiv Preprint Arxiv:1702.06064.

[8]

A. J. Annunziata, M. C. Gaidis, L. Thomas, C. W. Chien, C. C. Hung, P. Chevalier, E. J. O’Sullivan, J. P. Hummel, E. A. Joseph, Y. Zhu, T. Topuria, E. Delenia, P. M. Rice, S. S. P. Parkin, and W. J. Gallagher. 2011. Racetrack memory cell array with integrated magnetic tunnel junction readout. In 2011 International Electron Devices Meeting. 24.3.1--24.3.4.

[9]

Elham Azari, Aykut Dengi, and Sarma Vrudhula. 2019. An energy-efficient FPGA implementation of an LSTM network using approximate computing. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). ACM, New York, 305--306.

Digital Library

[10]

K. Chang and T. Chang. 2019. VSCNN: Convolution neural network accelerator with vector sparsity. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS’19). 1--5.

[11]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 269--284.

[12]

Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 367--379.

Digital Library

[13]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). 609--622.

Digital Library

[14]

Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 27--39.

Digital Library

[15]

Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014). http://arxiv.org/abs/1406.1078.

[16]

Jinil Chung, Jongsun Park, and Swaroop Ghosh. 2016. Domain wall memory based convolutional neural networks for bit-width extendability and energy-efficiency. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design (ISLPED’16). 332--337.

Digital Library

[17]

Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 92--104.

Digital Library

[18]

J. C. Ferreira and J. Fonseca. 2016. An FPGA implementation of a long short-term memory neural network. In 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig’16). 1--8.

[19]

J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, P. Patel, A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, A. M. Caulfield, E. S. Chung, and D. Burger. 2018. A configurable cloud-scale DNN processor for real-time AI. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 1--14.

Digital Library

[20]

S. Ghosh. 2013. Design methodologies for high density domain wall memory. In 2013 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH’13). 30--31.

Digital Library

[21]

Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). 6645--6649.

[22]

Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (Oct. 2017), 2222--2232.

[23]

Y. Guan, Z. Yuan, G. Sun, and J. Cong. 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC’17). 629--634.

[24]

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et al. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 75--84.

Digital Library

[25]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 243--254.

[26]

Awni Y. Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep speech: Scaling up end-to-end speech recognition. CoRR abs/1412.5567 (2014). http://arxiv.org/abs/1412.5567.

[27]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.

Digital Library

[28]

Kejie Huang, Rong Zhao, and Yong Lian. 2016. Racetrack memory-based nonvolatile storage elements for multicontext FPGAs. IEEE Transactions on VLSI Systems 24, 5 (2016), 1885--1894.

Digital Library

[29]

Anirudh Iyengar and Swaroop Ghosh. 2014. Modeling and analysis of domain wall dynamics for robust and low-power embedded memory. In Proceedings of the 51st Annual Design Automation Conference (DAC’14). ACM, New York, Article 65, 6 pages.

Digital Library

[30]

Tian Jin and Seokin Hong. 2019. Split-CNN: Splitting window-based operations in convolutional neural networks for memory system optimization. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). ACM, New York, 835--847.

Digital Library

[31]

Norman P. Jouppi, et. al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, 1--12.

Digital Library

[32]

Duckhwan Kim, Jaeha Kung, Sek M. Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 380--392.

Digital Library

[33]

H. T. Kung, Bradley McDanel, and Sai Qian Zhang. 2019. Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). ACM, New York, 821--834.

Digital Library

[34]

Chen-Lu Li, Yu-Jie Huang, Yu-Jie Cai, Jun Han, and Xiao-Yang Zeng. 2018. FPGA implementation of LSTM based on automatic speech recognition. In Proceedings of the 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT’18). IEEE, 1--3.

[35]

S. Li, C. Wu, H. Li, B. Li, Y. Wang, and Q. Qiu. 2015. FPGA acceleration of recurrent neural network based language model. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. 111--118.

Digital Library

[36]

Z. Li, C. Ding, S. Wang, W. Wen, Y. Zhuo, C. Liu, Q. Qiu, W. Xu, X. Lin, X. Qian, and Y. Wang. 2019. E-RNN: Design optimization for efficient recurrent neural networks in FPGAs. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA’19). 69--80.

[37]

Robert LiKamWa, Yunhui Hou, Yuan Gao, Mia Polansky, and Lin Zhong. 2016. RedEye: Analog convnet image sensor architecture for continuous mobile vision. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 255--266.

Digital Library

[38]

Dao-Fu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Temam, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A polyvalent machine learning accelerator. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 369--381.

Digital Library

[39]

Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: An instruction set architecture for neural networks. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 393--405.

Digital Library

[40]

Y. Long, E. M. Jung, J. Kung, and S. Mukhopadhyay. 2016. ReRAM crossbar based recurrent neural network for human activity detection. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN’16). 939--946.

[41]

Thomas Mealey and Tarek M. Taha. 2018. Accelerating inference in long short-term memory neural networks. In Proceedings of the NAECON 2018 IEEE National Aerospace and Electronics Conference. IEEE, 382--390.

[42]

Seyedhamidreza Motaman and Swaroop Ghosh. 2016. Adaptive write and shift current modulation for process variation tolerance in domain wall caches. IEEE Transactions on VLSI Systems 24, 3 (2016), 944--953.

Digital Library

[43]

Seyedhamidreza Motaman, Anirudh Iyengar, and Swaroop Ghosh. 2014. Synergistic circuit and system design for energy-efficient and robust domain wall caches. In International Symposium on Low Power Electronics and Design (ISLPED’14). 195--200.

Digital Library

[44]

Seyedhamidreza Motaman, Anirudh Srikant Iyengar, and Swaroop Ghosh. 2015. Domain wall memory-layout, circuit and synergistic systems. IEEE Transactions on Nanotechnology 14, 2 (March 2015), 282--291.

Digital Library

[45]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, 311--318.

[46]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, 27--40.

[47]

Stuart S. P. Parkin, Masamitsu Hayashi, and Luc Thomas. 2008. Magnetic domain-wall racetrack memory. Science 320, 5873 (2008), 190--194. arXiv:http://science.sciencemag.org/content/320/5873/190. full.pdf

[48]

Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In SIGIR’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281.

[49]

A. Ranjan, S. G. Ramasubramanian, R. Venkatesan, V. Pai, K. Roy, and A. Raghunathan. 2015. DyReCTape: A dynamically reconfigurable cache using domain wall memory tapes. In Proceedings of the 2015 Design, Automation Test in Europe Conference Exhibition (DATE’15). 181--186.

[50]

Brandon Reagen, Paul N. Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David M. Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ISCA.

[51]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 14--26.

Digital Library

[52]

Yongming Shen, Michael Ferdman, and Peter Milder. 2017. Maximizing CNN accelerator efficiency through resource partitioning. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, 535--547.

Digital Library

[53]

Clinton W. Smullen, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. The STeTSiMS STT-RAM simulation and modeling system. In ICCAD. IEEE Press, 318--325.

Digital Library

[54]

Z. Sun, X. Bi, W. Wu, S. Yoo, and H. Li. 2016. Array organization and data management exploration in racetrack memory. IEEE Transactions on Computers 65, 4 (April 2016), 1041--1054.

Digital Library

[55]

Zhenyu Sun, Wenqing Wu, and Hai (Helen) Li. 2013. Cross-layer racetrack memory design for ultra high density and low power consumption. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, Article 53, 6 pages.

Digital Library

[56]

Zhanrui Sun, Yongxin Zhu, Yu Zheng, Hao Wu, Zihao Cao, Peng Xiong, Junjie Hou, Tian Huang, and Zhiqiang Que. 2018. FPGA acceleration of LSTM based on data for test flight. In Proceedings of the 2018 IEEE International Conference on Smart Cloud (SmartCloud’18). IEEE, 1--6.

[57]

M. T. Tommiska. 2003. Efficient digital implementation of the sigmoid function for reprogrammable logic. IEE Proceedings-Computers and Digital Techniques 150, 6 (2003), 403--411.

[58]

Antonio Toral and Andy Way. 2018. What level of quality can neural machine translation attain on literary text? In Translation Quality Assessment. Springer, 263--287.

[59]

Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, 13--26.

Digital Library

[60]

Rangharajan Venkatesan, Vivek J. Kozhikkottu, Mrigank Sharad, Charles Augustine, Arijit Raychowdhury, Kaushik Roy, and Anand Raghunathan. 2016. Cache design with domain wall memory. IEEE Transactions on Computers 65, 4 (April 2016), 1010--1024.

Digital Library

[61]

R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan. 2013. DWM-TAPESTRI - An energy efficient all-spin cache using domain wall shift based writes. In Proceedings of the 2013 Design, Automation Test in Europe Conference Exhibition (DATE’13). 1825--1830.

[62]

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2016. Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge. CoRR abs/1609.06647 (2016). http://arxiv.org/abs/1609.06647.

[63]

M. Wang, Z. Wang, J. Lu, J. Lin, and Z. Wang. 2019. E-LSTM: An efficient hardware architecture for long short-term memory. IEEE Journal on Emerging and Selected Topics in Circuits and Systems (2019), 1--1.

[64]

Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). ACM, New York, 11--20.

Digital Library

[65]

X. Wang, J. Yu, C. Augustine, R. Iyer, and R. Das. 2019. Bit prudent in-cache acceleration of deep convolutional neural networks. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA’19). 81--93.

[66]

Yuhao Wang, Hao Yu, Leibin Ni, Guang-Bin Huang, Mei Yan, Chuliang Weng, Wei Yang, and Junfeng Zhao. 2015. An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Transactions on Nanotechnology 14, 6 (2015), 998--1012.

Digital Library

[67]

Yuhao Wang, Hao Yu, Dennis Sylvester, and Pingfan Kong. 2014. Energy efficient in-memory AES encryption based on nonvolatile domain-wall nanowire. In Design, Automation & Test in Europe Conference & Exhibition (DATE’14). 1--4.

[68]

Zhisheng Wang, Jun Lin, and Zhongfeng Wang. 2017. Accelerating recurrent neural networks: A memory-efficient approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 10 (2017), 2763--2775.

Digital Library

[69]

Cong Xu, Dimin Niu, Xiaochun Zhu, Seung H. Kang, Matt Nowak, and Yuan Xie. 2011. Device-architecture co-optimization of STT-RAM based memory for low power embedded systems. In ICCAD. IEEE Press, 463--470.

[70]

Hao Yu, Yuhao Wang, Shuai Chen, Wei Fei, Chuliang Weng, Junfeng Zhao, and Zhulin Wei. 2014. Energy efficient in-memory machine learning for data intensive image-processing by non-volatile domain-wall memory. In Proceedngs of the 19th Asia and South Pacific Design Automation Conference (ASP-DAC’14). 191--196.

[71]

Chao Zhang, Guangyu Sun, Weiqi Zhang, Fan Mi, Hai Li, and W. Zhao. 2015. Quantitative modeling of racetrack memory, a tradeoff among area, performance, and power. In Proceedings of the 20th Asia and South Pacific Design Automation Conference. 100--105.

[72]

Chao Zhang, Guangyu Sun, Xian Zhang, Weiqi Zhang, Weisheng Zhao, Tao Wang, Yun Liang, Yongpan Liu, Yu Wang, and Jiwu Shu. 2015. Hi-fi playback: Tolerating position errors in shift operations of racetrack memory. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, 694--706.

Digital Library

[73]

Yiwei Zhang, Chao Wang, Lei Gong, Yuntao Lu, Fan Sun, Chongchong Xu, Xi Li, and Xuehai Zhou. 2017. A power-efficient accelerator based on FPGAs for LSTM network. In Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER’17). IEEE, 629--630.

[74]

Y. Zhang, C. Zhang, J. Nan, Z. Zhang, X. Zhang, J. O. Klein, D. Ravelosona, G. Sun, and W. Zhao. 2016. Perspectives of racetrack memory for large-capacity on-chip memory: From device to system. IEEE Transactions on Circuits and Systems I: Regular Papers 63, 5 (May 2016), 629--638.

[75]

Weisheng Zhao, Nesrine Ben Romdhane, Yue Zhang, Jacques-Olivier Klein, and Define Ravelosona. 2013. Racetrack memory based reconfigurable computing. In Proceedings of the 2013 IEEE Faible Tension Faible Consommation (FTFC’13). IEEE, 1--4.

Cited By

Lachtar NIbdah DKhan HBacha A(2023)RansomShield: A Visualization Approach to Defending Mobile Systems Against RansomwareACM Transactions on Privacy and Security10.1145/357982226:3(1-30)Online publication date: 13-Mar-2023
https://dl.acm.org/doi/10.1145/3579822
Tárrega HValero ALorente VPetit SSahuquillo JRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)Fast-track cacheProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532383(1-12)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532383
Wang JLiu JWang DAn JFan X(2022)An Automatic-Addressing Architecture With Fully Serialized Access in Racetrack Memory for Energy-Efficient CNNsIEEE Transactions on Computers10.1109/TC.2020.304543371:1(235-250)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TC.2020.3045433
Show More Cited By

Index Terms

RNNFast: An Accelerator for Recurrent Neural Networks Using Domain-Wall Memory
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Emerging technologies
    1. Memory and dense storage

Recommendations

In-Datacenter Performance Analysis of a Tensor Processing Unit
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU) --- deployed in datacenters since 2015 that accelerates ...
EMG-based online classification of gestures with recurrent neural networks
Highlights
- Gesture classification from Electromyography (EMG) signals.
- Evaluated on the ...
Graphical abstract

Display Omitted

Abstract
Online gesture classification can rely on unsupervised segmentation in order to divide the data stream into static and dynamic segments for individual classification. However, this process requires motion detection calibration and adds ...
CORUSCANT: Fast Efficient Processing-in-Racetrack Memories
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture

The growth in data needs of modern applications has created significant challenges for modern systems leading to a "memory wall." Spintronic Domain-Wall Memory (DWM), provides near-SRAM read/write performance, energy savings and non-volatility, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 16, Issue 4

Special Issue on Nanoelectronic Device, Circuit, Architecture Design, Part 2 and Regular Papers

October 2020

202 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/3418801

Editor:
Ramesh Karri
Polytechnic Institute of New York University, USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 18 September 2020

Accepted: 01 May 2020

Revised: 01 February 2020

Received: 01 May 2019

Published in JETC Volume 16, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
315
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)4

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lachtar NIbdah DKhan HBacha A(2023)RansomShield: A Visualization Approach to Defending Mobile Systems Against RansomwareACM Transactions on Privacy and Security10.1145/357982226:3(1-30)Online publication date: 13-Mar-2023
https://dl.acm.org/doi/10.1145/3579822
Tárrega HValero ALorente VPetit SSahuquillo JRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)Fast-track cacheProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532383(1-12)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532383
Wang JLiu JWang DAn JFan X(2022)An Automatic-Addressing Architecture With Fully Serialized Access in Racetrack Memory for Energy-Efficient CNNsIEEE Transactions on Computers10.1109/TC.2020.304543371:1(235-250)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TC.2020.3045433
Saifullah KQuaiser RAkhtar N(2022)Voice Keyword Spotting on Edge Devices2022 5th International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT)10.1109/IMPACT55510.2022.10029228(1-5)Online publication date: 26-Nov-2022
https://doi.org/10.1109/IMPACT55510.2022.10029228
Miah MWang G(2022)Keyword Spotting with Deep Neural Network on Edge Devices2022 IEEE 12th International Conference on Electronics Information and Emergency Communication (ICEIEC)10.1109/ICEIEC54567.2022.9835061(98-102)Online publication date: 15-Jul-2022
https://doi.org/10.1109/ICEIEC54567.2022.9835061
Perumal SRajendiran S(2022)Low power multiplier based long short-term memory hardware architecture for smart grid energy managementInternational Journal of System Assurance Engineering and Management10.1007/s13198-022-01662-wOnline publication date: 15-Apr-2022
https://doi.org/10.1007/s13198-022-01662-w

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents