Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

RNNFast: An Accelerator for Recurrent Neural Networks Using Domain-Wall Memory

Published: 18 September 2020 Publication History

Abstract

Recurrent Neural Networks (RNNs) are an important class of neural networks designed to retain and incorporate context into current decisions. RNNs are particularly well suited for machine learning problems in which context is important, such as speech recognition and language translation.
This work presents RNNFast, a hardware accelerator for RNNs that leverages an emerging class of non-volatile memory called domain-wall memory (DWM). We show that DWM is very well suited for RNN acceleration due to its very high density and low read/write energy. At the same time, the sequential nature of input/weight processing of RNNs mitigates one of the downsides of DWM, which is the linear (rather than constant) data access time.
RNNFast is very efficient and highly scalable, with flexible mapping of logical neurons to RNN hardware blocks. The basic hardware primitive, the RNN processing element (PE), includes custom DWM-based multiplication, sigmoid and tanh units for high density and low energy. The accelerator is designed to minimize data movement by closely interleaving DWM storage and computation. We compare our design with a state-of-the-art GPGPU and find 21.8× higher performance with 70× lower energy.

References

[1]
[n.d.]. DeepBench. Retrieved on August 12, 2020 from https://svail.github.io/DeepBench/.
[2]
[n.d.]. NVIDIA CUDA Deep Neural Network library. Retrieved on August 12, 2020 from https://developer.nvidia.com/cudnn.
[3]
[n.d.]. Optimizing Recurrent Neural Networks in cuDNN 5. Retrieved on August 12, 2020 from https://devblogs.nvidia.com/parallelforall/optimizing-recurrent-neural-networks-cudnn-5/.
[4]
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). IEEE, 1--13.
[5]
Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Awni Y. Hannun, Billy Jun, Tony Han, Patrick LeGresley, Xiangang Li, Libby Lin, Sharan Narang, Andrew Y. Ng, Sherjil Ozair, Ryan Prenger, Sheng Qian, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Chong Wang, Yi Wang, Zhiqian Wang, Bo Xiao, Yan Xie, Dani Yogatama, Jun Zhan, and Zhenyao Zhu. 2016. Deep speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of the 33nd International Conference on Machine Learning (ICML’16). 173--182. http://jmlr.org/proceedings/papers/v48/amodei16.html
[6]
Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, et al. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 715--731.
[7]
Aayush Ankit, Abhronil Sengupta, Priyadarshini Panda, and Kaushik Roy. 2017. RESPARC: A reconfigurable and energy-efficient architecture with memristive crossbars for deep spiking neural networks. Arxiv Preprint Arxiv:1702.06064.
[8]
A. J. Annunziata, M. C. Gaidis, L. Thomas, C. W. Chien, C. C. Hung, P. Chevalier, E. J. O’Sullivan, J. P. Hummel, E. A. Joseph, Y. Zhu, T. Topuria, E. Delenia, P. M. Rice, S. S. P. Parkin, and W. J. Gallagher. 2011. Racetrack memory cell array with integrated magnetic tunnel junction readout. In 2011 International Electron Devices Meeting. 24.3.1--24.3.4.
[9]
Elham Azari, Aykut Dengi, and Sarma Vrudhula. 2019. An energy-efficient FPGA implementation of an LSTM network using approximate computing. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). ACM, New York, 305--306.
[10]
K. Chang and T. Chang. 2019. VSCNN: Convolution neural network accelerator with vector sparsity. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS’19). 1--5.
[11]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 269--284.
[12]
Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 367--379.
[13]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). 609--622.
[14]
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 27--39.
[15]
Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014). http://arxiv.org/abs/1406.1078.
[16]
Jinil Chung, Jongsun Park, and Swaroop Ghosh. 2016. Domain wall memory based convolutional neural networks for bit-width extendability and energy-efficiency. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design (ISLPED’16). 332--337.
[17]
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 92--104.
[18]
J. C. Ferreira and J. Fonseca. 2016. An FPGA implementation of a long short-term memory neural network. In 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig’16). 1--8.
[19]
J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, P. Patel, A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, A. M. Caulfield, E. S. Chung, and D. Burger. 2018. A configurable cloud-scale DNN processor for real-time AI. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 1--14.
[20]
S. Ghosh. 2013. Design methodologies for high density domain wall memory. In 2013 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH’13). 30--31.
[21]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). 6645--6649.
[22]
Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (Oct. 2017), 2222--2232.
[23]
Y. Guan, Z. Yuan, G. Sun, and J. Cong. 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC’17). 629--634.
[24]
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et al. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 75--84.
[25]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 243--254.
[26]
Awni Y. Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep speech: Scaling up end-to-end speech recognition. CoRR abs/1412.5567 (2014). http://arxiv.org/abs/1412.5567.
[27]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.
[28]
Kejie Huang, Rong Zhao, and Yong Lian. 2016. Racetrack memory-based nonvolatile storage elements for multicontext FPGAs. IEEE Transactions on VLSI Systems 24, 5 (2016), 1885--1894.
[29]
Anirudh Iyengar and Swaroop Ghosh. 2014. Modeling and analysis of domain wall dynamics for robust and low-power embedded memory. In Proceedings of the 51st Annual Design Automation Conference (DAC’14). ACM, New York, Article 65, 6 pages.
[30]
Tian Jin and Seokin Hong. 2019. Split-CNN: Splitting window-based operations in convolutional neural networks for memory system optimization. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). ACM, New York, 835--847.
[31]
Norman P. Jouppi, et. al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, 1--12.
[32]
Duckhwan Kim, Jaeha Kung, Sek M. Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 380--392.
[33]
H. T. Kung, Bradley McDanel, and Sai Qian Zhang. 2019. Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). ACM, New York, 821--834.
[34]
Chen-Lu Li, Yu-Jie Huang, Yu-Jie Cai, Jun Han, and Xiao-Yang Zeng. 2018. FPGA implementation of LSTM based on automatic speech recognition. In Proceedings of the 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT’18). IEEE, 1--3.
[35]
S. Li, C. Wu, H. Li, B. Li, Y. Wang, and Q. Qiu. 2015. FPGA acceleration of recurrent neural network based language model. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. 111--118.
[36]
Z. Li, C. Ding, S. Wang, W. Wen, Y. Zhuo, C. Liu, Q. Qiu, W. Xu, X. Lin, X. Qian, and Y. Wang. 2019. E-RNN: Design optimization for efficient recurrent neural networks in FPGAs. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA’19). 69--80.
[37]
Robert LiKamWa, Yunhui Hou, Yuan Gao, Mia Polansky, and Lin Zhong. 2016. RedEye: Analog convnet image sensor architecture for continuous mobile vision. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 255--266.
[38]
Dao-Fu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Temam, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A polyvalent machine learning accelerator. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 369--381.
[39]
Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: An instruction set architecture for neural networks. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 393--405.
[40]
Y. Long, E. M. Jung, J. Kung, and S. Mukhopadhyay. 2016. ReRAM crossbar based recurrent neural network for human activity detection. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN’16). 939--946.
[41]
Thomas Mealey and Tarek M. Taha. 2018. Accelerating inference in long short-term memory neural networks. In Proceedings of the NAECON 2018 IEEE National Aerospace and Electronics Conference. IEEE, 382--390.
[42]
Seyedhamidreza Motaman and Swaroop Ghosh. 2016. Adaptive write and shift current modulation for process variation tolerance in domain wall caches. IEEE Transactions on VLSI Systems 24, 3 (2016), 944--953.
[43]
Seyedhamidreza Motaman, Anirudh Iyengar, and Swaroop Ghosh. 2014. Synergistic circuit and system design for energy-efficient and robust domain wall caches. In International Symposium on Low Power Electronics and Design (ISLPED’14). 195--200.
[44]
Seyedhamidreza Motaman, Anirudh Srikant Iyengar, and Swaroop Ghosh. 2015. Domain wall memory-layout, circuit and synergistic systems. IEEE Transactions on Nanotechnology 14, 2 (March 2015), 282--291.
[45]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, 311--318.
[46]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, 27--40.
[47]
Stuart S. P. Parkin, Masamitsu Hayashi, and Luc Thomas. 2008. Magnetic domain-wall racetrack memory. Science 320, 5873 (2008), 190--194. arXiv:http://science.sciencemag.org/content/320/5873/190. full.pdf
[48]
Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In SIGIR’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281.
[49]
A. Ranjan, S. G. Ramasubramanian, R. Venkatesan, V. Pai, K. Roy, and A. Raghunathan. 2015. DyReCTape: A dynamically reconfigurable cache using domain wall memory tapes. In Proceedings of the 2015 Design, Automation Test in Europe Conference Exhibition (DATE’15). 181--186.
[50]
Brandon Reagen, Paul N. Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David M. Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ISCA.
[51]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 14--26.
[52]
Yongming Shen, Michael Ferdman, and Peter Milder. 2017. Maximizing CNN accelerator efficiency through resource partitioning. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, 535--547.
[53]
Clinton W. Smullen, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. The STeTSiMS STT-RAM simulation and modeling system. In ICCAD. IEEE Press, 318--325.
[54]
Z. Sun, X. Bi, W. Wu, S. Yoo, and H. Li. 2016. Array organization and data management exploration in racetrack memory. IEEE Transactions on Computers 65, 4 (April 2016), 1041--1054.
[55]
Zhenyu Sun, Wenqing Wu, and Hai (Helen) Li. 2013. Cross-layer racetrack memory design for ultra high density and low power consumption. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, Article 53, 6 pages.
[56]
Zhanrui Sun, Yongxin Zhu, Yu Zheng, Hao Wu, Zihao Cao, Peng Xiong, Junjie Hou, Tian Huang, and Zhiqiang Que. 2018. FPGA acceleration of LSTM based on data for test flight. In Proceedings of the 2018 IEEE International Conference on Smart Cloud (SmartCloud’18). IEEE, 1--6.
[57]
M. T. Tommiska. 2003. Efficient digital implementation of the sigmoid function for reprogrammable logic. IEE Proceedings-Computers and Digital Techniques 150, 6 (2003), 403--411.
[58]
Antonio Toral and Andy Way. 2018. What level of quality can neural machine translation attain on literary text? In Translation Quality Assessment. Springer, 263--287.
[59]
Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, 13--26.
[60]
Rangharajan Venkatesan, Vivek J. Kozhikkottu, Mrigank Sharad, Charles Augustine, Arijit Raychowdhury, Kaushik Roy, and Anand Raghunathan. 2016. Cache design with domain wall memory. IEEE Transactions on Computers 65, 4 (April 2016), 1010--1024.
[61]
R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan. 2013. DWM-TAPESTRI - An energy efficient all-spin cache using domain wall shift based writes. In Proceedings of the 2013 Design, Automation Test in Europe Conference Exhibition (DATE’13). 1825--1830.
[62]
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2016. Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge. CoRR abs/1609.06647 (2016). http://arxiv.org/abs/1609.06647.
[63]
M. Wang, Z. Wang, J. Lu, J. Lin, and Z. Wang. 2019. E-LSTM: An efficient hardware architecture for long short-term memory. IEEE Journal on Emerging and Selected Topics in Circuits and Systems (2019), 1--1.
[64]
Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). ACM, New York, 11--20.
[65]
X. Wang, J. Yu, C. Augustine, R. Iyer, and R. Das. 2019. Bit prudent in-cache acceleration of deep convolutional neural networks. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA’19). 81--93.
[66]
Yuhao Wang, Hao Yu, Leibin Ni, Guang-Bin Huang, Mei Yan, Chuliang Weng, Wei Yang, and Junfeng Zhao. 2015. An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Transactions on Nanotechnology 14, 6 (2015), 998--1012.
[67]
Yuhao Wang, Hao Yu, Dennis Sylvester, and Pingfan Kong. 2014. Energy efficient in-memory AES encryption based on nonvolatile domain-wall nanowire. In Design, Automation & Test in Europe Conference & Exhibition (DATE’14). 1--4.
[68]
Zhisheng Wang, Jun Lin, and Zhongfeng Wang. 2017. Accelerating recurrent neural networks: A memory-efficient approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 10 (2017), 2763--2775.
[69]
Cong Xu, Dimin Niu, Xiaochun Zhu, Seung H. Kang, Matt Nowak, and Yuan Xie. 2011. Device-architecture co-optimization of STT-RAM based memory for low power embedded systems. In ICCAD. IEEE Press, 463--470.
[70]
Hao Yu, Yuhao Wang, Shuai Chen, Wei Fei, Chuliang Weng, Junfeng Zhao, and Zhulin Wei. 2014. Energy efficient in-memory machine learning for data intensive image-processing by non-volatile domain-wall memory. In Proceedngs of the 19th Asia and South Pacific Design Automation Conference (ASP-DAC’14). 191--196.
[71]
Chao Zhang, Guangyu Sun, Weiqi Zhang, Fan Mi, Hai Li, and W. Zhao. 2015. Quantitative modeling of racetrack memory, a tradeoff among area, performance, and power. In Proceedings of the 20th Asia and South Pacific Design Automation Conference. 100--105.
[72]
Chao Zhang, Guangyu Sun, Xian Zhang, Weiqi Zhang, Weisheng Zhao, Tao Wang, Yun Liang, Yongpan Liu, Yu Wang, and Jiwu Shu. 2015. Hi-fi playback: Tolerating position errors in shift operations of racetrack memory. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, 694--706.
[73]
Yiwei Zhang, Chao Wang, Lei Gong, Yuntao Lu, Fan Sun, Chongchong Xu, Xi Li, and Xuehai Zhou. 2017. A power-efficient accelerator based on FPGAs for LSTM network. In Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER’17). IEEE, 629--630.
[74]
Y. Zhang, C. Zhang, J. Nan, Z. Zhang, X. Zhang, J. O. Klein, D. Ravelosona, G. Sun, and W. Zhao. 2016. Perspectives of racetrack memory for large-capacity on-chip memory: From device to system. IEEE Transactions on Circuits and Systems I: Regular Papers 63, 5 (May 2016), 629--638.
[75]
Weisheng Zhao, Nesrine Ben Romdhane, Yue Zhang, Jacques-Olivier Klein, and Define Ravelosona. 2013. Racetrack memory based reconfigurable computing. In Proceedings of the 2013 IEEE Faible Tension Faible Consommation (FTFC’13). IEEE, 1--4.

Cited By

View all
  • (2023)RansomShield: A Visualization Approach to Defending Mobile Systems Against RansomwareACM Transactions on Privacy and Security10.1145/357982226:3(1-30)Online publication date: 13-Mar-2023
  • (2022)Fast-track cacheProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532383(1-12)Online publication date: 28-Jun-2022
  • (2022)An Automatic-Addressing Architecture With Fully Serialized Access in Racetrack Memory for Energy-Efficient CNNsIEEE Transactions on Computers10.1109/TC.2020.304543371:1(235-250)Online publication date: 1-Jan-2022
  • Show More Cited By

Index Terms

  1. RNNFast: An Accelerator for Recurrent Neural Networks Using Domain-Wall Memory

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Journal on Emerging Technologies in Computing Systems
      ACM Journal on Emerging Technologies in Computing Systems  Volume 16, Issue 4
      Special Issue on Nanoelectronic Device, Circuit, Architecture Design, Part 2 and Regular Papers
      October 2020
      202 pages
      ISSN:1550-4832
      EISSN:1550-4840
      DOI:10.1145/3418801
      • Editor:
      • Ramesh Karri
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 18 September 2020
      Accepted: 01 May 2020
      Revised: 01 February 2020
      Received: 01 May 2019
      Published in JETC Volume 16, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. LSTM
      2. Recurrent neural networks
      3. accelerator
      4. domain-wall memory

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)59
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 10 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)RansomShield: A Visualization Approach to Defending Mobile Systems Against RansomwareACM Transactions on Privacy and Security10.1145/357982226:3(1-30)Online publication date: 13-Mar-2023
      • (2022)Fast-track cacheProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532383(1-12)Online publication date: 28-Jun-2022
      • (2022)An Automatic-Addressing Architecture With Fully Serialized Access in Racetrack Memory for Energy-Efficient CNNsIEEE Transactions on Computers10.1109/TC.2020.304543371:1(235-250)Online publication date: 1-Jan-2022
      • (2022)Voice Keyword Spotting on Edge Devices2022 5th International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT)10.1109/IMPACT55510.2022.10029228(1-5)Online publication date: 26-Nov-2022
      • (2022)Keyword Spotting with Deep Neural Network on Edge Devices2022 IEEE 12th International Conference on Electronics Information and Emergency Communication (ICEIEC)10.1109/ICEIEC54567.2022.9835061(98-102)Online publication date: 15-Jul-2022
      • (2022)Low power multiplier based long short-term memory hardware architecture for smart grid energy managementInternational Journal of System Assurance Engineering and Management10.1007/s13198-022-01662-wOnline publication date: 15-Apr-2022

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media