research-article

CLU: A Near-Memory Accelerator Exploiting the Parallelism in Convolutional Neural Networks

Authors:

Hemangee K. KapoorAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 17, Issue 2

Article No.: 22, Pages 1 - 25

https://doi.org/10.1145/3427472

Published: 15 April 2021 Publication History

Abstract

Convolutional/Deep Neural Networks (CNNs/DNNs) are rapidly growing workloads for the emerging AI-based systems. The gap between the processing speed and the memory-access latency in multi-core systems affects the performance and energy efficiency of the CNN/DNN tasks. This article aims to alleviate this gap by providing a simple and yet efficient near-memory accelerator-based system that expedites the CNN inference. Towards this goal, we first design an efficient parallel algorithm to accelerate CNN/DNN tasks. The data is partitioned across the multiple memory channels (vaults) to assist in the execution of the parallel algorithm. Second, we design a hardware unit, namely the convolutional logic unit (CLU), which implements the parallel algorithm. To optimize the inference, the CLU is designed, and it works in three phases for layer-wise processing of data. Last, to harness the benefits of near-memory processing (NMP), we integrate homogeneous CLUs on the logic layer of the 3D memory, specifically the Hybrid Memory Cube (HMC). The combined effect of these results in a high-performing and energy-efficient system for CNNs/DNNs. The proposed system achieves a substantial gain in the performance and energy reduction compared to multi-core CPU- and GPU-based systems with a minimal area overhead of 2.37%.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In OSDI, 16, 265--283.

Digital Library

[2]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2016. A scalable processing-in-memory accelerator for parallel graph processing. ACM SIGARCH Computer Architecture News 43, 3 (2016), 105--117.

Digital Library

[3]

Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In Proceedings of the 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), IEEE, 336--348.

Digital Library

[4]

Shaahin Angizi, Zhezhi He, Farhana Parveen, and Deliang Fan. 2018. IMCE: Energy-efficient bit-wise in-memory convolution engine for deep neural network. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference. IEEE Press, 111--116.

Digital Library

[5]

Shaahin Angizi, Zhezhi He, Adnan Siraj Rakin, and Deliang Fan. 2018. CMP-PIM: An energy-efficient comparator-based processing-in-memory neural network accelerator. In Proceedings of the 55th Annual Design Automation Conference. ACM, 105.

Digital Library

[6]

Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. 2018. Neurostream: Scalable and energy efficient deep learning with smart memory cubes. IEEE Transactions on Parallel & Distributed Systems1 (2018), 420--434.

[7]

Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Computer Architecture News 38, 3 (2010), 247--257.

Digital Library

[8]

Xue-Wen Chen and Xiaotong Lin. 2014. Big data deep learning: Challenges and perspectives. IEEE Access 2 (2014), 514--525.

[9]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.

Digital Library

[10]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.

[11]

Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Andrew Ng. 2013. Deep learning with COTS HPC systems. In Proceedings of the International Conference on Machine Learning. 1337--1345.

[12]

Hybrid Memory Cube Consortium. 2013. Hybrid memory cube specification 1.0. Last Revision Jan (2013).

[13]

Francesco Conti and Luca Benini. 2015. A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition. EDA Consortium, 683--688.

[14]

George E. Dahl, Tara N. Sainath, and Geoffrey E. Hinton. 2013. Improving deep neural networks for LVCSR using rectified linear units and dropout. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8609--8613.

[15]

Palash Das and Hemangee K. Kapoor. 2018. Towards near-data processing of compare operations in 3D-stacked memory. In Proceedings of the 2018 Great Lakes Symposium on VLSI. ACM, 243--248.

[16]

P. Das and H. K. Kapoor. 2020. nZESPA: A near-3D-memory zero skipping parallel accelerator for CNNs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020), 1--13.

[17]

Palash Das, Shivam Lakhotia, Prabodh Shetty, and Hemangee K. Kapoor. 2018. Towards near data processing of convolutional neural networks. In Proceedings of the 2018 31st International Conference on VLSI Design and 2018 17th International Conference on Embedded Systems (VLSID). IEEE, 380--385.

[18]

Li Du, Yuan Du, Yilei Li, Junjie Su, Yen-Cheng Kuan, Chun-Chen Liu, and Mau-Chung Frank Chang. 2018. A reconfigurable streaming deep convolutional neural network accelerator for internet of things. IEEE Transactions on Circuits and Systems I: Regular Papers 65, 1 (2018), 198--208.

[19]

Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 445--450.

Digital Library

[20]

Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 283--295.

[21]

Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 113--124.

Digital Library

[22]

Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and flexible reconfigurable logic for near-data processing. In Proceedings of the 2016 IEEE 22nd International Symposium on High Performance Computer Architecture (HPCA). IEEE, 126--137.

[23]

Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. Tetris: Scalable and efficient neural network acceleration with 3D memory. ACM SIGOPS Operating Systems Review 51, 2 (2017), 751--764.

[24]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the International Conference on Machine Learning. 1737--1746.

[25]

Raia Hadsell, Pierre Sermanet, Jan Ben, Ayse Erkan, Marco Scoffier, Koray Kavukcuoglu, Urs Muller, and Yann LeCun. 2009. Learning long-range vision for autonomous off-road driving. Journal of Field Robotics 26, 2 (2009), 120--144.

Digital Library

[26]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[27]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management. ACM, 2333--2338.

Digital Library

[28]

Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In Proceedings of the 2012 Symposium on VLSI Technology (VLSIT). IEEE, 87--88.

[29]

Zhihao Jia, Matei Zaharia, and Alex Aiken. 2018. Beyond data and model parallelism for deep neural networks. arXiv preprint arXiv:1807.05358 (2018).

[30]

Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1--12.

[31]

Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Lam, Pratap Pattnaik, and Josep Torrellas. 2012. FlexRAM: Toward an advanced intelligent memory system. In Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD). IEEE, 5--14.

Digital Library

[32]

Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 380--392.

Digital Library

[33]

Duckhwan Kim, Taesik Na, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2018. Deeptrain: A programmable embedded platform for training deep neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2360--2370.

[34]

Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014).

[35]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[36]

Jinho Lee, Jongwook Chung, Jung Ho Ahn, and Kiyoung Choi. 2017. Excavating the hidden parallelism inside DRAM architectures with buffered compares. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 6 (2017), 1793--1806.

Digital Library

[37]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 469--480.

[38]

Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: An instruction set architecture for neural networks. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 393--405.

Digital Library

[39]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.

[40]

N. Manohar, Y. H. Sharath Kumar, Radhika Rani, and G. Hemantha Kumar. 2019. Convolutional neural network with SVM for classification of animal images. In Emerging Research in Electronics, Computer Science and Technology. Springer, 527--537.

[41]

J. Murphy. 2017. Deep learning benchmarks of NVIDIA Tesla P100 PCIe Tesla K80 and Tesla M40 GPUs.

[42]

Andreas Nowatzyk, Fong Pong, and Ashley Saulsbury. 1996. Missing the memory wall: The case for processor/memory integration. In Proceedings of the 1996 23rd Annual International Symposium on Computer Architecture. IEEE, 90--90.

[43]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In ACM SIGARCH Computer Architecture News, Vol. 45. ACM, 27--40.

[44]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17).

[45]

J. Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS). IEEE, 1--24.

[46]

Maurice Peemen, Arnaud A. A. Setio, Bart Mesman, and Henk Corporaal. 2013. Memory-centric accelerator design for convolutional neural Networks. In Proceedings of the ICCD, vol. 2013. 13--19.

[47]

Matthew Pickett. 2010. The Materials Science of Titanium Dioxide Memristors. Ph.D. Dissertation. UC Berkeley.

[48]

Seth H. Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads. In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 190--200.

[49]

Kiran Puttaswamy and Gabriel H. Loh. 2006. Thermal analysis of a 3D die-stacked high-performance microprocessor. In Proceedings of GLSVLSI. ACM, 19--24.

[50]

Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 873--880.

[51]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 234--241.

[52]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.

Digital Library

[53]

Murugan Sankaradas, Venkata Jakkula, Srihari Cadambi, Srimat Chakradhar, Igor Durdanovic, Eric Cosatto, and Hans Peter Graf. 2009. A massively parallel coprocessor for convolutional neural networks. In Proceedings of the 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009 (ASAP 2009). IEEE, 53--60.

Digital Library

[54]

Michael Schaffner, Frank K. Gürkaynak, Aljoscha Smolic, and Luca Benini. 2015. DRAM or no-DRAM?: Exploring linear solver architectures for image domain warping in 28 nm CMOS. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition. EDA Consortium, 707--712.

[55]

Vivek Seshadri, Kevin Hsieh, Amirali Boroum, Donghyuk Lee, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2015. Fast bulk bitwise AND and OR in DRAM. IEEE Computer Architecture Letters 14, 2 (2015), 127--131.

Digital Library

[56]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14--26.

Digital Library

[57]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[58]

Vinay Sriram, David Cox, Kuen Hung Tsoi, and Wayne Luk. 2010. Towards an embedded biologically-inspired machine vision processor. In Proceedings of the 2010 International Conference on Field-Programmable Technology (FPT). IEEE, 273--278.

[59]

JEDEC Standard. 2013. High bandwidth memory (HBM) DRAM. JESD235 (2013).

[60]

Yichuan Tang. 2013. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239 (2013).

[61]

Sam Likun Xi, Oreoluwa Babarinsa, Manos Athanassoulis, and Stratos Idreos. 2015. Beyond the wall: Near-data processing for databases. In Proceedings of the 11th International Workshop on Data Management on New Hardware. ACM, 2.

Digital Library

[62]

Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923 (2017).

[63]

Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision. Springer, 818--833.

[64]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170.

Digital Library

[65]

Qiuling Zhu, Berkin Akin, H. Ekin Sumbul, Fazle Sadi, James C. Hoe, Larry Pileggi, and Franz Franchetti. 2013. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In Proceedings of the 2013 IEEE International 3D Systems Integration Conference (3DIC). IEEE, 1--7.

Cited By

Wang JDu HDing BXu QChen SKang Y(2023)DDAM: Data Distribution-Aware Mapping of CNNs on Processing-In-Memory SystemsACM Transactions on Design Automation of Electronic Systems10.1145/357619628:3(1-30)Online publication date: 19-Mar-2023
https://dl.acm.org/doi/10.1145/3576196
An JAliaj EJun S(2023)PreCog: Near-Storage Accelerator for Heterogeneous CNN Inference2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP57973.2023.00021(45-52)Online publication date: Jul-2023
https://doi.org/10.1109/ASAP57973.2023.00021
Thomas K APoddar SMondal H(2022)A CNN Hardware Accelerator Using Triangle-based ConvolutionACM Journal on Emerging Technologies in Computing Systems10.1145/354497518:4(1-23)Online publication date: 13-Oct-2022
https://dl.acm.org/doi/10.1145/3544975
Show More Cited By

Index Terms

CLU: A Near-Memory Accelerator Exploiting the Parallelism in Convolutional Neural Networks
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Very large scale integration design
    1. Application-specific VLSI designs
      1. Application specific integrated circuits

Recommendations

Toward standardized near-data processing with unrestricted data placement for GPUs
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

3D-stacked memory devices with processing logic can help alleviate the memory bandwidth bottleneck in GPUs. However, in order for such Near-Data Processing (NDP) memory stacks to be used for different GPU architectures, it is desirable to standardize ...
Towards Near-Data Processing of Compare Operations in 3D-Stacked Memory
GLSVLSI '18: Proceedings of the 2018 Great Lakes Symposium on VLSI

The gap between the processing speed and memory access speed of the modern multi-core systems has become a bottleneck for the emerging data-intensive workloads. In this scenario, it has become a smarter idea to move some amount of computation closer to ...
Memory-system requirements for convolutional neural networks
MEMSYS '18: Proceedings of the International Symposium on Memory Systems

Energy efficiency of the underlying memory systems is a huge issue in most neural network accelerator designs. It is imperative to understand the characteristics and behavior of data in the algorithm of neural networks to gain an insight into their ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 17, Issue 2

Hardware and Algorithms for Efficient Machine Learning

April 2021

360 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/3446841

Editor:
Ramesh Karri
Polytechnic Institute of New York University, USA

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 15 April 2021

Accepted: 01 September 2020

Revised: 01 August 2020

Received: 01 May 2020

Published in JETC Volume 17, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
302
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)7

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang JDu HDing BXu QChen SKang Y(2023)DDAM: Data Distribution-Aware Mapping of CNNs on Processing-In-Memory SystemsACM Transactions on Design Automation of Electronic Systems10.1145/357619628:3(1-30)Online publication date: 19-Mar-2023
https://dl.acm.org/doi/10.1145/3576196
An JAliaj EJun S(2023)PreCog: Near-Storage Accelerator for Heterogeneous CNN Inference2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP57973.2023.00021(45-52)Online publication date: Jul-2023
https://doi.org/10.1109/ASAP57973.2023.00021
Thomas K APoddar SMondal H(2022)A CNN Hardware Accelerator Using Triangle-based ConvolutionACM Journal on Emerging Technologies in Computing Systems10.1145/354497518:4(1-23)Online publication date: 13-Oct-2022
https://dl.acm.org/doi/10.1145/3544975
Tao GLi YXu YFan JShen HHuang K(2022)A Near Memory Computing FPGA Architecture for Neural Network Acceleration2022 2nd International Conference on Frontiers of Electronics, Information and Computation Technologies (ICFEICT)10.1109/ICFEICT57213.2022.00100(543-548)Online publication date: Aug-2022
https://doi.org/10.1109/ICFEICT57213.2022.00100

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents