Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going

Published: 30 May 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Deep neural networks have proven to be particularly effective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardware-oriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms of both throughput and energy efficiency. Application-tailored accelerators, when co-designed with approximation-based network training methods, transform large, dense, and computationally expensive networks into small, sparse, and hardware-efficient alternatives, increasing the feasibility of network deployment. In this article, we provide a comprehensive evaluation of approximation methods for high-performance network inference along with in-depth discussion of their effectiveness for custom hardware implementation. We also include proposals for future research based on a thorough analysis of current trends. This article represents the first survey providing detailed comparisons of custom hardware accelerators featuring approximation for both convolutional and recurrent neural networks, through which we hope to inspire exciting new developments in the field.

    References

    [1]
    Intel AI. 2017. Intel Nervana Neural Network Processors (NNP) Redefine AI Silicon. Retrieved from https://ai.intel.com/intel-nervana-neural-network-processors-nnp-redefine-ai-silicon/.
    [2]
    Filipp Akopyan, Jun Sawada, Andrew Cassidy, Rodrigo Alvarez-Icaza, John Arthur, Paul Merolla, Nabil Imam, Yutaka Nakamura, Pallab Datta, and Gi-Joon Nam. 2015. TrueNorth: Design and tool flow of a 65mW 1 million neuron programmable neurosynaptic chip. IEEE Trans. Comput.-aided Design Integr. Circ. Syst. 34, 10 (2015).
    [3]
    Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O’Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic deep neural network computing. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture.
    [4]
    Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie E. Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In ACM SIGARCH Computer Architecture News.
    [5]
    Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In Proceedings of the International Joint Conference on Neural Networks.
    [6]
    Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, and Aaron Courville. 2016. Dynamic capacity networks. In Proceedings of the International Conference on Machine Learning.
    [7]
    Amara Amara, Frederic Amiel, and Thomas Ea. 2006. FPGA vs. ASIC for low power applications. Microelectron. J. 37, 8 (2006).
    [8]
    Hesham Amin, K. Mervyn Curtis, and Barrie R. Hayes-Gill. 1997. Piecewise linear approximation applied to nonlinear function of a neural network. IEE Proceedings—Circuits, Devices and Systems 144, 6 (1997).
    [9]
    Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. 2018. YodaNN: An architecture for ultra-low power binary-weight CNN acceleration. IEEE Trans. Comput.-aided Design Integr. Circ. Syst. 37, 1 (2018).
    [10]
    Sayed O. Ayat, Mohamed Khalil-Hani, and Ab Al-Hadi Ab Rahman. 2018. Optimizing FPGA-based CNN accelerator for energy efficiency with an extended roofline model. Turkish J. Electric. Eng. Comput. Sci. 26, 2 (2018).
    [11]
    Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep? In Proceedings of the Conference on Neural Information Processing Systems.
    [12]
    Nathan Bell and Michael Garland. 2009. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
    [13]
    Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. 2015. Conditional computation in neural networks for faster models. In Proceedings of the International Conference on Learning Representations.
    [14]
    Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
    [15]
    Andrew Boutros, Sadegh Yazdanshenas, and Vaughn Betz. 2018. Embracing diversity: Enhanced DSP blocks for low-precision deep learning on FPGAs. In Proceedings of the International Conference on Field-programmable Logic and Applications.
    [16]
    Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. 2017. Deep learning with low precision by half-wave Gaussian quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    [17]
    Adrian Caulfield, Eric Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A cloud-scale acceleration architecture. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture.
    [18]
    Andre X. M. Chang and Eugenio Culurciello. 2017. Hardware accelerators for recurrent neural networks on FPGA. In Proceedings of the International Symposium on Circuits and Systems.
    [19]
    Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision.
    [20]
    Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. In Proceedings of the Conference on Neural Information Processing Systems.
    [21]
    Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In Proceedings of the International Conference on Machine Learning.
    [22]
    Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, and Ninghui Sun. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture.
    [23]
    Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-state Circuits 52, 1 (2017).
    [24]
    Jian Cheng, Peisong Wang, Gang Li, Qinghao Hu, and Hanqing Lu. 2018. Recent advances in efficient computation of deep convolutional neural networks. Front. Info. Technol. Electron. Eng. 19, 1 (2018).
    [25]
    Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. 2018. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Process. Mag. 35, 1 (2018).
    [26]
    Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, and Shi-Fu Chang. 2015. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the International Conference on Computer Vision.
    [27]
    Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, and Shih-Fu Chang. 2015. Fast neural networks with circulant projections. arXiv preprint arXiv:1502.03436 (2015).
    [28]
    Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengil, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Christian Boehn, Oren Firestein, Alessandro Forin, Kang S. Gatlin, Mahdi Ghandi, Stephen Heil, Kyle Holohan, Tamas Juhasz, Ratna K. Kovvuri, Sitaram Lanka, Friedel van Megen, Dima Mukhortov, Prerak Patel, Steve Reinhardt, Adam Sapek, Raja Seera, Balaji Sridharan, Lisa Woods, Phillip Yi-Xiao, Ritchie Zhao, and Doug Burger. 2017. Accelerating persistent neural networks at datacenter scale. In Proceedings of the Conference on Hot Chips.
    [29]
    Philip Colangelo, Nasibeh Nasiri, Eriko Nurvitadhi, Asit Mishra, Martin Margala, and Kevin Nealis. 2018. Exploration of low numerical precision deep learning inference using Intel FPGAs. In Proceedings of the International Symposium on Field-programmable Custom Computing Machines.
    [30]
    Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or . arXiv preprint arXiv:1602.02830 (2016).
    [31]
    Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. In Conference on Neural Information Processing Systems.
    [32]
    Matthieu Courbariaux, Jean-Pierre David, and Yoshua Bengio. 2015. Low precision storage for deep learning. In International Conference on Learning Representations.
    [33]
    Vin De Silva and Lek-Heng Lim. 2006. Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30, 3 (2006).
    [34]
    Wei Deng, Wotao Yin, and Yin Zhang. 2013. Group sparse optimization by alternating direction method. In Proceedings of the International Society for Optical Engineering.
    [35]
    Misha Denil, Babak Shakibi, Laurent Dinh, and Nando De Freitas. 2013. Predicting parameters in deep learning. In Proceedings of the Conference on Neural Information Processing Systems.
    [36]
    Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of the Conference on Neural Information Processing Systems.
    [37]
    Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, and Geng Yuan. 2017. CirCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture.
    [38]
    Caiwen Ding, Ao Ren, Geng Yuan, Xiaolong Ma, Jiayu Li, Ning Liu, Bo Yuan, and Yanzhi Wang. 2018. Structured weight matrices-based hardware accelerators in deep neural networks: FPGAs and ASICs. arXiv preprint arXiv:1804.11239 (2018).
    [39]
    Wlodzislaw Duch and Norbert Jankowski. 1999. Survey of neural transfer functions. Neural Comput. Surveys 2, 1 (1999).
    [40]
    Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the IEEE Computer Society Computer Vision and Pattern Recognition Workshops.
    [41]
    Sean Fox, David Boland, and Philip H. W. Leong. 2018. FPGA FastFood -- A high speed systolic implementation of a large scale online kernel method. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [42]
    Dhiraj Gandhi, Lerrel Pinto, and Abhinav Gupta. 2017. Learning to fly by crashing. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.
    [43]
    Chang Gao, Daniel Neil, Enea Ceolini, Shih-Chii Liu, and Tobi Delbruck. 2018. DeltaRNN: A power-efficient recurrent neural network accelerator. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [44]
    Mohammad Ghasemzadeh, Mohammad Samragh, and Farinaz Koushanfar. 2018. ReBNet: Residual binarized neural network. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines.
    [45]
    Robert M. Gray. 2006. Toeplitz and circulant matrices: A review. Found. Trends Commun. Info. Theory 2, 3 (2006).
    [46]
    Yijin Guan, Zhihang Yuan, Guangyu Sun, and Jason Cong. 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In Proceedings of the Asia and South Pacific Design Automation Conference.
    [47]
    Denis A. Gudovskiy and Luca Rigazio. 2017. ShiftCNN: Generalized low-precision architecture for inference of convolutional neural networks. arXiv preprint arXiv:1706.02393 (2017).
    [48]
    Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Song Yao, Song Han, Yu Wang, and Huazhong Yang. 2016. Angel-Eye: A complete design flow for mapping CNN onto customized hardware. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI.
    [49]
    Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, and Huazhong Yang. 2017. A survey of FPGA based neural network accelerator. ACM Trans. Reconfig. Technol. Syst. 9, 4 (2017).
    [50]
    Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient DNNs. In Proceedings of the Conference on Neural Information Processing Systems.
    [51]
    Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the International Conference on Machine Learning.
    [52]
    Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, and Yu Wang. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [53]
    Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the ACM/IEEE International Symposium on Computer Architecture.
    [54]
    Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the International Conference on Learning Representations.
    [55]
    Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Conference on Neural Information Processing Systems.
    [56]
    Babak Hassibi and David G. Stork. 1993. Second order derivatives for network pruning: Optimal brain surgeon. In Proceedings of the Conference on Neural Information Processing Systems.
    [57]
    Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision.
    [58]
    Gopalakrishna Hegde and Nachiket Kapre. 2018. CaffePresso: Accelerating convolutional networks on embedded SoCs. ACM Trans. Embed. Comput. Syst. 17, 1 (2018).
    [59]
    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
    [60]
    Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
    [61]
    Intel. 2018. Intel at Hot Chips 2018: Showing the Ankle of Cascade Lake. Retrieved from https://www.anandtech.com/show/13239/intel-at-hot-chips-2018-showing-the-ankle-of-cascade-lake.
    [62]
    Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2017. Quantization and training of neural networks for efficient integer-arithmetic-only inference. arXiv preprint arXiv:1712.05877 (2017).
    [63]
    Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference.
    [64]
    Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (2011).
    [65]
    Norman P. Jouppi, Cliff Young, Nishant Patil, and David Patterson. 2018. A domain-specific architecture for deep neural networks. Commun. ACM 61, 9 (2018).
    [66]
    Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, and Al Borchers. 2017. In-datacenter performance analysis of a Tensor Processing Unit. In Proceedings of the International Symposium on Computer Architecture.
    [67]
    Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M. Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture.
    [68]
    Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In International Conference on Computer Vision.
    [69]
    Soroosh Khoram and Jing Li. 2018. Adaptive quantization of neural networks. In Proceedings of the International Conference on Learning Representations.
    [70]
    Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Kloss Carey, Ruby J. Pai, and Naveen Rao. 2017. Flexpoint: An adaptive numerical format for efficient training of deep neural networks. In Proceedings of the Conference on Neural Information Processing Systems.
    [71]
    Alexandros Kouris, Stylianos I. Venieris, and Christos-Savvas Bouganis. 2018. CascadeCNN: Pushing the performance limits of quantisation in convolutional neural networks. In Proceedings of the International Conference on Field-programmable Logic and Applications.
    [72]
    Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2017. Deep convolutional neural network inference with floating-point weights and fixed-point activations. In Proceedings of the International Conference on Machine Learning.
    [73]
    Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky. 2015. Speeding-up convolutional neural networks using fine-tuned CP-decomposition. In Proceedings of the International Conference on Learning Representations.
    [74]
    Vadim Lebedev and Victor Lempitsky. 2016. Fast convnets using group-wise brain damage. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    [75]
    Yann LeCun, John S. Denker, and Sara A. Solla. 1990. Optimal Brain Damage. In Proceedings of the Conference on Neural Information Processing Systems.
    [76]
    Edward H. Lee, Daisuke Miyashita, Elaina Chai, Boris Murmann, and Simon S. Wong. 2017. LogNet: Energy-efficient neural networks using logarithmic computation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
    [77]
    Bing Li, Wei Wen, Jiachen Mao, Sicheng Li, Yiran Chen, and Hai Li. 2018. Running sparse and low-precision neural network: When algorithm meets hardware. In Proceedings of the Asia and South Pacific Design Automation Conference.
    [78]
    Fengfu Li and Bin Liu. 2016. Ternary weight networks. In Proceedings of the Conference on Neural Information Processing Systems.
    [79]
    Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, and Tom Goldstein. 2017. Training quantized nets: A deeper understanding. In Proceedings of the Conference on Neural Information Processing Systems.
    [80]
    Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans P. Graf. 2017. Pruning filters for efficient convnets. In Proceedings of the International Conference on Learning Representations.
    [81]
    Sicheng Li, Wei Wen, Yu Wang, Song Han, Yiran Chen, and Hai Li. 2017. An FPGA design framework for CNN sparsification and acceleration. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines.
    [82]
    Sicheng Li, Chunpeng Wu, Hai Li, Boxun Li, Yu Wang, and Qinru Qiu. 2015. FPGA acceleration of recurrent neural network based language model. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines.
    [83]
    Shuang Liang, Shouyi Yin, Leibo Liu, Wayne Luk, and Shaojun Wei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275, C (2018).
    [84]
    Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. 2016. Fixed point quantization of deep convolutional networks. In Proceedings of the International Conference on Machine Learning.
    [85]
    Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Runtime neural pruning. In Proceedings of the Conference on Neural Information Processing Systems.
    [86]
    Xiaofan Lin, Cong Zhao, and Wei Pan. 2017. Towards accurate binary convolutional neural network. In Proceedings of the Conference on Neural Information Processing Systems.
    [87]
    Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. 2015. Neural networks with few multiplications. In Proceedings of the International Conference on Learning Representations.
    [88]
    Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. 2013. Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2013).
    [89]
    Lanlan Liu and Jia Deng. 2017. Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution. arXiv preprint arXiv:1701.00299 (2017).
    [90]
    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision.
    [91]
    Xuan Liu, Di Cao, and Kai Yu. 2018. Binarized LSTM language model. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics.
    [92]
    Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the International Conference on Computer Vision.
    [93]
    Zhiyun Lu, Vikas Sindhwani, and Tara N. Sainath. 2016. Learning compact recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
    [94]
    Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-Sun Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [95]
    Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, and Pradeep Dubey. 2017. Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462 (2017).
    [96]
    Asit Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. 2018. WRPN: Wide reduced-precision networks. In Proceedings of the International Conference on Learning Representations.
    [97]
    Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient inference. In Proceedings of the International Conference on Learning Representations.
    [98]
    Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically tuning sparse matrix-vector multiplication for GPU architectures. In Proceedings of the International Conference on High-performance Embedded Architectures and Compilers.
    [99]
    Bert Moons and Marian Verhelst. 2016. A 0.3--2.6 TOPS/W precision-scalable processor for real-time large-scale convnets. In Proceedings of the IEEE Symposium on VLSI Circuits.
    [100]
    Duncan Moss, Srivatsan Krishnan, Eriko Nurvitadhi, Piotr Ratuszniak, Chris Johnson, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, and Philip H. W. Leong. 2018. A customizable matrix multiplication framework for the Intel HARPv2 Xeon + FPGA platform. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [101]
    Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. 2015. Adding gradient noise improves learning for very deep networks. In Proceedings of the International Conference on Learning Representations.
    [102]
    Steven J. Nowlan and Geoffrey E. Hinton. 1992. Simplifying neural networks by soft weight-sharing. Neural Comput. 4, 4 (1992).
    [103]
    Eriko Nurvitadhi, Jeff Cook, Asit Mishra, Debbie Marr, Kevin Nealis, Philip Colangelo, Andrew Ling, Davor Capalija, Utku Aydonat, Sergey Shumarayev, and Aravind Dasu. 2018. In-package domain-specific ASICs for Intel Stratix 10 FPGAs: A case study of accelerating deep learning using TensorTile ASIC. In Proceedings of the International Conference on Field-programmable Logic and Applications.
    [104]
    Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason O. G. Hock, Yeong T. Liew, Krishnan Srivatsan, Duncan Moss, and Suchit Subhaschandra. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [105]
    Nvidia. 2018. CUDA C Programming Guide. Retrieved from https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions.
    [106]
    Nvidia. 2018. NVIDIA Turing Architecture Whitepaper. Retrieved from https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf.
    [107]
    Georg Ofenbeck, Ruedi Steinmann, Victoria Caparros, Daniele G. Spampinato, and Markus Puschel. 2014. Applying the roofline model. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.
    [108]
    Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, and Yoshua Bengio. 2016. Recurrent neural networks with limited numerical precision. arXiv preprint arXiv:1608.06902 (2016).
    [109]
    Thorbjörn Posewsky and Daniel Ziener. 2018. Throughput optimizations for FPGA-based deep neural network inference. Microprocess. Microsyst. 60 (2018).
    [110]
    Adrien Prost-Boucle, Alban Bourge, Frédéric Pétrot, Hande Alemdar, Nicholas Caldwell, and Vincent Leroy. 2017. Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In Proceedings of the International Conference on Field-programmable Logic and Applications.
    [111]
    Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, and Sen Song. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [112]
    Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision.
    [113]
    Mohammad S. Razlighi, Mohsen Imani, Farinaz Koushanfar, and Tajana Rosing. 2017. LookNN: Neural network with no multiplication. In Proceedings of the Design, Automation and Test Conference in Europe.
    [114]
    Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae-Kyu Lee, José M. Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ACM SIGARCH Computer Architecture News.
    [115]
    Michalis Rizakis, Stylianos I. Venieris, Alexandros Kouris, and Christos-Savvas Bouganis. 2018. Approximate FPGA-based LSTMs under computation time constraints. In Proceedings of the International Symposium on Applied Reconfigurable Computing.
    [116]
    Adriana Romero, Nicolas Ballas, Samira E. Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. FITNets: Hints for thin deep nets. In Proceedings of the International Conference on Learning Representations.
    [117]
    Bita D. Rouhani, Azalia Mirhoseini, and Farinaz Koushanfar. 2016. Delight: Adding energy dimension to deep neural networks. In Proceedings of the International Symposium on Low Power Electronics and Design.
    [118]
    Bita D. Rouhani, Azalia Mirhoseini, and Farinaz Koushanfar. 2017. Deep3: Leveraging three levels of parallelism for efficient deep learning. In Proceedings of the Design Automation Conference.
    [119]
    Charbel Sakr, Yongjune Kim, and Naresh Shanbhag. 2017. Analytical guarantees on numerical precision of deep neural networks. In Proceedings of the International Conference on Machine Learning.
    [120]
    Mohammad Samragh, Mohammad Ghasemzadeh, and Farinaz Koushanfar. 2017. Customizing neural networks for efficient FPGA implementation. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines.
    [121]
    Eric Schurman and Jake Brutlag. 2009. The user and business impact of server delays, additional bytes, and HTTP chunking in Web search. In Proceedings of the Velocity Conference.
    [122]
    Abigail See, Minh-Thang Luong, and Christopher D. Manning. 2016. Compression of neural machine translation models via pruning. In Proceedings of the SIGNLL Conference on Computational Natural Language Learning.
    [123]
    Sayeh Sharify, Alberto Delmás, Kevin Siu, Patrick Judd, and Andreas Moshovos. 2018. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. In Proceedings of the Design Automation Conference.
    [124]
    Sayeh Sharify, Mostafa Mahmoud, Alberto Delmás, Milos Nikolic, and Andreas Moshovos. 2018. Laconic deep learning computing. arXiv preprint arXiv:1805.04513 (2018).
    [125]
    Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon K. Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture.
    [126]
    Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. In Proceedings of the International Symposium on Computer Architecture.
    [127]
    Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, and Chunyuan Zhang. 2018. Towards a uniform template-based architecture for accelerating 2D and 3D CNNs on FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [128]
    Sungho Shin, Yoonho Boo, and Wonyong Sung. 2017. Fixed-point optimization of deep neural networks with adaptive step size retraining. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
    [129]
    Sungho Shin, Kyuyeon Hwang, and Wonyong Sung. 2016. Fixed-point performance analysis of recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
    [130]
    Nathan Silberman and Sergio Guadarrama. 2016. TensorFlow-Slim Image Classification Model Library. Retrieved from https://github.com/tensorflow/models/tree/master/research/slim.
    [131]
    Vikas Sindhwani, Tara N. Sainath, and Sanjiv Kumar. 2015. Structured transforms for small-footprint deep learning. In Proceedings of the Conference on Neural Information Processing Systems.
    [132]
    Suraj Srinivas and R. Venkatesh Babu. 2015. Data-free parameter pruning for deep neural networks. arXiv preprint arXiv:1507.06149 (2015).
    [133]
    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014).
    [134]
    Jiang Su, Julian Faraone, Junyi Liu, Yiren Zhao, David B. Thomas, Philip H. W. Leong, and Peter Y. K. Cheung. 2018. Redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification. In Proceedings of the International Symposium on Applied Reconfigurable Computing.
    [135]
    Wonyong Sung and Ki-Il Kum. 1995. Simulation-based word-length optimization method for fixed-point digital signal processing systems. IEEE Trans. Signal Process. 43, 12 (1995).
    [136]
    Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017).
    [137]
    Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In Proceedings of the Association for the Advancement of Artificial Intelligence.
    [138]
    Cheng Tai, Tong Xiao, Yi Zhang, and Xiaogang Wang. 2016. Convolutional neural networks with low-rank regularization. In Proceedings of the International Conference on Learning Representations.
    [139]
    Wei Tang, Gang Hua, and Liang Wang. 2017. How to train a compact binary neural network with high accuracy? In Proceedings of the Association for the Advancement of Artificial Intelligence.
    [140]
    Karen Ullrich, Edward Meeds, and Max Welling. 2017. Soft weight-sharing for neural network compression. In Proceedings of the International Conference on Learning Representations.
    [141]
    Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip H. W. Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [142]
    Stylianos I. Venieris and Christos-Savvas Bouganis. 2016. fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines.
    [143]
    Stylianos I. Venieris and Christos-Savvas Bouganis. 2017. Latency-driven design for FPGA-based convolutional neural networks. In Proceedings of the International Conference on Field-programmable Logic and Applications.
    [144]
    Erwei Wang, James J. Davis, and Peter Y. K. Cheung. 2018. A PYNQ-based framework for rapid CNN prototyping. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines.
    [145]
    Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [146]
    Zhisheng Wang, Jun Lin, and Zhongfeng Wang. 2017. Accelerating recurrent neural networks: A memory-efficient approach. IEEE Trans. VLSI Syst. 25, 10 (2017).
    [147]
    Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Conference on Neural Information Processing Systems.
    [148]
    Darrell Williamson. 1991. Dynamically scaled fixed point arithmetic. In Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference.
    [149]
    Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    [150]
    Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. 2018. Training and inference with integers in deep neural networks. In Proceedings of the International Conference on Learning Representations.
    [151]
    Xilinx. 2018. Versal, the First Adaptive Compute Acceleration Platform. Retrieved from https://www.xilinx.com/support/documentation/white_papers/wp505-versal-acap.pdf.
    [152]
    Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. 2017. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    [153]
    Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. 2018. NetAdapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision.
    [154]
    Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, and Ziyu Wang. 2015. Deep fried convnets. In Proceedings of the International Conference on Computer Vision.
    [155]
    Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. In Proceedings of the International Conference On Computer Aided Design.
    [156]
    Jialiang Zhang and Jing Li. 2018. PQ-CNN: Accelerating product quantized convolutional neural network on FPGA. In Proceedings of the International Symposium on Field-programmable Custom Computing Machines.
    [157]
    Xiaofan Zhang, Xinheng Liu, Anand Ramachandran, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Kyle Rupnow, and Deming Chen. 2017. High-performance video content recognition with long-term recurrent convolutional network for FPGA. In Proceedings of the International Conference on Field-programmable Logic and Applications.
    [158]
    Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays.
    [159]
    Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2016. Incremental network quantization: Towards lossless CNNs with low-precision weights. In Proceedings of the International Conference on Learning Representations.
    [160]
    Hao Zhou, Jose M. Alvarez, and Fatih Porikli. 2016. Less is more: Towards compact CNNs. In Proceedings of the European Conference on Computer Vision.
    [161]
    Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).
    [162]
    Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2017. Trained ternary quantization. In Proceedings of the International Conference on Learning Representations.
    [163]
    Shilin Zhu, Xin Dong, and Hao Su. 2018. Binary ensemble neural network: More bits per network or more networks per bit? arXiv preprint arXiv:1806.07550 (2018).

    Cited By

    View all
    • (2024)Hardware Acceleration for Object Detection using YOLOv5 Deep Learning Algorithm on Xilinx Zynq FPGA PlatformEngineering, Technology & Applied Science Research10.48084/etasr.676114:1(13066-13071)Online publication date: 8-Feb-2024
    • (2024)A Survey on Convolutional Neural Networks and Their Performance Limitations in Image Recognition TasksJournal of Sensors10.1155/2024/27973202024:1Online publication date: 12-Jul-2024
    • (2024)Implementing an Integrated Neural Network for Real-Time Position Reconstruction in Emission Tomography With Monolithic ScintillatorsIEEE Transactions on Radiation and Plasma Medical Sciences10.1109/TRPMS.2024.33784218:5(501-510)Online publication date: May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 52, Issue 2
    March 2020
    770 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/3320149
    • Editor:
    • Sartaj Sahni
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2019
    Accepted: 01 January 2019
    Revised: 01 December 2018
    Received: 01 September 2018
    Published in CSUR Volume 52, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ASICs
    2. FPGAs
    3. approximation methods
    4. convolutional neural networks
    5. recurrent neural networks

    Qualifiers

    • Survey
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)218
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Hardware Acceleration for Object Detection using YOLOv5 Deep Learning Algorithm on Xilinx Zynq FPGA PlatformEngineering, Technology & Applied Science Research10.48084/etasr.676114:1(13066-13071)Online publication date: 8-Feb-2024
    • (2024)A Survey on Convolutional Neural Networks and Their Performance Limitations in Image Recognition TasksJournal of Sensors10.1155/2024/27973202024:1Online publication date: 12-Jul-2024
    • (2024)Implementing an Integrated Neural Network for Real-Time Position Reconstruction in Emission Tomography With Monolithic ScintillatorsIEEE Transactions on Radiation and Plasma Medical Sciences10.1109/TRPMS.2024.33784218:5(501-510)Online publication date: May-2024
    • (2024)Stage-Wise Magnitude-Based Pruning for Recurrent Neural NetworksIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.318473035:2(1666-1680)Online publication date: Feb-2024
    • (2024)Low-Cost FPGA Implementation of Deep Learning-Based Heart Sound Segmentation for Real-Time CVDs ScreeningIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.339227173(1-16)Online publication date: 2024
    • (2024)A Survey on Neural Network Hardware AcceleratorsIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33771475:8(3801-3822)Online publication date: Aug-2024
    • (2024)Embedded Deep Learning Accelerators: A Survey on Recent AdvancesIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33117765:5(1954-1972)Online publication date: May-2024
    • (2024)Intelligence Beyond the Edge using Hyperdimensional Computing2024 23rd ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)10.1109/IPSN61024.2024.00005(1-13)Online publication date: 13-May-2024
    • (2024)Iterative Multiplication Unit for Inference and Train Neural NetworksAdvances in Signal Processing and Communication Engineering10.1007/978-981-97-0562-7_49(591-603)Online publication date: 4-Jul-2024
    • (2023)A unifying review of edge intelligent computing technique applications in the field of energy networksJournal of Industrial and Management Optimization10.3934/jimo.2023027(0-0)Online publication date: 2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media