Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

Published: 11 June 2020 Publication History

Abstract

Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety of machine learning tasks and are deployed in increasing numbers of products and services. However, the computational requirements of training and evaluating large-scale DNNs are growing at a much faster pace than the capabilities of the underlying hardware platforms that they are executed upon. To address this challenge, one promising approach is to exploit the error resilient nature of DNNs by skipping or approximating computations that have negligible impact on classification accuracy. Almost all prior efforts in this direction propose static DNN approximations by either pruning network connections, implementing computations at lower precision, or compressing weights.
In this work, we propose <u>Dy</u>namic <u>V</u>ariable <u>E</u>ffort <u>Deep</u> Neural Networks (DyVEDeep) to reduce the computational requirements of DNNs during inference. Complementary to the aforementioned static approaches, DyVEDeep is a dynamic approach that exploits heterogeneity in the DNN inputs to improve their compute efficiency with comparable classification accuracy and without requiring any re-training. DyVEDeep equips DNNs with dynamic effort mechanisms that identify computations critical to classifying a given input and focus computational effort only on the critical computations, while skipping or approximating the rest. We propose three dynamic effort mechanisms that operate at different levels of granularity viz. neuron, feature, and layer levels. We build DyVEDeep versions of six popular image recognition benchmarks (CIFAR-10, AlexNet, OverFeat, VGG-16, SqueezeNet, and Deep-Compressed-AlexNet) within the Caffe deep-learning framework. We evaluate DyVEDeep on two platforms—a high-performance server with a 2.7 GHz Intel Xeon E5-2680 processor and 128 GB memory, and a low-power Raspberry Pi board with an ARM Cortex A53 processor and 1 GB memory. Across all benchmarks, DyVEDeep achieves 2.47×--5.15× reduction in the number of scalar operations, which translates to 1.94×--2.23× and 1.46×--3.46× performance improvement over well-optimized baselines on the Xeon server and the Raspberry Pi, respectively, with comparable classification accuracy.

References

[1]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, 1097--1105. Retrieved from http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.
[2]
Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann Lecun. [n.d.] Overfeat: Integrated recognition, localization and detection using convolutional networks. Retrieved from http://arxiv.org/abs/1312.6229.
[3]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from http://arxiv.org/abs/1409.1556.
[4]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 248--255.
[5]
Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. Retrieved from http://arxiv.org/abs/1404.5997.
[6]
Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dhiraj D. Kalamkar, Bharat Kaul, and Pradeep Dubey. 2016. Distributed deep learning using synchronous stochastic gradient descent. Retrieved from http://arxiv.org/abs/1602.06709.
[7]
Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’12).
[8]
Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-Bit stochastic gradient descent and application to data-parallel distributed training of speech DNNs. In Proceedings of the Conference of the International Speech (Interspeech’14).
[9]
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’11). 109--116.
[10]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE Computer Society, Washington, DC, 609--622.
[11]
Norman Jouppi. [n.d.] Google supercharges machine learning tasks with custom chip. Retrieved from https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html.
[12]
Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, New York, NY, 13--26.
[13]
Xiaoxiao Liu, Mengjie Mao, Beiye Liu, Hai Li, Yiran Chen, Boxun Li, Yu Wang, Hao Jiang, Mark Barnell, Qing Wu, and Jianhua Yang. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY.
[14]
Shankar Ganesh Ramasubramanian, Rangharajan Venkatesan, Mrigank Sharad, Kaushik Roy, and Anand Raghunathan. 2014. SPINDLE: SPINtronic deep learning engine for large-scale neuromorphic computing. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). ACM, New York, NY, 15--20.
[15]
Yann LeCun, John S. Denker, and Sara A. Solla. 1989. Optimal brain damage. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’89). 598--605. Retrieved from http://papers.nips.cc/paper/250-optimal-brain-damage.
[16]
Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. Retrieved from http://arxiv.org/abs/1506.02626.
[17]
Chao Liu, Zhiyong Zhang, and Dong Wang. 2014. Pruning deep neural networks by optimal brain damage. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH’14). 1092--1095. Retrieved from http://www.isca-speech.org/archive/interspeech_2014/i14_1092.html.
[18]
Swagath Venkataramani, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2014. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). ACM, New York, NY, 27--32.
[19]
Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’15). 1131--1135.
[20]
Shawn Tan and Khe Chai Sim. 2016. Towards implicit complexity control using variable-depth deep neural networks for automatic speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). 5965--5969.
[21]
Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, and Hadi Esmaeilzadeh. 2018. SnaPEA: Predictive early activation for reducing computation in deep convolutional neural networks. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE Press, Piscataway, NJ, 662--673.
[22]
Hokchhay Tann, Soheil Hashemi, R. Iris Bahar, and Sherief Reda. 2016. Runtime configurable deep neural networks for energy-accuracy trade-off. In Proceedings of the 11th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES’16). ACM, New York, NY.
[23]
H. Tann, S. Hashemi, R. I. Bahar, and S. Reda. 2017. Hardware-software codesign of accurate, multiplier-free deep neural networks. In Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17). 1--6.
[24]
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. Retrieved from http://arxiv.org/abs/1805.06085.
[25]
E. Park, D. Kim, S. Kim, Y. Kim, G. Kim, S. Yoon, and S. Yoo. 2015. Big/little deep neural network for ultra low power inference. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’15). 124--132.
[26]
Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. BranchyNet: Fast inference via early exiting from deep neural networks. Retrieved from http://arxiv.org/abs/1709.01686.
[27]
Yoshua Bengio. 2013. Estimating or propagating gradients through stochastic neurons. Retrieved from http://arxiv.org/abs/1305.2982.
[28]
Lei Jimmy Ba and Brendan J. Frey. 2013. Adaptive dropout for training deep neural networks. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS’13). 3084--3092. Retrieved from http://papers.nips.cc/paper/5032-adaptive-dropout-for-training-deep-neural-networks.
[29]
Swagath Venkataramani, Anand Raghunathan, Jie Liu, and Mohammed Shoaib. 2015. Scalable-effort classifiers for energy-efficient machine learning. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY.
[30]
Nitthilan Kannappan Jayakodi, Anwesha Chatterjee, Wonje Choi, Janardhan Rao Doppa, and Partha Pande. 2018. Trading-off accuracy and energy of deep inference on embedded systems: A co-design approach. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. (July 2018), 1--1.
[31]
P. Panda, A. Sengupta, and K. Roy. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’16). 475--480.
[32]
Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. Retrieved from http://arxiv.org/abs/1504.04788.
[33]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. Retrieved from http://arxiv.org/abs/1510.00149.
[34]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
[35]
C. Wu, D. Brooks, K. Chen, D. Chen, S. Choudhury, M. Dukhan, K. Hazelwood, E. Isaac, Y. Jia, B. Jia, T. Leyvand, H. Lu, Y. Lu, L. Qiao, B. Reagen, J. Spisak, F. Sun, A. Tulloch, P. Vajda, X. Wang, Y. Wang, B. Wasti, Y. Wu, R. Xian, S. Yoo, and P. Zhang. 2019. Machine learning at Facebook: Understanding inference at the edge. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’19). 331--344.
[36]
BVLC. Caffe model zoo. [n.d.]. Retrieved from https://github.com/BVLC/caffe/wiki/Model-Zoo.
[37]
BVLC. Caffe cifar-10 network. [n.d.]. Retrieved from https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10_quick_train_test.prototxt.
[38]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report.
[39]
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and &lt;1 MB model size. Retrieved from http://arxiv.org/abs/1602.07360.
[40]
Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. Virtualizing deep neural networks for memory-efficient neural network design. Retrieved from http://arxiv.org/abs/1602.08124.
[41]
Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. Retrieved from http://arxiv.org/abs/1404.0736.
[42]
Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference (BMVC’14). Retrieved from http://www.bmva.org/bmvc/2014/papers/paper073/index.html.
[43]
Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall F. Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 806--814.
[44]
Michaël Mathieu, Mikael Henaff, and Yann LeCun. 2013. Fast training of convolutional networks through FFTs. Retrieved from http://arxiv.org/abs/1312.5851.
[45]
Michael Figurnov, Dmitry P. Vetrov, and Pushmeet Kohli. 2015. PerforatedCNNs: Acceleration through elimination of redundant convolutions. Retrieved from http://arxiv.org/abs/1504.08362.

Cited By

View all
  • (2023)Energy-Efficient Approximate Edge Inference SystemsACM Transactions on Embedded Computing Systems10.1145/358976622:4(1-50)Online publication date: 31-Mar-2023
  • (2022)Dynamically throttleable neural networksMachine Vision and Applications10.1007/s00138-022-01311-z33:4Online publication date: 7-Jul-2022

Index Terms

  1. DyVEDeep: Dynamic Variable Effort Deep Neural Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 19, Issue 3
    May 2020
    156 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3400880
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 11 June 2020
    Online AM: 07 May 2020
    Accepted: 01 November 2019
    Revised: 01 September 2019
    Received: 01 August 2019
    Published in TECS Volume 19, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DyVEDeep
    2. deep neural networks

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Energy-Efficient Approximate Edge Inference SystemsACM Transactions on Embedded Computing Systems10.1145/358976622:4(1-50)Online publication date: 31-Mar-2023
    • (2022)Dynamically throttleable neural networksMachine Vision and Applications10.1007/s00138-022-01311-z33:4Online publication date: 7-Jul-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media