Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3130379.3130725guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Understanding the impact of precision quantization on the accuracy and energy of neural networks

Published: 27 March 2017 Publication History

Abstract

Deep neural networks are gaining in popularity as they are used to generate state-of-the-art results for a variety of computer vision and machine learning applications. At the same time, these networks have grown in depth and complexity in order to solve harder problems. Given the limitations in power budgets dedicated to these networks, the importance of low-power, low-memory solutions has been stressed in recent years. While a large number of dedicated hardware using different precisions has recently been proposed, there exists no comprehensive study of different bit precisions and arithmetic in both inputs and network parameters. In this work, we address this issue and perform a study of different bit-precisions in neural networks (from floating-point to fixed-point, powers of two, and binary). In our evaluation, we consider and analyze the effect of precision scaling on both network accuracy and hardware metrics including memory footprint, power and energy consumption, and design area. We also investigate training-time methodologies to compensate for the reduction in accuracy due to limited bit precision and demonstrate that in most cases, precision scaling can deliver significant benefits in design metrics at the cost of very modest decreases in network accuracy. In addition, we propose that a small portion of the benefits achieved when using lower precisions can be forfeited to increase the network size and therefore the accuracy. We evaluate our experiments, using three well-recognized networks and datasets to show its generality. We investigate the trade-offs and highlight the benefits of using lower precisions in terms of energy and memory footprint.

References

[1]
S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. A dynamically configurable coprocessor for convolutional neural networks. In ISCA, pages 247--257, 2010.
[2]
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, pages 269--284, 2014.
[3]
Y. H. Chen, J. Emer, and V. Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In ISCA, pages 367--379, 2016.
[4]
M. Courbariaux and Y. Bengio. Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. 2016.
[5]
M. Courbariaux, Y. Bengio, and J. David. Binaryconnect: Training deep neural networks with binary weights during propagations. CoRR, abs/1511.00363, 2015.
[6]
C. Farabet, C. Poulet, and Y. LeCun. An fpga-based stream processor for embedded real-time vision with convolutional networks. In ICCV Workshops, pages 878--885, 2009.
[7]
V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello. A 240 g-ops/s mobile coprocessor for deep neural networks. In CVPRW, pages 696--701, 2014.
[8]
S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. Deep learning with limited numerical precision. CoRR, abs/1502.02551, 2015.
[9]
P. Gysel. Ristretto: Hardware-oriented approximation of convolutional neural networks. CoRR, abs/1605.06402, 2016.
[10]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
[11]
J. Y. Kim, M. Kim, S. Lee, J. Oh, K. Kim, and H. J. Yoo. A 201.4 gops 496 mw real-time multi-object recognition processor with bio-inspired neural perception engine. IEEE Journal of Solid-State Circuits, 45(1):32--45, 2010.
[12]
A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images, 2009.
[13]
J. Kung, D. Kim, and S. Mukhopadhyay. A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses. In ISLPED, pages 85--90, 2015.
[14]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278--2324, 1998.
[15]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278--2324, 1998.
[16]
Z. Lin, M. Courbariaux, R. Memisevic, and Y. Bengio. Neural networks with few multiplications. CoRR, abs/1510.03009, 2015.
[17]
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. CoRR, abs/1603.05279, 2016.
[18]
M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf. A massively parallel coprocessor for convolutional neural networks. In ASAP, pages 53--60, 2009.
[19]
P. Sermanet, S. Chintala, and Y. LeCun. Convolutional neural networks applied to house numbers digit classification. In ICPR, pages 3288--3291, 2012.
[20]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recog- nition. ICLR, abs/1607.05418, 2015.
[21]
H. Tann, S. Hashemi, R. I. Bahar, and S. Reda. Runtime configurable deep neural networks for energy-accuracy trade-off. In CODES+ISSS, pages 1--10, 2016.
[22]
O. Temam. A defect-tolerant accelerator for emerging high-performance applications. In ISCA, pages 356--367, 2012.
[23]
S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan. Axnn: Energy-efficient neuromorphic systems using approximate computing. In ISLPED, pages 27--32, 2014.
[24]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In FPGA, pages 161--170, 2015.

Cited By

View all
  • (2023)Energy Time Fairness: Balancing Fair Allocation of Energy and Time for GPU WorkloadsProceedings of the Eighth ACM/IEEE Symposium on Edge Computing10.1145/3583740.3628435(53-66)Online publication date: 6-Dec-2023
  • (2022)Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network AccelerationACM Transactions on Design Automation of Electronic Systems10.1145/349273327:3(1-20)Online publication date: 28-Jan-2022
  • (2021)Hardware Acceleration for Embedded Keyword Spotting: Tutorial and SurveyACM Transactions on Embedded Computing Systems10.1145/347436520:6(1-25)Online publication date: 18-Oct-2021
  • Show More Cited By
  1. Understanding the impact of precision quantization on the accuracy and energy of neural networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe
    March 2017
    1814 pages

    Publisher

    European Design and Automation Association

    Leuven, Belgium

    Publication History

    Published: 27 March 2017

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)120
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Energy Time Fairness: Balancing Fair Allocation of Energy and Time for GPU WorkloadsProceedings of the Eighth ACM/IEEE Symposium on Edge Computing10.1145/3583740.3628435(53-66)Online publication date: 6-Dec-2023
    • (2022)Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network AccelerationACM Transactions on Design Automation of Electronic Systems10.1145/349273327:3(1-20)Online publication date: 28-Jan-2022
    • (2021)Hardware Acceleration for Embedded Keyword Spotting: Tutorial and SurveyACM Transactions on Embedded Computing Systems10.1145/347436520:6(1-25)Online publication date: 18-Oct-2021
    • (2021)A Reconfigurable Multiplier for Signed Multiplications with Asymmetric Bit-WidthsACM Journal on Emerging Technologies in Computing Systems10.1145/344621317:4(1-16)Online publication date: 30-Jun-2021
    • (2019)BiScaled-DNNProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317783(1-6)Online publication date: 2-Jun-2019
    • (2019)ApproxLPProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317774(1-6)Online publication date: 2-Jun-2019
    • (2019)Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural NetworksProceedings of the Conference for Next Generation Arithmetic 201910.1145/3316279.3316282(1-9)Online publication date: 13-Mar-2019
    • (2019)A Mixed Signal Architecture for Convolutional Neural NetworksACM Journal on Emerging Technologies in Computing Systems10.1145/330411015:2(1-26)Online publication date: 26-Mar-2019
    • (2019)SynergyACM Transactions on Embedded Computing Systems10.1145/330127818:2(1-23)Online publication date: 18-Mar-2019
    • (2018)Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networksProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201693(331-336)Online publication date: 22-Jan-2018
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media