research-article

Free access

Understanding the impact of precision quantization on the accuracy and energy of neural networks

Authors:

Soheil Hashemi,

Nicholas Anthony,

Sherief RedaAuthors Info & Claims

DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

Pages 1478 - 1483

Published: 27 March 2017 Publication History

Abstract

Deep neural networks are gaining in popularity as they are used to generate state-of-the-art results for a variety of computer vision and machine learning applications. At the same time, these networks have grown in depth and complexity in order to solve harder problems. Given the limitations in power budgets dedicated to these networks, the importance of low-power, low-memory solutions has been stressed in recent years. While a large number of dedicated hardware using different precisions has recently been proposed, there exists no comprehensive study of different bit precisions and arithmetic in both inputs and network parameters. In this work, we address this issue and perform a study of different bit-precisions in neural networks (from floating-point to fixed-point, powers of two, and binary). In our evaluation, we consider and analyze the effect of precision scaling on both network accuracy and hardware metrics including memory footprint, power and energy consumption, and design area. We also investigate training-time methodologies to compensate for the reduction in accuracy due to limited bit precision and demonstrate that in most cases, precision scaling can deliver significant benefits in design metrics at the cost of very modest decreases in network accuracy. In addition, we propose that a small portion of the benefits achieved when using lower precisions can be forfeited to increase the network size and therefore the accuracy. We evaluate our experiments, using three well-recognized networks and datasets to show its generality. We investigate the trade-offs and highlight the benefits of using lower precisions in terms of energy and memory footprint.

References

[1]

S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. A dynamically configurable coprocessor for convolutional neural networks. In ISCA, pages 247--257, 2010.

Digital Library

[2]

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, pages 269--284, 2014.

Digital Library

[3]

Y. H. Chen, J. Emer, and V. Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In ISCA, pages 367--379, 2016.

Digital Library

[4]

M. Courbariaux and Y. Bengio. Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. 2016.

[5]

M. Courbariaux, Y. Bengio, and J. David. Binaryconnect: Training deep neural networks with binary weights during propagations. CoRR, abs/1511.00363, 2015.

Digital Library

[6]

C. Farabet, C. Poulet, and Y. LeCun. An fpga-based stream processor for embedded real-time vision with convolutional networks. In ICCV Workshops, pages 878--885, 2009.

[7]

V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello. A 240 g-ops/s mobile coprocessor for deep neural networks. In CVPRW, pages 696--701, 2014.

Digital Library

[8]

S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. Deep learning with limited numerical precision. CoRR, abs/1502.02551, 2015.

[9]

P. Gysel. Ristretto: Hardware-oriented approximation of convolutional neural networks. CoRR, abs/1605.06402, 2016.

[10]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.

[11]

J. Y. Kim, M. Kim, S. Lee, J. Oh, K. Kim, and H. J. Yoo. A 201.4 gops 496 mw real-time multi-object recognition processor with bio-inspired neural perception engine. IEEE Journal of Solid-State Circuits, 45(1):32--45, 2010.

[12]

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images, 2009.

[13]

J. Kung, D. Kim, and S. Mukhopadhyay. A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses. In ISLPED, pages 85--90, 2015.

[14]

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278--2324, 1998.

[15]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278--2324, 1998.

[16]

Z. Lin, M. Courbariaux, R. Memisevic, and Y. Bengio. Neural networks with few multiplications. CoRR, abs/1510.03009, 2015.

[17]

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. CoRR, abs/1603.05279, 2016.

[18]

M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf. A massively parallel coprocessor for convolutional neural networks. In ASAP, pages 53--60, 2009.

Digital Library

[19]

P. Sermanet, S. Chintala, and Y. LeCun. Convolutional neural networks applied to house numbers digit classification. In ICPR, pages 3288--3291, 2012.

[20]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recog- nition. ICLR, abs/1607.05418, 2015.

[21]

H. Tann, S. Hashemi, R. I. Bahar, and S. Reda. Runtime configurable deep neural networks for energy-accuracy trade-off. In CODES+ISSS, pages 1--10, 2016.

Digital Library

[22]

O. Temam. A defect-tolerant accelerator for emerging high-performance applications. In ISCA, pages 356--367, 2012.

Digital Library

[23]

S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan. Axnn: Energy-efficient neuromorphic systems using approximate computing. In ISLPED, pages 27--32, 2014.

Digital Library

[24]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In FPGA, pages 161--170, 2015.

Digital Library

Cited By

Liang QHanafy WBashir NIrwin DShenoy PSha KBanerjee SChen J(2023)Energy Time Fairness: Balancing Fair Allocation of Energy and Time for GPU WorkloadsProceedings of the Eighth ACM/IEEE Symposium on Edge Computing10.1145/3583740.3628435(53-66)Online publication date: 6-Dec-2023
https://dl.acm.org/doi/10.1145/3583740.3628435
Elangovan RJain SRaghunathan A(2022)Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network AccelerationACM Transactions on Design Automation of Electronic Systems10.1145/349273327:3(1-20)Online publication date: 28-Jan-2022
https://dl.acm.org/doi/10.1145/3492733
Giraldo JVerhelst M(2021)Hardware Acceleration for Embedded Keyword Spotting: Tutorial and SurveyACM Transactions on Embedded Computing Systems10.1145/347436520:6(1-25)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3474365
Show More Cited By

Understanding the impact of precision quantization on the accuracy and energy of neural networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Noise Resilience of Reduced Precision Neural Networks
HEART '23: Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

Reduced Precision Neural Networks, where computations are performed with as low as one or two bits of precision, are starting to find relevance in a wide range of applications, including vision, speech, and natural language processing. Such networks ...
Predicting Performance and Accuracy of Mixed-Precision Programs for Precision Tuning
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

A mixed-precision program is a floating-point program that utilizes different precisions for different operations, providing the opportunity of balancing the trade-off between accuracy and performance. Precision tuning aims to find a mixed-precision ...
Power Awareness in Low Precision Neural Networks
Computer Vision – ECCV 2022 Workshops
Abstract
Existing approaches for reducing DNN power consumption rely on quite general principles, including avoidance of multiplication operations and aggressive quantization of weights and activations. However, these methods do not consider the precise ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

March 2017

1814 pages

Publisher

European Design and Automation Association

Leuven, Belgium

Publication History

Published: 27 March 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
246
Total Downloads

Downloads (Last 12 months)120
Downloads (Last 6 weeks)10

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liang QHanafy WBashir NIrwin DShenoy PSha KBanerjee SChen J(2023)Energy Time Fairness: Balancing Fair Allocation of Energy and Time for GPU WorkloadsProceedings of the Eighth ACM/IEEE Symposium on Edge Computing10.1145/3583740.3628435(53-66)Online publication date: 6-Dec-2023
https://dl.acm.org/doi/10.1145/3583740.3628435
Elangovan RJain SRaghunathan A(2022)Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network AccelerationACM Transactions on Design Automation of Electronic Systems10.1145/349273327:3(1-20)Online publication date: 28-Jan-2022
https://dl.acm.org/doi/10.1145/3492733
Giraldo JVerhelst M(2021)Hardware Acceleration for Embedded Keyword Spotting: Tutorial and SurveyACM Transactions on Embedded Computing Systems10.1145/347436520:6(1-25)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3474365
Guo CZhang LZhou XZhang GLi BQian WYin XZhuo C(2021)A Reconfigurable Multiplier for Signed Multiplications with Asymmetric Bit-WidthsACM Journal on Emerging Technologies in Computing Systems10.1145/344621317:4(1-16)Online publication date: 30-Jun-2021
https://dl.acm.org/doi/10.1145/3446213
Jain SVenkataramani SSrinivasan VChoi JGopalakrishnan KChang L(2019)BiScaled-DNNProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317783(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317783
Imani MSokolova AGarcia RHuang AWu FAksanli BRosing T(2019)ApproxLPProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317774(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317774
Carmichael ZLangroudi HKhazanov CLillie JGustafson JKudithipudi DGustafson JDimitrov V(2019)Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural NetworksProceedings of the Conference for Next Generation Arithmetic 201910.1145/3316279.3316282(1-9)Online publication date: 13-Mar-2019
https://dl.acm.org/doi/10.1145/3316279.3316282
Lou QPan CMcGuinness JHorvath ANaeemi ANiemier MHu X(2019)A Mixed Signal Architecture for Convolutional Neural NetworksACM Journal on Emerging Technologies in Computing Systems10.1145/330411015:2(1-26)Online publication date: 26-Mar-2019
https://dl.acm.org/doi/10.1145/3304110
Zhong GDubey ATan CMitra T(2019)SynergyACM Transactions on Embedded Computing Systems10.1145/330127818:2(1-23)Online publication date: 18-Mar-2019
https://dl.acm.org/doi/10.1145/3301278
Zhang BDavoodi AHu YShin Y(2018)Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networksProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201693(331-336)Online publication date: 22-Jan-2018
https://dl.acm.org/doi/10.5555/3201607.3201693
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents