research-article

Low Power GPGPU Computation with Imprecise Hardware

Authors:

Hang Zhang,

Mateja Putic,

John LachAuthors Info & Claims

DAC '14: Proceedings of the 51st Annual Design Automation Conference

Pages 1 - 6

https://doi.org/10.1145/2593069.2593156

Published: 01 June 2014 Publication History

Get Access

Abstract

Massively parallel computation in GPUs significantly boosts performance of compute-intensive applications but creates power and thermal issues that limit further performance scaling. This paper demonstrates significant GPGPU power savings by relaxing application accuracy requirements and enabling the use of low power imprecise hardware (IHW). A synthesized set of novel imprecise floating point arithmetic units is presented. GPGPU-Sim and GPUWattch are used to estimate impacts of IHW units on output quality and system-level power consumption, providing a quality-power tradeoff model for application-specific optimization. Experimental results for a 45 nm process show up to 32% power savings with negligible impacts on output quality.

References

[1]

NVIDIA, "Whitepaper NVIDIA's Next Generation CUDA Compute Architecture," pp. 1--22, 2009, URL: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf

Google Scholar

[2]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," IISWC, pp. 44--54, Oct. 2009

Digital Library

Google Scholar

[3]

A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," ISPASS, pp. 163--174, Apr. 2009

Google Scholar

[4]

J. Leng, T. Hetherington, A. Eltantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling energy optimizations in GPGPUs," ISCA, pp. 487--498, June 2013

Digital Library

Google Scholar

[5]

A. B. Kahng and S. Kang, "Accuracy-configurable adder for approximate arithmetic designs," DAC, pp. 820--825, June 2012

Digital Library

Google Scholar

[6]

M. Weber, M. Putic, H. Zhang, and J. Lach, "Balancing adder for error tolerant applications," ISCAS, pp. 3038--3041, May 2013

Google Scholar

[7]

K. Du, P. Varman, and K. Mohanram, "Static window addition: A new paradigm for the design of variable latency adders," ICCD, pp. 455--456, Oct. 2011

Digital Library

Google Scholar

[8]

V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, "IMPACT: IMPrecise adders for low-power approximate computing," ISLPED, pp. 409--414, Aug. 2011

Digital Library

Google Scholar

[9]

K. E. Wires, M. J. Schulte, and J. E. Stine, "Variable-correction truncated floating point multipliers," ACSSC, pp. 1344--1348, Oct.-Nov. 2000

Google Scholar

[10]

A. Gupta, S. Mandavalli, V. J. Mooney, K.-V. Ling, A. Basu, H. Johan, and B. Tandianus, "Low power probabilistic floating point multiplier design," ISVLSI, pp. 182--187, July 2011

Digital Library

Google Scholar

[11]

J. Ying, F. Tong, D. Nagle, and R. A. Rutenbar, "Reducing power by optimizing the necessary precision / range of floating-point arithmetic," IEEE TVLSI, vol. 8, no. 3, pp. 273--286, June 2000

Digital Library

Google Scholar

[12]

K. Pillai, R. V. K. Pillai, D. Al-Khalili, and a. J. Al-Khalili, "A low power approach to floating point adder design," ICCD, pp. 178--185, Oct. 1997

Digital Library

Google Scholar

[13]

J. Won and K. Choi, "Low power self-timed floating-point divider in 0.25 um technology," ESSCIRC, pp. 113--116, Sept. 2000

Google Scholar

[14]

M. Kuhlmann and K. K. Parhi, "Fast low-power shared division and square-root architecture," ICCD, pp. 128--135, Oct. 1998

Digital Library

Google Scholar

[15]

V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Low-power digital signal processing using approximate adders," IEEE TCAD, vol. 32, no. 1, pp. 124--137, Jan. 2013

Digital Library

Google Scholar

[16]

N. R. Shanbhag and S. Member, "Reliable low-power digital signal processing via educed precision redundancy," IEEE TVLSI, vol. 12, no. 5, pp.497--510, May 2004

Digital Library

Google Scholar

[17]

J. Pool, A. Lastra, M. Singh, and N. C. Hill, "Energy-precision tradeoffs in mobile graphics processing units," ICCD, pp. 60--67, Oct. 2008

Google Scholar

[18]

M. D. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann, Oxford, Elsevier Science, 2004

Google Scholar

[19]

R. E. Caflisch, "Monte Carlo and quasi-Monte Carlo methods," Acta Numerica, vol. 7, pp. 1--49, Jan. 1998

Crossref

Google Scholar

[20]

S. Li, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," Mirco, pp. 469--480, Dec. 2009

Digital Library

Google Scholar

[21]

K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, "Temperature-aware microarchitecture." ISCA, pp. 2--13, 2003

Digital Library

Google Scholar

[22]

Y. Yu and S. T. Acton, "Speckle reducing anisotropic diffusion," IEEE TIP, vol. 11, no. 11, pp. 1260--1270, Jan. 2002

Digital Library

Google Scholar

[23]

A. J. Pinho, D. Electrnica, and T. Inesc, "Figures of merit for quality assessment of binary edge maps," ICIP, vol. 3, pp. 591--594, Sept. 1996

Google Scholar

[24]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE TIP, vol. 13, no. 4, pp. 600--612, Apr. 2004

Digital Library

Google Scholar

Cited By

View all

Balasubramanian PMaskell D(2024)Monotonic Asynchronous Two-Bit Full AdderElectronics10.3390/electronics1309171713:9(1717)Online publication date: 29-Apr-2024
https://doi.org/10.3390/electronics13091717
LIU WCHEN KWU BDENG EWANG YGONG YCUI YWANG C(2024)High-efficiency and high-security emerging computing chips: development, challenges, and prospectsSCIENTIA SINICA Informationis10.1360/SSI-2023-031654:1(34)Online publication date: 3-Jan-2024
https://doi.org/10.1360/SSI-2023-0316
Balasubramanian PMastorakis N(2023)Speed, Power and Area Optimized Monotonic Asynchronous Array MultipliersJournal of Low Power Electronics and Applications10.3390/jlpea1401000114:1(1)Online publication date: 24-Dec-2023
https://doi.org/10.3390/jlpea14010001
Show More Cited By

Index Terms

Low Power GPGPU Computation with Imprecise Hardware
1. Mathematics of computing
  1. Mathematical analysis
    1. Functional analysis
      1. Approximation
2. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis

Recommendations

ARGA: Approximate Reuse for GPGPU Acceleration
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

Many data-driven applications including computer vision, speech recognition, and medical diagnostics show tolerance to error during computation. These applications are often accelerated on GPUs, but high computational costs limit performance and ...
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
A unified optimizing compiler framework for different GPGPU architectures

This article presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and ...

Reviews

Reviewer: Kai Diethelm

In high-performance computing, reducing the amount of energy required to perform the actual computations has recently become a highly important issue. In this paper, Zhang et al. deal with this topic in the framework of a general-purpose computing on graphics processing units (GPGPU)-based hardware platform. The authors observe that certain arithmetical operations are very energy intensive and could be replaced by corresponding first-order approximations requiring a significantly smaller amount of energy. Thus, they suggest using so-called “imprecise hardware” where, for example, a classical hardware multiplier is implemented in such a way that the usual 24×24-bit mantissa multiplication is replaced by a 25×25-bit addition. In combination with a suitable handling of the exponents, this leads to an approximate way of computing the product. Using appropriate simulation tools, the authors demonstrate that their approach leads to substantially smaller energy requirements. Similar ideas are introduced for other frequently used arithmetical operations. Clearly, such an approach has a negative impact on the accuracy of the final result, but theoretical analysis and some concrete examples show that the degradation of the output is usually not severe. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

DAC '14: Proceedings of the 51st Annual Design Automation Conference

June 2014

1249 pages

ISBN:9781450327305

DOI:10.1145/2593069

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

EDAC: Electronic Design Automation Consortium
SIGBED: ACM Special Interest Group on Embedded Systems
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

DAC '14

DAC '14: The 51st Annual Design Automation Conference 2014

June 1 - 5, 2014

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
359
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Balasubramanian PMaskell D(2024)Monotonic Asynchronous Two-Bit Full AdderElectronics10.3390/electronics1309171713:9(1717)Online publication date: 29-Apr-2024
https://doi.org/10.3390/electronics13091717
LIU WCHEN KWU BDENG EWANG YGONG YCUI YWANG C(2024)High-efficiency and high-security emerging computing chips: development, challenges, and prospectsSCIENTIA SINICA Informationis10.1360/SSI-2023-031654:1(34)Online publication date: 3-Jan-2024
https://doi.org/10.1360/SSI-2023-0316
Balasubramanian PMastorakis N(2023)Speed, Power and Area Optimized Monotonic Asynchronous Array MultipliersJournal of Low Power Electronics and Applications10.3390/jlpea1401000114:1(1)Online publication date: 24-Dec-2023
https://doi.org/10.3390/jlpea14010001
Balasubramanian PNayar RMaskell D(2022)Digital Image Compression Using Approximate AdditionElectronics10.3390/electronics1109136111:9(1361)Online publication date: 25-Apr-2022
https://doi.org/10.3390/electronics11091361
Balasubramanian PNayar RMin OMaskell D(2022)Approximator: A Software Tool for Automatic Generation of Approximate Arithmetic CircuitsComputers10.3390/computers1101001111:1(11)Online publication date: 8-Jan-2022
https://doi.org/10.3390/computers11010011
Dal Lago UGavazzo F(2022)Effectful program distancingProceedings of the ACM on Programming Languages10.1145/34986806:POPL(1-30)Online publication date: 12-Jan-2022
https://dl.acm.org/doi/10.1145/3498680
Balasubramanian PMastorakis N(2022)Quasi delay insensitive implementation of approximate multiplicationAin Shams Engineering Journal10.1016/j.asej.2021.10.02413:3(101629)Online publication date: May-2022
https://doi.org/10.1016/j.asej.2021.10.024
Balasubramanian PNayar RMaskell D(2021)Approximate Array MultipliersElectronics10.3390/electronics1005063010:5(630)Online publication date: 9-Mar-2021
https://doi.org/10.3390/electronics10050630
Bu TYan KTan J(2021)Towards Fine-Grained Online Adaptive Approximation Control for Dense SLAM on Embedded GPUsACM Transactions on Design Automation of Electronic Systems10.1145/348661227:2(1-19)Online publication date: 2-Nov-2021
https://dl.acm.org/doi/10.1145/3486612
Tsiokanos IMiskelly JGu CO’neill MKarakonstantis G(2021)DTA-PUF: Dynamic Timing-aware Physical Unclonable Function for Resource-constrained DevicesACM Journal on Emerging Technologies in Computing Systems10.1145/343428117:3(1-24)Online publication date: 12-Aug-2021
https://dl.acm.org/doi/10.1145/3434281
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

ARGA: Approximate Reuse for GPGPU Acceleration

From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

A unified optimizing compiler framework for different GPGPU architectures

Reviews

Access critical reviews of Computing literature here

Comments

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

ARGA: Approximate Reuse for GPGPU Acceleration

From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

A unified optimizing compiler framework for different GPGPU architectures

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations