Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2593069.2593156acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Low Power GPGPU Computation with Imprecise Hardware

Published: 01 June 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Massively parallel computation in GPUs significantly boosts performance of compute-intensive applications but creates power and thermal issues that limit further performance scaling. This paper demonstrates significant GPGPU power savings by relaxing application accuracy requirements and enabling the use of low power imprecise hardware (IHW). A synthesized set of novel imprecise floating point arithmetic units is presented. GPGPU-Sim and GPUWattch are used to estimate impacts of IHW units on output quality and system-level power consumption, providing a quality-power tradeoff model for application-specific optimization. Experimental results for a 45 nm process show up to 32% power savings with negligible impacts on output quality.

    References

    [1]
    NVIDIA, "Whitepaper NVIDIA's Next Generation CUDA Compute Architecture," pp. 1--22, 2009, URL: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
    [2]
    S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," IISWC, pp. 44--54, Oct. 2009
    [3]
    A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," ISPASS, pp. 163--174, Apr. 2009
    [4]
    J. Leng, T. Hetherington, A. Eltantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling energy optimizations in GPGPUs," ISCA, pp. 487--498, June 2013
    [5]
    A. B. Kahng and S. Kang, "Accuracy-configurable adder for approximate arithmetic designs," DAC, pp. 820--825, June 2012
    [6]
    M. Weber, M. Putic, H. Zhang, and J. Lach, "Balancing adder for error tolerant applications," ISCAS, pp. 3038--3041, May 2013
    [7]
    K. Du, P. Varman, and K. Mohanram, "Static window addition: A new paradigm for the design of variable latency adders," ICCD, pp. 455--456, Oct. 2011
    [8]
    V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, "IMPACT: IMPrecise adders for low-power approximate computing," ISLPED, pp. 409--414, Aug. 2011
    [9]
    K. E. Wires, M. J. Schulte, and J. E. Stine, "Variable-correction truncated floating point multipliers," ACSSC, pp. 1344--1348, Oct.-Nov. 2000
    [10]
    A. Gupta, S. Mandavalli, V. J. Mooney, K.-V. Ling, A. Basu, H. Johan, and B. Tandianus, "Low power probabilistic floating point multiplier design," ISVLSI, pp. 182--187, July 2011
    [11]
    J. Ying, F. Tong, D. Nagle, and R. A. Rutenbar, "Reducing power by optimizing the necessary precision / range of floating-point arithmetic," IEEE TVLSI, vol. 8, no. 3, pp. 273--286, June 2000
    [12]
    K. Pillai, R. V. K. Pillai, D. Al-Khalili, and a. J. Al-Khalili, "A low power approach to floating point adder design," ICCD, pp. 178--185, Oct. 1997
    [13]
    J. Won and K. Choi, "Low power self-timed floating-point divider in 0.25 um technology," ESSCIRC, pp. 113--116, Sept. 2000
    [14]
    M. Kuhlmann and K. K. Parhi, "Fast low-power shared division and square-root architecture," ICCD, pp. 128--135, Oct. 1998
    [15]
    V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Low-power digital signal processing using approximate adders," IEEE TCAD, vol. 32, no. 1, pp. 124--137, Jan. 2013
    [16]
    N. R. Shanbhag and S. Member, "Reliable low-power digital signal processing via educed precision redundancy," IEEE TVLSI, vol. 12, no. 5, pp.497--510, May 2004
    [17]
    J. Pool, A. Lastra, M. Singh, and N. C. Hill, "Energy-precision tradeoffs in mobile graphics processing units," ICCD, pp. 60--67, Oct. 2008
    [18]
    M. D. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann, Oxford, Elsevier Science, 2004
    [19]
    R. E. Caflisch, "Monte Carlo and quasi-Monte Carlo methods," Acta Numerica, vol. 7, pp. 1--49, Jan. 1998
    [20]
    S. Li, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," Mirco, pp. 469--480, Dec. 2009
    [21]
    K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, "Temperature-aware microarchitecture." ISCA, pp. 2--13, 2003
    [22]
    Y. Yu and S. T. Acton, "Speckle reducing anisotropic diffusion," IEEE TIP, vol. 11, no. 11, pp. 1260--1270, Jan. 2002
    [23]
    A. J. Pinho, D. Electrnica, and T. Inesc, "Figures of merit for quality assessment of binary edge maps," ICIP, vol. 3, pp. 591--594, Sept. 1996
    [24]
    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE TIP, vol. 13, no. 4, pp. 600--612, Apr. 2004

    Cited By

    View all
    • (2024)Monotonic Asynchronous Two-Bit Full AdderElectronics10.3390/electronics1309171713:9(1717)Online publication date: 29-Apr-2024
    • (2024)High-efficiency and high-security emerging computing chips: development, challenges, and prospectsSCIENTIA SINICA Informationis10.1360/SSI-2023-031654:1(34)Online publication date: 3-Jan-2024
    • (2023)Speed, Power and Area Optimized Monotonic Asynchronous Array MultipliersJournal of Low Power Electronics and Applications10.3390/jlpea1401000114:1(1)Online publication date: 24-Dec-2023
    • Show More Cited By

    Recommendations

    Reviews

    Kai Diethelm

    In high-performance computing, reducing the amount of energy required to perform the actual computations has recently become a highly important issue. In this paper, Zhang et al. deal with this topic in the framework of a general-purpose computing on graphics processing units (GPGPU)-based hardware platform. The authors observe that certain arithmetical operations are very energy intensive and could be replaced by corresponding first-order approximations requiring a significantly smaller amount of energy. Thus, they suggest using so-called “imprecise hardware” where, for example, a classical hardware multiplier is implemented in such a way that the usual 24×24-bit mantissa multiplication is replaced by a 25×25-bit addition. In combination with a suitable handling of the exponents, this leads to an approximate way of computing the product. Using appropriate simulation tools, the authors demonstrate that their approach leads to substantially smaller energy requirements. Similar ideas are introduced for other frequently used arithmetical operations. Clearly, such an approach has a negative impact on the accuracy of the final result, but theoretical analysis and some concrete examples show that the degradation of the output is usually not severe. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    DAC '14: Proceedings of the 51st Annual Design Automation Conference
    June 2014
    1249 pages
    ISBN:9781450327305
    DOI:10.1145/2593069
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Approximate Computing
    2. Floating Point Unit
    3. GPGPU
    4. Imprecise Hardware
    5. Special Function Unit

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    DAC '14

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Monotonic Asynchronous Two-Bit Full AdderElectronics10.3390/electronics1309171713:9(1717)Online publication date: 29-Apr-2024
    • (2024)High-efficiency and high-security emerging computing chips: development, challenges, and prospectsSCIENTIA SINICA Informationis10.1360/SSI-2023-031654:1(34)Online publication date: 3-Jan-2024
    • (2023)Speed, Power and Area Optimized Monotonic Asynchronous Array MultipliersJournal of Low Power Electronics and Applications10.3390/jlpea1401000114:1(1)Online publication date: 24-Dec-2023
    • (2022)Digital Image Compression Using Approximate AdditionElectronics10.3390/electronics1109136111:9(1361)Online publication date: 25-Apr-2022
    • (2022)Approximator: A Software Tool for Automatic Generation of Approximate Arithmetic CircuitsComputers10.3390/computers1101001111:1(11)Online publication date: 8-Jan-2022
    • (2022)Effectful program distancingProceedings of the ACM on Programming Languages10.1145/34986806:POPL(1-30)Online publication date: 12-Jan-2022
    • (2022)Quasi delay insensitive implementation of approximate multiplicationAin Shams Engineering Journal10.1016/j.asej.2021.10.02413:3(101629)Online publication date: May-2022
    • (2021)Approximate Array MultipliersElectronics10.3390/electronics1005063010:5(630)Online publication date: 9-Mar-2021
    • (2021)Towards Fine-Grained Online Adaptive Approximation Control for Dense SLAM on Embedded GPUsACM Transactions on Design Automation of Electronic Systems10.1145/348661227:2(1-19)Online publication date: 2-Nov-2021
    • (2021)DTA-PUF: Dynamic Timing-aware Physical Unclonable Function for Resource-constrained DevicesACM Journal on Emerging Technologies in Computing Systems10.1145/343428117:3(1-24)Online publication date: 12-Aug-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media