Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2464996.2465022acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

Published: 10 June 2013 Publication History
  • Get Citation Alerts
  • Abstract

    State-of-art graphics processing units (GPUs) employ the single-instruction multiple-data (SIMD) style execution to achieve both high computational throughput and energy efficiency. As previous works have shown, there exists significant computational redundancy in SIMD execution, where different execution lanes operate on the same operand values. Such value locality is referred to as uniform vectors. In this paper, we first show that besides redundancy within a uniform vector, different vectors can also have the identical values. Then, we propose detailed architecture designs to exploit both types of redundancy. For redundancy within a uniform vector, we propose to either extend the vector register file with token bits or add a separate small scalar register file to eliminate redundant computations as well as redundant data storage. For redundancy across different uniform vectors, we adopt instruction reuse, proposed originally for CPU architectures, to detect and eliminate redundancy. The elimination of redundant computations and data storage leads to both significant energy savings and performance improvement. Furthermore, we propose to leverage such redundancy to protect arithmetic-logic units (ALUs) and register files against hardware errors. Our detailed evaluation shows that our proposed design has low hardware overhead and achieves performance gains, up to 23.9% and 12.0% on average, along with energy savings, up to 24.8% and 12.6% on average, as well as a 21.1% and 14.1% protection coverage for ALUs and register files, respectively.

    References

    [1]
    AMD Accelerated Parallel Processing OpenCL Programming Guide 2.1, May 2012
    [2]
    A. Bakhoda, et al., Analyzing CUDA workloads using a detailed GPU simulator. IPASS 2009.
    [3]
    S. Che, et al., Rodinia: a benchmark suite for heterogeneous computing, IISWC 2009.
    [4]
    Z. Chen, et al., Characterizing Scalar Opportunities in GPGPU Applications, ISPSS, 2013
    [5]
    S. Collange, et al., Dynamic detection of uniform and affine vectors in GPGPU computations, Euro-Par, 2009
    [6]
    S. Collange. Identifying scalar behavior in CUDA kernels. Technical report hal-00555134, 2011.
    [7]
    B. Coutinho, et al., Divergence analysis and optimizations, PACT 2011.
    [8]
    M. Dimitrov, et al., Understanding software approaches for GPGPU reliability, GPGPU-2, 2009
    [9]
    S. Gilani, N. Kim, M. Schulte: Power-efficient computing for compute-intensive GPGPU applications. PACT 2012.
    [10]
    M. Gomaa and T. Vijaykumar, "Opportunistic Transient-Fault Detection", ISCA-32, 2005.
    [11]
    N. B. Lakshminarayana and H. Kim, Effect of Instruction Fetch and Memory Scheduling on GPU Performance, Workshop on Language, Compiler, and Architecture Support for GPGPU, 2010.
    [12]
    C. J. Lee, et al. Prefetch-aware DRAM controllers. MICRO-41, 2008.
    [13]
    J. Leng, et al., GPUWattch: Enabling Energy Optimizations in GPGPUs, ISCA, 2013
    [14]
    S. Li at al., McPAT: an integrated power, area and timing modeling framework for multicore and manycore architectures, MICRO 2009.
    [15]
    G. Long, et al., Minimal Multi-Threading: Finding and Removing Redundant Instructions in Multi-Threaded Processors. MICRO, 2010.
    [16]
    Y. Lee, et al. Convergence and Scalarization for Data-Parallel Architectures. CGO 2013.
    [17]
    NVIDIA GPU Computing SDK 3.1.
    [18]
    J. Sheaffer, et al. A Hardware Redundancy and Recovery Mechanism for Reliable Scientific Computation on Graphics Processors. Graphics Hardware 2007.
    [19]
    A. Sodani and G. S. Sohi. Dynamic Instruction Reuse. ISCA 1997.

    Cited By

    View all
    • (2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
    • (2022)ValueExpert: exploring value patterns in GPU-accelerated applicationsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507708(171-185)Online publication date: 28-Feb-2022
    • (2020)Approximate Cache in GPGPUsACM Transactions on Embedded Computing Systems10.1145/340790419:5(1-22)Online publication date: 26-Sep-2020
    • Show More Cited By

    Index Terms

    1. Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
      June 2013
      512 pages
      ISBN:9781450321303
      DOI:10.1145/2464996
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 June 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPGPU
      2. redundancy

      Qualifiers

      • Research-article

      Conference

      ICS'13
      Sponsor:
      ICS'13: International Conference on Supercomputing
      June 10 - 14, 2013
      Oregon, Eugene, USA

      Acceptance Rates

      ICS '13 Paper Acceptance Rate 43 of 202 submissions, 21%;
      Overall Acceptance Rate 629 of 2,180 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)1

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
      • (2022)ValueExpert: exploring value patterns in GPU-accelerated applicationsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507708(171-185)Online publication date: 28-Feb-2022
      • (2020)Approximate Cache in GPGPUsACM Transactions on Embedded Computing Systems10.1145/340790419:5(1-22)Online publication date: 26-Sep-2020
      • (2020)GVPROF: A Value Profiler for GPU-Based ClustersSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00093(1-16)Online publication date: Nov-2020
      • (2020)Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00065(725-737)Online publication date: Oct-2020
      • (2020)DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register FilesIEEE Access10.1109/ACCESS.2020.30258998(173276-173288)Online publication date: 2020
      • (2019)An Aging-Aware GPU Register File Design Based on Data RedundancyIEEE Transactions on Computers10.1109/TC.2018.284937668:1(4-20)Online publication date: 1-Jan-2019
      • (2018)An efficient control flow validation method using redundant computing capacity of dual-processor architecturePLOS ONE10.1371/journal.pone.020112713:8(e0201127)Online publication date: 1-Aug-2018
      • (2018)Efficiently Managing the Impact of Hardware Variability on GPUs’ Streaming ProcessorsACM Transactions on Design Automation of Electronic Systems10.1145/328730824:1(1-15)Online publication date: 21-Dec-2018
      • (2018)Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.285704337:11(2779-2789)Online publication date: Nov-2018
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media