Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2540708.2540719acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Warped gates: gating aware scheduling and power gating for GPGPUs

Published: 07 December 2013 Publication History

Abstract

With the widespread adoption of GPGPUs in varied application domains, new opportunities open up to improve GPGPU energy efficiency. Due to inherent application-level inefficiencies, GPGPU execution units experience significant idle time. In this work we propose to power gate idle execution units to eliminate leakage power, which is becoming a significant concern with technology scaling. We show that GPGPU execution units are idle for short windows of time and conventional microprocessor power gating techniques cannot fully exploit these idle windows efficiently due to power gating overhead. Current warp schedulers greedily intersperse integer and floating point instructions, which limit power gating opportunities for any given execution unit type. In order to improve power gating opportunities in GPGPU execution units, we propose a Gating Aware Two-level warp scheduler (GATES) that issues clusters of instructions of the same type before switching to another instruction type. We also propose a new power gating scheme, called Blackout, that forces a power gated execution unit to sleep for at least the break-even time necessary to overcome the power gating overhead before returning to the active state. The combination of GATES and Blackout, which we call Warped Gates, can save 31.6% and 46.5% of integer and floating point unit static energy. The proposed solutions suffer less than 1% performance and area overhead.

References

[1]
The freepdk process design kit. http://www.eda.ncsu.edu/wiki/FreePDK.
[2]
Parboil benchmark suite. http://impact.crhc.illinois.edu/parboil.php.
[3]
Amd graphics cores next (gcn) architecture. Technical report, AMD, 06 2012.
[4]
NvidiaâĂŹs next generation cuda compute architecture: Kepler tm gk110. Technical report, Nvidia, 2012.
[5]
M. Abdel-Majeed and M. Annavaram. Warped register file: A power efficient register file for gpgpus. In Proceedings of the 2013 International Symposium on High Performance Computer Architecture (HPCA), HPCA '13, 2013.
[6]
A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In IEEE International Symposium on Performance Analysis of Systems and Software, April 2009.
[7]
D. Bautista, J. Sahuquillo, H. Hassan, S. Petit, and J. Duato. A simple power-aware scheduling for multicore systems when running real-time applications. In IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008., pages 1--7, 2008.
[8]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC '09.
[9]
L. Chen and T. Pinkston. Nord: Node-router decoupling for effective power-gating of on-chip routers. In 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) 2012, pages 270--281, 2012.
[10]
S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman. Managing static leakage energy in microprocessor functional units. In Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, MICRO 35, 2002.
[11]
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: simple techniques for reducing leakage power. In Proceedings of the 29th annual international symposium on Computer architecture, ISCA '02, 2002.
[12]
M. Gebhart, D. Johnson, D. Tarjan, S. Keckler, W. Dally, E. Lindholm, and K. Skadron. Energy-efficient mechanisms for managing thread context in throughput processors. In Proceedings of the 38th Annual International Symposium on Computer Architecture, pages 235--246, 2011.
[13]
Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose. Microarchitectural techniques for power gating of execution units. In Low Power Electronics and Design, 2004. ISLPED '04. Proceedings of the 2004 International Symposium on, pages 32--37, 2004.
[14]
H. Jeon and M. Annavaram. Warped-dmr: Light-weight error detection for gpgpu. In Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, 2012.
[15]
A. Jog, O. Kayiran, A. Mishra, M. Kandemir, O. Mutlu, R. Iyer, and C. Das. Orchestrated scheduling and prefetching for gpgpus. In Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013.
[16]
J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi1. Gpuwattch: Enabling energy optimizations in gpgpus. In Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013.
[17]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, 2009.
[18]
A. Lungu, P. Bose, A. Buyuktosunoglu, and D. J. Sorin. Dynamic power gating with quality guarantees. In Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design, ISLPED '09, 2009.
[19]
V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt. Improving gpu performance via large warps and two-level warp scheduling. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, 2011.
[20]
T. G. Rogers, M. O'Connor, and T. M. Aamodt. Cache-conscious wavefront scheduling. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '12, 2012.
[21]
C. Scordino and G. Lipari. Using resource reservation techniques for power-aware scheduling. In Proceedings of the 4th ACM international conference on Embedded software, EMSOFT '04, pages 16--25, 2004.
[22]
P.-H. Wang, C.-L. Yang, Y.-M. Chen, and Y.-J. Cheng. Power gating strategies on gpus. ACM Trans. Archit. Code Optim.
[23]
W. Yu, R. Huang, S. Xu, S.-E. Wang, E. Kan, and G. Suh. Sram-dram hybrid memory with applications to efficient register files in fine-grained multi-threading. In 38th Annual International Symposium on Computer Architecture, pages 247--258, 2011.
[24]
J. Zhao and Y. Xie. Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration. In Computer-Aided Design (ICCAD), 2012 IEEE/ACM International Conference on, 2012.

Cited By

View all
  • (2024)Cross-Core Data Sharing for Energy-Efficient GPUsACM Transactions on Architecture and Code Optimization10.1145/3653019Online publication date: 18-Mar-2024
  • (2024)PC-oriented Prediction-based Runtime Power Management for GPGPU using Knowledge TransferProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659981(359-370)Online publication date: 17-Jun-2024
  • (2024)Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00075(978-990)Online publication date: 29-Jun-2024
  • Show More Cited By

Index Terms

  1. Warped gates: gating aware scheduling and power gating for GPGPUs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
    December 2013
    498 pages
    ISBN:9781450326384
    DOI:10.1145/2540708
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 December 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPGPUs
    2. power gating
    3. warp scheduling

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MICRO-46
    Sponsor:

    Acceptance Rates

    MICRO-46 Paper Acceptance Rate 39 of 239 submissions, 16%;
    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cross-Core Data Sharing for Energy-Efficient GPUsACM Transactions on Architecture and Code Optimization10.1145/3653019Online publication date: 18-Mar-2024
    • (2024)PC-oriented Prediction-based Runtime Power Management for GPGPU using Knowledge TransferProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659981(359-370)Online publication date: 17-Jun-2024
    • (2024)Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00075(978-990)Online publication date: 29-Jun-2024
    • (2023)Re-Cache: Mitigating cache contention by exploiting locality characteristics with reconfigurable memory hierarchy for GPGPUsMicroelectronics Journal10.1016/j.mejo.2023.105825138(105825)Online publication date: Aug-2023
    • (2023)PTTS: Power-aware tensor cores using two-sided sparsityJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.11.004173(70-82)Online publication date: Mar-2023
    • (2021)MAPAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3480853(1-14)Online publication date: 14-Nov-2021
    • (2021)Reducing Energy in GPGPUs through Approximate Trivial BypassingACM Transactions on Embedded Computing Systems10.1145/342944020:2(1-27)Online publication date: 4-Jan-2021
    • (2021)Sparsity-aware Power Gating for Tensor Cores2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD53543.2021.00021(94-103)Online publication date: Oct-2021
    • (2021)Balancing Energy Efficiency and Real-Time Performance in GPU Scheduling2021 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS52674.2021.00021(110-122)Online publication date: Dec-2021
    • (2021)BlockMaestroProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00034(333-346)Online publication date: 14-Jun-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media