Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2648668.2648673acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article

Compiler assisted dynamic register file in GPGPU

Published: 04 September 2013 Publication History

Abstract

The large Register File (RF) in General Purpose Graphic Processing Units (GPGPUs) demands tremendous chip area and energy consumption. For a sustainable growth of the size of RF in future GPGPUs, emerging on-chip memory technologies such as embedded-DRAM (eDRAM) have been proposed to replace the conventional SRAM for higher density and lower leakage but with the possible penalty from the periodic refresh operations. This paper explicitly shows that the refresh penalty can be effectively mitigated by leveraging the uniqueness of GPGPU operations. A compiler assisted refresh rescheduling policy can greatly reduce the refresh overhead for maintaining the correctness of the RF operations. The proposed scheme adequately exploits the features in both architecture and compilation, and delivers comparable performance to the SRAM counterpart. At the same time, the energy savings via the removal of large SRAM leakage well compensate for the additional refresh energy. This study promotes the eDRAM-based RF as a promising alternative that enables larger capacity and better power efficiency for future GPGPUs.

References

[1]
NVIDIA, http://www.geforce.com/hardware/desktop-gpus.
[2]
NVIDIA Whitepaper, "Nvidia's next generation cuda compute architecture: Fermi."
[3]
M.-T. Chang, P. Rosenfeld, S.-L. Lu, and B. Jacob, "Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM," in Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2013.
[4]
A. Valero, J. Sahuquillo, S. Petit, V. Lorente, R. Canal, P. López, and J. Duato, "An hybrid eDRAM/SRAM macrocell to implement first-level data caches," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009, pp. 213--221.
[5]
W. Luk, J. Cai, R. Dennard, M. Immediato, and S. Kosonocky, "A 3-transistor DRAM cell with gated diode for enhanced speed and retention time," in Symposium on VLSI Circuits, 2006, pp. 184--185.
[6]
X. Liang, R. Canal, G.-Y. Wei, and D. Brooks, "Process variation tolerant 3T1D-based cache architectures," in Proceedings of the 40th International Symposium on Microarchitecture, 2007, pp. 15--26.
[7]
X. Liang, R. Canal, G.-Y. Wei, and D. Brooks, "Process variation tolerant register files based on dynamic memories," in Workshop on Architectural Support for Gigascale Integration (ASGI-07) in conjunction with ISCA, 2007.
[8]
M. Ghosh and H.-H. S. Lee, "Smart refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs," in Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007, pp. 134--145.
[9]
J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, "RAIDR: Retention-aware intelligent DRAM refresh," in Proceedings of the 39th International Symposium on Computer Architecture, 2012, pp. 1--12.
[10]
J. Stuecheli, D. Kaseridis, H. C. Hunter, and L. K. John, "Elastic refresh: Techniques to mitigate refresh penalties in high density memory," in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010, pp. 375--384.
[11]
M. Alizadeh, A. Javanmard, S.-T. Chuang, S. Iyer, and Y. Lu, "Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM," in Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, 2012, pp. 247--258.
[12]
W.-k. S. Yu, R. Huang, S. Q. Xu, S.-E. Wang, E. Kan, and G. E. Suh, "SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading," in Proceedings of the 38th annual international symposium on Computer architecture, 2011, pp. 247--258.
[13]
B. C. Nilanjan Goswami and T. Li, "Power-performance co-optimization of throughput core architecture using resistive memory," in Proceedings of the IEEE International Symposium on High Performance Computer Architecture(HPCA), 2013.
[14]
N. Jing, Y. Shen, Y. Lu, S. Ganapathy, Z. Mao, M. Guo, R. Canal, and X. Liang, "An energy-efficient and scalable eDRAM-based register file architecture for GPGPU," in Proceedings of the 40th annual international symposium on Computer architecture, 2013.
[15]
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," in International Symposium on Performance Analysis of Systems and Software, 2009, pp. 163--174.
[16]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), 2009, pp. 44--54.

Cited By

View all
  • (2018)LTRFACM SIGPLAN Notices10.1145/3296957.317321153:2(489-502)Online publication date: 19-Mar-2018
  • (2018)Software-Directed Techniques for Improved GPU Register File UtilizationACM Transactions on Architecture and Code Optimization10.1145/324390515:3(1-23)Online publication date: 24-Sep-2018
  • (2018)LTRFProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173211(489-502)Online publication date: 19-Mar-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISLPED '13: Proceedings of the 2013 International Symposium on Low Power Electronics and Design
September 2013
440 pages
ISBN:9781479912353

Sponsors

Publisher

IEEE Press

Publication History

Published: 04 September 2013

Check for updates

Author Tags

  1. GPGPU
  2. RF
  3. compiler
  4. eDRAM
  5. refresh

Qualifiers

  • Research-article

Conference

ISLPED'13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)LTRFACM SIGPLAN Notices10.1145/3296957.317321153:2(489-502)Online publication date: 19-Mar-2018
  • (2018)Software-Directed Techniques for Improved GPU Register File UtilizationACM Transactions on Architecture and Code Optimization10.1145/324390515:3(1-23)Online publication date: 24-Sep-2018
  • (2018)LTRFProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173211(489-502)Online publication date: 19-Mar-2018
  • (2017)ReglessProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123974(151-164)Online publication date: 14-Oct-2017
  • (2016)Cache-emulated register fileThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195655(1-12)Online publication date: 15-Oct-2016
  • (2016)Architecture supported register stash for GPGPUJournal of Parallel and Distributed Computing10.1016/j.jpdc.2015.12.00389:C(25-36)Online publication date: 1-Mar-2016
  • (2015)A STT-RAM-based low-power hybrid register file for GPGPUsProceedings of the 52nd Annual Design Automation Conference10.1145/2744769.2744785(1-6)Online publication date: 7-Jun-2015
  • (2014)eDRAM-based tiered-reliability memory with applications to low-power frame buffersProceedings of the 2014 international symposium on Low power electronics and design10.1145/2627369.2627626(333-338)Online publication date: 11-Aug-2014

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media