Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

A Write-Aware STTRAM-Based Register File Architecture for GPGPU

Published: 03 August 2015 Publication History

Abstract

The massively parallel processing capacity of GPGPUs requires a large register file (RF), and its size keeps increasing to support more concurrent threads from generation to generation. Using traditional SRAM-based RFs, there are concerns in both area cost and energy consumption, and soon they will become unrealistic. In this work, we analyze the feasibility of using STTRAM-based RF designs, which have benefits in terms of smaller silicon area and zero standby leakage power. However, STTRAM long write latency and high write energy bring new challenges. Therefore, we propose a write-aware STTRAM-based RF architecture (WarRF), which contains two techniques: Split Bank Write modifies the arbitrator design to increase the parallelism of read and write accesses in the same bank; Write Pool reduces the number of repeated write accesses to RFs. Our experiment shows that the performance of STTRAM-based RF is improved by 13% and up to 23% after adopting WarRF. In addition, the energy consumption is reduced by 38% on average compared to SRAM-based RFs.

References

[1]
Mohammad Abdel-Majeed and Murali Annavaram. 2013. Warped register file: A power efficient register file for GPGPUs. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture. 412--423.
[2]
A. Bakhoda, G. L. Yuan, W. W. L. Fung, and others. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.
[3]
Shuai Che, M. Boyer, Jiayuan Meng, and others. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization.
[4]
Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, and others. 2008. Circuit and microarchitecture evaluation of 3d stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the Design Automation Conference. 554--559.
[5]
Xiangyu Dong, Cong Xu, Yuan Xie, and others. 2012. NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 31, 0.
[6]
Mark Gebhart, Daniel R. Johnson, David Tarjan, and others. 2011a. Energy-efficient mechanisms for managing thread context in throughput processors. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). 235--246.
[7]
Mark Gebhart, Stephen W. Keckler, and William J. Dally. 2011b. A compile-time managed multi-level register file hierarchy. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 465--476.
[8]
R. Gonzalez and M. Horowitz. 1996. Energy dissipation in general purpose microprocessors. IEEE J. Solid-State Circuits 31, 9, 1277--1284.
[9]
Nilanjan Goswami, Bingyi Cao, and Tao Li. 2013. Power-performance co-optimization of throughput core architecture using resistive memory. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture. 342--353.
[10]
Naifeng Jing, Yao Shen, Yao Lu, et al. 2013. An energy-efficient and scalable eDRAM-based register file architecture for GPGPU. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 344--355.
[11]
T. Kawahara, R. Takemura, K. Miura, and others. 2008. 2Mb spin-transfer torque RAM (SPRAM) with bit-by-bit bidirectional current write and parallelizing-direction current read. IEEE J. Solid-State Circuits 43, 1, 109--120.
[12]
Samuel Liu, John Erik Lindholm, Ming Y. Siu, BrettWCoon, and Stuart F. Oberman. 2010. Operand collector architecture. US Patent 7,834,881.
[13]
N. Brookwood. 2010. AMD Fusion. Family of APUs: Enabling superior, immersive PC Experience. AMD White Paper.
[14]
Veynu Narasiman, Michael Shebanow, Chang Joo Lee, and others. 2011. Improving GPU performance via large warps and two-level warp scheduling. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 308--317.
[15]
NVIDIA. 2010. Geforce GTX 480. http://www.geforce.com/hardware/desktop-gpus.
[16]
NVIDIA. 2012. Geforce GTX 680. http://www.geforce.com/hardware/desktop-gpus.
[17]
NVIDIA Corporation. 2009. NVIDIA's Next Generation CUDA Compute Architecture: Fermi. (2009). Nvidia White Paper.
[18]
C. Smullen, V. Mohan, A. Nigam, and others. 2011. Relaxing Non-Volatility for Fast and Energy-Efficient STT-RAM Caches. In Proceedings of the International Symposium on High Performance Computer Architecture. 50--61.
[19]
Guangyu Sun, Xiangyu Dong, Yuan Xie, and others. 2009. A Novel 3D Stacked MRAM Cache Architecture for CMPs. In Proceedings of the International Symposium on High-Performance Computer Architecture. 239--249.
[20]
Zhenyu Sun, Xiuyuan Bi, Hai Li, and others. 2011. Multi Retention Level STT-RAM Cache Designs with a Dynamic Refresh Scheme. In Proceedings of the International Symposium on Microarchitecture. 329--338.
[21]
Shyamkumar Thoziyoor, Jung Ho Ahn, Matteo Monchiero, and others. 2008. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In Proceedings of the International Symposium on Computer Architecture. 51--62.
[22]
K. Tsuchida, T. Inaba, K. Fujita, et al. 2010. A 64Mb MRAM with clamped-reference and adequate-reference schemes. In Proceedings of the International Solid-State Circuits Conference. 258--259.
[23]
W. Xu, Hongbin Sun, Xiaobin Wang, et al. 2011. Design of Last-Level On-Chip Cache Using Spin-Torque Transfer RAM. IEEE Trans. VLSI Syst. 19, 3, 483--493.
[24]
W. S. Yu, Ruirui Huang, S. Q. Xu, et al. 2011. SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading. In Proceedings of the 38th Annual International Symposium on Computer Architecture. 247--258.

Cited By

View all
  • (2024)Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00075(978-990)Online publication date: 29-Jun-2024
  • (2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
  • (2021)Highly Concurrent Latency-tolerant Register Files for GPUsACM Transactions on Computer Systems10.1145/341997337:1-4(1-36)Online publication date: 4-Jan-2021
  • Show More Cited By

Index Terms

  1. A Write-Aware STTRAM-Based Register File Architecture for GPGPU

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 12, Issue 1
    July 2015
    210 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/2810396
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 03 August 2015
    Accepted: 01 October 2014
    Revised: 01 September 2014
    Received: 01 February 2014
    Published in JETC Volume 12, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPGPU
    2. Nonvolatile memory
    3. STTRAM
    4. register file

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00075(978-990)Online publication date: 29-Jun-2024
    • (2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
    • (2021)Highly Concurrent Latency-tolerant Register Files for GPUsACM Transactions on Computer Systems10.1145/341997337:1-4(1-36)Online publication date: 4-Jan-2021
    • (2021)Exploring Applications of STT-RAM in GPU ArchitecturesIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2020.303189568:1(238-249)Online publication date: Jan-2021
    • (2019)FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00055(426-439)Online publication date: Feb-2019
    • (2018)LTRFACM SIGPLAN Notices10.1145/3296957.317321153:2(489-502)Online publication date: 19-Mar-2018
    • (2018)LTRFProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173211(489-502)Online publication date: 19-Mar-2018
    • (2018)FineRegProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00037(364-376)Online publication date: 20-Oct-2018
    • (2017)State-Transition-Aware Spilling Heuristic for MLC STT-RAM-Based RegistersVLSI Design10.1155/2017/10302492017Online publication date: 1-Jan-2017
    • (2017)Pipeline Optimizations of Architecting STT-RAM as Registers in Rad-Hard Environment2017 IEEE Trustcom/BigDataSE/ICESS10.1109/Trustcom/BigDataSE/ICESS.2017.321(844-852)Online publication date: Aug-2017
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media