Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Aging-Aware Compilation for GP-GPUs

Published: 08 July 2015 Publication History

Abstract

General-purpose graphic processing units (GP-GPUs) offer high computational throughput using thousands of integrated processing elements (PEs). These PEs are stressed during workload execution, and negative bias temperature instability (NBTI) adversely affects their reliability by introducing new delay-induced faults. However, the effect of these delay variations is not uniformly spread across the PEs: some are affected more—hence less reliable—than others. This variation causes significant reduction in the lifetime of GP-GPU parts. In this article, we address the problem of “wear leveling” across processing units to mitigate lifetime uncertainty in GP-GPUs. We propose innovations in the static compiled code that can improve healing in PEs and stream cores (SCs) based on their degradation status. PE healing is a fine-grained very long instruction word (VLIW) slot assignment scheme that balances the stress of instructions across the PEs within an SC. SC healing is a coarse-grained workload allocation scheme that distributes workload across SCs in GP-GPUs. Both schemes share a common property: they adaptively shift workload from less reliable units to more reliable units, either spatially or temporally. These software schemes are based on online calibration with NBTI monitoring that equalizes the expected lifetime of PEs and SCs by regenerating adaptive compiled codes to respond to the specific health state of the GP-GPUs. We evaluate the effectiveness of the proposed schemes for various OpenCL kernels from the AMD APP SDK on Evergreen and Southern Island GPU architectures. The aging-aware healthy kernels generated by the PE (or SC) healing scheme reduce NBTI-induced voltage threshold shift by 30% (77% in the case of SCs), with no (moderate) performance penalty compared to the naive kernels.

References

[1]
P. Aguilera, J. Lee, A. Farmahini-Farahani, K. Morrow, M. Schulte, and N. S. Kim. 2014. Process variation-aware workload partitioning algorithms for GPUs Supporting Spatial-Multitasking. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’14). http://dl.acm.org/citation.cfm?id=2616606.2616823
[2]
F. Ahmed, M. M. Sabry, D. Atienza, and L. Milor. 2012. Wearout-aware compiler-directed register assignment for embedded systems. In Proceedings of the 2012 13th International Symposium on Quality Electronic Design (ISQED’12). 33--40.
[3]
AMD. 2013. AMD Accelerated Parallel Processing OpenCL Programming Guide. Retrieved June 10, 2015, from http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf.
[4]
AMD APP SDK. 2013. AMD APP SDK v2.9. Available at http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/
[5]
L. Bautista Gomez, F. Cappello, L. Carro, N. Debardeleben, B. Fang, S. Gurumurthi, K. Pattabiraman, P. Rech, and M. Sonza Reorda. 2014. GPGPUs: How to combine high computational power with high reliability. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’14). 1--9.
[6]
K. Bernstein, D. J. Frank, A. E. Gattiker, W. Haensch, B. L. Ji, S. R. Nassif, E. J. Nowak, D. J. Pearson, and N. J. Rohrer. 2006. High-performance CMOS variability in the 65-nm regime and beyond. IBM Journal of Research and Development 50, 4.5, 433--449.
[7]
S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and S. Vrudhula. 2006. Predictive modeling of the NBTI effect for reliable design. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC’06). 189--192.
[8]
S. Chakravarthi, A. T. Krishnan, V. Reddy, C. F. Machala, and S. Krishnan. 2004. A comprehensive framework for predictive modeling of negative bias temperature instability. In Proceedings of the IEEE 42nd AnnualInternational Reliability Physics Symposium. 273--282.
[9]
T. B Chan, J. Sartori, P. Gupta, and R. Kumar. 2011. On the efficacy of NBTI mitigation techniques. In Proceedings of the Design, Automation, and Test in Europe Conference Exhibition (DATE’11). 1--6.
[10]
G. Chen, K. Y. Chuah, M.-F. Li, D. S. H. Chan, C. H. Ang, J. Z. Zheng, Y. Jin, and D. L. Kwong. 2003. Dynamic NBTI of PMOS transistors and its impact on device lifetime. In Proceedings of the IEEE 41st AnnualInternational Reliability Physics Symposium. 196--202.
[11]
G. Chen, M.-F. Li, C. H. Ang, J. Z. Zheng, and D.-L. Kwong. 2002. Dynamic NBTI of p-MOS transistors and its impact on MOSFET scaling. IEEE Electron Device Letters 23, 12, 734--736.
[12]
X. Chen, Y. Wang, Y. Liang, Y. Xie, and H. Yang. 2014. Run-time technique for simultaneous aging and power optimization in GPGPUs. In Proceedings of the 51st Annual Design Automation Conference (DAC’14). ACM, New York, NY, Article No. 168.
[13]
S. Dighe, S. R. Vangal, P. Aseron, S. Kumar, T. Jacob, K. A. Bowman, J. Howard, J. Tschanz, V. Erraguntla, N. Borkar, V. K. De, and S. Borkar. 2011. Within-die variation-aware dynamic-voltage-frequency-scaling with optimal core allocation and thread hopping for the 80-core teraflops processor. IEEE Journal of Solid-State Circuits 46, 1, 184--193.
[14]
W. Dweik, M. Abdel-Majeed, and M. Annavaram. 2014. Warped-shield: Tolerating hard faults in GPGPUs. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’14). 431--442.
[15]
F. Firouzi, S. Kiamehr, and M. B. Tahoori. 2012. NBTI mitigation by optimized NOP assignment and insertion. In Proceedings of the Design, Automation, and Test in Europe Conference Exhibition (DATE’12). 218--223.
[16]
E. Gunadi, A. A. Sinkar, N. S. Kim, and M. H. Lipasti. 2010. Combating aging with the Colt duty cycle equalizer. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-43). 103--114.
[17]
P. Gupta, Y. Agarwal, L. Dolecek, N. Dutt, R. K. Gupta, R. Kumar, S. Mitra, A. Nicolau, T. S. Rosing, M. B. Srivastava, S. Swanson, and D. Sylvester. 2013. Underdesigned and opportunistic computing in presence of hardware variability. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 1, 8--23.
[18]
U. R. Karpuzcu, B. Greskamp, and J. Torrellas. 2009. The bubblewrap many-core: Popping cores for sequential acceleration. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). 447--458.
[19]
S. V. Kumar, C. H. Kim, and S. S. Sapatnekar. 2006. An analytical model for negative bias temperature instability. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’06). 493--496.
[20]
J. Lee, P. P. Ajgaonkar, and N. S. Kim. 2011. Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation. In Proceedings of the IEEE International Symposium on Performance Analysis and Systems Software (ISPASS’11). 237--246.
[21]
F. Oboril and M.B. Tahoori. 2012. ExtraTime: Modeling and analysis of wearout due to transistor aging at microarchitecture-level. In Proceedings of the 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’12). 1--12.
[22]
S. Ogawa and N. Shiono. 1995. Generalized diffusion-reaction model for the low-field charge-buildup instability at the Si-SiO2 interface. Physical Review 51, 7, 4218--4230.
[23]
OpenCL. 2009. OpenCL Programming Guide for the CUDA Architecture. Retrieved June 10, 2015, from http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingGuide.pdf.
[24]
F. Paterna, L. Benini, A. Acquaviva, F. Papariello, A. Acquaviva, and M. Olivieri. 2009. Adaptive idleness distribution for non-uniform aging tolerance in multiprocessor systems-on-chip. In Proceedings of the Design, Automation, and Test in Europe Conference Exhibition (DATE’09). 906--909.
[25]
A. Rahimi, L. Benini, and R. K. Gupta. 2013a. Aging-aware compiler-directed VLIW assignment for GPGPU architectures. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, Article No. 16.
[26]
A. Rahimi, L. Benini, and R. K. Gupta. 2013b. Hierarchically focused guardbanding: An adaptive approach to mitigate PVT variations and aging. In Proceedings of the Design, Automation, and Test in Europe Conference Exhibition (DATE’13). 1695--1700.
[27]
P. Singh, E. Karl, D. Sylvester, and D. Blaauw. 2011. Dynamic NBTI management using a 45 nm multi-degradation sensor. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 9, 2026--2037.
[28]
J. Sun, R. Lysecky, K. Shankar, A. Kodi, A. Louri, and J. M. Wang. 2010. Workload capacity considering NBTI degradation in multi-core systems. In Proceedings of the 15th Asia and South Pacific Design Automation Conference (ASP-DAC’10). 450--455.
[29]
J. Sun, R. Lysecky, K. Shankar, A. Kodi, A. Louri, and J. Roveda. 2014. Workload assignment considering NBTI degradationin multicore systems. ACM Journal on Emerging Technologies in Computing Systems 10, 1, Article No. 14.
[30]
A. Tiwari and J. Torrellas. 2008. Facelift: Hiding and slowing down aging in multicores. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO-41). 129--140.
[31]
R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. 335--344.
[32]
S. Wang, T. Jin, C. Zheng, and G. Duan. 2012. Low power aging-aware register file design by duty cycle balancing. In Proceedings of the Design, Automation, and Test in Europe Conference Exhibition (DATE’12). 546--549.
[33]
W. Wang, S. Yang, S. Bhardwaj, S. Vrudhula, F. Liu, and Y. Cao. 2010. The impact of NBTI effect on combinational circuit: Modeling, simulation, and analysis. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18, 2, 173--183.

Cited By

View all
  • (2021)Mitigating the processor aging through dynamic concurrency throttlingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.006156(86-100)Online publication date: Oct-2021
  • (2020)Kernel-Based Resource Allocation for Improving GPU Throughput While Minimizing the Activity Divergence of SMsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2019.293324567:2(428-440)Online publication date: Feb-2020
  • (2019)An Aging-Aware GPU Register File Design Based on Data RedundancyIEEE Transactions on Computers10.1109/TC.2018.284937668:1(4-20)Online publication date: 1-Jan-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 12, Issue 2
July 2015
410 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2775085
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2015
Accepted: 01 May 2015
Revised: 01 March 2015
Received: 01 January 2015
Published in TACO Volume 12, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GP-GPUs
  2. NBTI
  3. VLIW
  4. adaptive kernel
  5. aging-aware compilation

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • FP7 P-SOCRATES
  • NSF Variability Expeditions
  • ERC-AdG MultiTherman

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)7
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Mitigating the processor aging through dynamic concurrency throttlingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.05.006156(86-100)Online publication date: Oct-2021
  • (2020)Kernel-Based Resource Allocation for Improving GPU Throughput While Minimizing the Activity Divergence of SMsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2019.293324567:2(428-440)Online publication date: Feb-2020
  • (2019)An Aging-Aware GPU Register File Design Based on Data RedundancyIEEE Transactions on Computers10.1109/TC.2018.284937668:1(4-20)Online publication date: 1-Jan-2019
  • (2019)Transparent Aging-Aware Thread Throttling2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2019.00014(1-8)Online publication date: Oct-2019
  • (2018)Aging-Aware Workload Management on Embedded GPU Under Process VariationIEEE Transactions on Computers10.1109/TC.2018.278990467:7(920-933)Online publication date: 1-Jul-2018
  • (2017)Low-overhead Aging-aware Resource Management on Embedded GPUsProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062277(1-6)Online publication date: 18-Jun-2017
  • (2017)Exploiting Data Compression to Mitigate Aging in GPU Register Files2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2017.15(57-64)Online publication date: Oct-2017
  • (2016)Low Power Aging-Aware On-Chip Memory Structure Design by Duty Cycle BalancingJournal of Circuits, Systems and Computers10.1142/S021812661650115225:09(1650115)Online publication date: Sep-2016

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media