Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2966986.2980080guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Dynamic reliability management for near-threshold dark silicon processors

Published: 07 November 2016 Publication History

Abstract

In this article, we propose a new dynamic reliability management (DRM) techniques at the system level for emerging low power dark silicon manycore microprocessors operating in near-threshold region. We mainly consider the electromigration (EM) failures. To leverage the EM recovery effects, which was ignored in the past, at the system-level, we propose a new equivalent DC current model to consider recovery effects for general time-varying current waveforms so that existing compact EM model can be applied. The new equivalent DC current is calculated in two steps: firstly, the equivalent square waveform is calculated so that peak and terminal stresses are matched, secondly, the parameterized equivalent DC current is derived in terms of the parameters of the periodic fitted square waveforms from the first step. The new recovery EM model can allow EM-induced lifetime to be better managed at the system level. The system level energy optimization problem considering EM lifetime subject to power and performance constraints is framed by seeking the best dark silicon cores' voltage and on/off status. The resulting problem is solved by the State-Action-Reward-State-Action (SARSA) reinforcement learning algorithm. Experimental results on a 64-core near-threshold dark silicon processor show that the new equivalent EM DC currents can fully exhibit the recovery effects at the system-level so that trade-off between EM lifetime and energy/performance can be easily made. We further show that the proposed learning-based energy optimization can effectively manage and optimize energy subject to reliability, given power budget and performance limits. When the recovery effects are considered, the new optimization method can achieve 8.6× longer lifetime at the costs of 2.0× more energy and 3.3× more performance degradation.

6. References

[1]
R. Dennard, F. Gaensslen, H. Yu, V. Rideout, E. Bassous, and A. LeBlanc, “Design of ion-implanted mosfet's with very small physical dimensions”, IEEE Journal of Solid-State Circuits, vol. 9, pp. 256–268, October 1974.
[2]
H. Esmaeilzadeh, E. Blem, R. St Amant, K. Sankaralingam, and D. Burger, “Dark silicon and the end of multicore scaling”, in Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, (New York, NY, USA), pp. 365–376, ACM, 2011.
[3]
R. M. Swanson and J. D. Meindl, “Ion-implanted complementary mos transistors in low-voltage circuits”, IEEE Journal of Solid-State Circuits, vol. 7, pp. 146–153, Apr 1972.
[4]
T. Kim, X. Huang, V. S. H.-B. Chen, and S.X.-D. Tan, “Learning-based dynamic reliability management for dark silicon processor considering EM effects”, in Proc. Design, Automation and Test In Europe. (DATE), Mar. 2016.
[5]
M. Shafique, S. Garg, T. Mitra, S. Parameswaran, and J. Henkel, “Dark silicon as a challenge for hardware/software co-design: Invited special session paper”, in Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis, CODES '14, (New York, NY, USA), pp. 13:1–13:10, ACM, 2014.
[6]
J. R. Black, “Electromigration-A Brief Survey and Some Recent Results”, IEEE Trans. on Electron Devices, vol. 16, no. 4, pp. 338–347, 1969.
[7]
I. A. Blech, “Electromigration in Thin Aluminum Films on Titanium Nitride”, Journal of Applied Physics, vol. 47, no. 4, pp. 1203–1208, 1976.
[8]
X. Huang, T. Yu, V. Sukharev, and S.X.-D. Tan, “Physics-based electromigration assessment for power grid networks”, in Proc. Design Automation Conf. (DAC), June 2014.
[9]
V. Sukharev, “Beyond Black's Equation: Full-Chip EM/SM Assessment in 3D IC Stack”, Microelectronic Engineering, vol. 120, pp. 99–105, 2014.
[10]
V. Sukharev, A. Kteyan, E. Zschech, and W. D. Nix, “Microstructure Effect on EM-Induced Degradations in Dual Inlaid Copper Interconnects”, IEEE Transactions on Device and Materials Reliability, vol. 9, no. 1, pp. 87–97, 2009.
[11]
V. Sukharev, X. Huang, and S.X.-D. Tan, “Electromigration Induced Stress Evolution Under Alternate Current and Pulse Current Loads”, Journal of Applied Physics, vol. 118, pp. 034504–1-034504–10, 2015.
[12]
X. Huang, V. Sukharev, T. Kim, and S.X.-D. Tan, “Electromigration recovery modeling and analysis under time-depdendent current and temperature stressing”, in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), pp. 244–249, 2016.
[13]
K.-D. Lee, “Electromigration Recovery and Short Lead Effect under Bipolar- and Unipolar-Pulse Current”, in IEEE InternationalReliability Physics Symposium (IRPS), pp. 6B.3.1–6B.3.4, April 2012.
[14]
R. H. Myers and D. C. Montgomery, Response Surface Methodology: Process and Product Optimization Using Designed Experiments. Wiley-Interscience 2002.
[15]
Z. Lu, W. Huang, J. Lach, M. Stan, and K. Skadron, “Interconnect lifetime prediction under dynamic stress for reliability-aware design”, in Proc. Int. Conf. on ComputerAided Design (ICCAD), pp. 327–334, IEEE, November 2004.
[16]
A. Das, A. Kumar, and B. Veeravalli, “Reliability-driven task mapping for lifetime extension of networks-on-chip based multiprocessor systems”, in Proceedings of the Conference on Design, Automation and Test in Europe, DATE '13, (San Jose, CA, USA), pp. 689–694, EDA Consortium, 2013.
[17]
R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, “Near-threshold computing: Reclaiming moore's law through energy efficient integrated circuits”, Proceedings of the IEEE, vol. 98, pp. 253–266, Feb 2010.
[18]
C. Silvano, G. Palermo, S. Xydis, and I. Stamelakos, “Voltage island management in near threshold manycore architectures to mitigate dark silicon”, in 2014 Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1–6, March 2014.
[19]
T. E. Carlson, W. Heirman, and L. Eeckhout, “Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulations”, in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 52:1–52:12, Nov. 2011.
[20]
W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. R. Stan, “HotSpot: A compact thermal modeling methodology for early-stage VLSI design”, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 14. pp. 501–513. May 2006.
[21]
S. Li, J. H. Ahn, R. D. Strong, B. J. B D. M. Tullsen, and N. P. Jouppi, “Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures”, in Proceedings of the 42nd Annual IEEE/ACM InternationalSymposium on Microarchitecture, pp. 469–480, ACM, 2009.
[22]
G. A. Rummery and M. Niranjan, On-Line Q-learning using connectionist systems. University of Cambridge, Department of Engineering, 1994.
[23]
K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, “Temperature-aware microarchitecture”, in International Symposium on Computer Architecture, pp. 2–13, 2003.
[24]
H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar, “Near-threshold voltage (ntv) design: Opportunities and challenges”, in Proceedings of the 49th Annual Design Automation Conference, DAC '12, (New York, NY, USA), pp. 1153–1158, ACM, 2012.

Cited By

View all
  • (2022)Learning-Oriented QoS- and Drop-Aware Task Scheduling for Mixed-Criticality SystemsComputers10.3390/computers1107010111:7(101)Online publication date: 22-Jun-2022
  • (2022)A High-Level Synthesis Methodology for Energy and Reliability-Oriented DesignsIEEE Transactions on Computers10.1109/TC.2020.304388571:1(161-174)Online publication date: 1-Jan-2022
  • (2022)Rapid Design-Space Exploration for Low-Power Manycores Under Process Variation Utilizing Machine LearningIEEE Access10.1109/ACCESS.2022.318714010(70187-70203)Online publication date: 2022
  • Show More Cited By

Index Terms

  1. Dynamic reliability management for near-threshold dark silicon processors
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
          Nov 2016
          946 pages

          Publisher

          IEEE Press

          Publication History

          Published: 07 November 2016

          Permissions

          Request permissions for this article.

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 23 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2022)Learning-Oriented QoS- and Drop-Aware Task Scheduling for Mixed-Criticality SystemsComputers10.3390/computers1107010111:7(101)Online publication date: 22-Jun-2022
          • (2022)A High-Level Synthesis Methodology for Energy and Reliability-Oriented DesignsIEEE Transactions on Computers10.1109/TC.2020.304388571:1(161-174)Online publication date: 1-Jan-2022
          • (2022)Rapid Design-Space Exploration for Low-Power Manycores Under Process Variation Utilizing Machine LearningIEEE Access10.1109/ACCESS.2022.318714010(70187-70203)Online publication date: 2022
          • (2018)Recovery-Aware Proactive TSV Repair for Electromigration Lifetime Enhancement in 3-D ICsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.277558626:3(531-543)Online publication date: 1-Mar-2018
          • (2018)Recent advances in EM and BTI induced reliability modeling, analysis and optimization (invited)Integration, the VLSI Journal10.1016/j.vlsi.2017.08.00960:C(132-152)Online publication date: 1-Jan-2018
          • (2017)Recovery-aware proactive TSV repair for electromigration in 3D ICsProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130429(220-225)Online publication date: 27-Mar-2017
          • (2017)Recovery-aware proactive TSV repair for electromigration in 3D ICsDesign, Automation & Test in Europe Conference & Exhibition (DATE), 201710.23919/DATE.2017.7926986(220-225)Online publication date: Mar-2017

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media