Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Implications of accelerated self-healing as a key design knob for cross-layer resilience

Published: 01 January 2017 Publication History

Abstract

In this paper we propose a cross-layer accelerated self-healing (CLASH) system which "repairs" its wearout issues in a physical sense through accelerated and active recovery, by which wearout can be reversed while actively applying several accelerated self-healing techniques, such as high temperature and negative voltages. Different from previous solutions of coping with wearout issues (e.g. BTI) by "tolerating", "slowing down" or "compensating", which still leave the irreversible (permanent) wearout component unchecked, the proposed solution is able to fully avoid the irreversible wearout through periodic rejuvenation, and this is inspired by the explored frequency dependent behaviors of wearout and (accelerated and active) recovery based on measurements on FPGAs. We demonstrate a case where the chip can always be brought back to the fresh status by employing a pattern of 31-h regular operation (under room temperature and nominal voltage) followed by a 1-h accelerated self-healing (under high temperature and negative voltage). The proposed system integrates the notions of accelerated self-healing across multiple layers of the system stack. At the circuit level, a negative voltage generator and heating elements are designed and implemented; at the architecture level, the core can be allocated in a way such that the dark silicon or redundant resources can be healed by active elements; at the system level, right balance of stress and accelerated/active recovery can be employed by the system scheduler to fully mitigate the wearout; various wearout sensors act as the media between different layers. Overall, these techniques work together to guarantee that the whole system performs for more of the time at higher levels of performance and power efficiency by fully taking advantage of the extra opportunities enabled by the accelerated self-healing. HighlightsWearout (e.g. BTI) can be reversed significantly by accelerated and active recovery (accelerated self-healing) techniques.Irreversible component of wearout can be fully avoided through the optimal active and sleep scheduling.A cross-layer accelerated self-healing system which integrates the accelerated and recovery techniques across the system stack is proposed.Accelerated self-healing should be introduced as a key design knob for cross-layer resilience.

References

[1]
K. Bernstein, D.J. Frank, A.E. Gattiker, W. Haensch, B.L. Ji, S.R. Nassif, E.J. Nowak, D.J. Pearson, N.J. Rohrer, High-performance cmos variability in the 65-nm regime and beyond, IBM J. Res. Dev., 50 (2006) 433-449.
[2]
J. Henkel, L. Bauer, N. Dutt, P. Gupta, S. Nassif, M. Shafique, M. Tahoori, N. Wehn, Reliable on-chip systems in the nano-era: lessons learnt and future trends, in: Proceedings of the 50th Annual Design Automation Conference, Austin, TX, USA, ACM, 2013, p. 99.
[3]
H. Hong, J. Lim, H. Lim, S. Kang, Lifetime reliability enhancement of microprocessors, ACM Comput. Surv., 48 (2015) 9.
[4]
S. Mitra, K. Brelsford, P.N. Sanda, Cross-layer resilience challenges: metrics and optimization, in: Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, IEEE, 2010, pp. 1029-1034.
[5]
W. Wang, S. Yang, S. Bhardwaj, R. Vattikonda, S. Vrudhula, F. Liu, Y. Cao, The impact of nbti on the performance of combinational and sequential circuits, in: Proceedings of the 44th annual Design Automation Conference, San Diego, CA, USA, ACM, 2007, pp. 364-369.
[6]
R. Wang, P. Ren, C. Liu, S. Guo, R. Huang, Understanding nbti-induced dynamic variability in the nano-reliability era: from devices to circuits, in:¿2015 IEEE Proceedings of the 22nd International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA),¿Taiwan, IEEE, 2015, pp. 119-121.
[7]
N.P. Carter, H. Naeimi, D.S. Gardner, Design techniques for cross-layer resilience, in: Proceedings of the Conference on Design, Automation and Test in Europe, European Design and Automation Association, Dresden, Germany, 2010, pp. 1023-1028.
[8]
S. Mitra, P. Bose, E. Cheng, C.-Y. Cher, H. Cho, R. Joshi, Y.M. Kim, C.R. Lefurgy, Y. Li, K.P. Rodbell, et al., The resilience wall: cross-layer solution strategies, in: Proceedings of Technical Program-2014 International Symposium on VLSI Technology, Systems and Application (VLSI-TSA),¿Hsinchu, Taiwan, IEEE, 2014, pp. 1-11.
[9]
K. Kang, S. Gangwal, S.P. Park, K. Roy, Nbti induced performance degradation in logic and memory circuits: How effectively can we approach a reliability solution? in: Proceedings of the 2008 Asia and South Pacific Design Automation Conference, Seoul, Korea, IEEE Computer Society Press, 2008, pp. 726-731.
[10]
X. Guo, M.R. Stan, Work hard, sleep well - avoid irreversible ic wearout with proactive rejuvenation, in: 2016 Proceedings of the 21st Asia and South Pacific Design Automation Conference (ASP-DAC), Macau, China, January 2016, pp. 649-654.
[11]
S. Sarma, N. Dutt, N. Venkatasubramanian, A. Nicolau, P. Gupta, Cyberphysical System-on-chip (cpsoc): Sensoractuator Rich Self-aware Computational Platform, Technical Report CECS TR-13-06, University of California Irvine, 2013.
[12]
S.V. Kumar, C.H. Kim, S.S. Sapatnekar, Adaptive techniques for overcoming performance degradation due to aging in digital circuits, in: Proceedings of the 2009 Asia and South Pacific Design Automation Conference, Yokohama, Japan, IEEE Press, 2009, pp. 284-289.
[13]
H. Mostafa, M. Anis, M. Elmasry, Nbti and process variations compensation circuits using adaptive body bias, IEEE Trans. Semicond. Manuf., 25 (2012) 460-467.
[14]
L. Zhang, R.P. Dick, Scheduled voltage scaling for increasing lifetime in the presence of nbti, in:¿Asia and South Pacific Design Automation Conference, Yokohama, Japan, ASP-DAC 2009, IEEE, 2009, pp. 492-497.
[15]
S. Gupta, S.S. Sapatnekar, Employing circadian rhythms to enhance power and reliability, ACM Trans. Des. Autom. Electron. Syst., 18 (2013) 38.
[16]
Y. Cao, J. Velamala, K. Sutaria, M.S.-W. Chen, J. Ahlbin, I. Sanchez Esqueda, M. Bajura, M. Fritze, Cross-layer modeling and simulation of circuit reliability, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., 33 (2014) 8-23.
[17]
T. Grasser, T. Aichinger, G. Pobegen, H. Reisinger, P.-J. Wagner, J. Franco, M. Nelhiebel, B. Kaczer, The 'permanent' component of nbti: composition and annealing, in: 2011 IEEE International Reliability Physics Symposium (IRPS),¿Monterey, CA, IEEE, 2011, pp. 6A-2.
[18]
S. Mahapatra, Fundamentals of Bias Temperature Instability in MOS Transistors: Characterization Methods, Process and Materials Impact, DC and AC Modeling, vol. 52, New York, NY, Springer, 2015
[19]
S. Zafar, Y. Kim, V. Narayanan, C. Cabral Jr., V. Paruchuri, B. Doris, J. Stathis, A. Callegari, M. Chudzik, A comparative study of nbti and pbti (charge trapping) in sio2/hfo2 stacks with fusi, tin, re gates, in: 2006 Symposium on VLSI Technology, 2006. Digest of Technical Papers, Hsinchu, Taiwan, IEEE, 2006, pp. 23-25.
[20]
V. Huard, M. Denais, C. Parthasarathy, Nbti degradation, Microelectron. Reliab., 46 (2006) 1-23.
[21]
J.B. Velamala, K. Sutaria, T. Sato, Y. Cao, Physics matters: statistical aging prediction under trapping/detrapping, in: Proceedings of the 49th Annual Design Automation Conference, San Francisco, CA, ACM, 2012, pp. 139-144.
[22]
S. Gupta S.S. Sapatnekar, Gnomo: greater-than-nominal v dd operation for bti mitigation, in: 2012 17th Asia and South Pacific Design Automation Conference (ASP-DAC),¿Sydney, Australia, IEEE, 2012, pp. 271-276.
[23]
J. Abella, X. Vera, A. Gonzalez, Penelope: the nbti-aware processor, in: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Chicago, IL, 2007. MICRO 2007, IEEE, 2007, pp. 85-96.
[24]
X. Chen, Y. Wang, Y. Cao, Y. Ma, H. Yang, Variation-aware supply voltage assignment for simultaneous power and aging optimization, IEEE Trans. Very Large Scale Integr. Syst., 20 (2012) 2143-2147.
[25]
A. Tiwari, J. Torrellas, Facelift: hiding and slowing down aging in multicores, in:¿2008 Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture, 2008. MICRO-41, Lake Como Italy, IEEE, 2008, pp. 129-140.
[26]
N. Shah, R. Samanta, M. Zhang, J. Hu, D. Walker, Built-in proactive tuning system for circuit aging resilience, in: IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems,Cambridge, MA, IEEE, 2008, pp. 96-104.
[27]
T. Siddiqua, S. Gurumurthi, Nbti-aware dynamic instruction scheduling, in: Proceedings of the 5th Workshop on Silicon Errors in Logic-System Effects, Citeseer, 2009.
[28]
L. Li, Y. Zhang, J. Yang, J. Zhao, Proactive nbti mitigation for busy functional units in out-of-order microprocessors, in: Proceedings of the Conference on Design, Automation and Test in Europe, Dresden, Germany,¿2010, pp. 411-416.
[29]
D.M. Ancajas, K. Chakraborty, S. Roy, Proactive aging management in heterogeneous nocs through a criticality-driven routing approach, in: Proceedings of the Conference on Design, Automation and Test in Europe, EDA Consortium, Grenoble, France, 2013, pp. 1032-1037.
[30]
H. Reisinger, O. Blank, W. Heinrigs, W. Gustin, C. Schlünder, A comparison of very fast to very slow components in degradation and recovery due to nbti and bulk hole trapping to existing physical models, IEEE Trans. Device Mater. Reliab., 7 (2007) 119-129.
[31]
J. Shin, V. Zyuban, P. Bose, T.M. Pinkston, A proactive wearout recovery approach for exploiting microarchitectural redundancy to extend cache sram lifetime, in: ACM SIGARCH Computer Architecture News, vol. 36, no. 3, IEEE Computer Society, 2008, pp. 353-362.
[32]
T. Siddiqua, S. Gurumurthi, Recovery boosting: a technique to enhance nbti recovery in sram arrays, in:¿2010 IEEE Computer Society Annual Symposium on VLSI (ISVLSI),¿Lixouri, Greece, IEEE, 2010, pp. 393-398.
[33]
A. Bansal, J.-J. Kim, Power napping technique for accelerated negative bias temperature instability (nbti) and/or positive bias temperature instability (pbti) recovery, July 21 2015, US Patent 9086865.
[34]
T. Aichinger, M. Nelhiebel, T. Grasser, On the temperature dependence of nbti recovery, Microelectron. Reliab., 48 (2008) 1178-1184.
[35]
A.A. Katsetos, Negative bias temperature instability (nbti) recovery with bake, Microelectron. Reliab., 48 (2008) 1655-1659.
[36]
G. Pobegen, T. Aichinger, M. Nelhiebel, T. Grasser, Understanding temperature acceleration for nbti, in: Proceedings International Electron Devices Meeting (IEDM), 2011, pp. 27-3.
[37]
B. Djezzar, H. Tahi, A. Benabdelmoumene, A. Chenouf, M. Goudjil, Y. Kribes, On the permanent component profiling of the negative bias temperature instability in p-mosfet devices, Solid-State Electron., 106 (2015) 54-62.
[38]
X. Guo, W. Burleson, M. Stan, Modeling and experimental demonstration of accelerated self-healing techniques, in: Proceedings of the 51st Annual Design Automation Conference, San Francisco, CA, ACM, 2014, pp. 1-6.
[39]
K. Ramakrishnan, S. Suresh, N. Vijaykrishnan, M.J. Irwin, V. Degalahal, Impact of nbti on fpgas, in:¿20th International Conference on VLSI Design, 2007. Held Jointly with Proceedings of the 6th International Conference, Bangalore, India, IEEE, 2007, pp. 717-722.
[40]
G. Kömürcü, A.E. Pusane, G. Dündar, Effects of aging and compensation mechanisms in ordering based ro-pufs, Integr. VLSI J., 52 (2016) 71-76.
[41]
M. Naouss, F. Marc, Design and implementation of a low cost test bench to assess the reliability of fpga, Microelectron. Reliab., 55 (2015) 1341-1345.
[42]
S. Velusamy, W. Huang, J. Lach, M. Stan, K. Skadron, Monitoring temperature in fpga based socs, in: Proceedings of the 2005 IEEEInternational Conference on Computer Design: VLSI in Computers and Processors, Washington, DC, IEEE, 2005, pp. 634-637.
[43]
A. Calimera, A. Macii, E. Macii, M. Poncino, Power-gating for leakage control and beyond, in: Circuit Design for Reliability. Springer, New York, NY, 2015, pp. 175-205.
[44]
V. Huard, F. Cacho, X. Federspiel, P. Mora, Hot-carrier injection degradation in advanced cmos nodes: a bottom-up approach to circuit and system reliability, in: Hot Carrier Degradation in Semiconductor Devices, Springer, New York, NY, 2015, pp. 401-444.
[45]
Maxim, Max1044/icl7660 datasheet. Online. Available: {https://www.maximintegrated.com/en/datasheet/index.mvp/id/1017}
[46]
P. Weber, M. Zagrabski, P. Musz, K. Kepa, M. Nikodem, B. Wojciechowski, Configurable heat generators for fpgas, in: 2014 Proceedings of the 20th International Workshop on Thermal Investigations of ICs and Systems (THERMINIC),¿Greenwich, London, UK, IEEE, 2014, pp. 1-4
[47]
A. Amouri, J. Hepp, M. Tahoori, Built-in self-heating thermal testing of fpgas, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. PP(99) (2016) 1.
[48]
H. Esmaeilzadeh, E. Blem, R.S. Amant, K. Sankaralingam, D. Burger, Dark silicon and the end of multicore scaling, in: 2011 Proceedings of the 38th Annual International Symposium on¿Computer Architecture (ISCA), San Jose, CA, IEEE, 2011, pp. 365-376.
[49]
J. Henkel, H. Khdr, S. Pagani, M. Shafique, New trends in dark silicon, in:¿2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC),¿San Francisco, IEEE, 2015, pp. 1-6.
[50]
L. Huang, Q. Xu, Characterizing the lifetime reliability of manycore processors with core-level redundancy, in:¿2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD),¿San Jose, CA, IEEE, 2010, pp. 680-685.
[51]
C. Zhuo, K. Chopra, D. Sylvester, D. Blaauw, Process variation and temperature-aware full chip oxide breakdown reliability analysis, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., 30 (2011) 1321-1334.
[52]
R.A. Ashraf, A. Al-Zahrani, N. Khoshavi, R. Zand, S. Salehi, A. Roohi, M. Lin, R.F. DeMara, Reactive rejuvenation of cmos logic paths using self-activating voltage domains, in:¿2015 IEEE International Symposium on Circuits and Systems (ISCAS),¿Lisbon, Portugal, IEEE, 2015, pp. 2944-2947
[53]
A. Gandhi, M. Harchol-Balter, M.A. Kozuch, Are sleep states effective in data centers? in: 2012 International Green Computing Conference (IGCC),¿San Jose, CA, IEEE, 2012, pp. 1-10
[54]
A. Paya, D. Marinescu, Energy-aware load balancing and application scaling for the cloud ecosystem, IEEE Trans. Cloud Comput. PP(99) (2015) 1.
[55]
P. Bogdan, S. Garg, U.Y. Ogras, Energy-efficient computing from systems-on-chip to micro-server and data centers, in:¿2015 Sixth International Green Computing Conference and Sustainable Computing Conference (IGSC),¿Las Vegas, NV, IEEE, 2015, pp. 1-6
[56]
T.-H. Kim, R. Persaud, C.H. Kim, Silicon odometer, IEEE J. Solid-State Circuits, 43 (2008) 874-880.
[57]
X. Guo, M. Stan, MCPENS: multiple-critical-path embeddable nbti sensors for dynamic wearout management, in: Proceedings of the 11th Workshop on Silicon Errors in Logic-System Effects, Austin, TX, Citeseer, 2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Integration, the VLSI Journal
Integration, the VLSI Journal  Volume 56, Issue C
January 2017
181 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2017

Author Tags

  1. Accelerated self-healing
  2. BTI
  3. Cross-layer
  4. Frequency dependence
  5. Wearout

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media