Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Self-Healing Many-Core Architecture

Published: 01 July 2016 Publication History

Abstract

More pronounced aging effects, more frequent early-life failures, and incomplete testing and verification processes due to time-to-market pressure in new fabrication technologies impose reliability challenges on forthcoming systems. A promising solution to these reliability challenges is self-test and self-reconfiguration with no or limited external control. In this work a scalable self-test mechanism for periodic online testing of many-core processor has been proposed. This test mechanism facilitates autonomous detection and omission of faulty cores and makes graceful degradation of the many-core architecture possible. Several test components are incorporated in the many-core architecture that distribute test stimuli, suspend normal operation of individual processing cores, apply test, and detect faulty cores. Test is performed concurrently with the system normal operation without any noticeable downtime at the application level. Experimental results show that the proposed test architecture is extensively scalable in terms of hardware overhead and performance overhead that makes it applicable to many-cores with more than a thousand processing cores.

References

[1]
Borkar S., Designing reliable systems from unreliable components: the challenges of transistor variability and degradation IEEE Micro 2005 Volume 25 Issue 6 pp.10 –16
[2]
Srinivasan J., Adve S. V., Bose P., Rivers J. A., The impact of technology scaling on lifetime reliability Proceedings of the International Conference on Dependable Systems and Networks July 2004 Florence, Italy IEEE pp.177 –186
[3]
Huang S.-H., Tu W.-P., Chang C.-M., Pan S.-B., Low-power anti-aging zero skew clock gating ACM Transactions on Design Automation of Electronic Systems 2013 Volume 18 Issue 2, article 27
[4]
Collet J. H., Psarakis M., Zajac P., Gizopoulos D., Napieralski A., Comparison of fault-tolerance techniques for massively defective fine- and coarse-grained nanochips Proceedings of the 16th International Conference on Mixed Design of Integrated Circuits & Systems MIXDES '09 June 2009 Lodz, Poland pp.23 –30
[5]
Zajac P., Collet J. H., Production yield and self-configuration in the future massively defective nanochips Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems DFT '07 September 2007 pp.197 –205
[6]
Kamran A., Navabi Z., Homogeneous many-core processor system test distribution and execution mechanism Proceedings of the 19th IEEE European Test Symposium ETS '14 May 2014 Paderborn, Germany pp.1 –2
[7]
Kamran A., Navabi Z., Online periodic test mechanism for homogeneous many-core processors Proceedings of the IFIP/IEEE 21st International Conference on Very Large Scale Integration VLSI-SoC '13 October 2013 Istanbul, Turkey IEEE pp.256 –259
[8]
Constantinides K., Mutlu O., Austin T., Bertacco V., A flexible software-based framework for online detection of hardware defects IEEE Transactions on Computers 2009 Volume 58 Issue 8 pp.1063 –1079
[9]
Li Y., Makar S., Mitra S., CASP: concurrent autonomous chip self-test using stored test patterns Proceedings of the Design, Automation and Test in Europe DATE '08 March 2008 pp.885 –890
[10]
Bernardi P., Reorda M. S., A new architecture to cross-fertilize on-line and manufacturing testing Proceedings of the 20th Asian Test Symposium ATS '11 November 2011 New Delhi, India IEEE pp.142 –147
[11]
Rodrigues R., Kundu S., An online mechanism to verify datapath execution using existing resources in chip multiprocessors Proceedings of the 20th Asian Test Symposium ATS '11 November 2011 New Delhi, India IEEE pp.161 –166
[12]
Austin T. M., DIVA: a reliable substrate for deep submicron microarchitecture design Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture MICRO '32 November 1999 pp.196 –207
[13]
Benabdenbi M., Pecheux F., Faure E., On-line test and monitoring of multi-processor SoCs: a software-based approach Proceedings of the 10th Latin American Test Workshop LATW '09 March 2009 Rio de Janeiro, Brazil pp.1 –6
[14]
Collet J. H., Zajac P., Psarakis M., Gizopoulos D., Chip self-organization and fault tolerance in massively defective multicore arrays IEEE Transactions on Dependable and Secure Computing 2011 Volume 8 Issue 2 pp.207 –217
[15]
Plasma CPU Model, http://opencores.org/project,plasma
[16]
Kamran A., Navabi Z., Hardware acceleration of online error detection in many-core processors Canadian Journal of Electrical and Computer Engineering 2015 Volume 38 Issue 2 pp.143 –153
[17]
Kranitis N., Paschallis A., Gizopoulos D., Xenoulis G., Software-based self-testing of embedded processors IEEE Transactions on Computers 2005 Volume 54 Issue 4 pp.461 –475
[18]
Paschalis A., Gizopoulos D., Effective software-based self-test strategies for on-line periodic testing of embedded processors IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2005 Volume 24 Issue 1 pp.88 –98
[19]
Apostolakis A., Gizopoulos D., Psarakis M., Paschalis A., Software-based self-testing of symmetric shared-memory multiprocessors IEEE Transactions on Computers 2009 Volume 58 Issue 12 pp.1682 –1694
[20]
Kaliorakis M., Psarakis M., Foutris N., Gizopoulos D., Accelerated online error detection in many-core microprocessor architectures Proceedings of the IEEE 32nd VLSI Test Symposium VTS '14 April 2014 pp.1 –6

Recommendations

Comments

Information & Contributors

Information

Published In

cover image VLSI Design
VLSI Design  Volume 2016, Issue
July 2016
39 pages
ISSN:1065-514X
EISSN:1563-5171
Issue’s Table of Contents

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Published: 01 July 2016

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 31
    Total Downloads
  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media