Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1266366.1266614acmconferencesArticle/Chapter ViewAbstractPublication PagesdateConference Proceedingsconference-collections
Article

Low-cost protection for SER upsets and silicon defects

Published: 16 April 2007 Publication History

Abstract

Extreme transistor scaling trends in silicon technology are soon to reach a point where manufactured systems will suffer from limited device reliability and severely reduced life-time, due to early transistor failures, gate oxide wear-out, manufacturing defects, and radiation-induced soft errors (SER). In this paper we present a low-cost technique to harden a microprocessor pipeline and caches against these reliability threats. Our approach utilizes online built-in self-test (BIST) and microarchitectural checkpointing to detect, diagnose and recover the computation impaired by silicon defects or SER events. The approach works by periodically testing the processor to determine if the system is broken. If so, we reconfigure the processor to avoid using the broken component. A similar mechanism is used to detect SER, faults, with the difference that recovery is implemented by re-execution. By utilizing low-cost techniques to address defects and SER, we keep protection costs significantly lower than traditional fault-tolerance approaches while providing high levels of coverage for a wide range of faults. Using detailed gate-level simulation, we find that our approach provides 95% and 99% coverage for silicon defects and SER events, respectively, with only a 14% area overhead.

References

[1]
D. Anderson and G. Metze. Design of totally self-checking check circuits for m-out-of-n codes. IEEE Transaction on Computers, 22:263--269, March 1973.
[2]
T. Austin, D. Blaauw, T. Mudge, and K. Flautner. Making typical silicon matter with razor. IEEE Computer, 37(3):57--65, 2004.
[3]
F. A. Bower, P. G. Shealy, S. Ozev, and D. J. Sorin. Tolerating hard faults in microprocessor array structures. In Proc. of Int'l Conf. on Dependable Systems and Networks (DSN), 2004.
[4]
D. Bradley and A. Tyrrell. Immunotronics - novel finite-state-machine architectures with built-in self-test using self-nonself differentiation. IEEE Transactions on Evolutionary Computation, 6(3):227--238, June 2002.
[5]
F. S. C. Bolchini, R. Montandon and D. Sciuto. A state encoding for self-checking finite state machines. Proceedings of the ASP-DAC, pages 711--716, August 1995.
[6]
K. Constantinides, J. Blome, S. Plaza, B. Zhang, V. Bertacco, S. Mahlke, T. Austin, and M. Orshansky. Bulletproof: A defect-tolerant CMP switch architecture. In Proc. of the Int'l Symp. on High-Performance Computer Arch., Feb. 2006.
[7]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In IEEE Annual Workshop on Workload Characteristics, pages 3--14, 2001.
[8]
M. D. Hill and A. J. Smith. Evaluating associativity in CPU caches. IEEE Trans. on Computers, 38(12):1612--1630, 1989.
[9]
N. Jha. Separable codes for detecting unidirectional errors. IEEE Transaction on Computer-Aided Design, 8, May 1990.
[10]
A. J. KleinOsowski and D. J. Lilja. The NanoBox project: Exploring fabrics of self-correcting logic blocks for high defect rate molecular device technologies. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pages 19--24, 2004.
[11]
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Int'l Symposium on Computer Architecture, pages 330--335, 1997.
[12]
J. Lo. A hyper optimal encoding scheme for self-checking circuits. IEEE Trans. Computers, 45(9):1022--1030, Sept. 1996.
[13]
M. Marouf and D. Friedman. Design of self-checking checkers for Berger codes. Proc. Eighth Symp. Fault-Tolerant Computing, pages 179--184, June 1978.
[14]
M. Marouf and D. Friedman. Efficient design of self-checking checkers for m-out-of-n codes. IEEE Trans. Computers, 27:482--490, June 1978.
[15]
J. F. Martinez, J. Renau, M. C. Huang, M. Prvulovic, and J. Torrellas. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In Proc. Int'l Symposium on Microarchitecture (MICRO), pages 3--14, 2002.
[16]
D. G. Mavis and P. H. Eaton. Soft error rate mitigation techniques for modern microcircuits. In Proceedings of 40th Annual Reliability Physics Symposium, pages 216--225, 2002.
[17]
N. Miskov-Zivanov and D. Marculescu. MARS-C: modeling and reduction of soft errors in combinational circuits. In DAC '06: Proceedings of the 43rd annual conference on Design automation, pages 767--772, 2006.
[18]
S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim. Robust system design with built-in soft-error resilience. Computer, 38(2):43--52, 2005.
[19]
S. Mitra, M. Zhang, N. Seifert, B. Gill, S. Waqas, and K. S. Kim. Combinational logic soft error correction. In International Test Conference, November 2006.
[20]
M. Nicolaidis. Time redundancy based soft-error tolerance to rescue nanometer technologies. In Proc. VLSI Test Symposium, pages 86--94, 1999.
[21]
S. Shyam, K. Constantinides, S. Phadke, V. Bertacco, and T. Austin. Ultra low-cost defect protection for microprocessor pipelines. In Proc. of the Symp. on Architectural Support for Prog. Languages and Operating Systems (ASPLOS), Oct. 2006.
[22]
D. P. Siewiorek and R. S. Swarz. Reliable computer systems: Design and evaluation, 3rd edition. AK Peters, Ltd, 1998.
[23]
J. H. Stathis. Reliability limits for the gate insulator in CMOS technology. IBM Journal of Research and Development, 46(2/3):265--286, 2002.
[24]
Trimaran. An infrastructure for research in ILP. http://www.trimaran.org, 2000.
[25]
C. Weaver and T. Austin. A fault tolerant approach to microprocessor design. In Proc. of Int'l Conf. on Dependable Systems and Networks (DSN), pages 411--420, 2001.
[26]
B. Zhang, W.-S. Wang, and M. Orshansky. FASER: Fast analysis of soft error susceptibility for cell-based designs. In ISQED '06: Proceedings of the 7th International Symposium on Quality Electronic Design, pages 755--760, 2006.

Cited By

View all
  • (2010)Application-aware diagnosis of runtime hardware faultsProceedings of the International Conference on Computer-Aided Design10.5555/2133429.2133531(487-492)Online publication date: 7-Nov-2010
  • (2009)A unified online fault detection scheme via checking of stability violationProceedings of the Conference on Design, Automation and Test in Europe10.5555/1874620.1874742(496-501)Online publication date: 20-Apr-2009

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DATE '07: Proceedings of the conference on Design, automation and test in Europe
April 2007
1741 pages
ISBN:9783981080124

Sponsors

Publisher

EDA Consortium

San Jose, CA, United States

Publication History

Published: 16 April 2007

Check for updates

Qualifiers

  • Article

Conference

DATE07
Sponsor:
  • EDAA
  • SIGDA
  • The Russian Academy of Sciences
DATE07: Design, Automation and Test in Europe
April 16 - 20, 2007
Nice, France

Acceptance Rates

Overall Acceptance Rate 518 of 1,794 submissions, 29%

Upcoming Conference

DATE '25
Design, Automation and Test in Europe
March 31 - April 2, 2025
Lyon , France

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2010)Application-aware diagnosis of runtime hardware faultsProceedings of the International Conference on Computer-Aided Design10.5555/2133429.2133531(487-492)Online publication date: 7-Nov-2010
  • (2009)A unified online fault detection scheme via checking of stability violationProceedings of the Conference on Design, Automation and Test in Europe10.5555/1874620.1874742(496-501)Online publication date: 20-Apr-2009

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media