Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1152154.1152168acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
Article

Self-checking instructions: reducing instruction redundancy for concurrent error detection

Published: 16 September 2006 Publication History

Abstract

With reducing feature size, increasing chip capacity, and increasing clock speed, microprocessors are becoming increasingly susceptible to transient (soft) errors. Redundant multi-threading (RMT) is an attractive approach for concurrent error detection. However, redundant thread execution has a significant impact on performance and energy consumption in the chip.In this paper, we propose reducing instruction redundancy (the instructions that are redundantly executed) as a means to mitigate the performance and energy impact of redundancy. In this paper, we experiment with an decoupled RMT approach where the frontend pipeline stages are protected through error codes, while the backend pipeline stages are protected through redundant execution. In this approach, we define two categories of instructions—self-checking and semi self-checking instructions. Self checking instructions are those instructions whose results are checked for any errors when their "main" copies are executed. These instructions are not redundantly executed. Semi self-checking instructions are those instructions for which a major part of their results is checked when the "main" copies are executed, and the remaining part of the instructions is checked using a small amount of additional hardware. Reducing instruction redundancy with this approach has the same fault coverage as the base architecture where all the instructions are redundantly executed. The techniques are evaluated in terms of their performance, power, and vulnerability impact on the RMT processor. Our experiments show that the techniques reduce instruction redundancy by about 58% and recover about 51% of the performance lost due to redundant execution. Our techniques also recover about 40% of the energy consumption increase in the key data-path structures.

References

[1]
T. Austin, "DIVA: a reliable substrate for deep submicron microarchitecture design," Proc. Micro-32, 1999.
[2]
D. Burger and T. M. Austin, "The SimpleScalar Tool Set, Version 2.0," Computer Arch. News, 1997.
[3]
J.A.Butts and G.Sohi, "Dynamic dead instruction detection and elimination," ASPLOS, 2002.
[4]
Compaq Computer Corp., "Data integrity for Compaq Non-Stop Himalaya servers," http://nonstop.compaq.com, 1999.
[5]
J. G. Holm, and P. Banerjee, "Low cost concurrent error detection in a VLIW architecture using replicated instructions" Proc. ICPP-21, 1992.
[6]
M. Gomaa, et. al., "Transient-Fault Recovery for Chip Multiprocessors," Proc. ISCA-30, 2003.
[7]
M. K. Gowan, et. al., "Power Considerations in the Design of the Alpha 21264 Microprocessor," Proc. DAC, 1998.
[8]
S. Kumar, et. al., "Reducing Resource Redundancy for Concurrent Error Detection Techniques in High Performance Microprocessor," Proc. HPCA, 2006.
[9]
S. Mukherjee, et. al., "A Systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor," Micro-36, 2003.
[10]
J. H. Patel, and L. T. Fung, "Concurrent error detection in ALU's by recomputing with shifted operands," IEEE Transactions on Computers, 31(7):589--595, July 1982.
[11]
J. Ray, J. Hoe, and B. Falsafi, "Dual use of superscalar datapath for transient-fault detection and recovery," Proc. Micro-34, 2001.
[12]
S. Reinhardt, and S. Mukherjee, "Transient fault detection via simultaneous multithreading," Proc. ISCA-27, June 2000.
[13]
E. Rotenberg, "AR-SMT: A microarchitectural approach to fault tolerance in microprocessors," Proc. of the 29th Intl. Symp. on Fault-Tolerant Computing Systems, June 1999.
[14]
P. Shivakumar, and N. Jouppi, "CACTI 3.0: An Integrated Cache Timing Power, and Area Model," Technical Report, DEC Western Research Lab, 2002.
[15]
D. P. Siewiorek and R. S. Swarz, "Reliable Computer Systems Design and Evaluation," The Digital Press, 1992.
[16]
T. J. Slegel, et al. "IBM's S/390 G5 microprocessor design," IEEE Micro, 19(2):12--23, March/April 1999.
[17]
J.Smolens, et. al., "Efficient Resource sharing in Concurrent error detecting Superscalar microarchitectures," Proc. Micro-37, 2004.
[18]
K. Sundaramoorthy, Z. Purser, and E. Rotenberg, "Slipstream processors: Improving both performance and fault tolerance," In Proc. Micro-33, December 2000.
[19]
T. Vijaykumar, I. Pomeranz, and K. Cheng, "Transient-fault recovery using simultaneous multithreading," Proc. ISCA-29, 2002.
[20]
C. Weaver, et. al., "Techniques to Reduce the Soft Error Rate of a High Performance Microprocessor," Proc. ISCA-31, 2004.
[21]
M. Gomaa and T. N. Vijaykumar, "Opportunistic Transient-Fault Detection," Proc. ISCA-32, 2005.
[22]
V. Petric, et. al, "Reno: A Rename-Based Instruction Optimizer," Proc. ISCA-32, 2005.
[23]
B. Fahs, et. al, "Continuous Optimization," Technical Report, UIUC, 2004.
[24]
A. Parashar, et. al, "A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy" ISCA-31, 2004.

Cited By

View all
  • (2018)A user‐assisted thread‐level vulnerability assessment toolConcurrency and Computation: Practice and Experience10.1002/cpe.508531:13Online publication date: 20-Nov-2018
  • (2012)Statistical Reliability Estimation of Microprocessor-Based SystemsIEEE Transactions on Computers10.1109/TC.2011.18861:11(1521-1534)Online publication date: 1-Nov-2012
  • (2010)Implementation and Performance Evaluation of an Intelligent Online Argumentation Assessment SystemProceedings of the 2010 International Conference on Electrical and Control Engineering10.1109/iCECE.2010.632(2560-2563)Online publication date: 25-Jun-2010
  • Show More Cited By

Index Terms

  1. Self-checking instructions: reducing instruction redundancy for concurrent error detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques
      September 2006
      308 pages
      ISBN:159593264X
      DOI:10.1145/1152154
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 September 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. RISC/CISC
      2. VLIW architectures
      3. concurrent error detection
      4. reducing instruction redundancy
      5. redundant multi-threading
      6. self-checking instructions

      Qualifiers

      • Article

      Conference

      PACT06
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 121 of 471 submissions, 26%

      Upcoming Conference

      PACT '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)A user‐assisted thread‐level vulnerability assessment toolConcurrency and Computation: Practice and Experience10.1002/cpe.508531:13Online publication date: 20-Nov-2018
      • (2012)Statistical Reliability Estimation of Microprocessor-Based SystemsIEEE Transactions on Computers10.1109/TC.2011.18861:11(1521-1534)Online publication date: 1-Nov-2012
      • (2010)Implementation and Performance Evaluation of an Intelligent Online Argumentation Assessment SystemProceedings of the 2010 International Conference on Electrical and Control Engineering10.1109/iCECE.2010.632(2560-2563)Online publication date: 25-Jun-2010
      • (2010)Low Latency Recovery from Transient Faults for Pipelined Processor ArchitecturesProceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools10.1109/DSD.2010.87(219-225)Online publication date: 1-Sep-2010
      • (2008)Efficient fault tolerance in multi-media applications through selective instruction replicationProceedings of the 2008 workshop on Radiation effects and fault tolerance in nanometer technologies10.1145/1366224.1366227(339-346)Online publication date: 5-May-2008
      • (2008)Speculative instruction validation for performance-reliability trade-off2008 IEEE 14th International Symposium on High Performance Computer Architecture10.1109/HPCA.2008.4658656(405-414)Online publication date: Feb-2008
      • (2007)Error Detection Using Dynamic Dataflow Verification16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)10.1109/PACT.2007.4336204(104-118)Online publication date: Sep-2007

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media