Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1086228.1086266acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Compiler-guided register reliability improvement against soft errors

Published: 18 September 2005 Publication History

Abstract

With the scaling of technology, transient errors caused by external particle strikes have become a critical challenge for microprocessor design. As embedded processors are widely used in reliability-sensitive environments, it becomes increasingly important to develop cost-effective techniques to improve the processor reliability against soft errors. This paper focuses on studying the register file immunity against soft errors since modern processors typically employ a large number of registers, which are accessed very frequently. As a result, soft errors occurred in registers can easily propagate to functional units or the memory system, leading to silent data error (SDC) or system crash.To develop cost-effective techniques to fight soft errors for embedded processors, the first step is to understand the register file susceptibility to soft errors and its impact on the system reliability accurately. Toward this goal, this paper proposes the concept of register vulnerability factor (RVF) to characterize the probability that register transient errors can escape the register file and thus potentially impact the system reliability. Built upon the RVF concept, we then propose two cost-effective compiler-guided techniques to improve the register file reliability by lowering the RVF value. Our experiments indicate that on average, the RVF can be reduced to 9.1% and 9.5% by the hyperblock-based instruction re-scheduling and the reliability-oriented register assignment respectively, which can potentially lower the reliability cost significantly while protecting register files against transient errors.

References

[1]
M. Rebaudengo, M. S. Reorda and M. Violantc. An accurate analysis of the effects of soft errors in the instruction and data caches of a pipelined microprocessor. In Proc. of the Design, Automation and Testing Europe (DATE), 2003.
[2]
S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt and T. Austin. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. MICRO 2003.
[3]
C. L. Chen and M.Y Hsiao. Error-correcting codes for semiconductor memory applications: a state of the art review. In Reliable Computer Systems - Design and Evaluation, pages 771--786, Digital Press, 2nd edition, 1992.
[4]
G. Memik, M. Kandemir, O. OZturk. Increasing register file immunity to transient errors. In Proc of DATE 2005.
[5]
M. Tremblay and Y. Tamir. Support for fault tolerance in VLSI processors. ISCS, 1989.
[6]
R. Phelan. Addressing soft errors in ARM core-based SoC. ARM White Paper, Dec. 2003.
[7]
S. Kim and A. K. Somani. Area efficient architectures for information integrity in cache memories. In Proc. of the International Symposium on Computer Architecture, 1999.
[8]
C. Chen and A. K. Somani. Fault containment in cache memories for TMR redundant processor systems. IEEE Transactions on Computers, March 1999.
[9]
T. Austin. DIVA: a reliable substrate for deep submicron microarchitecture design. MICRO, 1999.
[10]
S.K. Reinhardt and S.S. Mukherjee. Transient fault detection via simultaneous multithreading. In Proc. of ISCA, 2000.
[11]
J. Ray et al. Dual use of superscalar datapath for transient-fault detection and recovery. MICRO, 2001.
[12]
T. J. Dell. A white paper on the benefits of chipkill-correct ECC for PC serve main memory. IBM, Nov 1997.
[13]
W. Zhang, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, D. Duarte and Y. Tsai. Exploiting VLIW schedule slacks for dynamic and leakage energy reduction. In Proc. of MICRO 2001.
[14]
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: an effective technique for VLIW and superscalar compilation. The Journal of Supercomputing, pp. 229--248.
[15]
S. A. Mahilke, D. C. Lin, W. Y. Chen, R. E. Hank and R. A. Bringmann. Effective compiler support for predicated execution using hyperblock. In Proc. of the 25th International Symposium on Microarchitecture, pp.45--54, Dec. 1992.
[16]
http://www.trimaran.org.
[17]
C. Lee and M. Potkonjak, and W. H. Mangione-Smith. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In Proc. the International Symposium on Microarchitecture, pp. 330--335, 1997.
[18]
S. S. Muchnick. Advanced compiler design implementation. Morgan Kaufmann Publishers, 1997.
[19]
S. Kim and A.K. Somaini. Soft error sensitivity characterization for microprocessor dependability enhancement strategy. In Proc. of the International Conference on Dependable Systems and Networks (DSN), 2002.
[20]
N. J. Wang, J. Quek, T.M. Rafacz, S.J. Patel. Characterizing the effects of transient faults on a high-performance processor pipeline. In Proc of the International Conference on Dependable Systems and Networks (DSN), 2004.

Cited By

View all
  • (2024)BEC: Bit-Level Static Analysis for Reliability against Soft ErrorsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444844(283-295)Online publication date: 2-Mar-2024
  • (2022)Studying error propagation on application data structure and hardwareThe Journal of Supercomputing10.1007/s11227-022-04625-x78:17(18691-18724)Online publication date: 13-Jun-2022
  • (2022)Quantifying the impact of data replication on error propagationCluster Computing10.1007/s10586-022-03726-926:3(1985-1999)Online publication date: 13-Sep-2022
  • Show More Cited By

Index Terms

  1. Compiler-guided register reliability improvement against soft errors

      Recommendations

      Reviews

      Juan A. Carrasco

      As the size of processors shrinks, transient errors caused by external particle strikes, also known as soft errors, are becoming a significant source of system failures. This paper analyzes the impact of such soft errors in the register file of a typical microprocessor with a very long instruction word (VLIW) architecture. It also analyzes the effectiveness of three compiler-based techniques aimed at reducing the impact of soft errors in the register file. The analyzed processor includes four integer arithmetic logic units (ALUs), two floating point ALUs, one load/store unit, one branch unit, and a register file with 64 registers. The analysis is performed using ten benchmark applications. The impact of soft errors is measured using a new metric, the register vulnerability factor (RVF), defined as the probability that a soft error in the file register will be propagated to other hardware elements. The RVF metric can be easily estimated by simulation. The three compiler-based techniques for reducing the RVF include instruction rescheduling (advancing read operations and delaying write operations), register reassignment (some registers are protected by an error correcting code (ECC), and the registers showing higher RVF are reassigned to them); and a hybrid technique combining the first two techniques. Simulation experiments show that instruction rescheduling reduces the RVF to a value of 9.1 percent, on average; register reassignment with four ECC-protected registers reduces RVF to an average of 10.5 percent; and the hybrid technique with four ECC-protected registers reduces the RVF to a value of 6.1 percent, on average. The simulation results reported in the paper are interesting. The authors also report results from previous experiments analyzing the impact of soft errors in microprocessors. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      EMSOFT '05: Proceedings of the 5th ACM international conference on Embedded software
      September 2005
      390 pages
      ISBN:1595930914
      DOI:10.1145/1086228
      • Conference Chair:
      • Wayne Wolf
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 September 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. register file
      2. register lifetime
      3. reliability
      4. soft errors

      Qualifiers

      • Article

      Conference

      EMSOFT05
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 60 of 203 submissions, 30%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)BEC: Bit-Level Static Analysis for Reliability against Soft ErrorsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444844(283-295)Online publication date: 2-Mar-2024
      • (2022)Studying error propagation on application data structure and hardwareThe Journal of Supercomputing10.1007/s11227-022-04625-x78:17(18691-18724)Online publication date: 13-Jun-2022
      • (2022)Quantifying the impact of data replication on error propagationCluster Computing10.1007/s10586-022-03726-926:3(1985-1999)Online publication date: 13-Sep-2022
      • (2020)Background on Soft ErrorsSoft Error Reliability Using Virtual Platforms10.1007/978-3-030-55704-1_2(9-17)Online publication date: 3-Nov-2020
      • (2018)Efficient Protection of the Register File in Soft-Processors Implemented on Xilinx FPGAsIEEE Transactions on Computers10.1109/TC.2017.273799667:2(299-304)Online publication date: 1-Feb-2018
      • (2017)Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital SystemsACM Computing Surveys10.1145/309269950:4(1-38)Online publication date: 4-Oct-2017
      • (2017)Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining TechniquesACM Computing Surveys10.1145/309256650:4(1-36)Online publication date: 25-Aug-2017
      • (2017)Corrections to “A Menagerie of Timed Automata”ACM Computing Surveys10.1145/307880950:3(1-8)Online publication date: 29-Jun-2017
      • (2017)Searchable Symmetric EncryptionACM Computing Surveys10.1145/306400550:3(1-37)Online publication date: 26-May-2017
      • (2017)A Survey on Ensemble Learning for Data Stream ClassificationACM Computing Surveys10.1145/305492550:2(1-36)Online publication date: 27-Mar-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media