Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1250662.1250719acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Examining ACE analysis reliability estimates using fault-injection

Published: 09 June 2007 Publication History

Abstract

ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE analysis couples data from abstract performance models with low level design details to identify and rule out transient faults that will not cause incorrect execution. While many transient faults are analyzable in ACE analysis frameworks, some are not. As a result, ACE analysis is conservative and provides a lower bound for the reliability of a processor design. Bounding the reliability of a design is useful since it can guarantee that the given design will meet reliability goals.
In this work, we quantify and identify the sources of ACE analysis conservatism by comparing an ACE analysis methodology against a rigorous fault-injection study. We evaluate two flavors of ACE analysis: a "simple" analysis and a refined analysis, finding that even the refined analysis overestimates the soft error vulnerability of an instruction scheduler by 2-3x. The conservatism stems from two key sources: from lack of detail in abstract performance models and from what we term Y-Bits, a result of the single-pass simulation methodology that is typical of ACE analysis. We also examine the efficacy of applying ACE analysis to a class of "partial coverage" error mitigation techniques. In particular, we perform a case study on one such technique and extrapolate our findings to others.

References

[1]
A. Biswas, P. Racunas, R. Cheveresan, J. Emer, S. S. Mukherjee, and R. Rangan. Computing the Architectural Vulnerability Factor for address-based structures. In ISCA-33, pages 532--543, June 2005.
[2]
E. W. Czeck and D. Siewiorek. Effects of transient gate-level faults on program behavior. In Proceedings of the 1990 International Symposium on Fault-Tolerant Computing, pages 236--243, June 1990.
[3]
M. A. Gomaa and T. N. Vijaykumar. Opportunistic transient-fault detection. In ISCA-33, pages 172--183, June 2005.
[4]
S. Kim and A. K. Somani. Soft error sensitivity characterization for microprocessor dependability enhancement strategy. In Proceedings of the International Conference on Dependable Systems and Networks, pages 416--425, Sept. 2002.
[5]
S. S. Mukherjee, J. Emer, T. Fossum, and S. K. Reinhardt. Cache scrubbing in microprocessors: Myth or necessity? In 10th IEEE Pacific Rim International Symposium on Dependable Computing, pages 37--42, Mar. 2004.
[6]
S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T. Austin. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In MICRO-36, pages 29--40, Dec. 2003.
[7]
P. Racunas, K. Constantinides, S. Manne, and S. S. Mukherjee. Perturbation-based fault screening. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture, Feb. 2007.
[8]
S. K. Reinhardt and S. S. Mukherjee. Transient fault detection via simultaneous multithreading. In ISCA-27, June 2000.
[9]
G. A. Reis, J. Chang, N. Vachhara jani, R. Rangan, D. I. August, and S. S. Mukherjee. Design and evaluation of hybrid fault-detection systems. In ISCA-33, pages 148--159, June 2005.
[10]
E. Rotenberg. AR-SMT: A microarchitectural approach to fault tolerance in microprocessors. In FTCS, June 1999.
[11]
N. Wang, M. Fertig, and S. Patel. Y-branches: When you come to a fork in the road, take it. In Proceedings of the International Conference on Paral lel Architectures and Compilation Techniques, pages 56--66, 2003.
[12]
N. J. Wang and S. J. Patel. Restore: Symptom based soft error detection in microprocessors. In DSN-2005, June 2005.
[13]
N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel. Characterizing the effects of transient faults on a high-performance processor pipeline. In DSN-2004, June 2004.
[14]
C. Weaver and T. Austin. A fault tolerant approach to microprocessor design. In ISCA-29, May 2002.
[15]
C. Weaver, J. Emer, S. S. Mukherjee, and S. K. Reinhardt. Techniques to reduce the soft error rate of a high-performance microprocessor. In ISCA-31, June 2004.

Cited By

View all
  • (2023)Multi-Level Fault Injection Methodology Using UVM-SystemC2023 IEEE East-West Design & Test Symposium (EWDTS)10.1109/EWDTS59469.2023.10297034(1-6)Online publication date: 22-Sep-2023
  • (2022)Flodam: Cross-Layer Reliability Analysis Flow for Complex Hardware Designs2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774541(819-824)Online publication date: 14-Mar-2022
  • (2022)Reliability assessment of FreeRTOS in Embedded Systems2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)10.1109/DSN-S54099.2022.00019(28-30)Online publication date: Jun-2022
  • Show More Cited By

Index Terms

  1. Examining ACE analysis reliability estimates using fault-injection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
      June 2007
      542 pages
      ISBN:9781595937063
      DOI:10.1145/1250662
      • General Chair:
      • Dean Tullsen,
      • Program Chair:
      • Brad Calder
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
        May 2007
        527 pages
        ISSN:0163-5964
        DOI:10.1145/1273440
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 June 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. fault tolerance
      2. measurement techniques
      3. microprocessors
      4. soft errors

      Qualifiers

      • Article

      Conference

      SPAA07
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)30
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 23 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Multi-Level Fault Injection Methodology Using UVM-SystemC2023 IEEE East-West Design & Test Symposium (EWDTS)10.1109/EWDTS59469.2023.10297034(1-6)Online publication date: 22-Sep-2023
      • (2022)Flodam: Cross-Layer Reliability Analysis Flow for Complex Hardware Designs2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774541(819-824)Online publication date: 14-Mar-2022
      • (2022)Reliability assessment of FreeRTOS in Embedded Systems2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)10.1109/DSN-S54099.2022.00019(28-30)Online publication date: Jun-2022
      • (2022)IntroductionFault Tolerant Computer Architecture10.1007/978-3-031-01723-0_1(1-17)Online publication date: 5-Mar-2022
      • (2021)Revisiting Symptom-Based Fault Tolerant Techniques against Soft ErrorsElectronics10.3390/electronics1023302810:23(3028)Online publication date: 4-Dec-2021
      • (2021)Special Session: Operating Systems under test: an overview of the significance of the operating system in the resiliency of the computing continuum2021 IEEE 39th VLSI Test Symposium (VTS)10.1109/VTS50974.2021.9441042(1-10)Online publication date: 25-Apr-2021
      • (2021)Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00075(902-915)Online publication date: Jun-2021
      • (2021)Boosting Microprocessor Efficiency: Circuit- and Workload-Aware Assessment of Timing Errors2021 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC53511.2021.00022(125-137)Online publication date: Nov-2021
      • (2020)Characterizing and Exploiting Soft Error Vulnerability Phase Behavior in GPU ApplicationsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2020.2991136(1-1)Online publication date: 2020
      • (2020)On the Analysis of Real-time Operating System Reliability in Embedded Systems2020 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)10.1109/DFT50435.2020.9250861(1-6)Online publication date: 2020
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media