Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Flexible Software-Based Framework for Online Detection of Hardware Defects

Published: 01 August 2009 Publication History

Abstract

This work proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called Access-Control Extensions (ACE), that can access and control the microprocessor's internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade-off performance with reliability without requiring any change to the hardware. We describe and evaluate different execution models for using the ACE framework. We also describe how the proposed ACE framework can be extended and utilized to improve the quality of post-silicon debugging and manufacturing testing of modern processors. We evaluated our technique on a commercial chip-multiprocessor based on Sun's Niagara and found that it can provide very high coverage, with 99.22 percent of all silicon defects detected. Moreover, our results show that the average performance overhead of software-based testing is only 5.5 percent. Based on a detailed register transfer level (RTL) implementation of our technique, we find its area and power consumption overheads to be modest, with a 5.8 percent increase in total chip area and a 4 percent increase in the chip's overall power consumption.

Cited By

View all
  • (2022)Pass/Fail Data for Logic Diagnosis Under Bounded Transparent ScanIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.313490141:11(4862-4872)Online publication date: 1-Nov-2022
  • (2021)A Holistic Solution for Reliability of 3D Parallel SystemsACM Journal on Emerging Technologies in Computing Systems10.1145/348890018:1(1-27)Online publication date: 16-Nov-2021
  • (2020)Logic Diagnosis with Hybrid Fail DataACM Transactions on Design Automation of Electronic Systems10.1145/343392926:3(1-13)Online publication date: 17-Dec-2020
  • Show More Cited By
  1. A Flexible Software-Based Framework for Online Detection of Hardware Defects

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Computers
          IEEE Transactions on Computers  Volume 58, Issue 8
          August 2009
          144 pages

          Publisher

          IEEE Computer Society

          United States

          Publication History

          Published: 01 August 2009

          Author Tags

          1. Reliability
          2. hardware defects
          3. manufacturing test.
          4. online defect detection
          5. online self-test
          6. post-silicon debugging
          7. testing

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 12 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2022)Pass/Fail Data for Logic Diagnosis Under Bounded Transparent ScanIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.313490141:11(4862-4872)Online publication date: 1-Nov-2022
          • (2021)A Holistic Solution for Reliability of 3D Parallel SystemsACM Journal on Emerging Technologies in Computing Systems10.1145/348890018:1(1-27)Online publication date: 16-Nov-2021
          • (2020)Logic Diagnosis with Hybrid Fail DataACM Transactions on Design Automation of Electronic Systems10.1145/343392926:3(1-13)Online publication date: 17-Dec-2020
          • (2018)Exploring System Availability During Software-Based Self-Testing of Multi-core CPUsJournal of Electronic Testing: Theory and Applications10.1007/s10836-018-5706-034:1(67-81)Online publication date: 1-Feb-2018
          • (2017)3DFARProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130451(310-313)Online publication date: 27-Mar-2017
          • (2017)A Processor and Cache Online Self-Testing Methodology for OS-Managed PlatformIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.269850625:8(2346-2359)Online publication date: 24-Jul-2017
          • (2016)Self-Healing Many-Core ArchitectureVLSI Design10.1155/2016/97671392016(2)Online publication date: 1-Jul-2016
          • (2016)DaemonGuardIEEE Transactions on Computers10.1109/TC.2015.244984065:5(1453-1466)Online publication date: 1-May-2016
          • (2010)Reliability, thermal, and power modeling and optimizationProceedings of the International Conference on Computer-Aided Design10.5555/2133429.2133466(181-184)Online publication date: 7-Nov-2010

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media