Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ISED.2013.32guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Lifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis

Published: 10 December 2013 Publication History

Abstract

Check pointing mechanism is used to tolerate the impact of transient faults through roll-back operation to a previously saved system state. In this paper, we propose a novel check pointing mechanism that considers fault tolerance in a duplex system in the presence of both transient and permanent faults. The main objective of our proposed mechanism is to extend the lifetime reliability of the duplex system by avoiding or even tolerating permanent faults in microprocessors. In addition, we also propose to migrate tasks from a 'near-to-die' processor to a spare processor under a condition where the current Mean-Time-To-Failure (MTTF) value is less or equal to a pre-determined threshold MTTF value. We validate our proposed mechanism and perform overhead analysis using various case studies. Later, we compare it with one of the most popular existing check pointing mechanism, namely the roll-forward check pointing scheme [9]. We show that unlike roll-back or roll-forward mechanisms, our proposed mechanism gives significantly higher lifetime reliability with reasonable system overheads.

Cited By

View all
  • (2017)Low-overhead Aging-aware Resource Management on Embedded GPUsProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062277(1-6)Online publication date: 18-Jun-2017
  1. Lifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ISED '13: Proceedings of the 2013 International Symposium on Electronic System Design
    December 2013
    197 pages
    ISBN:9781479923014

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 10 December 2013

    Author Tag

    1. Checkpointing, fault tolerance, microprocessors, lifetime reliability

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Low-overhead Aging-aware Resource Management on Embedded GPUsProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062277(1-6)Online publication date: 18-Jun-2017

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media