Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2593069.2593085acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

An Efficient Real Time Fault Detection and Tolerance Framework Validated on the Intel SCC Processor

Published: 01 June 2014 Publication History

Abstract

We present a new framework that efficiently detects and tolerates timing faults in real time systems. Timing faults are observed when the inputs and/or outputs of a given system fail to meet their desired timing properties, such as I/O rates. Most current approaches either rely on heartbeat monitoring which is too restrictive; or on statistical or inexact methods which are not suitable for embedded real time systems. Current approaches based on the abstract real time model of the given application are resource intensive, and may not be suitable for embedded systems. Our framework utilizes active replication, and is based on already existing timing models for real time applications to develop fault detection and tolerance strategies. The approach does not require any timekeeping at runtime, and is efficient in terms of computational resources used. Experiments using three realistic applications on the Intel Baremetal SCC demonstrate the efficiency of our framework, both in memory and computational resources used.

References

[1]
Chakraborty, S. et.al. Interface-based rate analysis of embedded systems. In Real-Time Systems Symposium, 2006. RTSS '06. 27th IEEE International, pages 25--34, 2006.
[2]
Clauss, C. et. al. Evaluation and improvements of programming models for the intel scc many-core processor. In High Performance Computing and Simulation (HPCS), 2011., pages 525--532, 2011.
[3]
Devendra Rai et. al. Designing Applications with Predictable Runtime Characteristics for the Baremetal Intel SCC. Runtime and Operating Systems for the Many-core Era (ROME), 2013.
[4]
Goseva-Popstojanova, K et. al. Performability and reliability modeling of n version fault tolerant software in real time systems. In Proc. 23rd EUROMICRO Conference, pages 532--539, 1997.
[5]
Hagbae Kim et. al. Evaluation of fault tolerance latency from real-time application's perspectives. Computers, IEEE Transactions on, pages 55--64, 2000.
[6]
Hopkins, A.L., Jr. A highly reliable fault-tolerant multiprocess for aircraft. Proc. IEEE, pages 1221--1239, 1978.
[7]
J. Howard et al. A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In Proc. ISSCC, pages 108--109, 2010.
[8]
Ian A. Troxel et.al. Reliable management services for cots-based space systems and applications. In Proc. International Conference on Embedded Systems &Applications, pages 169--175, 2006.
[9]
Meng Guo et. al. Distributed real-time fault detection and isolation for cooperative multi-agent systems. In American Control Conference (ACC), 2012, pages 5270--5275, 2012.
[10]
B. D. Milburn. Apparatus and method for initializing a master/checker fault detecting microprocessor, 1998.
[11]
Neukirchner, M. et. al. Monitoring arbitrary activation patterns in real-time systems. In Real-Time Systems Symposium (RTSS), pages 293--302, 2012.
[12]
P.R. Croll et. al. Dependable, intelligent voting for real-time control software. Engineering Applications of Artificial Intelligence, pages 615--623, 1995.
[13]
Zimmer, C. et. al. Low contention mapping of real-time tasks onto tilepro 64 core processors. In Proc. Real-Time and Embedded Technology and Applications Symposium, pages 131--140, 2012.
[14]
M. Ziwisky et al. BareMichael: A Minimalistic Bare-Metal Framework for the Intel SCC. In Proc. MARC, pages 66--71, 2012.

Cited By

View all
  • (2016)Dynamic many-process applications on many-tile embedded systems and HPC clustersJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2015.11.00869:C(29-53)Online publication date: 1-Sep-2016
  • (2014)EURETILE Design FlowProceedings of the 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2014.32(182-189)Online publication date: 26-Aug-2014
  1. An Efficient Real Time Fault Detection and Tolerance Framework Validated on the Intel SCC Processor

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    DAC '14: Proceedings of the 51st Annual Design Automation Conference
    June 2014
    1249 pages
    ISBN:9781450327305
    DOI:10.1145/2593069
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    DAC '14

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Dynamic many-process applications on many-tile embedded systems and HPC clustersJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2015.11.00869:C(29-53)Online publication date: 1-Sep-2016
    • (2014)EURETILE Design FlowProceedings of the 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2014.32(182-189)Online publication date: 26-Aug-2014

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media