Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1874620.1874740acmconferencesArticle/Chapter ViewAbstractPublication PagesdateConference Proceedingsconference-collections
research-article

Fault-tolerant average execution time optimization for general-purpose multi-processor system-on-chips

Published: 20 April 2009 Publication History

Abstract

Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.

References

[1]
S. Borkar, "Design challenges of technology scaling," Micro, IEEE, vol. 19, no. 4, pp. 23--29, Jul-Aug 1999.
[2]
I. Koren and C. M. Krishna, Fault-Tolerant Systems. Morgan Kaufmann, 2007.
[3]
S. Mukherjee, Architecture Design for Soft Errors. Morgan Kaufmann, 2008.
[4]
J. von Neumann, Probabilistic logics and synthesis of reliable organisms from unreliable components, C. Shannon and J. McCarthy, Eds. Princeton University Press, 1956.
[5]
K. Alstrom and J. Torin, "Future architecture for flight control systems," Proceedings of the The 20th Conference on Digital Avionics Systems, vol. 1, pp. 1B5/1-1B5/10, 2001.
[6]
A. Bertossi, A. Fusiello, and L. Mancini, "Fault-tolerant deadline-monotonic algorithm for scheduling hard-real-time tasks," Proceedings of Parallel Processing Symposium, pp. 133--138, April 1997.
[7]
R. Al-Omari, A. Somani, and G. Manimaran, "A new fault-tolerant technique for improving schedulability in multiprocessor real-time systems," IPDPS '01: Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS'01), p. 10032.1, 2001.
[8]
V. Izosimov, P. Pop, P. Eles, and Z. Peng, "Design optimization of time-and cost-constrained fault-tolerant distributed embedded systems," in DATE '05: Proceedings of the conference on Design, Automation and Test in Europe, 2005, pp. 864--869.
[9]
A. Ejlali, B. M. Al-Hashimi, M. T. Schmitz, P. Rosinger, and S. G. Miremadi, "Combined time and information redundancy for seu-tolerance in energy-efficient real-time systems," IEEE Transactions on Very Large Scale Integrated Circuits and Systems, vol. 14, no. 4, pp. 323--335, 2006.
[10]
Y. Cai, M. T. Schmitz, A. Ejlali, B. M. Al-Hashimi, and S. M. Reddy, "Cache size selection for performance, energy and reliability of time-constrained systems," in ASP-DAC '06: Proceedings of Asia South Pacific design automation, 2006, pp. 923--928.
[11]
M. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979.
[12]
M. Berkelaar, "lpsolve 3.0," Eindhoven University of Technology, Eindhoven, The Netherlands, ftp://ftp.ics.ele.tue.nl/pub/lp_solve.

Cited By

View all
  • (2016)Optimizing the Level of Confidence for Multiple JobsIEEE Transactions on Computers10.1109/TC.2015.243925465:4(1239-1252)Online publication date: 1-Apr-2016
  • (2014)OCEANACM Transactions on Embedded Computing Systems10.1145/258466713:4s(1-26)Online publication date: 1-Apr-2014
  • (2013)Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systemsProceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.5555/2555729.2555730(1-10)Online publication date: 29-Sep-2013
  • Show More Cited By
  1. Fault-tolerant average execution time optimization for general-purpose multi-processor system-on-chips

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DATE '09: Proceedings of the Conference on Design, Automation and Test in Europe
    April 2009
    1776 pages
    ISBN:9783981080155

    Sponsors

    • EDAA: European Design Automation Association
    • ECSI
    • EDAC: Electronic Design Automation Consortium
    • SIGDA: ACM Special Interest Group on Design Automation
    • The IEEE Computer Society TTTC
    • The IEEE Computer Society DATC
    • The Russian Academy of Sciences: The Russian Academy of Sciences

    Publisher

    European Design and Automation Association

    Leuven, Belgium

    Publication History

    Published: 20 April 2009

    Check for updates

    Qualifiers

    • Research-article

    Conference

    DATE '09
    Sponsor:
    • EDAA
    • EDAC
    • SIGDA
    • The Russian Academy of Sciences

    Acceptance Rates

    Overall Acceptance Rate 518 of 1,794 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Optimizing the Level of Confidence for Multiple JobsIEEE Transactions on Computers10.1109/TC.2015.243925465:4(1239-1252)Online publication date: 1-Apr-2016
    • (2014)OCEANACM Transactions on Embedded Computing Systems10.1145/258466713:4s(1-26)Online publication date: 1-Apr-2014
    • (2013)Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systemsProceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.5555/2555729.2555730(1-10)Online publication date: 29-Sep-2013
    • (2013)System-level memory management based on statistical variability compensation for frame-based applicationsACM Transactions on Embedded Computing Systems10.1145/2536747.253675713:1s(1-28)Online publication date: 6-Dec-2013
    • (2011)A self-checking hardware journal for a fault-tolerant processor architectureInternational Journal of Reconfigurable Computing10.1155/2011/9620622011(11-11)Online publication date: 1-Jan-2011

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media