Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1176887.1176917acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Implementing fault-tolerance in real-time systems by automatic program transformations

Published: 22 October 2006 Publication History
  • Get Citation Alerts
  • Abstract

    We present a formal approach to implement and certify fault-tolerance in real-time embedded systems. The fault-intolerant initial system consists of a set of independent periodic tasks scheduled onto a set of fail-silent processors. We transform the tasks such that, assuming the availability of an additional spare processor, the system tolerates one failure at a time (transient or permanent). Failure detection is implemented using heartbeating, and failure masking using checkpointing and roll-back. These techniques are described and implemented by automatic program transformations on the tasks' programs. The proposed formal approach to fault-tolerance by program transformation highlights the benefits of separation of concerns and allows us to establish correctness properties.

    References

    [1]
    Aggarwal, A., and Gupta, D. Failure detectors for distributed systems. Tech. rep., Indian Institute of Technology, Kanpur, India, 2002. http://resolute.ucsd.edu/diwaker/publications/ds.pdf.]]
    [2]
    Ayav, T., Fradet, P., and Girault, A. Implementing fault-tolerance in real-time systems by program transformations. Research Report 5743, Inria, May 2006. http://hal.inria.fr/inria-00077156/en.]]
    [3]
    Aydin, H., Melhem, R., and Mossé, D. Optimal scheduling of imprecise computation tasks in the presence of multiple faults. In Real-Time Computing Systems and Applications, RTCSA'00 (Cheju Island, South Korea, 2000), IEEE, pp. 289--296.]]
    [4]
    Baille, G., Garnier, P., Mathieu, H., and Pissard-Gibollet, R. Le Cycab de l'Inria Rhône-Alpes. Technical report 0229, Inria, Rocquencourt, France, Apr. 1999. http://hal.inria.fr/inria-00071193/en.]]
    [5]
    Brière, D., Ribot, D., Pilaud, D., and Camus, J.-L. Methods and specifications tools for Airbus on-board systems. In Avionics Conference and Exhibition (London, UK, Dec. 1994), ERA Technology.]]
    [6]
    Bronevetsky, G., Marques, D., Pingali, K., and Stodghill, P. Automated application-level checkpointing of MPI programs. In Principles and Practice of Parallel Programming, PPoPP'03 (San Diego, USA, June 2003), ACM, pp. 84--94.]]
    [7]
    Caspi, P., Mazuet, C., Salem, R., and Weber, D. Formal design of distributed control systems with Lustre. In International Conference on Computer Safety, Reliabilitiy, and Security, SAFECOMP'99 (Toulouse, France, Sept. 1999), no. 1698 in LNCS, pp. 396--409.]]
    [8]
    Chandra, T., and Toueg, S. Unreliable failure detectors for reliable distributed systems. J. of the ACM 43,2 (Mar. 1996), 225--267.]]
    [9]
    Cristian, F. Understanding fault-tolerant distributed systems. Comm. of the ACM 34, 2 (Feb. 1991), 56--78.]]
    [10]
    Dean, A., and Shen, J. Hardware to software migration with real-time thread integration. In Proc. of the 24th Conf. on EUROMICRO (1998), IEEE Computer Society, p. 10243.]]
    [11]
    Elnozahy, E., Alvisi, L., Wang, Y., and Johnson, D. A survey of rollback recovery protocols in message passing systems. ACM Comp. Survey 34, 3 (Sept. 2002), 375--408.]]
    [12]
    Fisher, M., Lynch, N., and Paterson, M. Impossibility of distributed consensus with one faulty process. J. of the ACM 32, 2 (1985), 374--382.]]
    [13]
    Grandpierre, T., Lavarenne, C., and Sorel, Y. Optimized rapid prototyping for real-time embedded heterogeneous multiprocessors. In Int. Workshop on Hardware/Software Co-Design, CODES'99 (Rome, Italy, May 1999), IEEE.]]
    [14]
    Halbwachs, N. Synchronous programming of reactive systems, a tutorial and commented bibliography. In International Conference on Computer-Aided Verification, CAV'98 (Vancouver, Canada, June 1998), vol. 1427 of LNCS, Springer-Verlag.]]
    [15]
    Jalote, P. Fault-Tolerance in Distributed Systems. Prentice Hall, 1994.]]
    [16]
    Kalaiselvi, S., and Rajaraman, V. A surveyof checkpointing algorithms for parallel and distributed computers. Sadhana 25, 5 (Oct. 2000), 489--510.]]
    [17]
    Kopetz, H. Real-Time Systems : Design Principles for Distributed Embedded Applications. Kluwer, 1997.]]
    [18]
    Kulkarni, S., and Arora, A. Automating the addition of fault-tolerance. In International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems, FTRTFT'00 (Pune, India, Sept. 2000), M. Joseph, Ed., vol. 1926 of LNCS, Springer-Verlag, pp. 82--93.]]
    [19]
    Kulkarni, S., and Ebnenasir, A. Automated synthesis of multitolerance. In Int. Conf. on Dependable Systems and Networks, DSN'04 (Firenze, Italy, June 2004), IEEE.]]
    [20]
    Leistman, A., and Campbell, R. A fault-tolerant scheduling problem. IEEE Trans. on Software Engineering 12, 11 (1986), 1088--1089.]]
    [21]
    Li, X., Mitra, T., and Roychoudhury, A. Modeling control speculation for timing analysis. Real-Time Systems Journal 29, 1 (Jan. 2005).]]
    [22]
    Maheswaran, M., Ali, S., Siegel, H., Hensgen, D., and Freund, R. Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In Heterogeneous Computing Workshop (1999), pp. 30--44.]]
    [23]
    Milner, R., Tofte, M., and Harper, R. The Definition of Standard ML. MIT Press, 1990.]]
    [24]
    Mossé, D., Melhem, R., and Ghosh, S. A nonpreemptive real-time scheduler with recovery from transient faults and its implementation. IEEE Trans. on Software Engineering 29, 8 (2003), 752--767.]]
    [25]
    Nielson, H., and Nielson, F. Semantics with Applications--A Formal Introduction. John Wiley & Sons, 1992.]]
    [26]
    Punnekkat, S., and Burns, A. Analysis of checkpointing for schedulability of real-time systems. In Proc. of the Int. Workshop on Real-Time Computing Systems and Applications, RTCSA'97 (1997), pp. 198--205.]]
    [27]
    Puschner, P. Transforming execution-time boundable code into temporally predictable code. In Design and Analysis of Distributed Embedded Systems. Kluwer, 2002, pp. 163--172.]]
    [28]
    Puschner, P., and Burns, A. A review of worst-case execution-time analysis. Real-Time Systems Journal 18, 2-3 (1999), 115--128.]]
    [29]
    Silva, L., and Silva, J. System-level versus user-defined checkpointing. In Symp. on Reliable Distributed Systems, SRDS'98 (West Lafayette (IN), USA, Oct. 1998), pp. 68--74.]]

    Cited By

    View all
    • (2020)Enhancing System Reliability Through Targeting Fault Propagation ScopeSoft Computing Methods for System Dependability10.4018/978-1-7998-1718-5.ch004(131-160)Online publication date: 2020
    • (2015)Improving reliability through fault propagation scope in embedded systems2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC)10.1109/ICDIPC.2015.7323045(300-305)Online publication date: Oct-2015
    • (2008)System level energy aware fault tolerance approach for real time systemTENCON 2008 - 2008 IEEE Region 10 Conference10.1109/TENCON.2008.4766854(1-6)Online publication date: Nov-2008
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EMSOFT '06: Proceedings of the 6th ACM & IEEE International conference on Embedded software
    October 2006
    346 pages
    ISBN:1595935428
    DOI:10.1145/1176887
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. checkpointing
    2. fault-tolerance
    3. heartbeating
    4. program transformations

    Qualifiers

    • Article

    Conference

    ESWEEK06
    ESWEEK06: Second Embedded Systems Week 2006
    October 22 - 25, 2006
    Seoul, Korea

    Acceptance Rates

    Overall Acceptance Rate 60 of 203 submissions, 30%

    Upcoming Conference

    ESWEEK '24
    Twentieth Embedded Systems Week
    September 29 - October 4, 2024
    Raleigh , NC , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Enhancing System Reliability Through Targeting Fault Propagation ScopeSoft Computing Methods for System Dependability10.4018/978-1-7998-1718-5.ch004(131-160)Online publication date: 2020
    • (2015)Improving reliability through fault propagation scope in embedded systems2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC)10.1109/ICDIPC.2015.7323045(300-305)Online publication date: Oct-2015
    • (2008)System level energy aware fault tolerance approach for real time systemTENCON 2008 - 2008 IEEE Region 10 Conference10.1109/TENCON.2008.4766854(1-6)Online publication date: Nov-2008
    • (1993)A control structure for fault-tolerant operation of robotic manipulators[1993] Proceedings IEEE International Conference on Robotics and Automation10.1109/ROBOT.1993.291826(684-690)Online publication date: 1993

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media