Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Demystifying Soft-Error Mitigation by Control-Flow Checking -- A New Perspective on its Effectiveness

Published: 27 September 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Soft errors are a challenging and urging problem in the domain of safety-critical embedded systems. For decades, checking schemes have been investigated and improved to mitigate soft-error effects for the class of control-flow faults, with current industrial standards strongly recommending their use.
    However, reality looks different: Taking a systems perspective, we implemented four representative Control-Flow Checking (CFC) schemes and put them through their paces in 396 fault-injection campaigns. In contrast to previous work, which typically relied on probability-based vulnerability metrics, we accounted for the influence of memory and time overheads on the fault-space dimensions and applied those in full-scan fault injections. This change in procedure alone severely degraded the perceived effectiveness of CFC.
    In addition, we expanded the perspective to data-flow faults and their influence on the overall susceptibility, an aspect that so far has been largely ignored. Our results suggest that, without accompanying measures, any improvement regarding control-flow faults is dominated by the increase in data faults caused by the increased attack surface in terms of memory and runtime overhead. Moreover, CFC performance less depended on the detection capabilities than on general aspects of the concrete binary compilation and execution.
    In conclusion, incorporating CFC is not as straightforward as often assumed and the vulnerability of systems with hardened control-flow may in many cases even be increased by the schemes themselves.

    References

    [1]
    R. Alexandersson and J. Karlsson. 2011. Fault injection-based assessment of aspect-oriented implementation of fault tolerance. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems Networks (DSN). 303--314.
    [2]
    Z. Alkhalifa, V. S. S. Nair, N. Krishnamurthy, and J. A. Abraham. 1999. Design and evaluation of system-level checks for on-line control flow error detection. IEEE Trans. Parallel Distrib. Syst. 10, 6 (June 1999), 627--641.
    [3]
    S. A. Asghari, H. Taheri, H. Pedram, and O. Kaynak. 2014. Software-Based control flow checking against transient faults in Industrial Environments. IEEE Transactions on Industrial Informatics 10, 1 (Feb. 2014), 481--490.
    [4]
    R. Baumann. 2005. Soft errors in advanced computer systems. IEEE Design Test of Computers 22, 3 (May 2005), 258--266.
    [5]
    S. Y. Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25, 6 (2005), 10--16.
    [6]
    P. Cheynet, B. Nicolescu, R. Velazco, M. Rebaudengo, M. Sonza Reorda, and M. Violante. 2000. Experimentally evaluating an automatic approach for generating safety-critical software with respect to transient errors. IEEE Transactions on Nuclear Science 47 (2000), 2231--2236.
    [7]
    J.-D. Choi, M. Gupta, M. J. Serrano, V. C. Sreedhar, and S. P. Midkiff. 2003. Stack allocation and synchronization optimizations for java using escape analysis. ACM Trans. Program. Lang. Syst. 25, 6 (Nov. 2003), 876--910.
    [8]
    C. Dietrich, M. Hoffmann, and D. Lohmann. 2017. Global optimization of fixed-Priority real-Time systems by RTOS-Aware control-Flow analysis. ACM Trans. Embed. Comput. Syst. 16, 2 (Jan. 2017), 35:1--35:25.
    [9]
    R. Feldt and A. Magazinius. 2010. Validity threats in empirical software engineering research-An Initial Survey. In SEKE. 374--379.
    [10]
    R. R. Ferreira, R. B. Parizi, L. Carro, and Á. F. Moreira. 2013. Compiler optimizations impact the reliability of the control-Flow of radiation-Hardened software. Journal of Aerospace Technology and Management 5, 3 (Aug. 2013), 323--334.
    [11]
    P. Forin. 1989. Vital coded microprocessor principles and application for various transit systems. In Symp. on Control, Computers, Communication in Transportation (CCCT’89). 79--84.
    [12]
    P. Gawkowski, J. Sosnowski, and B. Radko. 2005. Analyzing the effectiveness of fault hardening procedures. In 11th IEEE International On-Line Testing Symposium. 14--19.
    [13]
    O. Goloubeva, M. Rebaudengo, M. S. Reorda, and M. Violante. 2003. Soft-error detection using control flow assertions. In 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2003. Proceedings. 581--588.
    [14]
    O. Goloubeva, M. Rebaudengo, M. S. Reorda, and M. Violante. 2005. Improved software-based processor control-flow errors detection technique. In Annual Reliability and Maintainability Symposium, 2005. Proceedings. 583--589.
    [15]
    O. Goloubeva, M. Rebaudengo, M. S. Reorda, and M. Violante. 2006. Software-Implemented Hardware Fault Tolerance. Springer US.
    [16]
    R. W. Hamming. 1950. Error detecting and error correcting codes. Bell System Technical Journal 29, 2 (1950), 147--160.
    [17]
    F. Irom and D. Nguyen. 2007. IEEE Transactions on Nuclear Science 54, 6 (Dec 2007), 2547--2553.
    [18]
    ISO 26262-9. 2011. ISO 26262-9:2011: Road vehicles -- Functional safety -- Part 9: Automotive Safety Integrity Level (ASIL)-oriented and safety-oriented analyses. ISO, Geneva, Switzerland.
    [19]
    S. Kim and M. A. Rouf. 2010. Modeling and evaluation of control flow vulnerability in the Embedded System. In 18th IEEE/ACM International Symposium on Modelling, Analysis 8 Simulation of Computer and Telecommunication Systems (MASCOTS 2010). IEEE Computer Society, Los Alamitos, CA, USA, 430--433.
    [20]
    V. Kleeberger, C. Gimmler-Dumont, C. Weis, A. Herkersdorf, D. Mueller-Gritschneder, S. Nassif, U. Schlichtmann, and N. Wehn. 2013. A cross-layer technology-based study of how memory errors impact system resilience. IEEE Micro 33, 4 (July 2013), 46--55.
    [21]
    X. Li, K. Shen, M. C. Huang, and L. Chu. 2007. A memory soft error measurement on production systems. In Proceedings of the USENIX Annual Technical Conference (ATC’07). USENIX Association, Berkeley, CA, USA, Article 21, 6 pages. http://dl.acm.org/citation.cfm?id=1364385.1364406.
    [22]
    A. Mahmood and E. J. McCluskey. 1988. Concurrent error detection using watchdog processors-A survey. IEEE TC 37 (February 1988), 160--174. Issue 2.
    [23]
    J. Maiz, S. Hareland, K. Zhang, and P. Armstrong. 2003. Characterization of multi-bit soft error events in advanced SRAMs. In Intern. Electron Devices Meeting (IEDM’03). IEEE Press, New York, NY, USA, 21.4.1--21.4.4.
    [24]
    N. Oh, P. Shirvani, and E. McCluskey. 2002. Control-flow checking by software signatures. IEEE Transactions on Reliability 51, 1 (2002), 111--122.
    [25]
    T. Santini, C. Borchert, C. Dietrich, H. Schirmeier, M. Hoffmann, O. Spinczyk, D. Lohmann, F. R. Wagner, and P. Rech. 2017. Effectiveness of software-based hardening for radiation-induced soft errors in real-time operating systems. Lecture Notes in Computer Science (LNCS) (2017), 3--15.
    [26]
    U. Schiffel, A. Schmitt, M. Süßkraut, and C. Fetzer. 2010. ANB- and ANBDmem-Encoding: Detecting hardware errors in software. In 29th Int. Conf. on Comp. Safety, Reliability, and Security (SAFECOMP’10), Erwin Schoitsch (Ed.). Springer, Heidelberg, Germany, 169--182.
    [27]
    H. Schirmeier, C. Borchert, and O. Spinczyk. 2015. Avoiding pitfalls in fault-Injection based comparison of program susceptibility to soft errors. In 45th Int. Conf. on Dep. Systems 8 Networks (DSN’15). IEEE, Washington, DC, USA, 12.
    [28]
    H. Schirmeier, M. Hoffmann, C. Dietrich, M. Lenz, D. Lohmann, and O. Spinczyk. 2015. FAIL*: An open and versatile fault-injection framework for the assessment of software-implemented hardware fault tolerance. In 12th Int. Conf. on Eur. Dep. Computing Conf. (EDCC’15), Pierre Sens (Ed.). 245--255.
    [29]
    A. Shrivastava, A. Rhisheekesan, R. Jeyapaul, and C. J. Wu. 2014. Quantitative analysis of control flow checking mechanisms for soft errors. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6.
    [30]
    V. Sridharan, N. DeBardeleben, S. Blanchard, K. B. Ferreira, J. Stearley, J. Shalf, and S. Gurumurthi. 2015. Memory errors in modern systems: The good, the bad, and the ugly. In 20th Int. Conf. on Arch. Support for Programming Languages 8 Operating Systems (ASPLOS’15). ACM, New York, NY, USA.
    [31]
    I. Stilkerich, C. Lang, C. Erhardt, C. Bay, and M. Stilkerich. 2017. The perfect getaway: Using escape analysis in embedded real-time systems. ACM Trans. Embed. Comp. Syst. 16, Article 99 (2017), 99:1--99:30 pages. Issue 4.
    [32]
    I. Stilkerich, M. Strotz, C. Erhardt, M. Hoffmann, D. Lohmann, F. Scheler, and W. Schröder-Preikschat. 2013. A JVM for soft-error-prone embedded systems. In 2013 ACM SIGPLAN/SIGBED Conf. on Languages, Compilers and Tools for Embedded Systems (LCTES’13). ACM, New York, NY, USA, 21--32.
    [33]
    M. Stilkerich, I. Thomm, C. Wawersich, and W. Schröder-Preikschat. 2012. Tailor-made JVMs for statically configured embedded systems. Concurrency and Computation: Practice and Experience 24, 8 (2012), 789--812.
    [34]
    N. Theißing, D. Merli, M. Smola, F. Stumpf, and G. Sigl. 2013. Comprehensive analysis of software countermeasures against fault attacks. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). EDA Consortium, San Jose, CA, USA, 404--409.
    [35]
    I. Thomm, M. Stilkerich, R. Kapitza, D. Lohmann, and W. Schröder-Preikschat. 2011. Automated application of fault tolerance mechanisms in a component-based system. In JTRES’11: 9th Int. W’shop on Java Technologies for real-time 8 embedded systems. ACM, New York, NY, USA, 87--95.
    [36]
    I. Thomm, M. Stilkerich, C. Wawersich, and W. Schröder-Preikschat. 2010. KESO: An open-source multi-JVM for deeply embedded systems. In JTRES’10: 8th Int. W’shop on Java Technologies for real-time 8 embedded systems. ACM, New York, NY, USA, 109--119.
    [37]
    P. Ulbrich, R. Kapitza, C. Harkort, R. Schmid, and W. Schröder-Preikschat. 2011. I4Copter: An adaptable and modular quadrotor platform. In 26th ACM Symp. on Applied Computing (SAC’11). ACM, New York, NY, USA, 380--396.
    [38]
    N. J. Wang, J. Quek, T. M. Rafacz, and S. J. patel. 2004. Characterizing the effects of transient faults on a high-performance processor pipeline. In 34th Int. Conf. on Dep. Systems 8 Networks (DSN’04). IEEE, Washington, DC, USA, 61--70.

    Cited By

    View all
    • (2024)Generic Soft Error Data and Control Flow Error Detection by Instruction DuplicationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.324584221:1(78-92)Online publication date: Jan-2024
    • (2022)Survey of Control-flow Integrity Techniques for Real-time Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/353827521:4(1-32)Online publication date: 18-Jul-2022
    • (2022)In-ConcReTeS: Interactive Consistency meets Distributed Real-Time Systems, Again!2022 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS55097.2022.00027(211-224)Online publication date: Dec-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 16, Issue 5s
    Special Issue ESWEEK 2017, CASES 2017, CODES + ISSS 2017 and EMSOFT 2017
    October 2017
    1448 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3145508
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 27 September 2017
    Accepted: 01 June 2017
    Revised: 01 June 2017
    Received: 01 March 2017
    Published in TECS Volume 16, Issue 5s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CFC
    2. CFCSS
    3. Soft error mitigation
    4. YACCA
    5. absolute-failure-count metrics
    6. control-flow checking
    7. fault-coverage
    8. fault-injection experiments
    9. reliability metrics
    10. software-based fault tolerance

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Generic Soft Error Data and Control Flow Error Detection by Instruction DuplicationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.324584221:1(78-92)Online publication date: Jan-2024
    • (2022)Survey of Control-flow Integrity Techniques for Real-time Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/353827521:4(1-32)Online publication date: 18-Jul-2022
    • (2022)In-ConcReTeS: Interactive Consistency meets Distributed Real-Time Systems, Again!2022 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS55097.2022.00027(211-224)Online publication date: Dec-2022
    • (2022)Software-based Control-Flow Error Detection with Hardware Performance Counters in ARM Processors2022 CPSSI 4th International Symposium on Real-Time and Embedded Systems and Technologies (RTEST)10.1109/RTEST56034.2022.9850096(1-8)Online publication date: 30-May-2022
    • (2022)Comprehensive Analysis of Software-Based Fault Tolerance with Arithmetic Coding for Performant Encoding of Integer CalculationsComputer Safety, Reliability, and Security10.1007/978-3-031-14835-4_10(144-157)Online publication date: 6-Jun-2022
    • (2021)Exploiting Application Tolerance for Functional Safety2021 IEEE International Test Conference (ITC)10.1109/ITC50571.2021.00056(399-408)Online publication date: Oct-2021
    • (2021)Investigating real-time control-flow error detection in hardware: How fast can we detect errors and take action?Microelectronics Reliability10.1016/j.microrel.2021.114264126(114264)Online publication date: Nov-2021
    • (2020)Fine Grained Control Flow Checking with Dedicated FPGA Monitors2020 IEEE 33rd International System-on-Chip Conference (SOCC)10.1109/SOCC49529.2020.9524751(219-224)Online publication date: 8-Sep-2020
    • (2018)Energy Optimization and Fault Tolerance to Embedded System Based on Adaptive Heterogeneous Multi-Core Hardware Architecture2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)10.1109/QRS-C.2018.00063(316-323)Online publication date: Jul-2018

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media