research-article

Open access

Intermittently failing tests in the embedded systems domain

Authors:

Per Erik Strandberg,

Thomas J. Ostrand,

Elaine J. Weyuker,

Daniel SundmarkAuthors Info & Claims

ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 337 - 348

https://doi.org/10.1145/3395363.3397359

Published: 18 July 2020 Publication History

Abstract

Software testing is sometimes plagued with intermittently failing tests and finding the root causes of such failing tests is often difficult. This problem has been widely studied at the unit testing level for open source software, but there has been far less investigation at the system test level, particularly the testing of industrial embedded systems. This paper describes our investigation of the root causes of intermittently failing tests in the embedded systems domain, with the goal of better understanding, explaining and categorizing the underlying faults. The subject of our investigation is a currently-running industrial embedded system, along with the system level testing that was performed. We devised and used a novel metric for classifying test cases as intermittent. From more than a half million test verdicts, we identified intermittently and consistently failing tests, and identified their root causes using multiple sources. We found that about 1-3% of all test cases were intermittently failing. From analysis of the case study results and related work, we identified nine factors associated with test case intermittence. We found that a fix for a consistently failing test typically removed a larger number of failures detected by other tests than a fix for an intermittent test. We also found that more effort was usually needed to identify fixes for intermittent tests than for consistent tests. An overlap between root causes leading to intermittent and consistent tests was identified. Many root causes of intermittence are the same in industrial embedded systems and open source software. However, when comparing unit testing to system level testing, especially for embedded systems, we observed that the test environment itself is often the cause of intermittence.

References

[1]

Sara Abbaspour Asadollah, Rafia Inam, and Hans Hansson. 2015. A survey on testing for cyber physical system. In IFIP International Conference on Testing Software and Systems. Springer.

Digital Library

[2]

Sara Abbaspour Asadollah, Daniel Sundmark, Sigrid Eldh, and Hans Hansson. 2017. Concurrency bugs in open source software: a case study. Journal of Internet Services and Applications 8, 1 ( 2017 ), 4.

[3]

Azeem Ahmad, Ola Leifler, and Kristian Sandahl. 2019. Empirical Analysis of Factors and their Efect on Test Flakiness-Practitioners' Perceptions. Preprint arXiv: 1906. 00673 ( 2019 ).

[4]

Emil Alégroth and Javier Gonzalez-Huerta. 2017. Towards a Mapping of Software Technical Debt onto Testware. In Euromicro Conference on Software Engineering and Advanced Applications. IEEE.

[5]

Alberto Avritzer and Elaine J Weyuker. 1995. The automatic generation of load test suites and the assessment of the resulting software. IEEE Transactions on Software Engineering 21, 9 ( 1995 ).

Digital Library

[6]

Roozbeh Bakhshi, Surya Kunche, and Michael Pecht. 2014. Intermittent failures in hardware and software. Journal of Electronic Packaging 136, 1 ( 2014 ), 011014.

[7]

M Ball and F Hardie. 1969. Efects and detection of intermittent failures in digital systems. In Proceedings of the November 18-20, 1969, fall joint computer conference (AFIPS'69). ACM.

[8]

Abhijeet Banerjee, Sudipta Chattopadhyay, and Abhik Roychoudhury. 2016. On Testing Embedded Software. Advances in Computers 101 ( 2016 ), 121-153.

[9]

Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, and Darko Marinov. 2018. DeFlaker: automatically detecting flaky tests. In International Conference on Software Engineering. ACM.

Digital Library

[10]

Melvin A Breuer. 1973. Testing for intermittent faults in digital circuits. IEEE Trans. Comput. 100, 3 ( 1973 ), 241-246.

[11]

Davide G Cavezza, Roberto Pietrantuono, Javier Alonso, Stefano Russo, and Kishor S Trivedi. 2014. Reproducibility of environment-dependent software failures: An experience report. In International Symposium on Software Reliability Engineering. IEEE.

Digital Library

[12]

W Fordham Cooper. 1947. Electrical control of dangerous machinery and processes. Journal of the Institution of Electrical Engineers-Part II: Power Engineering 94, 39 ( 1947 ), 216-232.

[13]

Catello di Martino, Zbigniew Kalbarczyk, Ravishankar K Iyer, Fabio Baccanico, Joseph Fullop, and William Kramer. 2014. Lessons learned from the analysis of system failures at petascale: The case of blue waters. In International Conference on Dependable Systems and Networks. IEEE/IFIP.

Digital Library

[14]

Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding Flaky Tests: The Developer's Perspective. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM.

Digital Library

[15]

Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for improving regression testing in continuous integration development environments. In International Symposium on Foundations of Software Engineering. ACM.

Digital Library

[16]

Sigrid Eldh, Sasikumar Punnekkat, Hans Hansson, and Peter Jönsson. 2007. Component testing is not enough-a study of software faults in telecom middleware. In Testing of Software and Communicating Systems. Springer, 74-89.

[17]

Martin Fowler. 2011. Eradicating Non-Determinism in Tests (Blog Post). https: //www.martinfowler.com/articles/nonDeterminism.html. Online, Accessed 2019-06-26.

[18]

Martin Fowler. 2018. Refactoring: improving the design of existing code. AddisonWesley Professional.

[19]

Zebao Gao. 2017. Quantifying Flakiness and Minimizing its Efects on Software Testing. Ph.D. Dissertation. University of Maryland.

[20]

Vahid Garousi and Barış Küçük. 2018. Smells in software test code: A survey of knowledge in industry and academia. Journal of systems and software 138 ( 2018 ), 52-81.

[21]

Michael Grottke and Kishor S Trivedi. 2005. A classification of software faults. Journal of Reliability Engineering Association of Japan 27, 7 ( 2005 ), 425-438.

[22]

Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable testing: detecting state-polluting tests to prevent test dependency. In International Symposium on Software Testing and Analysis. ACM.

Digital Library

[23]

Kim Herzig and Nachiappan Nagappan. 2015. Empirically detecting false test alarms using association rules. In International Conference on Software Engineering, Vol. 2. IEEE.

[24]

He Jiang, Xiaochen Li, Zijiang Yang, and Jifeng Xuan. 2017. What causes my test alarm? Automatic cause analysis for test alarms in system and integration testing. In International Conference on Software Engineering. IEEE.

Digital Library

[25]

Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes. 2017. Measuring the cost of regression testing in practice: a study of Java projects using continuous integration. In Joint Meeting on Foundations of Software Engineering. ACM.

Digital Library

[26]

Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, and Suresh Thummalapenta. 2019. Root Causing Flaky Tests in a Large-Scale Industrial Setting. In International Symposium on Software Testing and Analysis. ACM.

Digital Library

[27]

Nancy G Leveson. 2004. Role of software in spacecraft accidents. Journal of spacecraft and Rockets 41, 4 ( 2004 ), 564-575.

[28]

Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In International Symposium on Foundations of Software Engineering. ACM.

Digital Library

[29]

Yashwant K Malaiya and Stephen YH Su. 1979. A survey of methods for intermittent fault analysis. In International Workshop on Managing Requirements Knowledge. IEEE.

[30]

Torvald Mårtensson, Daniel Ståhl, and Jan Bosch. 2016. Continuous integration applied to software-intensive embedded systems-problems and experiences. In International Conference on Product-Focused Software Process Improvement. Springer.

[31]

Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gerard Basler, Piramanayagam Arumuga Nainar, and Iulian Neamtiu. 2008. Finding and Reproducing Heisenbugs in Concurrent Programs. In Symposium on Operating Systems Design and Implementation. USENIX.

[32]

Thomas J Ostrand and Elaine J Weyuker. 1984. Collecting and categorizing software error data in an industrial environment. Journal of Systems and Software 4, 4 ( 1984 ), 289-300.

Digital Library

[33]

Nicolas Privault. 2013. Understanding Markov chains: examples and applications. Springer Science & Business Media.

[34]

Per Runeson, Martin Höst, Austen Rainer, and Bjorn Regnell. 2012. Case study research in software engineering: Guidelines and examples. John Wiley & Sons.

Digital Library

[35]

Mojtaba Shahin, Muhammad Ali Babar, and Liming Zhu. 2017. Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices. IEEE Access 5 ( 2017 ), 3909-3943.

[36]

Per Erik Strandberg, Wasif Afzal, Thomas Ostrand, Elaine Weyuker, and Daniel Sundmark. 2017. Automated System Level Regression Test Prioritization in a Nutshell. IEEE Software 34, 1 ( 2017 ), 1-10.

Digital Library

[37]

Per Erik Strandberg, Wasif Afzal, and Daniel Sundmark. 2018. Decision Making and Visualizations Based on Test Results. In International Symposium on Empirical Software Engineering and Measurement. ACM/IEEE.

[38]

Per Erik Strandberg, Eduard Paul Enoiu, Wasif Afzal, Daniel Sundmark, and Robert Feldt. 2019. Information Flow in Software Testing-An Interview Study With Embedded Software Engineering Practitioners. IEEE Access 7 ( 2019 ), 46434-46453.

[39]

Per Erik Strandberg, Thomas J Ostrand, Elaine J Weyuker, Daniel Sundmark, and Wasif Afzal. 2018. Automated test mapping and coverage for network topologies. In International Symposium on Software Testing and Analysis. ACM.

Digital Library

[40]

Per Erik Strandberg, Daniel Sundmark, Wasif Afzal, Thomas J Ostrand, and Elaine J Weyuker. 2016. Experience Report: Automated System Level Regression Test Prioritization Using Multiple Factors. In International Symposium on Software Reliability Engineering. IEEE.

[41]

Nikolaos Sycofyllos. 2016. An Empirical Exploration in the Study of SoftwareRelated Fatal Failures. Bachelor thesis, Mälardalen University.

[42]

Swapna Thorve, Chandani Sreshtha, and Na Meng. 2018. An Empirical Study of Flaky Tests in Android Apps. In International Conference on Software Maintenance and Evolution. IEEE.

[43]

Arash Vahabzadeh, Amin Milani Fard, and Ali Mesbah. 2015. An empirical study of bugs in test code. In International Conference on Software Maintenance and Evolution. IEEE.

Digital Library

[44]

Arie van Deursen, Leon Moonen, Alex van Den Bergh, and Gerard Kok. 2001. Refactoring test code. In International conference on extreme programming and lfexible processes in software engineering.

[45]

Kristian Wiklund, Sigrid Eldh, Daniel Sundmark, and Kristina Lundqvist. 2017. Impediments for software test automation: A systematic literature review. Software Testing, Verification and Reliability ( 2017 ).

[46]

Wayne H Wolf. 1994. Hardware-software co-design of embedded systems. Proc. IEEE 82, 7 ( 1994 ), 967-989.

[47]

Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D Ernst, and David Notkin. 2014. Empirically revisiting the test independence assumption. In International Symposium on Software Testing and Analysis. ACM.

Digital Library

Cited By

Leinen FElsner DPretschner AStahlbauer ASailer MJürgens E(2024)Cost of Flaky Tests in Continuous Integration: An Industrial Case Study2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00037(329-340)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00037
Cederbladh JEramo RMuttillo VStrandberg P(2024)Experiences and challenges from developing cyber‐physical systems in industry‐academia collaborationSoftware: Practice and Experience10.1002/spe.331254:6(1193-1212)Online publication date: 17-Jan-2024
https://doi.org/10.1002/spe.3312
Gruber MHeine MOster NPhilippsen MFraser G(2023)Practical Flaky Test Prediction using Common Code Evolution and Test History Data2023 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST57152.2023.00028(210-221)Online publication date: Apr-2023
https://doi.org/10.1109/ICST57152.2023.00028
Show More Cited By

Index Terms

Intermittently failing tests in the embedded systems domain
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Empirical software validation
      2. Software defect analysis
        Software testing and debugging

Recommendations

A Survey of Flaky Tests
Tests that fail inconsistently, without changes to the code under test, are described as flaky. Flaky tests do not give a clear indication of the presence of software bugs and thus limit the reliability of the test suites that contain them. A recent ...
An empirical analysis of flaky tests
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

Regression testing is a crucial part of software development. It checks that software changes do not break existing functionality. An important assumption of regression testing is that test outcomes are deterministic: an unmodified test is expected to ...
Test flakiness’ causes, detection, impact and responses: A multivocal review
Abstract
Flaky tests (tests with non-deterministic outcomes) pose a major challenge for software testing. They are known to cause significant issues, such as reducing the effectiveness and efficiency of testing and delaying software releases. In recent ...
Highlights
- A detailed multivocal review of flaky tests in research and practice.
- Most studies covering test flakiness have focused more on Java.
- Flakiness due to test order dependency and concurrency are widely studied.
- Dynamic rerun-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2020

591 pages

ISBN:9781450380089

DOI:10.1145/3395363

General Chair:
Sarfraz Khurshid
University of Texas at Austin, USA
,
Program Chair:
Corina S. Păsăreanu
Carnegie Mellon University Silicon Valley / NASA Ames Research Center, USA

Copyright © 2020 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Swedish Innovation Agency
Electronic Components and Systems for European Leadership
the Knowledge Foundation
Westermo Network Technologies AB
the Swedish Research Council

Conference

ISSTA '20

Sponsor:

SIGSOFT

ISSTA '20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 18 - 22, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
584
Total Downloads

Downloads (Last 12 months)171
Downloads (Last 6 weeks)19

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Leinen FElsner DPretschner AStahlbauer ASailer MJürgens E(2024)Cost of Flaky Tests in Continuous Integration: An Industrial Case Study2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00037(329-340)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00037
Cederbladh JEramo RMuttillo VStrandberg P(2024)Experiences and challenges from developing cyber‐physical systems in industry‐academia collaborationSoftware: Practice and Experience10.1002/spe.331254:6(1193-1212)Online publication date: 17-Jan-2024
https://doi.org/10.1002/spe.3312
Gruber MHeine MOster NPhilippsen MFraser G(2023)Practical Flaky Test Prediction using Common Code Evolution and Test History Data2023 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST57152.2023.00028(210-221)Online publication date: Apr-2023
https://doi.org/10.1109/ICST57152.2023.00028
Alsaedi SNoaman AGad-Elrab AEassa F(2023)Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learning ModelIEEE Access10.1109/ACCESS.2023.328815611(63916-63931)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3288156
Abbas MHamayouni AMoghadam MSaadatmand MStrandberg P(2023)Making Sense of Failure Logs in an Industrial DevOps EnvironmentITNG 2023 20th International Conference on Information Technology-New Generations10.1007/978-3-031-28332-1_25(217-226)Online publication date: 21-Feb-2023
https://doi.org/10.1007/978-3-031-28332-1_25
Jordan CFoth PPretschner AFruth MHarman MMiller H(2022)Unreliable test infrastructures in automotive testing setupsProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice10.1145/3510457.3513069(307-308)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510457.3513069
Chance GGhobrial AMcAreavey KLemaignan SPipe TEder K(2022)On Determinism of Game Engines Used for Simulation-Based Autonomous Vehicle VerificationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.317788723:11(20538-20552)Online publication date: Nov-2022
https://doi.org/10.1109/TITS.2022.3177887
Jordan CFoth PFruth MPretschner A(2022)Why Did the Test Execution Fail? Failure Classification Using Association Rules (Practical Experience Report)2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE55969.2022.00056(517-528)Online publication date: Oct-2022
https://doi.org/10.1109/ISSRE55969.2022.00056
Dutta SArunachalam AMisailovic S(2022)To Seed or Not to Seed? An Empirical Analysis of Usage of Seeds for Testing in Machine Learning Projects2022 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST53961.2022.00026(151-161)Online publication date: Apr-2022
https://doi.org/10.1109/ICST53961.2022.00026
Qin YWang SLiu KLin BWu HLi LMao XBissyande T(2022)Peeler: Learning to Effectively Predict Flakiness without Running Tests2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00031(257-268)Online publication date: Oct-2022
https://doi.org/10.1109/ICSME55016.2022.00031
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten