Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3395363.3397359acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Open access

Intermittently failing tests in the embedded systems domain

Published: 18 July 2020 Publication History

Abstract

Software testing is sometimes plagued with intermittently failing tests and finding the root causes of such failing tests is often difficult. This problem has been widely studied at the unit testing level for open source software, but there has been far less investigation at the system test level, particularly the testing of industrial embedded systems. This paper describes our investigation of the root causes of intermittently failing tests in the embedded systems domain, with the goal of better understanding, explaining and categorizing the underlying faults. The subject of our investigation is a currently-running industrial embedded system, along with the system level testing that was performed. We devised and used a novel metric for classifying test cases as intermittent. From more than a half million test verdicts, we identified intermittently and consistently failing tests, and identified their root causes using multiple sources. We found that about 1-3% of all test cases were intermittently failing. From analysis of the case study results and related work, we identified nine factors associated with test case intermittence. We found that a fix for a consistently failing test typically removed a larger number of failures detected by other tests than a fix for an intermittent test. We also found that more effort was usually needed to identify fixes for intermittent tests than for consistent tests. An overlap between root causes leading to intermittent and consistent tests was identified. Many root causes of intermittence are the same in industrial embedded systems and open source software. However, when comparing unit testing to system level testing, especially for embedded systems, we observed that the test environment itself is often the cause of intermittence.

References

[1]
Sara Abbaspour Asadollah, Rafia Inam, and Hans Hansson. 2015. A survey on testing for cyber physical system. In IFIP International Conference on Testing Software and Systems. Springer.
[2]
Sara Abbaspour Asadollah, Daniel Sundmark, Sigrid Eldh, and Hans Hansson. 2017. Concurrency bugs in open source software: a case study. Journal of Internet Services and Applications 8, 1 ( 2017 ), 4.
[3]
Azeem Ahmad, Ola Leifler, and Kristian Sandahl. 2019. Empirical Analysis of Factors and their Efect on Test Flakiness-Practitioners' Perceptions. Preprint arXiv: 1906. 00673 ( 2019 ).
[4]
Emil Alégroth and Javier Gonzalez-Huerta. 2017. Towards a Mapping of Software Technical Debt onto Testware. In Euromicro Conference on Software Engineering and Advanced Applications. IEEE.
[5]
Alberto Avritzer and Elaine J Weyuker. 1995. The automatic generation of load test suites and the assessment of the resulting software. IEEE Transactions on Software Engineering 21, 9 ( 1995 ).
[6]
Roozbeh Bakhshi, Surya Kunche, and Michael Pecht. 2014. Intermittent failures in hardware and software. Journal of Electronic Packaging 136, 1 ( 2014 ), 011014.
[7]
M Ball and F Hardie. 1969. Efects and detection of intermittent failures in digital systems. In Proceedings of the November 18-20, 1969, fall joint computer conference (AFIPS'69). ACM.
[8]
Abhijeet Banerjee, Sudipta Chattopadhyay, and Abhik Roychoudhury. 2016. On Testing Embedded Software. Advances in Computers 101 ( 2016 ), 121-153.
[9]
Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, and Darko Marinov. 2018. DeFlaker: automatically detecting flaky tests. In International Conference on Software Engineering. ACM.
[10]
Melvin A Breuer. 1973. Testing for intermittent faults in digital circuits. IEEE Trans. Comput. 100, 3 ( 1973 ), 241-246.
[11]
Davide G Cavezza, Roberto Pietrantuono, Javier Alonso, Stefano Russo, and Kishor S Trivedi. 2014. Reproducibility of environment-dependent software failures: An experience report. In International Symposium on Software Reliability Engineering. IEEE.
[12]
W Fordham Cooper. 1947. Electrical control of dangerous machinery and processes. Journal of the Institution of Electrical Engineers-Part II: Power Engineering 94, 39 ( 1947 ), 216-232.
[13]
Catello di Martino, Zbigniew Kalbarczyk, Ravishankar K Iyer, Fabio Baccanico, Joseph Fullop, and William Kramer. 2014. Lessons learned from the analysis of system failures at petascale: The case of blue waters. In International Conference on Dependable Systems and Networks. IEEE/IFIP.
[14]
Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding Flaky Tests: The Developer's Perspective. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM.
[15]
Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for improving regression testing in continuous integration development environments. In International Symposium on Foundations of Software Engineering. ACM.
[16]
Sigrid Eldh, Sasikumar Punnekkat, Hans Hansson, and Peter Jönsson. 2007. Component testing is not enough-a study of software faults in telecom middleware. In Testing of Software and Communicating Systems. Springer, 74-89.
[17]
Martin Fowler. 2011. Eradicating Non-Determinism in Tests (Blog Post). https: //www.martinfowler.com/articles/nonDeterminism.html. Online, Accessed 2019-06-26.
[18]
Martin Fowler. 2018. Refactoring: improving the design of existing code. AddisonWesley Professional.
[19]
Zebao Gao. 2017. Quantifying Flakiness and Minimizing its Efects on Software Testing. Ph.D. Dissertation. University of Maryland.
[20]
Vahid Garousi and Barış Küçük. 2018. Smells in software test code: A survey of knowledge in industry and academia. Journal of systems and software 138 ( 2018 ), 52-81.
[21]
Michael Grottke and Kishor S Trivedi. 2005. A classification of software faults. Journal of Reliability Engineering Association of Japan 27, 7 ( 2005 ), 425-438.
[22]
Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable testing: detecting state-polluting tests to prevent test dependency. In International Symposium on Software Testing and Analysis. ACM.
[23]
Kim Herzig and Nachiappan Nagappan. 2015. Empirically detecting false test alarms using association rules. In International Conference on Software Engineering, Vol. 2. IEEE.
[24]
He Jiang, Xiaochen Li, Zijiang Yang, and Jifeng Xuan. 2017. What causes my test alarm? Automatic cause analysis for test alarms in system and integration testing. In International Conference on Software Engineering. IEEE.
[25]
Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes. 2017. Measuring the cost of regression testing in practice: a study of Java projects using continuous integration. In Joint Meeting on Foundations of Software Engineering. ACM.
[26]
Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, and Suresh Thummalapenta. 2019. Root Causing Flaky Tests in a Large-Scale Industrial Setting. In International Symposium on Software Testing and Analysis. ACM.
[27]
Nancy G Leveson. 2004. Role of software in spacecraft accidents. Journal of spacecraft and Rockets 41, 4 ( 2004 ), 564-575.
[28]
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In International Symposium on Foundations of Software Engineering. ACM.
[29]
Yashwant K Malaiya and Stephen YH Su. 1979. A survey of methods for intermittent fault analysis. In International Workshop on Managing Requirements Knowledge. IEEE.
[30]
Torvald Mårtensson, Daniel Ståhl, and Jan Bosch. 2016. Continuous integration applied to software-intensive embedded systems-problems and experiences. In International Conference on Product-Focused Software Process Improvement. Springer.
[31]
Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gerard Basler, Piramanayagam Arumuga Nainar, and Iulian Neamtiu. 2008. Finding and Reproducing Heisenbugs in Concurrent Programs. In Symposium on Operating Systems Design and Implementation. USENIX.
[32]
Thomas J Ostrand and Elaine J Weyuker. 1984. Collecting and categorizing software error data in an industrial environment. Journal of Systems and Software 4, 4 ( 1984 ), 289-300.
[33]
Nicolas Privault. 2013. Understanding Markov chains: examples and applications. Springer Science & Business Media.
[34]
Per Runeson, Martin Höst, Austen Rainer, and Bjorn Regnell. 2012. Case study research in software engineering: Guidelines and examples. John Wiley & Sons.
[35]
Mojtaba Shahin, Muhammad Ali Babar, and Liming Zhu. 2017. Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices. IEEE Access 5 ( 2017 ), 3909-3943.
[36]
Per Erik Strandberg, Wasif Afzal, Thomas Ostrand, Elaine Weyuker, and Daniel Sundmark. 2017. Automated System Level Regression Test Prioritization in a Nutshell. IEEE Software 34, 1 ( 2017 ), 1-10.
[37]
Per Erik Strandberg, Wasif Afzal, and Daniel Sundmark. 2018. Decision Making and Visualizations Based on Test Results. In International Symposium on Empirical Software Engineering and Measurement. ACM/IEEE.
[38]
Per Erik Strandberg, Eduard Paul Enoiu, Wasif Afzal, Daniel Sundmark, and Robert Feldt. 2019. Information Flow in Software Testing-An Interview Study With Embedded Software Engineering Practitioners. IEEE Access 7 ( 2019 ), 46434-46453.
[39]
Per Erik Strandberg, Thomas J Ostrand, Elaine J Weyuker, Daniel Sundmark, and Wasif Afzal. 2018. Automated test mapping and coverage for network topologies. In International Symposium on Software Testing and Analysis. ACM.
[40]
Per Erik Strandberg, Daniel Sundmark, Wasif Afzal, Thomas J Ostrand, and Elaine J Weyuker. 2016. Experience Report: Automated System Level Regression Test Prioritization Using Multiple Factors. In International Symposium on Software Reliability Engineering. IEEE.
[41]
Nikolaos Sycofyllos. 2016. An Empirical Exploration in the Study of SoftwareRelated Fatal Failures. Bachelor thesis, Mälardalen University.
[42]
Swapna Thorve, Chandani Sreshtha, and Na Meng. 2018. An Empirical Study of Flaky Tests in Android Apps. In International Conference on Software Maintenance and Evolution. IEEE.
[43]
Arash Vahabzadeh, Amin Milani Fard, and Ali Mesbah. 2015. An empirical study of bugs in test code. In International Conference on Software Maintenance and Evolution. IEEE.
[44]
Arie van Deursen, Leon Moonen, Alex van Den Bergh, and Gerard Kok. 2001. Refactoring test code. In International conference on extreme programming and lfexible processes in software engineering.
[45]
Kristian Wiklund, Sigrid Eldh, Daniel Sundmark, and Kristina Lundqvist. 2017. Impediments for software test automation: A systematic literature review. Software Testing, Verification and Reliability ( 2017 ).
[46]
Wayne H Wolf. 1994. Hardware-software co-design of embedded systems. Proc. IEEE 82, 7 ( 1994 ), 967-989.
[47]
Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D Ernst, and David Notkin. 2014. Empirically revisiting the test independence assumption. In International Symposium on Software Testing and Analysis. ACM.

Cited By

View all
  • (2024)Cost of Flaky Tests in Continuous Integration: An Industrial Case Study2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00037(329-340)Online publication date: 27-May-2024
  • (2024)Experiences and challenges from developing cyber‐physical systems in industry‐academia collaborationSoftware: Practice and Experience10.1002/spe.331254:6(1193-1212)Online publication date: 17-Jan-2024
  • (2023)Practical Flaky Test Prediction using Common Code Evolution and Test History Data2023 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST57152.2023.00028(210-221)Online publication date: Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2020
591 pages
ISBN:9781450380089
DOI:10.1145/3395363
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. embedded systems
  2. flaky tests
  3. intermittently failing tests
  4. non-deterministic tests
  5. system level test automation

Qualifiers

  • Research-article

Funding Sources

Conference

ISSTA '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)171
  • Downloads (Last 6 weeks)19
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cost of Flaky Tests in Continuous Integration: An Industrial Case Study2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00037(329-340)Online publication date: 27-May-2024
  • (2024)Experiences and challenges from developing cyber‐physical systems in industry‐academia collaborationSoftware: Practice and Experience10.1002/spe.331254:6(1193-1212)Online publication date: 17-Jan-2024
  • (2023)Practical Flaky Test Prediction using Common Code Evolution and Test History Data2023 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST57152.2023.00028(210-221)Online publication date: Apr-2023
  • (2023)Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learning ModelIEEE Access10.1109/ACCESS.2023.328815611(63916-63931)Online publication date: 2023
  • (2023)Making Sense of Failure Logs in an Industrial DevOps EnvironmentITNG 2023 20th International Conference on Information Technology-New Generations10.1007/978-3-031-28332-1_25(217-226)Online publication date: 21-Feb-2023
  • (2022)Unreliable test infrastructures in automotive testing setupsProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice10.1145/3510457.3513069(307-308)Online publication date: 21-May-2022
  • (2022)On Determinism of Game Engines Used for Simulation-Based Autonomous Vehicle VerificationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.317788723:11(20538-20552)Online publication date: Nov-2022
  • (2022)Why Did the Test Execution Fail? Failure Classification Using Association Rules (Practical Experience Report)2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE55969.2022.00056(517-528)Online publication date: Oct-2022
  • (2022)To Seed or Not to Seed? An Empirical Analysis of Usage of Seeds for Testing in Machine Learning Projects2022 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST53961.2022.00026(151-161)Online publication date: Apr-2022
  • (2022)Peeler: Learning to Effectively Predict Flakiness without Running Tests2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00031(257-268)Online publication date: Oct-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media