research-article

Open access

Bounding Random Test Set Size with Computational Learning Theory

Authors:

Neil Walkinshaw,

Michael Foster,

José Miguel Rojas,

Robert M. HieronsAuthors Info & Claims

Proceedings of the ACM on Software Engineering, Volume 1, Issue FSE

Article No.: 112, Pages 2538 - 2560

https://doi.org/10.1145/3660819

Published: 12 July 2024 Publication History

Abstract

Random testing approaches work by generating inputs at random, or by selecting inputs randomly from some pre-defined operational profile. One long-standing question that arises in this and other testing contexts is as follows: When can we stop testing? At what point can we be certain that executing further tests in this manner will not explore previously untested (and potentially buggy) software behaviors? This is analogous to the question in Machine Learning, of how many training examples are required in order to infer an accurate model. In this paper we show how probabilistic approaches to answer this question in Machine Learning (arising from Computational Learning Theory) can be applied in our testing context. This enables us to produce an upper bound on the number of tests that are required to achieve a given level of adequacy. We are the first to enable this from only knowing the number of coverage targets (e.g. lines of code) in the source code, without needing to observe a sample test executions. We validate this bound on a large set of Java units, and an autonomous driving system.

References

[1]

Domenico Amalfitano, Nicola Amatucci, Anna Rita Fasolino, Porfirio Tramontana, Emily Kowalczyk, and Atif M Memon. 2015. Exploiting the saturation effect in automatic random testing of android applications. In 2015 2nd ACM International Conference on Mobile Software Engineering and Systems. 33–43.

[2]

Dana Angluin. 1987. Learning regular sets from queries and counterexamples. Information and computation, 75, 2 (1987), 87–106.

[3]

Andrea Arcuri, Muhammad Zohaib Iqbal, and Lionel Briand. 2011. Random testing: Theoretical results and practical implications. IEEE transactions on Software Engineering, 38, 2 (2011), 258–277.

[4]

Francesco Bergadano and Daniele Gunetti. 1996. Testing by means of inductive program learning. ACM Transactions on Software Engineering and Methodology (TOSEM), 5, 2 (1996), 119–145.

Digital Library

[5]

Antonia Bertolino, Breno Miranda, Roberto Pietrantuono, and Stefano Russo. 2019. Adaptive test case allocation, selection and generation using coverage spectrum and operational profile. IEEE Transactions on Software Engineering, 47, 5 (2019), 881–898.

[6]

Marcel Böhme. 2019. Assurances in software testing: A roadmap. In 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). 5–8.

[7]

Marcel Böhme and Soumya Paul. 2015. A probabilistic analysis of the efficiency of automated software testing. IEEE Transactions on Software Engineering, 42, 4 (2015), 345–360.

[8]

Lionel C Briand, Yvan Labiche, Zaheer Bawar, and Nadia Traldi Spido. 2009. Using machine learning to refine category-partition test specifications and test suites. Information and Software Technology, 51, 11 (2009), 1551–1564.

Digital Library

[9]

Timothy A Budd and Dana Angluin. 1982. Two notions of correctness and their relation to testing. Acta informatica, 18 (1982), 31–45.

[10]

Thierry Titcheu Chekam, Mike Papadakis, Yves Le Traon, and Mark Harman. 2017. An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 597–608.

Digital Library

[11]

Tsong Yueh Chen, Hing Leung, and Ieng Kei Mak. 2005. Adaptive random testing. In Advances in Computer Science-ASIAN 2004. Higher-Level Decision Making: 9th Asian Computing Science Conference. 320–329.

[12]

Yu-Fang Chen, Chiao Hsieh, Ondřej Lengál, Tsung-Ju Lii, Ming-Hsien Tsai, Bow-Yaw Wang, and Farn Wang. 2016. PAC learning-based verification and model synthesis. In Proceedings of the 38th International Conference on Software Engineering. 714–724.

Digital Library

[13]

Wontae Choi, George Necula, and Koushik Sen. 2013. Guided gui testing of android apps with minimal restart and approximate learning. Acm Sigplan Notices, 48, 10 (2013), 623–640.

Digital Library

[14]

Koen Claessen and John Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the fifth ACM SIGPLAN international conference on Functional programming. 268–279.

Digital Library

[15]

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. In Conference on robot learning. 1–16.

[16]

Pierre Dupont, Bernard Lambeau, Christophe Damas, and Axel van Lamsweerde. 2008. The QSM algorithm and its application to software behavior model induction. Applied Artificial Intelligence, 22, 1-2 (2008), 77–115.

[17]

Gintare Karolina Dziugaite and Daniel M. Roy. 2017. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data. CoRR, abs/1703.11008 (2017), arXiv:1703.11008. arxiv:1703.11008

[18]

Phyllis G. Frankl and Elaine J. Weyuker. 1988. An applicable family of data flow testing criteria. IEEE Transactions on Software Engineering, 14, 10 (1988), 1483–1498.

Digital Library

[19]

Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.

Digital Library

[20]

Gordon Fraser and Andrea Arcuri. 2014. A Large Scale Evaluation of Automated Unit Test Generation Using EvoSuite. ACM Transactions on Software Engineering and Methodology (TOSEM), 24, 2 (2014), 8.

Digital Library

[21]

Gordon Fraser and Neil Walkinshaw. 2015. Assessing and generating test sets in terms of behavioural adequacy. Software Testing, Verification and Reliability, 25, 8 (2015), 749–780.

Digital Library

[22]

Kamran Ghani and John A Clark. 2008. Strengthening inferred specifications using search based testing. In 2008 IEEE International Conference on Software Testing Verification and Validation Workshop. 187–194.

Digital Library

[23]

E Mark Gold. 1978. Complexity of automaton identification from given data. Information and control, 37, 3 (1978), 302–320.

[24]

Oded Goldreich, Shari Goldwasser, and Dana Ron. 1998. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), 45, 4 (1998), 653–750.

Digital Library

[25]

John B Goodenough and Susan L Gerhart. 1975. Toward a theory of test data selection. In Proceedings of the international conference on Reliable software. 493–510.

Digital Library

[26]

Alex Groce, Gerard Holzmann, and Rajeev Joshi. 2007. Randomized differential testing as a prelude to formal verification. In 29th International Conference on Software Engineering (ICSE’07). 621–631.

Digital Library

[27]

Ralph Guderlei and Johannes Mayer. 2007. Statistical metamorphic testing testing programs with random output by means of statistical hypothesis tests and metamorphic testing. In Seventh International Conference on Quality Software (QSIC 2007). 404–409.

Digital Library

[28]

Maxime Haddouche and Benjamin Guedj. 2022. Online pac-bayes learning. Advances in Neural Information Processing Systems, 35 (2022), 25725–25738.

[29]

Richard Hamlet. 1994. Random Testing. Encyclopedia of Software Engineering.

[30]

Mary Jean Harrold. 2000. Testing: a roadmap. In Proceedings of the Conference on the Future of Software Engineering. 61–72.

Digital Library

[31]

Mary Jean Harrold, Gregg Rothermel, Rui Wu, and Liu Yi. 1998. An empirical investigation of program spectra. In Proceedings of the 1998 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. 83–90.

Digital Library

[32]

David Haussler. 1988. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial intelligence, 36, 2 (1988), 177–221.

[33]

Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. In Proceedings of the 36th international conference on software engineering. 435–445.

Digital Library

[34]

Malte Isberner, Falk Howar, and Bernhard Steffen. 2015. The open-source learnLib: a framework for active automata learning. In Computer Aided Verification: 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I 27. 487–495.

[35]

Marko Ivanković, Goran Petrović, René Just, and Gordon Fraser. 2019. Code coverage at Google. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 955–963.

Digital Library

[36]

René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437–440.

Digital Library

[37]

Michael J Kearns and Umesh Vazirani. 1994. An introduction to computational learning theory. MIT press.

Digital Library

[38]

Trent Kyono, Yao Zhang, and Mihaela van der Schaar. 2020. Castle: Regularization via auxiliary causal graph discovery. Advances in Neural Information Processing Systems, 33 (2020), 1501–1512.

[39]

Danushka Liyanage, Marcel Böhme, Chakkrit Tantithamthavorn, and Stephan Lipp. 2023. Reachable Coverage: Estimating Saturation in Fuzzing. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23), 17-19 May 2023, Australia.

Digital Library

[40]

Danushka Liyanage, Seongmin Lee, Chakkrit Tantithamthavorn, and Marcel Böhme. 2024. Extrapolating Coverage Rate in Greybox Fuzzing. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12.

[41]

Michael R Lyu. 1996. Handbook of software reliability engineering. 222, IEEE computer society press Los Alamitos.

[42]

David A McAllester. 1998. Some pac-bayesian theorems. In Proceedings of the eleventh annual conference on Computational learning theory. 230–234.

Digital Library

[43]

Karl Meinke and Muddassar A Sindhu. 2011. Incremental learning-based testing for reactive systems. In International Conference on Tests and Proofs. 134–151.

[44]

Breno Miranda and Antonia Bertolino. 2015. Improving test coverage measurement for reused software. In 2015 41st Euromicro Conference on Software Engineering and Advanced Applications. 27–34.

[45]

Breno Miranda and Antonia Bertolino. 2020. Testing relative to usage scope: Revisiting software coverage criteria. ACM Transactions on Software Engineering and Methodology (TOSEM), 29, 3 (2020), 1–24.

Digital Library

[46]

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. 2018. Foundations of machine learning. MIT press.

Digital Library

[47]

Edward F Moore. 1956. Gedanken-experiments on sequential machines. Automata studies, 34 (1956), 129–153.

[48]

Rajeev Motwani and Prabhakar Raghavan. 1996. Randomized algorithms. ACM Computing Surveys (CSUR), 28, 1 (1996), 33–37.

Digital Library

[49]

John D. Musa. 1993. Operational profiles in software-reliability engineering. IEEE software, 10, 2 (1993), 14–32.

Digital Library

[50]

Changhai Nie and Hareton Leung. 2011. A survey of combinatorial testing. ACM Computing Surveys (CSUR), 43, 2 (2011), 1–29.

Digital Library

[51]

Thomas Reps, Thomas Ball, Manuvir Das, and James Larus. 1997. The use of program profiling for software maintenance with applications to the year 2000 problem. In Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering. 432–449.

[52]

Kathleen Romanik and Jeffrey Scott Vitter. 1993. Using computational learning theory to analyze the testing complexity of program segments. In Proceedings of 1993 IEEE 17th International Computer Software and Applications Conference COMPSAC’93. 367–373.

[53]

Kathleen Romanik and Jeffrey Scott Vitter. 1996. Using Vapnik–Chervonenkis Dimension to Analyze the Testing Complexity of Program Segments. Information and Computation, 128, 2 (1996), 87–108.

Digital Library

[54]

Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.

Digital Library

[55]

Han Shao, Omar Montasser, and Avrim Blum. 2022. A theory of pac learnability under transformation invariances. Advances in Neural Information Processing Systems, 35 (2022), 13989–14001.

[56]

Porfirio Tramontana, Domenico Amalfitano, Nicola Amatucci, Atif Memon, and Anna Rita Fasolino. 2019. Developing and evaluating objective termination criteria for random testing. ACM Transactions on Software Engineering and Methodology (TOSEM), 28, 3 (2019), 1–52.

Digital Library

[57]

Leslie Valiant. 2014. Probably Approximately Correct: Nature’s Algorithms for Learning and Prospering in a Complex World.

[58]

Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM, 27, 11 (1984), 1134–1142.

Digital Library

[59]

Vladimir Vapnik. 1999. The nature of statistical learning theory. Springer science & business media.

Digital Library

[60]

Neil Walkinshaw and Gordon Fraser. 2017. Uncertainty-driven black-box test data generation. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). 253–263.

[61]

Elaine J Weyuker. 1983. Assessing test data adequacy through program inference. ACM Transactions on Programming Languages and Systems (TOPLAS), 5, 4 (1983), 641–655.

Digital Library

[62]

Elaine J Weyuker. 1986. Axiomatizing software test data adequacy. IEEE transactions on software engineering, 1128–1138.

[63]

W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering, 42, 8 (2016), 707–740.

Digital Library

[64]

Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, and Yu Qiao. 2022. Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline. arxiv:2206.08129.

[65]

Hong Zhu. 1996. A formal interpretation of software testing as inductive inference. Software Testing, Verification and Reliability, 6, 1 (1996), 3–31.

[66]

Hong Zhu, Patrick Hall, and John May. 1992. Inductive inference and software testing. Software Testing, Verification and Reliability, 2, 2 (1992), 69–81.

[67]

Hong Zhu, Patrick AV Hall, and John HR May. 1997. Software unit test coverage and adequacy. Acm computing surveys (csur), 29, 4 (1997), 366–427.

[68]

Xiaogang Zhu, Sheng Wen, Seyit Camtepe, and Yang Xiang. 2022. Fuzzing: a survey for roadmap. ACM Computing Surveys (CSUR), 54, 11s (2022), 1–36.

Digital Library

Index Terms

Bounding Random Test Set Size with Computational Learning Theory
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Empirical software validation
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Sample complexity and generalization bounds

Recommendations

How Bad May Learning Curves Be?

In this paper, we motivate the need for estimating bounds on learning curves of average-case learning algorithms when they perform the worst on training samples. We then apply the method of reducing learning problems to hypothesis testing ones to ...
Improved Bounds on the Sample Complexity of Learning

We present a new general upper bound on the number of examples required to estimate all of the expectations of a set of random variables uniformly well. The quality of the estimates is measured using a variant of the relative error proposed by Haussler ...
Prioritizing random combinatorial test suites
SAC '17: Proceedings of the Symposium on Applied Computing

The behaviour of a system under test can be influenced by several factors, such as system configurations, user inputs, and so on. It has also been observed that many failures are caused by only a small number of factors. Combinatorial testing aims at ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Software Engineering

Proceedings of the ACM on Software Engineering Volume 1, Issue FSE

July 2024

2770 pages

EISSN:2994-970X

DOI:10.1145/3554322

Editor:
Luciano Baresi
Politecnico di Milano, Italy

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2024

Published in PACMSE Volume 1, Issue FSE

Author Tags

Qualifiers

Research-article

Funding Sources

EPSRC

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
107
Total Downloads

Downloads (Last 12 months)107
Downloads (Last 6 weeks)36

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents