Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Bounding Random Test Set Size with Computational Learning Theory

Published: 12 July 2024 Publication History

Abstract

Random testing approaches work by generating inputs at random, or by selecting inputs randomly from some pre-defined operational profile. One long-standing question that arises in this and other testing contexts is as follows: When can we stop testing? At what point can we be certain that executing further tests in this manner will not explore previously untested (and potentially buggy) software behaviors? This is analogous to the question in Machine Learning, of how many training examples are required in order to infer an accurate model. In this paper we show how probabilistic approaches to answer this question in Machine Learning (arising from Computational Learning Theory) can be applied in our testing context. This enables us to produce an upper bound on the number of tests that are required to achieve a given level of adequacy. We are the first to enable this from only knowing the number of coverage targets (e.g. lines of code) in the source code, without needing to observe a sample test executions. We validate this bound on a large set of Java units, and an autonomous driving system.

References

[1]
Domenico Amalfitano, Nicola Amatucci, Anna Rita Fasolino, Porfirio Tramontana, Emily Kowalczyk, and Atif M Memon. 2015. Exploiting the saturation effect in automatic random testing of android applications. In 2015 2nd ACM International Conference on Mobile Software Engineering and Systems. 33–43.
[2]
Dana Angluin. 1987. Learning regular sets from queries and counterexamples. Information and computation, 75, 2 (1987), 87–106.
[3]
Andrea Arcuri, Muhammad Zohaib Iqbal, and Lionel Briand. 2011. Random testing: Theoretical results and practical implications. IEEE transactions on Software Engineering, 38, 2 (2011), 258–277.
[4]
Francesco Bergadano and Daniele Gunetti. 1996. Testing by means of inductive program learning. ACM Transactions on Software Engineering and Methodology (TOSEM), 5, 2 (1996), 119–145.
[5]
Antonia Bertolino, Breno Miranda, Roberto Pietrantuono, and Stefano Russo. 2019. Adaptive test case allocation, selection and generation using coverage spectrum and operational profile. IEEE Transactions on Software Engineering, 47, 5 (2019), 881–898.
[6]
Marcel Böhme. 2019. Assurances in software testing: A roadmap. In 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). 5–8.
[7]
Marcel Böhme and Soumya Paul. 2015. A probabilistic analysis of the efficiency of automated software testing. IEEE Transactions on Software Engineering, 42, 4 (2015), 345–360.
[8]
Lionel C Briand, Yvan Labiche, Zaheer Bawar, and Nadia Traldi Spido. 2009. Using machine learning to refine category-partition test specifications and test suites. Information and Software Technology, 51, 11 (2009), 1551–1564.
[9]
Timothy A Budd and Dana Angluin. 1982. Two notions of correctness and their relation to testing. Acta informatica, 18 (1982), 31–45.
[10]
Thierry Titcheu Chekam, Mike Papadakis, Yves Le Traon, and Mark Harman. 2017. An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 597–608.
[11]
Tsong Yueh Chen, Hing Leung, and Ieng Kei Mak. 2005. Adaptive random testing. In Advances in Computer Science-ASIAN 2004. Higher-Level Decision Making: 9th Asian Computing Science Conference. 320–329.
[12]
Yu-Fang Chen, Chiao Hsieh, Ondřej Lengál, Tsung-Ju Lii, Ming-Hsien Tsai, Bow-Yaw Wang, and Farn Wang. 2016. PAC learning-based verification and model synthesis. In Proceedings of the 38th International Conference on Software Engineering. 714–724.
[13]
Wontae Choi, George Necula, and Koushik Sen. 2013. Guided gui testing of android apps with minimal restart and approximate learning. Acm Sigplan Notices, 48, 10 (2013), 623–640.
[14]
Koen Claessen and John Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the fifth ACM SIGPLAN international conference on Functional programming. 268–279.
[15]
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. In Conference on robot learning. 1–16.
[16]
Pierre Dupont, Bernard Lambeau, Christophe Damas, and Axel van Lamsweerde. 2008. The QSM algorithm and its application to software behavior model induction. Applied Artificial Intelligence, 22, 1-2 (2008), 77–115.
[17]
Gintare Karolina Dziugaite and Daniel M. Roy. 2017. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data. CoRR, abs/1703.11008 (2017), arXiv:1703.11008. arxiv:1703.11008
[18]
Phyllis G. Frankl and Elaine J. Weyuker. 1988. An applicable family of data flow testing criteria. IEEE Transactions on Software Engineering, 14, 10 (1988), 1483–1498.
[19]
Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.
[20]
Gordon Fraser and Andrea Arcuri. 2014. A Large Scale Evaluation of Automated Unit Test Generation Using EvoSuite. ACM Transactions on Software Engineering and Methodology (TOSEM), 24, 2 (2014), 8.
[21]
Gordon Fraser and Neil Walkinshaw. 2015. Assessing and generating test sets in terms of behavioural adequacy. Software Testing, Verification and Reliability, 25, 8 (2015), 749–780.
[22]
Kamran Ghani and John A Clark. 2008. Strengthening inferred specifications using search based testing. In 2008 IEEE International Conference on Software Testing Verification and Validation Workshop. 187–194.
[23]
E Mark Gold. 1978. Complexity of automaton identification from given data. Information and control, 37, 3 (1978), 302–320.
[24]
Oded Goldreich, Shari Goldwasser, and Dana Ron. 1998. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), 45, 4 (1998), 653–750.
[25]
John B Goodenough and Susan L Gerhart. 1975. Toward a theory of test data selection. In Proceedings of the international conference on Reliable software. 493–510.
[26]
Alex Groce, Gerard Holzmann, and Rajeev Joshi. 2007. Randomized differential testing as a prelude to formal verification. In 29th International Conference on Software Engineering (ICSE’07). 621–631.
[27]
Ralph Guderlei and Johannes Mayer. 2007. Statistical metamorphic testing testing programs with random output by means of statistical hypothesis tests and metamorphic testing. In Seventh International Conference on Quality Software (QSIC 2007). 404–409.
[28]
Maxime Haddouche and Benjamin Guedj. 2022. Online pac-bayes learning. Advances in Neural Information Processing Systems, 35 (2022), 25725–25738.
[29]
Richard Hamlet. 1994. Random Testing. Encyclopedia of Software Engineering.
[30]
Mary Jean Harrold. 2000. Testing: a roadmap. In Proceedings of the Conference on the Future of Software Engineering. 61–72.
[31]
Mary Jean Harrold, Gregg Rothermel, Rui Wu, and Liu Yi. 1998. An empirical investigation of program spectra. In Proceedings of the 1998 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. 83–90.
[32]
David Haussler. 1988. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial intelligence, 36, 2 (1988), 177–221.
[33]
Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. In Proceedings of the 36th international conference on software engineering. 435–445.
[34]
Malte Isberner, Falk Howar, and Bernhard Steffen. 2015. The open-source learnLib: a framework for active automata learning. In Computer Aided Verification: 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I 27. 487–495.
[35]
Marko Ivanković, Goran Petrović, René Just, and Gordon Fraser. 2019. Code coverage at Google. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 955–963.
[36]
René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437–440.
[37]
Michael J Kearns and Umesh Vazirani. 1994. An introduction to computational learning theory. MIT press.
[38]
Trent Kyono, Yao Zhang, and Mihaela van der Schaar. 2020. Castle: Regularization via auxiliary causal graph discovery. Advances in Neural Information Processing Systems, 33 (2020), 1501–1512.
[39]
Danushka Liyanage, Marcel Böhme, Chakkrit Tantithamthavorn, and Stephan Lipp. 2023. Reachable Coverage: Estimating Saturation in Fuzzing. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23), 17-19 May 2023, Australia.
[40]
Danushka Liyanage, Seongmin Lee, Chakkrit Tantithamthavorn, and Marcel Böhme. 2024. Extrapolating Coverage Rate in Greybox Fuzzing. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12.
[41]
Michael R Lyu. 1996. Handbook of software reliability engineering. 222, IEEE computer society press Los Alamitos.
[42]
David A McAllester. 1998. Some pac-bayesian theorems. In Proceedings of the eleventh annual conference on Computational learning theory. 230–234.
[43]
Karl Meinke and Muddassar A Sindhu. 2011. Incremental learning-based testing for reactive systems. In International Conference on Tests and Proofs. 134–151.
[44]
Breno Miranda and Antonia Bertolino. 2015. Improving test coverage measurement for reused software. In 2015 41st Euromicro Conference on Software Engineering and Advanced Applications. 27–34.
[45]
Breno Miranda and Antonia Bertolino. 2020. Testing relative to usage scope: Revisiting software coverage criteria. ACM Transactions on Software Engineering and Methodology (TOSEM), 29, 3 (2020), 1–24.
[46]
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. 2018. Foundations of machine learning. MIT press.
[47]
Edward F Moore. 1956. Gedanken-experiments on sequential machines. Automata studies, 34 (1956), 129–153.
[48]
Rajeev Motwani and Prabhakar Raghavan. 1996. Randomized algorithms. ACM Computing Surveys (CSUR), 28, 1 (1996), 33–37.
[49]
John D. Musa. 1993. Operational profiles in software-reliability engineering. IEEE software, 10, 2 (1993), 14–32.
[50]
Changhai Nie and Hareton Leung. 2011. A survey of combinatorial testing. ACM Computing Surveys (CSUR), 43, 2 (2011), 1–29.
[51]
Thomas Reps, Thomas Ball, Manuvir Das, and James Larus. 1997. The use of program profiling for software maintenance with applications to the year 2000 problem. In Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering. 432–449.
[52]
Kathleen Romanik and Jeffrey Scott Vitter. 1993. Using computational learning theory to analyze the testing complexity of program segments. In Proceedings of 1993 IEEE 17th International Computer Software and Applications Conference COMPSAC’93. 367–373.
[53]
Kathleen Romanik and Jeffrey Scott Vitter. 1996. Using Vapnik–Chervonenkis Dimension to Analyze the Testing Complexity of Program Segments. Information and Computation, 128, 2 (1996), 87–108.
[54]
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.
[55]
Han Shao, Omar Montasser, and Avrim Blum. 2022. A theory of pac learnability under transformation invariances. Advances in Neural Information Processing Systems, 35 (2022), 13989–14001.
[56]
Porfirio Tramontana, Domenico Amalfitano, Nicola Amatucci, Atif Memon, and Anna Rita Fasolino. 2019. Developing and evaluating objective termination criteria for random testing. ACM Transactions on Software Engineering and Methodology (TOSEM), 28, 3 (2019), 1–52.
[57]
Leslie Valiant. 2014. Probably Approximately Correct: Nature’s Algorithms for Learning and Prospering in a Complex World.
[58]
Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM, 27, 11 (1984), 1134–1142.
[59]
Vladimir Vapnik. 1999. The nature of statistical learning theory. Springer science & business media.
[60]
Neil Walkinshaw and Gordon Fraser. 2017. Uncertainty-driven black-box test data generation. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). 253–263.
[61]
Elaine J Weyuker. 1983. Assessing test data adequacy through program inference. ACM Transactions on Programming Languages and Systems (TOPLAS), 5, 4 (1983), 641–655.
[62]
Elaine J Weyuker. 1986. Axiomatizing software test data adequacy. IEEE transactions on software engineering, 1128–1138.
[63]
W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering, 42, 8 (2016), 707–740.
[64]
Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, and Yu Qiao. 2022. Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline. arxiv:2206.08129.
[65]
Hong Zhu. 1996. A formal interpretation of software testing as inductive inference. Software Testing, Verification and Reliability, 6, 1 (1996), 3–31.
[66]
Hong Zhu, Patrick Hall, and John May. 1992. Inductive inference and software testing. Software Testing, Verification and Reliability, 2, 2 (1992), 69–81.
[67]
Hong Zhu, Patrick AV Hall, and John HR May. 1997. Software unit test coverage and adequacy. Acm computing surveys (csur), 29, 4 (1997), 366–427.
[68]
Xiaogang Zhu, Sheng Wen, Seyit Camtepe, and Yang Xiang. 2022. Fuzzing: a survey for roadmap. ACM Computing Surveys (CSUR), 54, 11s (2022), 1–36.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Software Engineering
Proceedings of the ACM on Software Engineering  Volume 1, Issue FSE
July 2024
2770 pages
EISSN:2994-970X
DOI:10.1145/3554322
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2024
Published in PACMSE Volume 1, Issue FSE

Author Tags

  1. PAC Learning
  2. Sample Complexity
  3. Test saturation

Qualifiers

  • Research-article

Funding Sources

  • EPSRC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 107
    Total Downloads
  • Downloads (Last 12 months)107
  • Downloads (Last 6 weeks)36
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media