Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria

Published: 01 August 2006 Publication History

Abstract

The empirical assessment of test techniques plays an important role in software testing research. One common practice is to seed faults in subject software, either manually or by using a program that generates all possible mutants based on a set of mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults, thus facilitating the statistical analysis of fault detection effectiveness of test suites; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. Focusing on four common control and data flow criteria (Block, Decision, C-Use, and P-Use), this paper investigates this important issue based on a middle size industrial program with a comprehensive pool of test cases and known faults. Based on the data available thus far, the results are very consistent across the investigated criteria as they show that the use of mutation operators is yielding trustworthy results: Generated mutants can be used to predict the detection effectiveness of real faults. Applying such a mutation analysis, we then investigate the relative cost and effectiveness of the above-mentioned criteria by revisiting fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage. Although such questions have been partially investigated in previous studies, we can use a large number of mutants, which helps decrease the impact of random variation in our analysis and allows us to use a different analysis approach. Our results are then compared with published studies, plausible reasons for the differences are provided, and the research leads us to suggest a way to tune the mutation analysis process to possible differences in fault detection probabilities in a specific environment.

References

[1]
J.H. Andrews, L.C. Briand, and Y. Labiche, “Is Mutation an Appropriate Tool for Testing Experiments?” Proc. IEEE Int'l Conf. Software Eng., pp. 402-411, 2005.
[2]
J.H. Andrews and Y. Zhang, “General Test Result Checking with Log File Analysis,” IEEE Trans. Software Eng., vol. 29, no. 7, pp.634-648, July 2003.
[3]
B. Beizer, Software Testing Techniques, second ed. Van Nostrand Reinhold, 1990.
[4]
L.C. Briand, Y. Labiche, and Y. Wang, “Using Simulation to Empirically Investigate Test Coverage Criteria,” Proc. IEEE/ACM Int'l Conf. Software Eng., pp. 86-95, 2004.
[5]
T.A. Budd and D. Angluin, “Two Notions of Correctness and Their Relation to Testing,” Acta Informatica, vol. 18, no. 1, pp. 31-45, 1982.
[6]
D.T. Campbell and J.C. Stanley, Experimental and Quasi-Experimental Designs for Research. Houghton Mifflin Company, 1990.
[7]
W. Chen, R.H. Untch, G. Rothermel, S. Elbaum, and J. von Ronne, “Can Fault-Exposure-Potential Estimates Improve the Fault Detection Abilities of Test Suites?” Software Testing, Verification, and Reliability, vol. 12, no. 4, pp. 197-218, 2002.
[8]
R.A. DeMillo, R.J. Lipton, and F.G. Sayward, “Hints on Test Data Selection: Help for the Practicing Programmer,” Computer, vol. 11, no. 4, pp. 34-41, Apr. 1978.
[9]
R.L. Eubank, Spline Smoothing and Nonparametric Regression. Marcel Dekker, 1988.
[10]
N.E. Fenton and S.L. Pfleeger, Software Metrics: A Rigorous and Practical Approach, second ed. PWS Publishing, 1998.
[11]
P.G. Frankl, O. Iakounenko, “Further Empirical Studies of Test Effectiveness,” Proc. Sixth ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 153-162, Nov. 1998.
[12]
P.G. Frankl and S.N. Weiss, “An Experimental Comparison of the Effectiveness of the All-Uses and All-Edges Adequacy Criteria,” Proc. Fourth Symp. Testing, Analysis, and Verification, pp. 154-164, 1991.
[13]
P.G. Frankl and S.N. Weiss, “An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing,” IEEE Trans. Software Eng., vol. 19, no. 8, pp. 774-787, Aug. 1993.
[14]
D. Hamlet and J. Maybee, The Engineering of Software. Addison Wesley, 2001.
[15]
R.G. Hamlet, “Testing Programs with the Aid of a Compiler,” IEEE Trans. Software Eng., vol. 3, no. 4, pp. 279-290, 1977.
[16]
M. Harder, J. Mellen, and M.D. Ernst, “Improving Test Suites via Operational Abstraction,” Proc. 25th Int'l Conf. Software Eng., pp.60-71, May 2003.
[17]
M. Hutchins, H. Froster, T. Goradia, and T. Ostrand, “Experiments on the Effectiveness of Dataflow- and Controlflow-Based Test Adequacy Criteria,” Proc. 16th IEEE Int'l Conf. Software Eng., pp. 191-200, May 1994.
[18]
S. Kim, J.A. Clark, and J.A. McDermid, “Investigating the Effectiveness of Object-Oriented Testing Strategies with the Mutation Method,” Software Testing, Verification, and Reliability, vol. 11, no. 3, pp. 207-225, 2001.
[19]
R.E. Kirk, “Practical Significance: A Concept Whose Time Has Come,” Educational and Psychological Measurement, vol. 56, no. 5, pp. 746-759, 1996.
[20]
M.R. Lyu, J.R. Horgan, and S. London, “A Coverage Analysis Tool for the Effectiveness of Software Testing,” IEEE Trans. Reliability, vol. 43, no. 4, pp. 527-535, 1994.
[21]
A.M. Memon, I. Banerjee, and A. Nagarajan, “What Test Oracle Should I Use for Effective GUI Testing?” Proc. IEEE Int'l Conf. Automated Software Eng. (ASE '03), pp. 164-173, Oct. 2003.
[22]
A.J. Offutt, “Investigations of the Software Testing Coupling Effect,” ACM Trans. Software Eng. and Methodology, vol. 1, no. 1, pp.3-18, 1992.
[23]
A.J. Offutt, A. Lee, G. Rothermel, R.H. Untch, and C. Zapf, “An Experimental Determination of Sufficient Mutation Operators,” ACM Trans. Software Eng. and Methodology, vol. 5, no. 2, pp. 99-118, 1996.
[24]
A.J. Offutt and J. Pan, “Detecting Equivalent Mutants and the Feasible Path Problem,” Software Testing, Verification, and Reliability, vol. 7, no. 3, pp. 165-192, 1997.
[25]
A.J. Offutt and R.H. Untch, “Mutation 2000: Uniting the Orthogonal,” Proc. Mutation, pp. 45-55, Oct. 2000.
[26]
A. Pasquini, A. Crespo, and P. Matrelle, “Sensitivity of Reliability-Growth Models to Operational Profiles Errors vs Testing Accuracy,” IEEE Trans. Reliability, vol. 45, no. 4, pp. 531-540, 1996.
[27]
S. Rapps and E.J. Weyuker, “Selecting Software Test Data Using Data Flow Information,” IEEE Trans. Software Eng., vol. 11, no. 4, pp. 367-375, Apr. 1985.
[28]
G. Rothermel, R.H. Untch, C. Chu, and M.J. Harrold, “Prioritizing Test Cases for Regression Testing,” IEEE Trans. Software Eng., vol. 27, no. 10, pp. 929-948, Oct. 2001.
[29]
T.P. Ryan, Modern Regression Methods. Wiley, 1996.
[30]
P. Thévenod-Fosse, H. Waeselynck, and Y. Crouzet, “An Experimental Study on Software Structural Testing: Deterministic versus Random Input Generation,” Proc. 21st Int'l Symp. Fault-Tolerant Computing, pp. 410-417, June 1991.
[31]
F.I. Vokolos and P.G. Frankl, “Empirical Evaluation of the Textual Differencing Regression Testing Technique,” Proc. IEEE Int'l Conf. Software Maintenance, pp. 44-53, Mar. 1998.
[32]
C. Wohlin, P. Runeson, M. Host, M.C. Ohlsson, B. Regnell, and A. Wesslen, Experimentation in Software Engineering—An Introduction. Kluwer, 2000.
[33]
W.E. Wong, J.R. Horgan, A.P. Mathur, and A. Pasquini, “Test Set Size Minimization and Fault Detection Effectiveness: A Case Study in a Space Application,” Technical Report TR-173-P, Software Eng. Research Center (SERC), 1997.

Cited By

View all
  • (2024)Using Category Partition to Detect Metamorphic RelationsProceedings of the 9th ACM International Workshop on Metamorphic Testing10.1145/3679006.3685068(10-17)Online publication date: 13-Sep-2024
  • (2024)FeatMaker: Automated Feature Engineering for Search Strategy of Symbolic ExecutionProceedings of the ACM on Software Engineering10.1145/36608151:FSE(2447-2468)Online publication date: 12-Jul-2024
  • (2024)Large Language Models for Equivalent Mutant Detection: How Far Are We?Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680395(1733-1745)Online publication date: 11-Sep-2024
  • Show More Cited By

Index Terms

  1. Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Software Engineering
    IEEE Transactions on Software Engineering  Volume 32, Issue 8
    August 2006
    96 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 August 2006

    Author Tags

    1. Testing and debugging
    2. experimental design.
    3. test coverage of code
    4. testing strategies

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Using Category Partition to Detect Metamorphic RelationsProceedings of the 9th ACM International Workshop on Metamorphic Testing10.1145/3679006.3685068(10-17)Online publication date: 13-Sep-2024
    • (2024)FeatMaker: Automated Feature Engineering for Search Strategy of Symbolic ExecutionProceedings of the ACM on Software Engineering10.1145/36608151:FSE(2447-2468)Online publication date: 12-Jul-2024
    • (2024)Large Language Models for Equivalent Mutant Detection: How Far Are We?Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680395(1733-1745)Online publication date: 11-Sep-2024
    • (2024)Mutation Coverage is not Strongly Correlated with Mutation CoverageProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)10.1145/3644032.3644442(1-11)Online publication date: 15-Apr-2024
    • (2024)ReClues: Representing and indexing failures in parallel debugging with program variablesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639098(1-13)Online publication date: 20-May-2024
    • (2024)ReSuMo: a regression strategy and tool for mutation testing of solidity smart contractsSoftware Quality Journal10.1007/s11219-023-09637-132:1(225-253)Online publication date: 1-Mar-2024
    • (2024)Verifying consistency of software product line architectures with product architecturesSoftware and Systems Modeling (SoSyM)10.1007/s10270-023-01114-423:1(195-221)Online publication date: 1-Feb-2024
    • (2023)Who Judges the Judge: An Empirical Study on Online Judge TestsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598060(334-346)Online publication date: 12-Jul-2023
    • (2023)Green Fuzzing: A Saturation-Based Stopping Criterion using Vulnerability PredictionProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598043(127-139)Online publication date: 12-Jul-2023
    • (2023)Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network TestingACM Transactions on Software Engineering and Methodology10.1145/357604032:3(1-48)Online publication date: 26-Apr-2023
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media