Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3609437.3609453acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

A Fine-Grained Evaluation of Mutation Operators for Deep Learning Systems: A Selective Mutation Approach

Published: 05 October 2023 Publication History

Abstract

The widespread adoption of deep learning (DL) has made it critical to ensure its reliability. Mutation testing has been employed in DL testing to assess test data quality, but it can be costly of a large number of generated mutants. Cost reduction can be achieved by selecting a sufficient subset of mutation operators. However, it remains unclear to what extent the DL mutation operators contribute to test effectiveness, making it challenging to determine which are useful mutation operators in a selective mutation approach.
In this paper, we perform a fine-grained evaluation of DL mutation operators by introducing the classification results of DL models. Based on the findings that mutants generated by some mutation operators can be redundant or useless in guiding the generation of high-quality test cases, we introduce two measures of DL mutation operators usefulness: redundancy score (RS), which quantifies the redundancy of operators, and quality score (QS), which quantifies the ability of operators to guide the generation of high-quality test cases. Our empirical evaluation on three widely used datasets and four DL models shows that RS and QS offer a dual evaluation of DL mutation operators. When employing mutation operators in a selective mutation strategy, prioritizing mutation operators based on RS can result in a reduction of mutants by 46.15 to 69.23, while retaining over 90 of mutation score. Similarly, prioritizing mutation operators based on QS can reduce mutants by 46.15 to 50.35, while retaining over 87 of test cases. Our study shows that RS-based and QS-based selective mutation can significantly reduce the number of mutants while maintaining high test effectiveness.

References

[1]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp). IEEE, 39–57.
[2]
Pedro Delgado-Pérez, Louis M Rose, and Inmaculada Medina-Bulo. 2019. Coverage-based quality metric of mutation operators for test suite improvement. Software Quality Journal 27 (2019), 823–859.
[3]
Pedro Delgado-Pérez, Sergio Segura, and Inmaculada Medina-Bulo. 2017. Assessment of C++ object-oriented mutation operators: A selective mutation approach. Software Testing, Verification and Reliability 27, 4-5 (2017), e1630.
[4]
Antonia Estero-Botaro, Francisco Palomo-Lozano, Inmaculada Medina-Bulo, Juan José Domínguez-Jiménez, and Antonio García-Domínguez. 2015. Quality metrics for mutation testing with applications to WS-BPEL compositions. Software Testing, Verification and Reliability 25, 5-7 (2015), 536–571.
[5]
Milos Gligoric, Lingming Zhang, Cristiano Pereira, and Gilles Pokam. 2013. Selective mutation testing for concurrent code. In Proceedings of the 2013 international symposium on software testing and analysis. 224–234.
[6]
Qiang Hu, Lei Ma, Xiaofei Xie, Bing Yu, Yang Liu, and Jianjun Zhao. 2019. Deepmutation++: A mutation testing framework for deep learning systems. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1158–1161.
[7]
Shamaila Hussain. 2008. Mutation clustering. Ms. Th., Kings College London, Strand, London (2008), 9.
[8]
Gunel Jahangirova and Paolo Tonella. 2020. An empirical evaluation of mutation operators for deep learning systems. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). IEEE, 74–84.
[9]
Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering 37, 5 (2010), 649–678.
[10]
Samuel J Kaufman, Ryan Featherman, Justin Alvin, Bob Kurtz, Paul Ammann, and René Just. 2022. Prioritizing mutants to guide mutation testing. In Proceedings of the 44th International Conference on Software Engineering. 1743–1754.
[11]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1039–1049.
[12]
Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2023. The cifar-10 dataset. http://www.cs.toronto.edu/ kriz/cifar.html. Accessed January 15.
[13]
Bob Kurtz, Paul Ammann, Marcio E Delamaro, Jeff Offutt, and Lin Deng. 2014. Mutant subsumption graphs. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops. IEEE, 176–185.
[14]
Yann LeCun. 1998. The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/ (1998).
[15]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[16]
Yanhui Li, Weijun Shen, Tengchao Wu, Lin Chen, Di Wu, Yuming Zhou, and Baowen Xu. 2022. How higher order mutant testing performs for deep learning models: A fine-grained evaluation of test effectiveness and efficiency improved from second-order mutant-classification tuples. Information and Software Technology 150 (2022), 106954.
[17]
Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. Deepct: Tomographic combinatorial testing for deep learning systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 614–618.
[18]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. 120–131.
[19]
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th international symposium on software reliability engineering (ISSRE). IEEE, 100–111.
[20]
Aditya P Mathur. 1991. Performance, effectiveness, and reliability issues in software testing. In 1991 The Fifteenth Annual International Computer Software & Applications Conference. IEEE Computer Society, 604–605.
[21]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).
[22]
A Jefferson Offutt, Gregg Rothermel, and Christian Zapf. 1993. An experimental evaluation of selective mutation. In Proceedings of 1993 15th international conference on software engineering. IEEE, 100–107.
[23]
Francisco Palomo-Lozano, Antonia Estero-Botaro, Inmaculada Medina-Bulo, and Manuel Núñez. 2018. Test suite minimization for mutation testing of WS-BPEL compositions. In Proceedings of the Genetic and Evolutionary Computation Conference. 1427–1434.
[24]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
[25]
Macario Polo, Mario Piattini, and Ignacio García-Rodríguez. 2009. Decreasing the cost of mutation testing with second-order mutants. Software Testing, Verification and Reliability 19, 2 (2009), 111–131.
[26]
Weijun Shen, Yanhui Li, Yuanlei Han, Lin Chen, Di Wu, Yuming Zhou, and Baowen Xu. 2021. Boundary sampling to boost mutation testing for deep learning models. Information and Software Technology 130 (2021), 106413.
[27]
Weijun Shen, Jun Wan, and Zhenyu Chen. 2018. Munn: Mutation analysis of neural networks. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 108–115.
[28]
Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, Matthew Hill, and Rob Ashmore. 2018. Testing deep neural networks. (2018).
[29]
W Eric Wong and Aditya P Mathur. 1995. Reducing the cost of mutation testing: An empirical study. Journal of Systems and Software 31, 3 (1995), 185–196.
[30]
Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. 2018. Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610 (2018).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware
August 2023
332 pages
ISBN:9798400708947
DOI:10.1145/3609437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Software testing
  2. deep learning
  3. mutation testing
  4. selective mutation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

Internetware 2023

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 59
    Total Downloads
  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media