research-article

A Fine-Grained Evaluation of Mutation Operators for Deep Learning Systems: A Selective Mutation Approach

Authors:

Zhiqiu HuangAuthors Info & Claims

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

Pages 123 - 133

https://doi.org/10.1145/3609437.3609453

Published: 05 October 2023 Publication History

Abstract

The widespread adoption of deep learning (DL) has made it critical to ensure its reliability. Mutation testing has been employed in DL testing to assess test data quality, but it can be costly of a large number of generated mutants. Cost reduction can be achieved by selecting a sufficient subset of mutation operators. However, it remains unclear to what extent the DL mutation operators contribute to test effectiveness, making it challenging to determine which are useful mutation operators in a selective mutation approach.

In this paper, we perform a fine-grained evaluation of DL mutation operators by introducing the classification results of DL models. Based on the findings that mutants generated by some mutation operators can be redundant or useless in guiding the generation of high-quality test cases, we introduce two measures of DL mutation operators usefulness: redundancy score (RS), which quantifies the redundancy of operators, and quality score (QS), which quantifies the ability of operators to guide the generation of high-quality test cases. Our empirical evaluation on three widely used datasets and four DL models shows that RS and QS offer a dual evaluation of DL mutation operators. When employing mutation operators in a selective mutation strategy, prioritizing mutation operators based on RS can result in a reduction of mutants by 46.15 to 69.23, while retaining over 90 of mutation score. Similarly, prioritizing mutation operators based on QS can reduce mutants by 46.15 to 50.35, while retaining over 87 of test cases. Our study shows that RS-based and QS-based selective mutation can significantly reduce the number of mutants while maintaining high test effectiveness.

References

[1]

Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp). IEEE, 39–57.

[2]

Pedro Delgado-Pérez, Louis M Rose, and Inmaculada Medina-Bulo. 2019. Coverage-based quality metric of mutation operators for test suite improvement. Software Quality Journal 27 (2019), 823–859.

Digital Library

[3]

Pedro Delgado-Pérez, Sergio Segura, and Inmaculada Medina-Bulo. 2017. Assessment of C++ object-oriented mutation operators: A selective mutation approach. Software Testing, Verification and Reliability 27, 4-5 (2017), e1630.

[4]

Antonia Estero-Botaro, Francisco Palomo-Lozano, Inmaculada Medina-Bulo, Juan José Domínguez-Jiménez, and Antonio García-Domínguez. 2015. Quality metrics for mutation testing with applications to WS-BPEL compositions. Software Testing, Verification and Reliability 25, 5-7 (2015), 536–571.

Digital Library

[5]

Milos Gligoric, Lingming Zhang, Cristiano Pereira, and Gilles Pokam. 2013. Selective mutation testing for concurrent code. In Proceedings of the 2013 international symposium on software testing and analysis. 224–234.

Digital Library

[6]

Qiang Hu, Lei Ma, Xiaofei Xie, Bing Yu, Yang Liu, and Jianjun Zhao. 2019. Deepmutation++: A mutation testing framework for deep learning systems. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1158–1161.

Digital Library

[7]

Shamaila Hussain. 2008. Mutation clustering. Ms. Th., Kings College London, Strand, London (2008), 9.

[8]

Gunel Jahangirova and Paolo Tonella. 2020. An empirical evaluation of mutation operators for deep learning systems. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). IEEE, 74–84.

[9]

Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering 37, 5 (2010), 649–678.

[10]

Samuel J Kaufman, Ryan Featherman, Justin Alvin, Bob Kurtz, Paul Ammann, and René Just. 2022. Prioritizing mutants to guide mutation testing. In Proceedings of the 44th International Conference on Software Engineering. 1743–1754.

Digital Library

[11]

Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1039–1049.

Digital Library

[12]

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2023. The cifar-10 dataset. http://www.cs.toronto.edu/ kriz/cifar.html. Accessed January 15.

[13]

Bob Kurtz, Paul Ammann, Marcio E Delamaro, Jeff Offutt, and Lin Deng. 2014. Mutant subsumption graphs. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops. IEEE, 176–185.

Digital Library

[14]

Yann LeCun. 1998. The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/ (1998).

[15]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.

[16]

Yanhui Li, Weijun Shen, Tengchao Wu, Lin Chen, Di Wu, Yuming Zhou, and Baowen Xu. 2022. How higher order mutant testing performs for deep learning models: A fine-grained evaluation of test effectiveness and efficiency improved from second-order mutant-classification tuples. Information and Software Technology 150 (2022), 106954.

Digital Library

[17]

Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. Deepct: Tomographic combinatorial testing for deep learning systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 614–618.

[18]

Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. 120–131.

Digital Library

[19]

Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th international symposium on software reliability engineering (ISSRE). IEEE, 100–111.

[20]

Aditya P Mathur. 1991. Performance, effectiveness, and reliability issues in software testing. In 1991 The Fifteenth Annual International Computer Software & Applications Conference. IEEE Computer Society, 604–605.

[21]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).

[22]

A Jefferson Offutt, Gregg Rothermel, and Christian Zapf. 1993. An experimental evaluation of selective mutation. In Proceedings of 1993 15th international conference on software engineering. IEEE, 100–107.

[23]

Francisco Palomo-Lozano, Antonia Estero-Botaro, Inmaculada Medina-Bulo, and Manuel Núñez. 2018. Test suite minimization for mutation testing of WS-BPEL compositions. In Proceedings of the Genetic and Evolutionary Computation Conference. 1427–1434.

Digital Library

[24]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. 1–18.

Digital Library

[25]

Macario Polo, Mario Piattini, and Ignacio García-Rodríguez. 2009. Decreasing the cost of mutation testing with second-order mutants. Software Testing, Verification and Reliability 19, 2 (2009), 111–131.

Digital Library

[26]

Weijun Shen, Yanhui Li, Yuanlei Han, Lin Chen, Di Wu, Yuming Zhou, and Baowen Xu. 2021. Boundary sampling to boost mutation testing for deep learning models. Information and Software Technology 130 (2021), 106413.

[27]

Weijun Shen, Jun Wan, and Zhenyu Chen. 2018. Munn: Mutation analysis of neural networks. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 108–115.

[28]

Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, Matthew Hill, and Rob Ashmore. 2018. Testing deep neural networks. (2018).

[29]

W Eric Wong and Aditya P Mathur. 1995. Reducing the cost of mutation testing: An empirical study. Journal of Systems and Software 31, 3 (1995), 185–196.

Digital Library

[30]

Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. 2018. Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610 (2018).

Digital Library

Cited By

Index Terms

A Fine-Grained Evaluation of Mutation Operators for Deep Learning Systems: A Selective Mutation Approach
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Empirical Evaluation of Orthogonality of Class Mutation Operators
APSEC '04: Proceedings of the 11th Asia-Pacific Software Engineering Conference

Mutation testing is a fault-based testing technique which provides strong quality assurance. Mutation testing has a very long history for the procedural programs at unit-level testing, but the research on mutation testing of object-oriented programs is ...
Growing a Reduced Set of Mutation Operators
ARES '14: Proceedings of the 2014 Ninth International Conference on Availability, Reliability and Security

Although widely considered to be quite powerful, mutation testing is also known for its expense. Three fundamental (and related) sources for much of the expense are (1) the number of mutants, (2) the number of equivalent mutants, and (3) the number of ...
Mutation Operators for Actor Systems
ICSTW '10: Proceedings of the 2010 Third International Conference on Software Testing, Verification, and Validation Workshops

Mutation testing is a well known technique for estimating and improving the quality of test suites. Given a test suite T for a system S, mutation testing systematically creates mutants of S and executes T to measure how many mutants T detects. If T does ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

August 2023

332 pages

ISBN:9798400708947

DOI:10.1145/3609437

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

Internetware 2023

Internetware 2023: 14th Asia-Pacific Symposium on Internetware

August 4 - 6, 2023

Hangzhou, China

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
59
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents