research-article

Multi-objective metamorphic follow-up test case selection for deep learning systems

Author:

Aitor ArrietaAuthors Info & Claims

GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 1327 - 1335

https://doi.org/10.1145/3512290.3528697

Published: 08 July 2022 Publication History

Abstract

Deep Learning (DL) components are increasing their presence in safety and mission-critical software systems. To ensure a high dependability of DL systems, robust verification methods are required, for which automation is highly beneficial (e.g., more test cases can be executed). Metamorphic Testing (MT) is a technique that has shown to alleviate the test oracle problem when testing DL systems, and therefore, increasing test automation. However, a drawback of this technique lies into the need of multiple test executions to obtain the test verdict (named as the source and the follow-up test cases), requiring additional testing cost. In this paper we propose an approach based on multi-objective search to select follow-up test cases. Our approach makes use of source test cases to measure the uncertainty provoked by such test inputs in the DL model, and based on that, select failure-revealing follow-up test cases. We integrate our approach with the NSGA-II algorithm. An empirical evaluation on three DL models tackling the image classification problem, along with five different metamorphic relations demonstrates that our approach outperformed the baseline algorithm between 17.09 to 59.20% on average when considering the revisited Hypervolume quality indicator.

References

[1]

John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Maria Lomeli, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers. 2021. Testing Web Enabled Simulation at Scale Using Metamorphic Testing. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 140--149.

Digital Library

[2]

Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In 2011 33rd International Conference on Software Engineering (ICSE). IEEE, 1--10.

Digital Library

[3]

Aitor Arrieta. 2022. On the Cost-Efectiveness of Composite Metamorphic Relations for Testing Deep Learning Systems. In Metamorphic Testing Workshop (MET'22). ACM.

[4]

Aitor Arrieta, Joseba Andoni Agirre, and Goiuria Sagardui. 2020. Seeding strategies for multi-objective test case selection: an application on simulation-based testing. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 1222--1231.

Digital Library

[5]

Aitor Arrieta, Shuai Wang, Ainhoa Arruabarrena, Urtzi Markiegi, Goiuria Sagardui, and Leire Etxeberria. 2018. Multi-objective black-box test case selection for cost-effectively testing simulation models. In Proceedings of the Genetic and Evolutionary Computation Conference. 1411--1418.

Digital Library

[6]

Aitor Arrieta, Shuai Wang, Urtzi Markiegi, Ainhoa Arruabarrena, Leire Etxeberria, and Goiuria Sagardui. 2019. Pareto eficient multi-objective black-box test case selection for simulation-based testing. Information and Software Technology 114 (2019), 137--154.

Digital Library

[7]

Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2021. Generating metamorphic relations for cyber-physical systems with genetic programming: an industrial case study. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1264--1274.

Digital Library

[8]

T. Y. Chen, S. C. Cheung, and S. M. Yiu. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, The Hong Kong University of Science and Technology.

[9]

Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Comput. Surv. 51, 1, Article 4 (Jan. 2018), 27 pages.

Digital Library

[10]

Yiqun T Chen, Rahul Gopinath, Anita Tadakamalla, Michael D Ernst, Reid Holmes, Gordon Fraser, Paul Ammann, and René Just. 2020. Revisiting the relationship between fault detection, test adequacy criteria, and test set size. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 237--249.

Digital Library

[11]

Junhua Ding, Xiaojun Kang, and Xin-Hua Hu. 2017. Validating a deep learning framework by metamorphic testing. In 2017 IEEE/ACM 2nd International Workshop on Metamorphic Testing (MET). IEEE, 28--34.

[12]

Alastair F. Donaldson. 2019. Metamorphic Testing of Android Graphics Drivers. In Proceedings of the 4th International Workshop on Metamorphic Testing (Montreal, Quebec, Canada) (MET '19). IEEE Press, 1.

Digital Library

[13]

Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M Rao, RP Jagadeesh Chandra Bose, Neville Dubash, and Sanjay Podder. 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 118--128.

Digital Library

[14]

Emelie Engström, Per Runeson, and Mats Skoglund. 2010. A systematic review on regression test selection techniques. Information and Software Technology 52, 1 (2010), 14--30.

Digital Library

[15]

Vahid Garousi, Ramazan Özkan, and Aysu Betin-Can. 2018. Multi-objective regression test selection in practice: An empirical study in the defense software industry. Information and Software Technology 103 (2018), 40--54.

[16]

Fitash Ul Haq, Donghwan Shin, Shiva Nejati, and Lionel C Briand. 2020. Comparing offline and online testing of deep neural networks: An autonomous car case study. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). IEEE, 85--95.

[17]

Christopher Henard, Mike Papadakis, Mark Harman, Yue Jia, and Yves Le Traon. 2016. Comparing white-box and black-box test prioritization. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 523--534.

Digital Library

[18]

Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. Deepcrime: Mutation testing of deep learning systems based on real faults. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 67--78.

Digital Library

[19]

Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1039--1049.

Digital Library

[20]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097--1105.

[21]

Dong Guowei Xu Baowen Chen Lin and Nie Changhai Wang Lulu. 2008. Case studies on testing with compositional metamorphic relations. Journal of Southeast University (English Edition) 4 (2008).

[22]

Huai Liu, Xuan Liu, and Tsong Yueh Chen. 2012. A new method for constructing metamorphic relations. In 2012 12th International Conference on Quality Software. IEEE, 59--68.

Digital Library

[23]

Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. Deepct: Tomographic combinatorial testing for deep learning systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 614--618.

[24]

Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 120--131.

Digital Library

[25]

Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 100--111.

[26]

Wei Ma, Mike Papadakis, Anestis Tsakmalis, Maxime Cordy, and Yves Le Traon. 2021. Test selection for deep learning systems. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2 (2021), 1--22.

Digital Library

[27]

Yu-Seung Ma, Shin Yoo, and Taeho Kim. 2021. Selecting test inputs for DNNs using differential testing with subspecialized model instances. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1467--1470.

Digital Library

[28]

Urtzi Markiegi, Aitor Arrieta, Leire Etxeberria, and Goiuria Sagardui. 2021. Dynamic test prioritization of product lines: An application on configurable simulation models. Software Quality Journal 29, 4 (2021), 943--988.

Digital Library

[29]

Christian Murphy, Kuang Shen, and Gail Kaiser. 2009. Automatic system testing of programs without test oracles. In Proceedings of the eighteenth international symposium on Software testing and analysis. 189--200.

Digital Library

[30]

Mitchell Olsthoorn and Annibale Panichella. 2021. Multi-objective test case selection through linkage learning-based crossover. In International Symposium on Search Based Software Engineering. Springer, 87--102.

Digital Library

[31]

Annibale Panichella, Rocco Oliveto, Massimiliano Di Penta, and Andrea De Lucia. 2014. Improving multi-objective test case selection by injecting diversity in genetic algorithms. IEEE Transactions on Software Engineering 41, 4 (2014), 358--383.

Digital Library

[32]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. 1--18.

Digital Library

[33]

Dipesh Pradhan, Shuai Wang, Shaukat Ali, and Tao Yue. 2016. Search-based cost-effective test case selection within a time budget: An empirical study. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. 1085--1092.

[34]

Dipesh Pradhan, Shuai Wang, Shaukat Ali, Tao Yue, and Marius Liaaen. 2017. CBGA-ES: A cluster-based genetic algorithm with elitist selection for supporting multi-objective test optimization. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE, 367--378.

[35]

Kun Qiu, Zheng Zheng, Tsong Chen, and Pak-Lok Poon. 2020. Theoretical and Empirical Analyses of the Efectiveness of Metamorphic Relation Composition. IEEE Transactions on Software Engineering (2020).

[36]

Vincenzo Riccio, Gunel Jahangirova, Andrea Stocco, Nargiz Humbatova, Michael Weiss, and Paolo Tonella. 2020. Testing machine learning based systems: a systematic mapping. Empirical Software Engineering 25, 6 (2020), 5193--5254.

Digital Library

[37]

Vincenzo Riccio and Paolo Tonella. 2020. Model-based exploration of the frontier of behaviours for deep learning system testing. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 876--888.

Digital Library

[38]

Prashanta Saha and Upulee Kanewala. 2019. Fault detection effectiveness of metamorphic relations developed for testing supervised classifiers. In 2019 IEEE International conference on artificial intelligence testing (AITest). IEEE, 157--164.

[39]

S. Segura, G. Fraser, A. Sanchez, and A. Ruiz-Cortés. 2016. A Survey on Meta-morphic Testing. IEEE Transactions on Software Engineering 42, 9 (Sept 2016), 805--824.

[40]

S. Segura, D. Towey, Z.Q. Zhou, and T.Y. Chen. 2020. Metamorphic Testing: Testing the Untestable. IEEE Software 37, 3 (2020), 46--53.

[41]

Helge Spieker and Arnaud Gotlieb. 2020. Adaptive metamorphic testing with contextual bandits. Journal of Systems and Software 165 (2020), 110574.

[42]

Shuai Wang, Shaukat Ali, and Arnaud Gotlieb. 2015. Cost-effective test suite minimization in product lines using search techniques. Journal of Systems and Software 103 (2015), 370--391.

Digital Library

[43]

Xiaoyuan Xie, Zhiyi Zhang, Tsong Yueh Chen, Yang Liu, Pak-Lok Poon, and Baowen Xu. 2020. METTLE: a METamorphic testing approach to assessing and validating unsupervised machine LEarning systems. IEEE Transactions on Reliability 69, 4 (2020), 1293--1322.

[44]

Shin Yoo and Mark Harman. 2007. Pareto eficient multi-objective test case selection. In Proceedings of the 2007 international symposium on Software testing and analysis. 140--150.

Digital Library

[45]

Shin Yoo and Mark Harman. 2010. Using hybrid algorithm for pareto efficient multi-objective test suite minimisation. Journal of Systems and Software 83, 4 (2010), 689--701.

Digital Library

[46]

Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: a survey. Software testing, verification and reliability 22, 2 (2012), 67--120.

Digital Library

[47]

Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).

Digital Library

[48]

Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 132--142.

Digital Library

[49]

Yuecai Zhu, Emad Shihab, and Peter C Rigby. 2018. Test re-prioritization in continuous testing environments. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 69--79.

[50]

Tahereh Zohdinasab, Vincenzo Riccio, Alessio Gambi, and Paolo Tonella. 2021. Deephyperion: exploring the feature space of deep learning-based systems through illumination search. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 79--90.

Digital Library

Cited By

Ugarte MValle PIllarramendi MArrieta A(2025)Enhancing multi-objective test case selection through the mutation operatorAutomated Software Engineering10.1007/s10515-025-00489-632:1Online publication date: 30-Jan-2025
https://doi.org/10.1007/s10515-025-00489-6
Kalaee AParsa S(2025)Metamorphic testing of deep neural network-based autonomous driving systems using behavioural domain adequacyNeural Computing and Applications10.1007/s00521-024-10794-yOnline publication date: 23-Jan-2025
https://doi.org/10.1007/s00521-024-10794-y
Jafari FNadeem A(2024)Measuring Effectiveness of Metamorphic Relations for Image Processing Using Mutation TestingJournal of Imaging10.3390/jimaging1004008710:4(87)Online publication date: 6-Apr-2024
https://doi.org/10.3390/jimaging10040087
Show More Cited By

Index Terms

Multi-objective metamorphic follow-up test case selection for deep learning systems
1. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering

Recommendations

Feedback-Directed Metamorphic Testing
Over the past decade, metamorphic testing has gained rapidly increasing attention from both academia and industry, particularly thanks to its high efficacy on revealing real-life software faults in a wide variety of application domains. On the basis of a ...
Multi-objective black-box test case selection for system testing
GECCO '17: Proceedings of the Genetic and Evolutionary Computation Conference

Testing is a fundamental task to ensure software quality. Regression testing aims to ensure that changes to software do not introduce new failures. As resources are often limited and testing comprises a vast amount of test cases, different regression ...
Fault detection effectiveness of source test case generation strategies for metamorphic testing
MET '18: Proceedings of the 3rd International Workshop on Metamorphic Testing

Metamorphic testing is a well known approach to tackle the oracle problem in software testing. This technique requires the use of source test cases that serve as seeds for the generation of follow-up test cases. Systematic design of test cases is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference

July 2022

1472 pages

ISBN:9781450392372

DOI:10.1145/3512290

Editor:
Jonathan E. Fieldsend
University of Exeter
,
General Chair:
Markus Wagner
The University of Adelaide

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Commission

Conference

GECCO '22

Sponsor:

SIGEVO

GECCO '22: Genetic and Evolutionary Computation Conference

July 9 - 13, 2022

Massachusetts, Boston

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
193
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)9

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ugarte MValle PIllarramendi MArrieta A(2025)Enhancing multi-objective test case selection through the mutation operatorAutomated Software Engineering10.1007/s10515-025-00489-632:1Online publication date: 30-Jan-2025
https://doi.org/10.1007/s10515-025-00489-6
Kalaee AParsa S(2025)Metamorphic testing of deep neural network-based autonomous driving systems using behavioural domain adequacyNeural Computing and Applications10.1007/s00521-024-10794-yOnline publication date: 23-Jan-2025
https://doi.org/10.1007/s00521-024-10794-y
Jafari FNadeem A(2024)Measuring Effectiveness of Metamorphic Relations for Image Processing Using Mutation TestingJournal of Imaging10.3390/jimaging1004008710:4(87)Online publication date: 6-Apr-2024
https://doi.org/10.3390/jimaging10040087
Ayerdi JIriarte AValle PRoman IIllarramendi MArrieta A(2024)MarMot: Metamorphic Runtime Monitoring of Autonomous Driving SystemsACM Transactions on Software Engineering and Methodology10.1145/367817134:1(1-35)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.1145/3678171
Aghababaeyan ZAbdellatif MDadkhah MBriand L(2024)DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/364438833:6(1-29)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3644388
Arrieta AValle PIriarte AIllarramendi M(2023)How Do Deep Learning Faults Affect AI-Enabled Cyber-Physical Systems in Operation? A Preliminary Study Based on DeepCrime Mutation Operators2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1109/ESEM56168.2023.10304794(1-7)Online publication date: 26-Oct-2023
https://doi.org/10.1109/ESEM56168.2023.10304794
Arrieta AIllarramendi M(2023)A Novel Mutation Operator for Search-Based Test Case SelectionSearch-Based Software Engineering10.1007/978-3-031-48796-5_6(84-98)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.1007/978-3-031-48796-5_6
Sobania DBriesch MRöchner PRothlauf F(2023)MTGP: Combining Metamorphic Testing and Genetic ProgrammingGenetic Programming10.1007/978-3-031-29573-7_21(324-338)Online publication date: 12-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-29573-7_21

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten