Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3512290.3528697acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Multi-objective metamorphic follow-up test case selection for deep learning systems

Published: 08 July 2022 Publication History

Abstract

Deep Learning (DL) components are increasing their presence in safety and mission-critical software systems. To ensure a high dependability of DL systems, robust verification methods are required, for which automation is highly beneficial (e.g., more test cases can be executed). Metamorphic Testing (MT) is a technique that has shown to alleviate the test oracle problem when testing DL systems, and therefore, increasing test automation. However, a drawback of this technique lies into the need of multiple test executions to obtain the test verdict (named as the source and the follow-up test cases), requiring additional testing cost. In this paper we propose an approach based on multi-objective search to select follow-up test cases. Our approach makes use of source test cases to measure the uncertainty provoked by such test inputs in the DL model, and based on that, select failure-revealing follow-up test cases. We integrate our approach with the NSGA-II algorithm. An empirical evaluation on three DL models tackling the image classification problem, along with five different metamorphic relations demonstrates that our approach outperformed the baseline algorithm between 17.09 to 59.20% on average when considering the revisited Hypervolume quality indicator.

References

[1]
John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Maria Lomeli, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers. 2021. Testing Web Enabled Simulation at Scale Using Metamorphic Testing. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 140--149.
[2]
Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In 2011 33rd International Conference on Software Engineering (ICSE). IEEE, 1--10.
[3]
Aitor Arrieta. 2022. On the Cost-Efectiveness of Composite Metamorphic Relations for Testing Deep Learning Systems. In Metamorphic Testing Workshop (MET'22). ACM.
[4]
Aitor Arrieta, Joseba Andoni Agirre, and Goiuria Sagardui. 2020. Seeding strategies for multi-objective test case selection: an application on simulation-based testing. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 1222--1231.
[5]
Aitor Arrieta, Shuai Wang, Ainhoa Arruabarrena, Urtzi Markiegi, Goiuria Sagardui, and Leire Etxeberria. 2018. Multi-objective black-box test case selection for cost-effectively testing simulation models. In Proceedings of the Genetic and Evolutionary Computation Conference. 1411--1418.
[6]
Aitor Arrieta, Shuai Wang, Urtzi Markiegi, Ainhoa Arruabarrena, Leire Etxeberria, and Goiuria Sagardui. 2019. Pareto eficient multi-objective black-box test case selection for simulation-based testing. Information and Software Technology 114 (2019), 137--154.
[7]
Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2021. Generating metamorphic relations for cyber-physical systems with genetic programming: an industrial case study. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1264--1274.
[8]
T. Y. Chen, S. C. Cheung, and S. M. Yiu. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, The Hong Kong University of Science and Technology.
[9]
Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Comput. Surv. 51, 1, Article 4 (Jan. 2018), 27 pages.
[10]
Yiqun T Chen, Rahul Gopinath, Anita Tadakamalla, Michael D Ernst, Reid Holmes, Gordon Fraser, Paul Ammann, and René Just. 2020. Revisiting the relationship between fault detection, test adequacy criteria, and test set size. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 237--249.
[11]
Junhua Ding, Xiaojun Kang, and Xin-Hua Hu. 2017. Validating a deep learning framework by metamorphic testing. In 2017 IEEE/ACM 2nd International Workshop on Metamorphic Testing (MET). IEEE, 28--34.
[12]
Alastair F. Donaldson. 2019. Metamorphic Testing of Android Graphics Drivers. In Proceedings of the 4th International Workshop on Metamorphic Testing (Montreal, Quebec, Canada) (MET '19). IEEE Press, 1.
[13]
Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M Rao, RP Jagadeesh Chandra Bose, Neville Dubash, and Sanjay Podder. 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 118--128.
[14]
Emelie Engström, Per Runeson, and Mats Skoglund. 2010. A systematic review on regression test selection techniques. Information and Software Technology 52, 1 (2010), 14--30.
[15]
Vahid Garousi, Ramazan Özkan, and Aysu Betin-Can. 2018. Multi-objective regression test selection in practice: An empirical study in the defense software industry. Information and Software Technology 103 (2018), 40--54.
[16]
Fitash Ul Haq, Donghwan Shin, Shiva Nejati, and Lionel C Briand. 2020. Comparing offline and online testing of deep neural networks: An autonomous car case study. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). IEEE, 85--95.
[17]
Christopher Henard, Mike Papadakis, Mark Harman, Yue Jia, and Yves Le Traon. 2016. Comparing white-box and black-box test prioritization. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 523--534.
[18]
Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. Deepcrime: Mutation testing of deep learning systems based on real faults. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 67--78.
[19]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1039--1049.
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097--1105.
[21]
Dong Guowei Xu Baowen Chen Lin and Nie Changhai Wang Lulu. 2008. Case studies on testing with compositional metamorphic relations. Journal of Southeast University (English Edition) 4 (2008).
[22]
Huai Liu, Xuan Liu, and Tsong Yueh Chen. 2012. A new method for constructing metamorphic relations. In 2012 12th International Conference on Quality Software. IEEE, 59--68.
[23]
Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. Deepct: Tomographic combinatorial testing for deep learning systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 614--618.
[24]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 120--131.
[25]
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 100--111.
[26]
Wei Ma, Mike Papadakis, Anestis Tsakmalis, Maxime Cordy, and Yves Le Traon. 2021. Test selection for deep learning systems. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2 (2021), 1--22.
[27]
Yu-Seung Ma, Shin Yoo, and Taeho Kim. 2021. Selecting test inputs for DNNs using differential testing with subspecialized model instances. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1467--1470.
[28]
Urtzi Markiegi, Aitor Arrieta, Leire Etxeberria, and Goiuria Sagardui. 2021. Dynamic test prioritization of product lines: An application on configurable simulation models. Software Quality Journal 29, 4 (2021), 943--988.
[29]
Christian Murphy, Kuang Shen, and Gail Kaiser. 2009. Automatic system testing of programs without test oracles. In Proceedings of the eighteenth international symposium on Software testing and analysis. 189--200.
[30]
Mitchell Olsthoorn and Annibale Panichella. 2021. Multi-objective test case selection through linkage learning-based crossover. In International Symposium on Search Based Software Engineering. Springer, 87--102.
[31]
Annibale Panichella, Rocco Oliveto, Massimiliano Di Penta, and Andrea De Lucia. 2014. Improving multi-objective test case selection by injecting diversity in genetic algorithms. IEEE Transactions on Software Engineering 41, 4 (2014), 358--383.
[32]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. 1--18.
[33]
Dipesh Pradhan, Shuai Wang, Shaukat Ali, and Tao Yue. 2016. Search-based cost-effective test case selection within a time budget: An empirical study. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. 1085--1092.
[34]
Dipesh Pradhan, Shuai Wang, Shaukat Ali, Tao Yue, and Marius Liaaen. 2017. CBGA-ES: A cluster-based genetic algorithm with elitist selection for supporting multi-objective test optimization. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE, 367--378.
[35]
Kun Qiu, Zheng Zheng, Tsong Chen, and Pak-Lok Poon. 2020. Theoretical and Empirical Analyses of the Efectiveness of Metamorphic Relation Composition. IEEE Transactions on Software Engineering (2020).
[36]
Vincenzo Riccio, Gunel Jahangirova, Andrea Stocco, Nargiz Humbatova, Michael Weiss, and Paolo Tonella. 2020. Testing machine learning based systems: a systematic mapping. Empirical Software Engineering 25, 6 (2020), 5193--5254.
[37]
Vincenzo Riccio and Paolo Tonella. 2020. Model-based exploration of the frontier of behaviours for deep learning system testing. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 876--888.
[38]
Prashanta Saha and Upulee Kanewala. 2019. Fault detection effectiveness of metamorphic relations developed for testing supervised classifiers. In 2019 IEEE International conference on artificial intelligence testing (AITest). IEEE, 157--164.
[39]
S. Segura, G. Fraser, A. Sanchez, and A. Ruiz-Cortés. 2016. A Survey on Meta-morphic Testing. IEEE Transactions on Software Engineering 42, 9 (Sept 2016), 805--824.
[40]
S. Segura, D. Towey, Z.Q. Zhou, and T.Y. Chen. 2020. Metamorphic Testing: Testing the Untestable. IEEE Software 37, 3 (2020), 46--53.
[41]
Helge Spieker and Arnaud Gotlieb. 2020. Adaptive metamorphic testing with contextual bandits. Journal of Systems and Software 165 (2020), 110574.
[42]
Shuai Wang, Shaukat Ali, and Arnaud Gotlieb. 2015. Cost-effective test suite minimization in product lines using search techniques. Journal of Systems and Software 103 (2015), 370--391.
[43]
Xiaoyuan Xie, Zhiyi Zhang, Tsong Yueh Chen, Yang Liu, Pak-Lok Poon, and Baowen Xu. 2020. METTLE: a METamorphic testing approach to assessing and validating unsupervised machine LEarning systems. IEEE Transactions on Reliability 69, 4 (2020), 1293--1322.
[44]
Shin Yoo and Mark Harman. 2007. Pareto eficient multi-objective test case selection. In Proceedings of the 2007 international symposium on Software testing and analysis. 140--150.
[45]
Shin Yoo and Mark Harman. 2010. Using hybrid algorithm for pareto efficient multi-objective test suite minimisation. Journal of Systems and Software 83, 4 (2010), 689--701.
[46]
Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: a survey. Software testing, verification and reliability 22, 2 (2012), 67--120.
[47]
Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).
[48]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 132--142.
[49]
Yuecai Zhu, Emad Shihab, and Peter C Rigby. 2018. Test re-prioritization in continuous testing environments. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 69--79.
[50]
Tahereh Zohdinasab, Vincenzo Riccio, Alessio Gambi, and Paolo Tonella. 2021. Deephyperion: exploring the feature space of deep learning-based systems through illumination search. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 79--90.

Cited By

View all
  • (2025)Enhancing multi-objective test case selection through the mutation operatorAutomated Software Engineering10.1007/s10515-025-00489-632:1Online publication date: 30-Jan-2025
  • (2025)Metamorphic testing of deep neural network-based autonomous driving systems using behavioural domain adequacyNeural Computing and Applications10.1007/s00521-024-10794-yOnline publication date: 23-Jan-2025
  • (2024)Measuring Effectiveness of Metamorphic Relations for Image Processing Using Mutation TestingJournal of Imaging10.3390/jimaging1004008710:4(87)Online publication date: 6-Apr-2024
  • Show More Cited By

Index Terms

  1. Multi-objective metamorphic follow-up test case selection for deep learning systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2022
    1472 pages
    ISBN:9781450392372
    DOI:10.1145/3512290
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning systems
    2. metamorphic testing
    3. multi-objective search
    4. test case selection

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    GECCO '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)47
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Enhancing multi-objective test case selection through the mutation operatorAutomated Software Engineering10.1007/s10515-025-00489-632:1Online publication date: 30-Jan-2025
    • (2025)Metamorphic testing of deep neural network-based autonomous driving systems using behavioural domain adequacyNeural Computing and Applications10.1007/s00521-024-10794-yOnline publication date: 23-Jan-2025
    • (2024)Measuring Effectiveness of Metamorphic Relations for Image Processing Using Mutation TestingJournal of Imaging10.3390/jimaging1004008710:4(87)Online publication date: 6-Apr-2024
    • (2024)MarMot: Metamorphic Runtime Monitoring of Autonomous Driving SystemsACM Transactions on Software Engineering and Methodology10.1145/367817134:1(1-35)Online publication date: 15-Jul-2024
    • (2024)DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/364438833:6(1-29)Online publication date: 27-Jun-2024
    • (2023)How Do Deep Learning Faults Affect AI-Enabled Cyber-Physical Systems in Operation? A Preliminary Study Based on DeepCrime Mutation Operators2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1109/ESEM56168.2023.10304794(1-7)Online publication date: 26-Oct-2023
    • (2023)A Novel Mutation Operator for Search-Based Test Case SelectionSearch-Based Software Engineering10.1007/978-3-031-48796-5_6(84-98)Online publication date: 8-Dec-2023
    • (2023)MTGP: Combining Metamorphic Testing and Genetic ProgrammingGenetic Programming10.1007/978-3-031-29573-7_21(324-338)Online publication date: 12-Apr-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media