Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3510003.3510232acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Adaptive test selection for deep neural networks

Published: 05 July 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Deep neural networks (DNN) have achieved tremendous development in the past decade. While many DNN-driven software applications have been deployed to solve various tasks, they could also produce incorrect behaviors and result in massive losses. To reveal the incorrect behaviors and improve the quality of DNN-driven applications, developers often need rich labeled data for the testing and optimization of DNN models. However, in practice, collecting diverse data from application scenarios and labeling them properly is often a highly expensive and time-consuming task.
    In this paper, we proposed an adaptive test selection method, namely ATS, for deep neural networks to alleviate this problem. ATS leverages the difference between the model outputs to measure the behavior diversity of DNN test data. And it aims at selecting a subset with diverse tests from a massive unlabelled dataset. We experiment ATS with four well-designed DNN models and four widely-used datasets in comparison with various kinds of neuron coverage (NC). The results demonstrate that ATS can significantly outperform all test selection methods in assessing both fault detection and model improvement capability of test suites. It is promising to save the data labeling and model retraining costs for deep neural networks.

    References

    [1]
    [n.d.]. Amazon promises fix for creepy Alexa laugh - BBC News. https://www.bbc.com/news/technology-43325230. (Accessed on 08/23/2021).
    [2]
    [n.d.]. A Google self-driving car caused a crash for the first time - The Verge, https://www.theverge.com/2016/2/29/11134344/google-self-driving-car-crash-report. (Accessed on 09/04/2021).
    [3]
    [n.d.]. Tesla's Latest Autopilot Death Looks Just Like a Prior Crash | WIRED. https://www.wired.com/story/teslas-latest-autopilot-death-looks-like-prior-crash/. (Accessed on 09/04/2021).
    [4]
    Paul Eric Ammann and John C Knight. 1988. Data diversity: An approach to software fault tolerance. Ieee transactions on computers 37, 4 (1988), 418--425.
    [5]
    Saswat Anand, Edmund K Burke, Tsong Yueh Chen, John Clark, Myra B Cohen, Wolfgang Grieskamp, Mark Harman, Mary Jean Harrold, Phil McMinn, Antonia Bertolino, et al. 2013. An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software 86, 8 (2013), 1978--2001.
    [6]
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
    [7]
    Arlinta C Barus, Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Robert Merkel, and Gregg Rothermel. 2016. A cost-effective random testing method for programs with non-numeric inputs. IEEE Trans. Comput. 65, 12 (2016), 3509--3523.
    [8]
    Peter G Bishop. 1993. The variation of software survival time for different operational input profiles (or why you can wait a long time for a big bug to fail). In FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing. IEEE, 98--107.
    [9]
    Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).
    [10]
    Otto Bretscher. 1997. Linear algebra with applications. Prentice Hall Eaglewood Cliffs, NJ.
    [11]
    Paulo MS Bueno, Mario Jino, and W Eric Wong. 2014. Diversity oriented test data generation using metaheuristic search techniques. Information Sciences 259 (2014), 490--509.
    [12]
    FT Chan, Tsong Yueh Chen, IK Mak, and Yuen-Tak Yu. 1996. Proportional sampling strategy: guidelines for software testing practitioners. Information and Software Technology 38, 12 (1996), 775--782.
    [13]
    Kp Chan, Ty Chen, and D Towey. 2006. Restricted random testing: Adaptive random testing by exclusion. International Journal of Software Engineering & Knowledge Engineering 16, 4 (2006), 553--584.
    [14]
    Junjie Chen, Ming Yan, Zan Wang, Yuning Kang, and Zhuo Wu. 2020. Deep neural network test coverage: How far are we? arXiv preprint arXiv:2010.04946 (2020).
    [15]
    Tsong Yueh Chen, Hing Leung, and I. K. Mak. 2004. Adaptive Random Testing. (2004).
    [16]
    Tsong Yueh Chen, R Merkel, PK Wong, and G Eddy. 2004. Adaptive random testing through dynamic partitioning. In Fourth International Conference on Quality Software, 2004. QSIC 2004. Proceedings. IEEE, 79--86.
    [17]
    Tsong Yueh Chen and Robert G. Merkel. 2007. Quasi-Random Testing. IEEE Transactions on Reliability 56 (2007), 562--568.
    [18]
    T. Y. Chen, R. G. Merkel, P. K. Wong, and G. Eddy. 2004. Adaptive random testing through dynamic partitioning. In Quality Software, 2004. QSIC 2004. Proceedings. Fourth International Conference on.
    [19]
    Aron Culotta and Andrew McCallum. 2005. Reducing labeling effort for structured prediction tasks. In AAAI, Vol. 5. 746--751.
    [20]
    Jack Cuzick. 1985. A Wilcoxon-type test for trend. Statistics in medicine 4, 1 (1985), 87--90.
    [21]
    L De Capitani and D De Martini. 2011. On stochastic orderings of the Wilcoxon rank sum test statistic---with applications to reproducibility probability estimation testing. Statistics & probability letters 81, 8 (2011), 937--946.
    [22]
    Daniel Di Nardo, Nadia Alshahwan, Lionel Briand, and Yvan Labiche. 2013. Coverage-based test case prioritisation: An industrial case study. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. IEEE, 302--311.
    [23]
    Yang Feng, Qingkai Shi, Xinyu Gao, Jun Wan, Chunrong Fang, and Zhenyu Chen. 2020. DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 177--188.
    [24]
    George B Finelli. 1991. NASA software failure characterization experiments. Reliability Engineering & System Safety 32, 1--2 (1991), 155--169.
    [25]
    Abraham Adolf Fraenkel, Yehoshua Bar-Hillel, and Azriel Levy. 1973. Foundations of set theory. Elsevier.
    [26]
    Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun. 2018. DLFuzz: differential fuzzing testing of deep learning systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 739--743.
    [27]
    Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, and Miryung Kim. 2020. Is neuron coverage a meaningful measure for testing deep neural networks?. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 851--862.
    [28]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
    [29]
    Qiang Hu, Lei Ma, Xiaofei Xie, Bing Yu, Yang Liu, and Jianjun Zhao. 2019. Deep-Mutation++: A Mutation Testing Framework for Deep Learning Systems. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1158--1161.
    [30]
    Gunel Jahangirova and Paolo Tonella. 2020. An Empirical Evaluation of Mutation Operators for Deep Learning Systems. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST).
    [31]
    Bo Jiang, Zhenyu Zhang, Wing Kwong Chan, and TH Tse. 2009. Adaptive random test case prioritization. In 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE, 233--244.
    [32]
    Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).
    [33]
    A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases 1, 4 (2009).
    [34]
    Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2009. CIFAR10. https://www.cs.toronto.edu/~kriz/cifar.html.
    [35]
    Alexey Kurakin, Ian Goodfellow, Samy Bengio, et al. 2016. Adversarial examples in the physical world.
    [36]
    Hugo Larochelle, Yoshua Bengio, Jérôme Louradour, and Pascal Lamblin. 2009. Exploring strategies for training deep neural networks. Journal of machine learning research 10, 1 (2009).
    [37]
    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
    [38]
    David D Lewis and William A Gale. 1994. A sequential algorithm for training text classifiers. In SIGIR' 94. Springer, 3--12.
    [39]
    Zenan Li, Xiaoxing Ma, Chang Xu, and Chun Cao. 2019. Structural coverage criteria for neural networks could be misleading. In 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 89--92.
    [40]
    Zenan Li, Xiaoxing Ma, Chang Xu, Chun Cao, Jingwei Xu, and Jian Lü. 2019. Boosting operational DNN testing efficiency through conditioning. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 499--509.
    [41]
    Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 120--131.
    [42]
    Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 100--111.
    [43]
    Wei Ma, Mike Papadakis, Anestis Tsakmalis, Maxime Cordy, and Yves Le Traon. 2021. Test selection for deep learning systems. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2 (2021), 1--22.
    [44]
    Y. K. Malaiya. 1995. Antirandom testing: getting the most out of black-box testing. In International Symposium on Software Reliability Engineering.
    [45]
    Johannes Mayer. 2005. Lattice-based adaptive random testing. In IEEE/ACM International Conference on Automated Software Engineering.
    [46]
    Agnieszka Mikołajczyk and Michał Grochowski. 2018. Data augmentation for improving deep learning in image classification problem. In 2018 international interdisciplinary PhD workshop (IIPhDW). IEEE, 117--122.
    [47]
    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).
    [48]
    Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P). IEEE, 372--387.
    [49]
    Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Getmobile Mobile Computing & Communications 22, 3 (2017).
    [50]
    Burr Settles. 2009. Active learning literature survey. (2009).
    [51]
    Burr Settles. 2011. From Theories to Queries: Active Learning in Practice. In Active Learning and Experimental Design workshop In conjunction with AISTATS 2010 (Proceedings of Machine Learning Research), Isabelle Guyon, Gavin Cawley, Gideon Dror, Vincent Lemaire, and Alexander Statnikov (Eds.), Vol. 16. PMLR, Sardinia, Italy, 1--18. https://proceedings.mlr.press/v16/settles11a.html
    [52]
    Burr Settles and Mark Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 1070--1079.
    [53]
    Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of Big Data 6, 1 (2019), 1--48.
    [54]
    Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
    [55]
    Satinder P Singh, Michael J Kearns, Diane J Litman, and Marilyn A Walker. 2000. Reinforcement learning for spoken dialogue systems. In Advances in Neural Information Processing Systems. 956--962.
    [56]
    Gilbert Strang, Gilbert Strang, Gilbert Strang, and Gilbert Strang. 1993. Introduction to linear algebra. Vol. 3. Wellesley-Cambridge Press Wellesley, MA.
    [57]
    Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, Matthew Hill, and Rob Ashmore. 2019. DeepConcolic: Testing and debugging deep neural networks. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 111--114.
    [58]
    Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, Matthew Hill, and Rob Ashmore. 2019. Testing Deep Neural Networks. arXiv:cs.LG/1803.04792
    [59]
    Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2017. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. (2017).
    [60]
    Dave Towey. 2007. Adaptive random testing; ubiquitous testing to support ubiquitous computing. In Proceedings of the Korea Society of Information Technology Applications Conference. The Korea Society of Information Technology Applications, 138--138.
    [61]
    Lee J White and Edward I Cohen. 1980. A domain strategy for computer program testing. IEEE transactions on software engineering 3 (1980), 247--257.
    [62]
    Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:cs.LG/cs.LG/1708.07747
    [63]
    Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: a coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 146--157.
    [64]
    Christopher J.C. Burges Yann LeCun, Corinna Cortes. 1998. MNIST. http://yann.lecun.com/exdb/mnist/.
    [65]
    Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: a survey. Software testing, verification and reliability 22, 2 (2012), 67--120.
    [66]
    Xueying Zhan, Huan Liu, Qing Li, and Antoni B Chan. [n.d.]. A Comparative Survey: Benchmarking for Pool-based Active Learning, ([n. d.]).
    [67]
    Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 132--142.
    [68]
    Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel. 2012. A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal 5, 5 (2012), 363--387.

    Cited By

    View all
    • (2024)Keeper: Automated Testing and Fixing of Machine Learning SoftwareACM Transactions on Software Engineering and Methodology10.1145/3672451Online publication date: 13-Jun-2024
    • (2024)DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/364438833:6(1-29)Online publication date: 27-Jun-2024
    • (2024)Test Optimization in DNN Testing: A SurveyACM Transactions on Software Engineering and Methodology10.1145/364367833:4(1-42)Online publication date: 27-Jan-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '22: Proceedings of the 44th International Conference on Software Engineering
    May 2022
    2508 pages
    ISBN:9781450392211
    DOI:10.1145/3510003
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adaptive random testing
    2. deep learning testing
    3. deep neural networks
    4. test selection

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Science, Technology and Innovation commission of Shenzhen Municipality

    Conference

    ICSE '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)226
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Keeper: Automated Testing and Fixing of Machine Learning SoftwareACM Transactions on Software Engineering and Methodology10.1145/3672451Online publication date: 13-Jun-2024
    • (2024)DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/364438833:6(1-29)Online publication date: 27-Jun-2024
    • (2024)Test Optimization in DNN Testing: A SurveyACM Transactions on Software Engineering and Methodology10.1145/364367833:4(1-42)Online publication date: 27-Jan-2024
    • (2024)MultiTest: Physical-Aware Object Insertion for Testing Multi-sensor Fusion Perception SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639191(1-13)Online publication date: 20-May-2024
    • (2024)LUNA: A Model-Based Universal Analysis Framework for Large Language ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.3411928(1-28)Online publication date: 2024
    • (2024)FATS: Feature Distribution Analysis-Based Test Selection for Deep Learning EnhancementIEEE Transactions on Big Data10.1109/TBDATA.2023.333464810:2(132-145)Online publication date: Apr-2024
    • (2024)A Survey on Test Input Selection and Prioritization for Deep Neural Networks2024 10th International Symposium on System Security, Safety, and Reliability (ISSSR)10.1109/ISSSR61934.2024.00035(232-243)Online publication date: 16-Mar-2024
    • (2024)DeepSense: test prioritization for neural network based on multiple mutation and manifold spatial distributionEvolutionary Intelligence10.1007/s12065-024-00961-4Online publication date: 30-Jul-2024
    • (2024)Towards Exploring the Limitations of Test Selection Techniques on Graph Neural Networks: An Empirical StudyEmpirical Software Engineering10.1007/s10664-024-10515-y29:5Online publication date: 22-Jul-2024
    • (2023)LaF: Labeling-free Model Selection for Automated Deep Neural Network ReusingACM Transactions on Software Engineering and Methodology10.1145/361166633:1(1-28)Online publication date: 31-Jul-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media