Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access
Just Accepted

Can Coverage Criteria Guide Failure Discovery for Image Classifiers? An Empirical Study

Online AM: 13 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Quality assurance of deep neural networks (DNNs) is crucial for the deployment of DNN-based software, especially in mission- and safety-critical tasks. Inspired by structural white-box testing in traditional software, many test criteria have been proposed to test DNNs, i.e., to exhibit erroneous behaviors by activating new test units that have not been covered, such as new neurons, values, and decision paths. Many studies have been done to evaluate the effectiveness of DNN test coverage criteria. However, existing empirical studies mainly focused on measuring the effectiveness of DNN test criteria for improving the adversarial robustness of DNNs, while ignoring the correctness property when testing DNNs. To fill in this gap, we conduct a comprehensive study on 11 structural coverage criteria, 6 widely-used image datasets, and 9 popular DNNs. We investigate the effectiveness of DNN coverage criteria over natural inputs from 4 aspects: (1) the correlation between test coverage and test diversity; (2) the effects of criteria parameters and target DNNs; (3) the effectiveness to prioritize in-distribution natural inputs that lead to erroneous behaviors; (4) the capability to detect out-of-distribution natural samples. Our findings include: (1) For measuring the diversity, coverage criteria considering the relationship between different neurons are more effective than coverage criteria that treat each neuron independently. For instance, the neuron-path criteria (i.e., SNPC and ANPC) show high correlation with test diversity, which is promising to measure test diversity for DNNs. (2) The hyper-parameters have a big influence on the effectiveness of criteria, especially those relevant to the granularity of test criteria. Meanwhile, the computational complexity is one of the important issues to be considered when designing deep learning test coverage criteria, especially for large-scale models. (3) Test criteria related to data distribution (i.e., LSA and DSA, SNAC, and NBC) can be used to prioritize both in-distribution natural faults and out-of-distribution inputs. Furthermore, for OOD detection, the boundary metrics (i.e., SNAC and NBC) are also effective indicators with lower computational costs and higher detection efficiency compared with LSA and DSA. These findings motivate follow-up research on scalable test coverage criteria that improve the correctness of DNNs.

    References

    [1]
    2021. CIFAR10 and CIFAR100. https://www.cs.toronto.edu/kriz/cifar.html.
    [2]
    2021. Tiny-ImageNet. http://groups.csail.mit.edu/vision/TinyImages/.
    [3]
    Zohreh Aghababaeyan, Manel Abdellatif, Lionel Briand, Mojtaba Bagherzadeh, et al. 2021. Black-Box Testing of Deep Neural Networks through Test Case Diversity. arXiv preprint arXiv:2112.12591 (2021).
    [4]
    Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning 79, 1 (2010), 151–175.
    [5]
    David Berend, Xiaofei Xie, Lei Ma, Lingjun Zhou, Yang Liu, Chi Xu, and Jianjun Zhao. 2020. Cats are not fish: Deep learning testing calls for out-of-distribution awareness. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 1041–1052.
    [6]
    Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).
    [7]
    Taejoon Byun, Sanjai Rayadurgam, and Mats PE Heimdahl. 2021. Black-box testing of deep neural networks. In 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). IEEE, 309–320.
    [8]
    Taejoon Byun, Vaibhav Sharma, Abhishek Vijayakumar, Sanjai Rayadurgam, and Darren Cofer. 2019. Input prioritization for testing neural networks. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, 63–70.
    [9]
    Jiefeng Chen, Yixuan Li, Xi Wu, Yingyu Liang, and Somesh Jha. 2021. Atom: Robustifying out-of-distribution detection using outlier mining. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 430–445.
    [10]
    Tsong Yueh Chen, Fei-Ching Kuo, Robert G Merkel, and TH Tse. 2010. Adaptive random testing: The art of test case diversity. Journal of Systems and Software 83, 1 (2010), 60–66.
    [11]
    Xingyu Chen, Xuguang Lan, Fuchun Sun, and Nanning Zheng. 2020. A boundary based out-of-distribution classifier for generalized zero-shot learning. In European Conference on Computer Vision. Springer, 572–588.
    [12]
    Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, and Michiel Bacchiani. 2018. State-of-the-art speech recognition with sequence-to-sequence models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 4774–4778.
    [13]
    Per-Erik Danielsson. 1980. Euclidean distance mapping. Computer Graphics and image processing 14, 3 (1980), 227–248.
    [14]
    Li Deng. 2012. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.
    [15]
    Terrance DeVries and Graham W Taylor. 2018. Learning confidence for out-of-distribution detection in neural networks. arXiv preprint arXiv:1802.04865 (2018).
    [16]
    Akshay Raj Dhamija, Manuel Günther, and Terrance E Boult. 2018. Reducing network agnostophobia. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 9175–9186.
    [17]
    Swaroopa Dola, Matthew B Dwyer, and Mary Lou Soffa. 2021. Distribution-aware testing of neural networks using generative models. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 226–237.
    [18]
    Yizhen Dong, Peixin Zhang, Jingyi Wang, Shuang Liu, Jun Sun, Jianye Hao, Xinyu Wang, Li Wang, Jin Song Dong, and Dai Ting. 2019. There is limited correlation between coverage and robustness for deep neural networks. arXiv preprint arXiv:1911.05904 (2019).
    [19]
    Nick Drummond and Rob Shearer. 2006. The open world assumption. In eSI Workshop: The Closed World of Databases meets the Open World of the Semantic Web, Vol. 15.
    [20]
    Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, and Jianjun Zhao. 2019. Deepstellar: Model-based quantitative analysis of stateful deep learning systems. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 477–487.
    [21]
    Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Jianjun Zhao, and Yang Liu. 2018. Deepcruiser: Automated guided testing for stateful deep learning systems. arXiv preprint arXiv:1812.05339 (2018).
    [22]
    Geli Fei and Bing Liu. 2016. Breaking the closed world assumption in text classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 506–514.
    [23]
    Yang Feng, Qingkai Shi, Xinyu Gao, Jun Wan, Chunrong Fang, and Zhenyu Chen. 2020. Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 177–188.
    [24]
    Sujan Sai Gannamaneni, Maram Akila, Christian Heinzemann, and Matthias Woehrle. 2022. The good and the bad: using neuron coverage as a DNN validation technique. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety. Springer International Publishing Cham, 383–403.
    [25]
    Simos Gerasimou, Hasan Ferit Eniser, and Alper Sen. 2020. Importance-Driven Deep Learning System Testing. In Proceedings of the 42nd International Conference on Software Engineering.
    [26]
    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.
    [27]
    Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun. 2018. Dlfuzz: Differential fuzzing testing of deep learning systems. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 739–743.
    [28]
    Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, and Miryung Kim. 2020. Is neuron coverage a meaningful measure for testing deep neural networks?. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 851–862.
    [29]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
    [30]
    Matthias Hein, Maksym Andriushchenko, and Julian Bitterwolf. 2019. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 41–50.
    [31]
    Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations.
    [32]
    Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. 2019. Deep anomaly detection with outlier exposure. In International Conference on Learning Representations.
    [33]
    Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
    [34]
    Yen-Chang Hsu, Yilin Shen, Hongxia Jin, and Zsolt Kira. 2020. Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10951–10960.
    [35]
    Rui Huang, Andrew Geng, and Yixuan Li. 2021. On the importance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems 34 (2021).
    [36]
    Wei Huang, Youcheng Sun, Xingyu Zhao, James Sharp, Wenjie Ruan, Jie Meng, and Xiaowei Huang. 2021. Coverage-Guided Testing for Recurrent Neural Networks. IEEE Transactions on Reliability (2021).
    [37]
    Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. In Proceedings of the 36th International Conference on Software Engineering. ACM, 435–445.
    [38]
    Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448–456.
    [39]
    Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81–93.
    [40]
    Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1039–1049.
    [41]
    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
    [42]
    Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. 2017. Training confidence-calibrated classifiers for detecting out-of-distribution samples. arXiv preprint arXiv:1711.09325 (2017).
    [43]
    Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. 2018. Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples. In International Conference on Learning Representations.
    [44]
    Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. 2018. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems 31 (2018).
    [45]
    Seokhyun Lee, Sooyoung Cha, Dain Lee, and Hakjoo Oh. 2020. Effective white-box testing of deep neural networks with adaptive neuron-selection strategy. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 165–176.
    [46]
    Zenan Li, Xiaoxing Ma, Chang Xu, and Chun Cao. 2019. Structural coverage criteria for neural networks could be misleading. In Proceedings of the 41st International Conference on Software Engineering: New Ideas and Emerging Results. 89–92.
    [47]
    Zenan Li, Xiaoxing Ma, Chang Xu, Chun Cao, Jingwei Xu, and Jian Lü. 2019. Boosting operational dnn testing efficiency through conditioning. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 499–509.
    [48]
    Zenan Li, Xiaoxing Ma, Chang Xu, Jingwei Xu, Chun Cao, and Jian Lü. 2019. Operational calibration: Debugging confidence errors for DNNs in the field. arXiv preprint arXiv:1910.02352 (2019).
    [49]
    Zenan Li, Xiaoxing Ma, Chang Xu, Jingwei Xu, Chun Cao, and Jian Lü. 2020. Operational Calibration: Debugging Confidence Errors for DNNs in the Field. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
    [50]
    Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. 2018. Enhancing the reliability of out-of-distribution image detection in neural networks. Internation Conference on Learning Representations.
    [51]
    Ziqian Lin, Sreya Dutta Roy, and Yixuan Li. 2021. MOOD: Multi-level Out-of-distribution Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15313–15323.
    [52]
    Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. Energy-based Out-of-distribution Detection. Advances in Neural Information Processing Systems 33 (2020).
    [53]
    Yuhang Liu, Fandong Zhang, Qianyi Zhang, Siwen Wang, Yizhou Wang, and Yizhou Yu. 2020. Cross-view correspondence reasoning based on bipartite graph convolutional network for mammogram mass detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3812–3822.
    [54]
    Wenjie Luo, Bin Yang, and Raquel Urtasun. 2018. Fast and furious: Real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 3569–3577.
    [55]
    Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. Deepct: Tomographic combinatorial testing for deep learning systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 614–618.
    [56]
    Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 120–131.
    [57]
    Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. Deepmutation: Mutation testing of deep learning systems. In Proceedings of the 29th International Symposium on Software Reliability Engineering (ISSRE). 100–111.
    [58]
    Lei Ma, Fuyuan Zhang, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. Combinatorial testing for deep learning systems. arXiv preprint arXiv:1806.07723 (2018).
    [59]
    Wei Ma, Mike Papadakis, Anestis Tsakmalis, Maxime Cordy, and Yves Le Traon. 2021. Test selection for deep learning systems. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2 (2021), 1–22.
    [60]
    Sina Mohseni, Mandar Pitale, JBS Yadawa, and Zhangyang Wang. 2020. Self-supervised learning for generalizable out-of-distribution detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5216–5223.
    [61]
    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).
    [62]
    Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning. 4901–4911.
    [63]
    Karl Pearson. 1920. Notes on the history of correlation. Biometrika 13, 1 (1920), 25–45.
    [64]
    Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
    [65]
    Jie Ren, Peter J Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A DePristo, Joshua V Dillon, and Balaji Lakshminarayanan. 2019. Likelihood ratios for out-of-distribution detection. arXiv preprint arXiv:1906.02845 (2019).
    [66]
    Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4902–4912.
    [67]
    Chandramouli Shama Sastry and Sageev Oore. 2020. Detecting out-of-distribution examples with gram matrices. In International Conference on Machine Learning. PMLR, 8491–8501.
    [68]
    Robin Tibor Schirrmeister, Yuxuan Zhou, Tonio Ball, and Dan Zhang. 2020. Understanding anomaly detection with deep invertible networks through hierarchies of distributions and features. arXiv preprint arXiv:2006.10848 (2020).
    [69]
    David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, and et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484.
    [70]
    Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
    [71]
    Charles Spearman. 1961. ” General Intelligence” Objectively Determined and Measured. (1961).
    [72]
    Yiyou Sun, Chuan Guo, and Yixuan Li. 2021. React: Out-of-distribution detection with rectified activations. Advances in Neural Information Processing Systems 34 (2021).
    [73]
    Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, Matthew Hill, and Rob Ashmore. 2019. Structural test coverage criteria for deep neural networks. ACM Transactions on Embedded Computing Systems 18, 5s (2019), 1–23.
    [74]
    Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic testing for deep neural networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 109–119.
    [75]
    Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineering. 303–314.
    [76]
    Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar Das, Bharat Kaul, and Theodore L Willke. 2018. Out-of-distribution detection using an ensemble of self supervised leave-out classifiers. In Proceedings of the European Conference on Computer Vision (ECCV). 550–564.
    [77]
    Dong Wang, Ziyuan Wang, Chunrong Fang, Yanshan Chen, and Zhenyu Chen. 2019. Deeppath: Path-driven testing criteria for deep neural networks. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, 119–120.
    [78]
    Haoran Wang, Weitang Liu, Alex Bocchieri, and Yixuan Li. 2021. Can multi-label classification networks know what they don’t know? Advances in Neural Information Processing Systems 34 (2021).
    [79]
    Jingyi Wang, Jialuo Chen, Youcheng Sun, Xingjun Ma, Dongxia Wang, Jun Sun, and Peng Cheng. 2021. RobOT: Robustness-oriented testing for deep learning systems. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 300–311.
    [80]
    Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial sample detection for deep neural network through model mutation testing. In Proceedings of the 41st International Conference on Software Engineering. 1245–1256.
    [81]
    Yezhen Wang, Bo Li, Tong Che, Kaiyang Zhou, Ziwei Liu, and Dongsheng Li. 2021. Energy-Based Open-World Uncertainty Modeling for Confidence Calibration. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9302–9311.
    [82]
    Michael Weiss and Paolo Tonella. 2022. Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 139–150.
    [83]
    Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
    [84]
    Zhisheng Xiao, Qing Yan, and Yali Amit. 2020. Likelihood regret: An out-of-distribution detection score for variational auto-encoder. arXiv preprint arXiv:2003.02977 (2020).
    [85]
    Xiaofei Xie, Tianlin Li, Jian Wang, Lei Ma, Qing Guo, Felix Juefei-Xu, and Yang Liu. 2022. NPC: N euron P ath C overage via Characterizing Decision Logic of Deep Neural Networks. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 3 (2022), 1–27.
    [86]
    Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. Deephunter: A coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 146–157.
    [87]
    Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, and Xiaohong Li. 2019. DiffChaser: detecting disagreements for deep neural networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 5772–5778.
    [88]
    Shenao Yan, Guanhong Tao, Xuwei Liu, Juan Zhai, Shiqing Ma, Lei Xu, and Xiangyu Zhang. 2020. Correlations between deep neural network model coverage criteria and model quality. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 775–787.
    [89]
    Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. 2021. Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334 (2021).
    [90]
    Zhou Yang, Jieke Shi, Muhammad Hilmi Asyrofi, and David Lo. 2022. Revisiting neuron coverage metrics and quality of deep neural networks. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 408–419.
    [91]
    Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: a survey. Software testing, verification and reliability 22, 2 (2012), 67–120.
    [92]
    Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).
    [93]
    Qing Yu and Kiyoharu Aizawa. 2019. Unsupervised out-of-distribution detection by maximum classifier discrepancy. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9518–9526.
    [94]
    Fuyuan Zhang, Sankalan Pal Chowdhury, and Maria Christakis. 2020. DeepSearch: Simple and Effective Blackbox Fuzzing of Deep Neural Networks. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
    [95]
    Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering 48, 1 (2020), 1–36.
    [96]
    Kai Zhang, Yongtai Zhang, Liwei Zhang, Hongyu Gao, Rongjie Yan, and Jun Yan. 2020. Neuron Activation Frequency Based Test Case Prioritization. In 2020 International Symposium on Theoretical Aspects of Software Engineering (TASE). IEEE, 81–88.
    [97]
    Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering. 132–142.
    [98]
    Jia-Xing Zhao, Jiang-Jiang Liu, Deng-Ping Fan, Yang Cao, Jufeng Yang, and Ming-Ming Cheng. 2019. EGNet: Edge guidance network for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision. 8779–8788.
    [99]
    Zhong-Qiu Zhao, Peng Zheng, Shou-Tao Xu, and Xindong Wu. 2019. Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems 30, 11 (2019), 3212–3232.
    [100]
    Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xindong Wu. 2019. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems 30, 11 (2019), 3212–3232.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology Just Accepted
    ISSN:1049-331X
    EISSN:1557-7392
    Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Online AM: 13 June 2024
    Accepted: 14 May 2024
    Revised: 29 March 2024
    Received: 11 October 2022

    Check for updates

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 71
      Total Downloads
    • Downloads (Last 12 months)71
    • Downloads (Last 6 weeks)68
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media