Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Seed Selection for Testing Deep Neural Networks

Published: 24 November 2023 Publication History

Abstract

Deep learning (DL) has been applied in many applications. Meanwhile, the quality of DL systems is becoming a big concern. To evaluate the quality of DL systems, a number of DL testing techniques have been proposed. To generate test cases, a set of initial seed inputs are required. Existing testing techniques usually construct seed corpus by randomly selecting inputs from training or test dataset. Till now, there is no study on how initial seed inputs affect the performance of DL testing and how to construct an optimal one. To fill this gap, we conduct the first systematic study to evaluate the impact of seed selection strategies on DL testing. Specifically, considering three popular goals of DL testing (i.e., coverage, failure detection, and robustness), we develop five seed selection strategies, including three based on single-objective optimization (SOO) and two based on multi-objective optimization (MOO). We evaluate these strategies on seven testing tools. Our results demonstrate that the selection of initial seed inputs greatly affects the testing performance. SOO-based selection can construct the best seed corpus that can boost DL testing with respect to the specific testing goal. MOO-based selection strategies can construct seed corpus that achieve balanced improvement on multiple objectives.

References

[1]
Yuhan Zhi. 2022. Seed Selection. Retrieved from https://sites.google.com/view/seedselection
[2]
Humberto Abdelnur, Radu State, Obes Jorge Lucangeli, and Olivier Festor. 2010. Spectral fuzzing: Evaluation & feedback. phdthesis. INRIA.
[3]
Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing autonomous cars for feature interaction failures using many-objective search. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 143–154.
[4]
Mike Aizatsky, Kostya Serebryany, Oliver Chang, Abhishek Arya, and Meredith Whittaker. 2016. Announcing OSS-Fuzz: Continuous fuzzing for open source software. Google Testing Blog (2016). Retrieved from https://opensource.googleblog.com/2016/12/announcing-oss-fuzz-continuous-fuzzing.html
[5]
Nadia Alshahwan and Mark Harman. 2011. Automated web application testing using search based software engineering. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE’11). IEEE, 3–12.
[6]
Mohammad Alshraideh and Leonardo Bottaci. 2006. Search-based software test data generation for string data using program-specific search operators. Softw. Test., Verif. Reliab. 16, 3 (2006), 175–203.
[7]
Cornelius Aschermann, Sergej Schumilo, Tim Blazytko, Robert Gawlik, and Thorsten Holz. 2019. REDQUEEN: Fuzzing with input-to-state correspondence. In Proceedings of the Network and Distributed System Security Symposium (NDSS’19). 1–15.
[8]
Chunteng Bao, Lihong Xu, Erik D. Goodman, and Leilei Cao. 2017. A novel non-dominated sorting algorithm for evolutionary multi-objective optimization. J. Computat. Sci. 23 (2017), 31–43.
[9]
Marcel Böhme and Brandon Falk. 2020. Fuzzing: On the exponential cost of vulnerability discovery. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 713–724.
[10]
Marcel Böhme, Valentin J. M. Manès, and Sang Kil Cha. 2020. Boosting fuzzer efficiency: An information theoretic perspective. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 678–689.
[11]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). IEEE, 39–57.
[12]
Oliver Chang, Abhishek Arya, Kostya Serebryany, and Josh Armour. 2017. OSS-Fuzz: Five months later, and rewarding projects. Retrieved from https://opensource.googleblog.com/2017/05/oss-fuzz-five-months-later-and.html
[13]
Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3642–3649.
[14]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (2011), 2493–2537.
[15]
Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and Tanaka Meyarivan. 2000. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In Proceedings of the International Conference on Parallel Problem Solving from Nature. Springer, 849–858.
[16]
Swaroopa Dola, Matthew B. Dwyer, and Mary Lou Soffa. 2023. Input distribution coverage: Measuring feature interaction adequacy in neural network testing. ACM Transactions on Software Engineering and Methodology 32, 3 (2023), 1–48.
[17]
Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, and Jianjun Zhao. 2019. DeepStellar: Model-based quantitative analysis of stateful deep learning systems. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 477–487.
[18]
Ranjie Duan, Xingjun Ma, Yisen Wang, James Bailey, A. Kai Qin, and Yun Yang. 2020. Adversarial camouflage: Hiding physical-world attacks with natural styles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1000–1008.
[19]
Isaac Dunn, Hadrien Pouget, Daniel Kroening, and Tom Melham. 2021. Exposing previously undetectable faults in deep neural networks. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 56–66.
[20]
Hazem Fahmy, Fabrizio Pastore, Lionel Briand, and Thomas Stifter. 2023. Simulator-based explanation and debugging of hazard-triggering events in DNN-based safety-critical systems. ACM Transactions on Software Engineering and Methodology 32, 4 (2023), 1–47.
[21]
Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. 2017. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017).
[22]
Yang Feng, Qingkai Shi, Xinyu Gao, Jun Wan, Chunrong Fang, and Zhenyu Chen. 2020. DeepGini: Prioritizing massive tests to enhance the robustness of deep neural networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 177–188.
[23]
Samuel G. Finlayson, John D. Bowers, Joichi Ito, Jonathan L. Zittrain, Andrew L. Beam, and Isaac S. Kohane. 2019. Adversarial attacks on medical machine learning. Science 363, 6433 (2019), 1287–1289.
[24]
Gordon Fraser and Andrea Arcuri. 2012. The seed is strong: Seeding strategies in search-based software testing. In Proceedings of the IEEE 5th International Conference on Software Testing, Verification and Validation. IEEE, 121–130.
[25]
Gordon Fraser and Andreas Zeller. 2011. Exploiting common object usage in test case generation. In Proceedings of the 4th IEEE International Conference on Software Testing, Verification and Validation. IEEE, 80–89.
[26]
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1050–1059.
[27]
Xinyu Gao, Yang Feng, Yining Yin, Zixi Liu, Zhenyu Chen, and Baowen Xu. 2022. Adaptive test selection for deep neural networks. In Proceedings of the 44th International Conference on Software Engineering. 73–85.
[28]
Xiang Gao, Ripon K. Saha, Mukul R. Prasad, and Abhik Roychoudhury. 2020. Fuzz testing based data augmentation to improve robustness of deep neural networks. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1147–1158.
[29]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[30]
Gustavo Grieco, Martín Ceresa, Agustín Mista, and Pablo Buiras. 2017. QuickFuzz testing for fun and profit. J. Syst. Softw. 134 (2017), 340–354.
[31]
Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun. 2018. DLFuzz: Differential fuzzing testing of deep learning systems. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 739–743.
[32]
HyungSeok Han, DongHyeon Oh, and Sang Kil Cha. 2019. CodeAlchemist: Semantics-aware code generation to find vulnerabilities in JavaScript engines. In Proceedings of the Network and Distributed System Security Symposium (NDSS’19).
[33]
Mark Harman and Bryan F. Jones. 2001. Search-based software engineering. Inf. Softw. Technol. 43, 14 (2001), 833–839.
[34]
Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019).
[35]
Adrian Herrera, Hendra Gunadi, Shane Magrath, Michael Norrish, Mathias Payer, and Antony L. Hosking. 2021. Seed selection for successful fuzzing. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 230–243.
[36]
Kyriakos Ispoglou, Daniel Austin, Vishwath Mohan, and Mathias Payer. 2020. FuzzGen: Automatic fuzzer generation. In Proceedings of the 29th USENIX Security Symposium (USENIX Security’20). 2271–2287.
[37]
Yuseok Jeon, WookHyun Han, Nathan Burow, and Mathias Payer. 2020. FuZZan: Efficient sanitizer metadata design for fuzzing. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’20). 249–263.
[38]
Linxi Jiang, Xingjun Ma, Shaoxiang Chen, James Bailey, and Yu-Gang Jiang. 2019. Black-box adversarial attacks on video recognition models. In Proceedings of the 27th ACM International Conference on Multimedia. 864–872.
[39]
Sungmin Kang, Robert Feldt, and Shin Yoo. 2020. SINVAD: Search-based image space navigation for DNN image classifier test input generation. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. 521–528.
[40]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 1039–1049.
[41]
George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating fuzz testing. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 2123–2138.
[42]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[43]
Seokhyun Lee, Sooyoung Cha, Dain Lee, and Hakjoo Oh. 2020. Effective white-box testing of deep neural networks with adaptive neuron-selection strategy. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’20).
[44]
Jesse Levinson, Jake Askeland, Jan Becker, Jennifer Dolson, David Held, Soeren Kammel, J. Zico Kolter, Dirk Langer, Oliver Pink, Vaughan Pratt, Michael Sokolsky, Ganymed Stanek, David Stavens, Alex Teichman, Moritz Werling, and Sebastian Thrun. 2011. Towards fully autonomous driving: Systems and algorithms. In 2011 IEEE Intelligent Vehicles Symposium (IV), IEEE, 163–168.
[45]
Zixi Liu, Yang Feng, Yining Yin, and Zhenyu Chen. 2022. DeepState: Selecting test suites to enhance the robustness of recurrent neural networks. In Proceedings of the 44th International Conference on Software Engineering. 598–609.
[46]
Qiang Long, Xue Wu, and Changzhi Wu. 2021. Non-dominated sorting methods for multi-objective optimization: Review and numerical comparison. J. Industr. Manag. Optim. 17, 2 (2021), 1001.
[47]
Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and Raheem Beyah. 2019. MOPT: Optimized mutation scheduling for fuzzers. In Proceedings of the 28th USENIX Security Symposium (USENIX Security’19). 1949–1966.
[48]
Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. DeepCT: Tomographic combinatorial testing for deep learning systems. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER’19). IEEE, 614–618.
[49]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 120–131.
[50]
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 100–111.
[51]
Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E. Houle, and James Bailey. 2018. Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv preprint arXiv:1801.02613 (2018).
[52]
Xingjun Ma, Yuhao Niu, Lin Gu, Yisen Wang, Yitian Zhao, James Bailey, and Feng Lu. 2021. Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recog. 110 (2021), 107332.
[53]
Amit Mandelbaum and Daphna Weinshall. 2017. Distance-based confidence score for neural network classifiers. arXiv preprint arXiv:1709.09844 (2017).
[54]
Phil McMinn. 2004. Search-based software test data generation: A survey. Softw. Test., Verif. Reliab. 14, 2 (2004), 105–156.
[55]
Phil McMinn, Muzammil Shahbaz, and Mark Stevenson. 2012. Search-based test input generation for string data types using the results of web queries. In Proceedings of the IEEE 5th International Conference on Software Testing, Verification and Validation. IEEE, 141–150.
[56]
Phil McMinn, Mark Stevenson, and Mark Harman. 2010. Reducing qualitative human oracle costs associated with automatically generated test data. In Proceedings of the 1st International Workshop on Software Test Output Validation. 1–4.
[57]
Matteo Miraz, Pier Luca Lanzi, and Luciano Baresi. 2010. Improving evolutionary testing by means of efficiency enhancement techniques. In Proceedings of the IEEE Congress on Evolutionary Computation. IEEE, 1–8.
[58]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.
[59]
Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. TensorFuzz: Debugging neural networks with coverage-guided fuzzing. In Proceedings of the International Conference on Machine Learning. PMLR, 4901–4911.
[60]
Shankara Pailoor, Andrew Aday, and Suman Jana. 2018. MoonShine: Optimizing OS fuzzer seed selection with trace distillation. In Proceedings of the 27th USENIX Security Symposium (USENIX Security’18). 729–743.
[61]
Omkar Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. In Proceedings of the British Machine Vision Conference (BMVC’15). British Machine Vision Association, 1–12.
[62]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
[63]
Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, and Lawrence Carin. 2016. Variational autoencoder for deep learning of images, labels and captions. Adv. Neural Inf. Process. Syst. 29 (2016).
[64]
Alexandre Rebert, Sang Kil Cha, Thanassis Avgerinos, Jonathan Foote, David Warren, Gustavo Grieco, and David Brumley. 2014. Optimizing seed selection for fuzzing. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’14). 861–875.
[65]
Vincenzo Riccio, Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. DeepMetis: Augmenting a deep learning test set to increase its mutation score. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 355–367.
[66]
Vincenzo Riccio and Paolo Tonella. 2020. Model-based exploration of the frontier of behaviours for deep learning system testing. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 876–888.
[67]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815–823.
[68]
Jasmine Sekhon and Cody Fleming. 2019. Towards improved testing for deep learning. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER’19). IEEE, 85–88.
[69]
Kosta Serebryany. 2016. Continuous fuzzing with libFuzzer and AddressSanitizer. In Proceedings of the IEEE Cybersecurity Development (SecDev’16). IEEE, 157–157.
[70]
Weijun Shen, Yanhui Li, Lin Chen, Yuanlei Han, Yuming Zhou, and Baowen Xu. 2020. Multiple-boundary clustering and prioritization to promote neural network retraining. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 410–422.
[71]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. 2017. Mastering the game of go without human knowledge. Nature 550, 7676 (2017), 354–359.
[72]
Lewis Smith and Yarin Gal. 2018. Understanding measures of uncertainty for adversarial example detection. arXiv preprint arXiv:1803.08533 (2018).
[73]
Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, and Rob Ashmore. 2019. Structural test coverage criteria for deep neural networks. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE’19).
[74]
Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, Matthew Hill, and Rob Ashmore. 2019. DeepConcolic: Testing and debugging deep neural networks. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE’19). IEEE, 111–114.
[75]
Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic testing for deep neural networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 109–119.
[76]
Robert Swiecki. 2016. Honggfuzz. Retrieved from http://code.google.com/p/honggfuzz.
[77]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. 303–314.
[78]
Paolo Tonella. 2004. Evolutionary testing of classes. ACM SIGSOFT Softw. Eng. Notes 29, 4 (2004), 119–128.
[79]
Dong Wang, Ziyuan Wang, Chunrong Fang, Yanshan Chen, and Zhenyu Chen. 2019. DeepPath: Path-driven testing criteria for deep neural networks. In Proceedings of the IEEE International Conference on Artificial Intelligence Testing (AITest’19). IEEE, 119–120.
[80]
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-driven seed generation for fuzzing. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). IEEE, 579–594.
[81]
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-aware greybox fuzzing. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 724–735.
[82]
Jingyi Wang, Jialuo Chen, Youcheng Sun, Xingjun Ma, Dongxia Wang, Jun Sun, and Peng Cheng. 2021. RobOT: Robustness-oriented testing for deep learning systems. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). IEEE, 300–311.
[83]
Yanhao Wang, Xiangkun Jia, Yuwei Liu, Kyle Zeng, Tiffany Bao, Dinghao Wu, and Purui Su. 2020. Not all coverage measurements are equal: Fuzzing by coverage accounting for input prioritization. In Proceedings of the Network and Distributed System Security Symposium (NDSS).
[84]
Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, and Wenbin Zhang. 2021. Prioritizing test inputs for deep neural networks via mutation analysis. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). IEEE, 397–409.
[85]
Joachim Wegener, André Baresel, and Harmen Sthamer. 2001. Evolutionary test environment for automatic structural testing. Inf. Softw. Technol. 43, 14 (2001), 841–854.
[86]
Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. 2018. Feature-guided black-box safety testing of deep neural networks. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 408–426.
[87]
Dongxian Wu, Yisen Wang, Shu-Tao Xia, James Bailey, and Xingjun Ma. 2020. Skip connections matter: On the transferability of adversarial examples generated with ResNets. arXiv preprint arXiv:2002.05990 (2020).
[88]
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Hongxu Chen, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, and Simon See. 2018. DeepHunter: Hunting deep neural network defects via coverage-guided fuzzing. arXiv preprint arXiv:1809.01266 (2018).
[89]
Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, and Xiaohong Li. 2019. DiffChaser: Detecting disagreements for deep neural networks. In Proceedings of the International Joint Conferences on Artificial Intelligence. 5772–5778.
[90]
Shenao Yan, Guanhong Tao, Xuwei Liu, Juan Zhai, Shiqing Ma, Lei Xu, and Xiangyu Zhang. 2020. Correlations between deep neural network model coverage criteria and model quality. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 775–787.
[91]
Michal Zalewski. 2015. American fuzzy lop. Retrieved from http://lcamtuf.coredump.cx/afl.
[92]
Hugo Zaragoza and Florence d’Alché Buc. 1998. Confidence measures for neural network classifiers. In Proceedings of the 7th International Conference on Information Processing and Management of Uncertainty in Knowledge Based Systems. Citeseer.
[93]
Peixin Zhang, Jingyi Wang, Jun Sun, Guoliang Dong, Xinyu Wang, Xingen Wang, Jin Song Dong, and Ting Dai. 2020. White-box fairness testing through adversarial sampling. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 949–960.
[94]
Xiyue Zhang, Xiaofei Xie, Lei Ma, Xiaoning Du, Qiang Hu, Yang Liu, Jianjun Zhao, and Meng Sun. 2020. Towards characterizing adversarial defects of deep learning software from the lens of uncertainty. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20). IEEE, 739–751.
[95]
Tahereh Zohdinasab, Vincenzo Riccio, Alessio Gambi, and Paolo Tonella. 2021. DeepHyperion: Exploring the feature space of deep learning-based systems through illumination search. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 79–90.
[96]
Tahereh Zohdinasab, Vincenzo Riccio, Alessio Gambi, and Paolo Tonella. 2023. Efficient and effective feature space exploration for testing deep learning systems. ACM Transactions on Software Engineering and Methodology 32, 2 (2023), 1–38.

Cited By

View all
  • (2024)GIST: Generated Inputs Sets Transferability in Deep LearningACM Transactions on Software Engineering and Methodology10.1145/3672457Online publication date: 13-Jun-2024
  • (2024)Exploiting the Adversarial Example Vulnerability of Transfer Learning of Source CodeIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340215319(5880-5894)Online publication date: 16-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 1
January 2024
933 pages
EISSN:1557-7392
DOI:10.1145/3613536
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2023
Online AM: 07 July 2023
Accepted: 16 June 2023
Revised: 15 June 2023
Received: 05 January 2023
Published in TOSEM Volume 33, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep learning testing
  2. seed selection
  3. coverage
  4. robustness

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • National Natural Science Foundation of China
  • Shaanxi Province Key Industry Innovation Program
  • National Research Foundation, Singapore, and the Cyber Security Agency under its National Cybersecurity R&D Programme
  • Ministry of Education, Singapore under its Academic Research Tier 3

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)693
  • Downloads (Last 6 weeks)57
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)GIST: Generated Inputs Sets Transferability in Deep LearningACM Transactions on Software Engineering and Methodology10.1145/3672457Online publication date: 13-Jun-2024
  • (2024)Exploiting the Adversarial Example Vulnerability of Transfer Learning of Source CodeIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340215319(5880-5894)Online publication date: 16-May-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media