Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access
Just Accepted

GIST: Generated Inputs Sets Transferability in Deep Learning

Online AM: 13 June 2024 Publication History

Abstract

To foster the verifiability and testability of Deep Neural Networks (DNN), an increasing number of methods for test case generation techniques are being developed.
When confronted with testing DNN models, the user can apply any existing test generation technique. However, it needs to do so for each technique and each DNN model under test, which can be expensive. Therefore, a paradigm shift could benefit this testing process: rather than regenerating the test set independently for each DNN model under test, we could transfer from existing DNN models.
This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets. Given a property selected by a user (e.g., neurons covered, faults), GIST enables the selection of good test sets from the point of view of this property among available test sets. This allows the user to recover similar properties on the transferred test sets as he would have obtained by generating the test set from scratch with a test cases generation technique. Experimental results show that GIST can select effective test sets for the given property to transfer. Moreover, GIST scales better than reapplying test case generation techniques from scratch on DNN models under test.

References

[1]
2023. HuggingFace. https://huggingface.co/.
[2]
2023. ReplicationPackage. https://github.com/FlowSs/GIST or https://zenodo.org/records/10028594.
[3]
Z. Aghababaeyan, M. Abdellatif, L. Briand, R. S, and M. Bagherzadeh. 2023. Black-Box Testing of Deep Neural Networks through Test Case Diversity. IEEE Transactions on Software Engineering 49, 05 (may 2023), 3182–3204. https://doi.org/10.1109/TSE.2023.3243522
[4]
Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd international conference on software engineering. 1–10.
[5]
Richard A Armstrong. 2014. When to use the Bonferroni correction. Ophthalmic and Physiological Optics 34, 5 (2014), 502–508.
[6]
Mohammed Attaoui, Hazem Fahmy, Fabrizio Pastore, and Lionel Briand. 2023. Black-Box Safety Analysis and Retraining of DNNs Based on Feature Extraction and Clustering. 32, 3, Article 79 (apr 2023), 40 pages. https://doi.org/10.1145/3550271
[7]
Enric Boix-Adsera, Hannah Lawrence, George Stepaniants, and Philippe Rigollet. 2022. GULP: a prediction-based metric between representations. Advances in Neural Information Processing Systems 35 (2022), 7115–7127.
[8]
Houssem Ben Braiek and Foutse Khomh. 2019. Deepevolution: A search-based testing approach for deep neural networks. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 454–458.
[9]
Houssem Ben Braiek and Foutse Khomh. 2020. On testing machine learning programs. Journal of Systems and Software 164 (2020), 110542.
[10]
Sung-Hyuk Cha. 2007. Comprehensive survey on distance/similarity measures between probability density functions. City 1, 2 (2007), 1.
[11]
Jialuo Chen, Jingyi Wang, Xingjun Ma, Youcheng Sun, Jun Sun, Peixin Zhang, and Peng Cheng. 2023. QuoTe: Quality-Oriented Testing for Deep Learning Systems. ACM Trans. Softw. Eng. Methodol. 32, 5, Article 125 (jul 2023), 33 pages. https://doi.org/10.1145/3582573
[12]
MMGeneration Contributors. 2021. MMGeneration: OpenMMLab Generative Model Toolbox and Benchmark. https://github.com/open-mmlab/mmgeneration.
[13]
Adrián Csiszárik, Péter Kőrösi-Szabó, Akos Matszangosz, Gergely Papp, and Dániel Varga. 2021. Similarity and matching of neural network representations. Advances in Neural Information Processing Systems 34 (2021), 5656–5668.
[14]
Frances Ding, Jean-Stanislas Denain, and Jacob Steinhardt. 2021. Grounding Representation Similarity Through Statistical Testing. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 1556–1568. https://proceedings.neurips.cc/paper_files/paper/2021/file/0c0bf917c7942b5a08df71f9da626f97-Paper.pdf
[15]
Isaac Dunn, Hadrien Pouget, Daniel Kroening, and Tom Melham. 2021. Exposing Previously Undetectable Faults in Deep Neural Networks. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual, Denmark) (ISSTA 2021). Association for Computing Machinery, New York, NY, USA, 56–66. https://doi.org/10.1145/3460319.3464801
[16]
Abolfazl Farahani, Sahar Voghoei, Khaled Rasheed, and Hamid R. Arabnia. 2021. A Brief Review of Domain Adaptation. In Advances in Data Science and Information Engineering, Robert Stahlbock, Gary M. Weiss, Mahmoud Abou-Nasr, Cheng-Ying Yang, Hamid R. Arabnia, and Leonidas Deligiannidis (Eds.). Springer International Publishing, Cham, 877–894.
[17]
Yang Feng, Qingkai Shi, Xinyu Gao, Jun Wan, Chunrong Fang, and Zhenyu Chen. 2020. DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks (ISSTA 2020). Association for Computing Machinery, New York, NY, USA, 177–188. https://doi.org/10.1145/3395363.3397357
[18]
Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint meeting on foundations of software engineering. 498–510.
[19]
Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun. 2018. DLFuzz: Differential Fuzzing Testing of Deep Learning Systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 739–743. https://doi.org/10.1145/3236024.3264835
[20]
Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, and Miryung Kim. 2020. Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks?. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 851–862. https://doi.org/10.1145/3368089.3409754
[21]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 630–645.
[22]
Qiang Hu, Yuejun Guo, Maxime Cordy, Xiaofei Xie, Lei Ma, Mike Papadakis, and Yves Le Traon. 2022. An Empirical Study on Data Distribution-Aware Test Selection for Deep Learning Enhancement. ACM Trans. Softw. Eng. Methodol. 31, 4, Article 78 (jul 2022), 30 pages. https://doi.org/10.1145/3511598
[23]
Nathan Inkawhich, Wei Wen, Hai Helen Li, and Yiran Chen. 2019. Feature Space Perturbations Yield More Transferable Adversarial Examples. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7059–7067. https://doi.org/10.1109/CVPR.2019.00723
[24]
Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, and Mingli Song. 2020. Neural Style Transfer: A Review. IEEE Transactions on Visualization and Computer Graphics 26, 11 (2020), 3365–3385. https://doi.org/10.1109/TVCG.2019.2921336
[25]
Mahmut Kaya and Hasan Şakir Bilge. 2019. Deep metric learning: A survey. Symmetry 11, 9 (2019), 1066.
[26]
M. G. KENDALL. 1938. A NEW MEASURE OF RANK CORRELATION. Biometrika 30, 1-2 (06 1938), 81–93. https://doi.org/10.1093/biomet/30.1-2.81 arXiv:https://academic.oup.com/biomet/article-pdf/30/1-2/81/423380/30-1-2-81.pdf
[27]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy (ICSE ’19). IEEE Press, 1039–1049. https://doi.org/10.1109/ICSE.2019.00108
[28]
Ronald S. King. 2014. Cluster Analysis and Data Mining: An Introduction. Mercury Learning & Information, Dulles, VA, USA.
[29]
Max Klabunde, Tobias Schumacher, Markus Strohmaier, and Florian Lemmerich. 2023. Similarity of Neural Network Models: A Survey of Functional and Representational Measures. arXiv:2305.06329v1 [cs.LG]
[30]
Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. 2019. Similarity of neural network representations revisited. In International conference on machine learning. PMLR, 3519–3529.
[31]
Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes, and Donald Brown. 2019. Text classification algorithms: A survey. Information 10, 4 (2019), 150.
[32]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[33]
S. Kullback and R. A. Leibler. 1951. On Information and Sufficiency. The Annals of Mathematical Statistics 22, 1 (1951), 79 – 86. https://doi.org/10.1214/aoms/1177729694
[34]
Dong Li, Ruoming Jin, Jing Gao, and Zhi Liu. 2020. On Sampling Top-K Recommendation Evaluation (KDD ’20). Association for Computing Machinery, New York, NY, USA, 2114–2124. https://doi.org/10.1145/3394486.3403262
[35]
Wei Li. 2019. CIFAR-ZOO: PyTorch implementation of CNNs for CIFAR dataset. https://github.com/BIGBALLON/CIFAR-ZOO.
[36]
Yuanchun Li, Ziqi Zhang, Bingyan Liu, Ziyue Yang, and Yunxin Liu. 2021. ModelDiff: Testing-based DNN similarity comparison for model reuse detection. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 139–151.
[37]
Zenan Li, Xiaoxing Ma, Chang Xu, and Chun Cao. 2019. Structural Coverage Criteria for Neural Networks Could Be Misleading. In 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). 89–92. https://doi.org/10.1109/ICSE-NIER.2019.00031
[38]
Jun-Wei Lin, Reyhaneh Jabbarvand, and Sam Malek. 2019. Test Transfer Across Mobile Apps Through Semantic Mapping. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 42–53. https://doi.org/10.1109/ASE.2019.00015
[39]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE ’18). Association for Computing Machinery, New York, NY, USA, 120–131. https://doi.org/10.1145/3238147.3238202
[40]
Wei Ma, Mike Papadakis, Anestis Tsakmalis, Maxime Cordy, and Yves Le Traon. 2021. Test Selection for Deep Learning Systems. ACM Trans. Softw. Eng. Methodol. 30, 2, Article 13 (jan 2021), 22 pages. https://doi.org/10.1145/3417330
[41]
Omid Madani, David Pennock, and Gary Flake. 2004. Co-validation: Using model disagreement on unlabeled data to validate classification algorithms. Advances in neural information processing systems 17 (2004).
[42]
Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2, 11 (2017), 205.
[43]
Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
[44]
Ari Morcos, Maithra Raghu, and Samy Bengio. 2018. Insights on representational similarity in neural networks with canonical correlation. Advances in neural information processing systems 31 (2018).
[45]
John Morris, Eli Lifland, Jack Lanchantin, Yangfeng Ji, and Yanjun Qi. 2020. Reevaluating Adversarial Examples in Natural Language. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 3829–3839. https://doi.org/10.18653/v1/2020.findings-emnlp.341
[46]
John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. 2020. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 119–126.
[47]
Davoud Moulavi, Pablo A Jaskowiak, Ricardo JGB Campello, Arthur Zimek, and Jörg Sander. 2014. Density-based clustering validation. In Proceedings of the 2014 SIAM international conference on data mining. SIAM, 839–847.
[48]
Anna Ollerenshaw, Md Asif Jalal, and Thomas Hain. 2022. Insights on neural representations for end-to-end speech recognition. arXiv preprint arXiv:2205.09456 (2022).
[49]
Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345–1359. https://doi.org/10.1109/TKDE.2009.191
[50]
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL.
[51]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2019. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Commun. ACM 62, 11 (oct 2019), 137–145. https://doi.org/10.1145/3361566
[52]
Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo-Yiin Chang, and Tara Sainath. 2019. Deep Learning for Audio Signal Processing. IEEE Journal of Selected Topics in Signal Processing 13, 2 (2019), 206–219. https://doi.org/10.1109/JSTSP.2019.2908700
[53]
Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. 2021. Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems 34 (2021), 12116–12128.
[54]
Sebastian Raschka, Joshua Patterson, and Corey Nolet. 2020. Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. arXiv preprint arXiv:2002.04803 (2020).
[55]
Vincenzo Riccio and Paolo Tonella. 2020. Model-Based Exploration of the Frontier of Behaviours for Deep Learning System Testing. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 876–888. https://doi.org/10.1145/3368089.3409730
[56]
Vincenzo Riccio and Paolo Tonella. 2023. When and Why Test Generators for Deep Learning Produce Invalid Inputs: An Empirical Study. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE ’23). IEEE Press, 1161–1173. https://doi.org/10.1109/ICSE48619.2023.00104
[57]
Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
[58]
Weijun Shen, Yanhui Li, Lin Chen, Yuanlei Han, Yuming Zhou, and Baowen Xu. 2021. Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (Virtual Event, Australia) (ASE ’20). Association for Computing Machinery, New York, NY, USA, 410–422. https://doi.org/10.1145/3324884.3416621
[59]
Jasdeep Singh, Bryan McCann, Richard Socher, and Caiming Xiong. 2019. BERT is Not an Interlingua and the Bias of Tokenization. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019). Association for Computational Linguistics, Hong Kong, China, 47–55. https://doi.org/10.18653/v1/D19-6106
[60]
Sharath Nittur Sridhar and Anthony Sarah. 2020. Undivided attention: Are intermediate layers necessary for bert? arXiv preprint arXiv:2012.11881 (2020).
[61]
Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, Matthew Hill, and Rob Ashmore. 2019. Structural Test Coverage Criteria for Deep Neural Networks. ACM Trans. Embed. Comput. Syst. 18, 5s, Article 94 (oct 2019), 23 pages. https://doi.org/10.1145/3358233
[62]
Florian Tambon, Gabriel Laberge, Le An, Amin Nikanjam, Paulina Stevia Nouwou Mindom, Yann Pequignot, Foutse Khomh, Giulio Antoniol, Ettore Merlo, and François Laviolette. 2022. How to certify machine learning based safety-critical systems? A systematic literature review. Automated Software Engineering 29, 2 (2022), 38.
[63]
Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2017. The Space of Transferable Adversarial Examples. arXiv:1704.03453 [stat.ML]
[64]
Athanasios Voulodimos, Nikolaos Doulamis, Anastasios Doulamis, Eftychios Protopapadakis, et al. 2018. Deep learning for computer vision: A brief review. Computational intelligence and neuroscience 2018 (2018).
[65]
Ivan Vulić, Edoardo Maria Ponti, Robert Litschko, Goran Glavaš, and Anna Korhonen. 2020. Probing pretrained language models for lexical semantics. arXiv preprint arXiv:2010.05731 (2020).
[66]
Yuxuan Wang, Jiakai Wang, Zixin Yin, Ruihao Gong, Jingyi Wang, Aishan Liu, and Xianglong Liu. 2022. Generating Transferable Adversarial Examples against Vision Transformers (MM ’22). Association for Computing Machinery, New York, NY, USA, 5181–5190. https://doi.org/10.1145/3503161.3547989
[67]
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A Coverage-Guided Fuzz Testing Framework for Deep Neural Networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 146–157. https://doi.org/10.1145/3293882.3330579
[68]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 132–142.
[69]
Zejun Zhang, Zhenchang Xing, Xin Xia, Xiwei Xu, Liming Zhu, and Qinghua Lu. 2023. Faster or Slower? Performance Mystery of Python Idioms Unveiled with Empirical Evidence. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1495–1507. https://doi.org/10.1109/ICSE48619.2023.00130
[70]
Yuhan Zhi, Xiaofei Xie, Chao Shen, Jun Sun, Xiaoyu Zhang, and Xiaohong Guan. 2023. Seed Selection for Testing Deep Neural Networks. ACM Trans. Softw. Eng. Methodol. (jul 2023). https://doi.org/10.1145/3607190 Just Accepted.

Cited By

View all
  • (2024)Bridging the Gap between Real-world and Synthetic Images for Testing Autonomous Driving SystemsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695067(732-744)Online publication date: 27-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology Just Accepted
EISSN:1557-7392
Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 13 June 2024
Accepted: 15 May 2024
Revised: 10 May 2024
Received: 30 October 2023

Check for updates

Author Tags

  1. test sets generation
  2. deep learning
  3. DNN
  4. testing
  5. transferability

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)121
  • Downloads (Last 6 weeks)48
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bridging the Gap between Real-world and Synthetic Images for Testing Autonomous Driving SystemsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695067(732-744)Online publication date: 27-Oct-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media