Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Post-training Framework for Improving the Performance of Deep Learning Models via Model Transformation

Published: 15 March 2024 Publication History

Abstract

Deep learning (DL) techniques have attracted much attention in recent years and have been applied to many application scenarios. To improve the performance of DL models regarding different properties, many approaches have been proposed in the past decades, such as improving the robustness and fairness of DL models to meet the requirements for practical use. Among existing approaches, post-training is an effective method that has been widely adopted in practice due to its high efficiency and good performance. Nevertheless, its performance is still limited due to the incompleteness of training data. Additionally, existing approaches are always specifically designed for certain tasks, such as improving model robustness, which cannot be used for other purposes.
In this article, we aim to fill this gap and propose an effective and general post-training framework, which can be adapted to improve the model performance from different aspects. Specifically, it incorporates a novel model transformation technique that transforms a classification model into an isomorphic regression model for fine-tuning, which can effectively overcome the problem of incomplete training data by forcing the model to strengthen the memory of crucial input features and thus improve the model performance eventually. To evaluate the performance of our framework, we have adapted it to two emerging tasks for improving DL models, i.e., robustness and fairness improvement, and conducted extensive studies by comparing it with state-of-the-art approaches. The experimental results demonstrate that our framework is indeed general, as it is effective in both tasks. Specifically, in the task of robustness improvement, our approach Dare has achieved the best results on 61.1% cases (vs. 11.1% cases achieved by baselines). In the task of fairness improvement, our approach FMT can effectively improve the fairness without sacrificing the accuracy of the models.

References

[2]
Yang Bai, Xin Yan, Yong Jiang, Shu-Tao Xia, and Yisen Wang. 2021. Clustering effect of adversarial robust models. In NeurIPS Conference. 29590–29601.
[3]
Iveta Becková, Stefan Pócos, and Igor Farkas. 2020. Computational analysis of robustness in neural network classifiers. In 29th International Conference on Artificial Neural Networks. Springer, 65–76.
[4]
Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John T. Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2019. AI fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63, 4/5 (2019), 4:1–4:15.
[5]
Sumon Biswas and Hridesh Rajan. 2020. Do the machine learning models on a crowd sourced platform exhibit bias? An empirical study on model fairness. In 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 642–653.
[6]
Sumon Biswas and Hridesh Rajan. 2021. Fair preprocessing: Towards understanding compositional fairness of data transformers in machine learning pipeline. In 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 981–993.
[7]
Miranda Bogen and Aaron Rieke. 2018. Help Wanted: An Examination of Hiring Algorithms, Equity, and Bias. Technical Report, Technical report, Upturn.
[8]
Evgeny Burnaev, Pavel Erofeev, and Artem Papanov. 2015. Influence of resampling on accuracy of imbalanced classification. In 8th International Conference on Machine Vision(SPIE Proceedings, Vol. 9875), Antanas Verikas, Petia Radeva, and Dmitry P. Nikolaev (Eds.). SPIE, 987521.
[9]
Davide Li Calsi, Matias Duran, Thomas Laurent, Xiao-Yi Zhang, Paolo Arcaini, and Fuyuki Ishikawa. 2023. Adaptive search-based repair of deep neural networks. In Genetic and Evolutionary Computation Conference, Sara Silva and Luís Paquete (Eds.). ACM, 1527–1536.
[10]
Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian J. Goodfellow, Aleksander Madry, and Alexey Kurakin. 2019. On evaluating adversarial robustness. CoRR abs/1902.06705 (2019).
[11]
Nicholas Carlini and David A. Wagner. 2017. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy. 39–57.
[12]
Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. 2021. Bias in machine learning software: Why? How? What to do? In 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 429–440.
[13]
Joymallya Chakraborty, Suvodeep Majumder, Zhe Yu, and Tim Menzies. 2020. Fairway: A way to build fair ML software. In 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 654–665.
[14]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321–357. DOI:
[15]
Chenyi Chen, Ari Seff, Alain L. Kornhauser, and Jianxiong Xiao. 2015. DeepDriving: Learning affordance for direct perception in autonomous driving. In IEEE International Conference on Computer Vision. IEEE Computer Society, 2722–2730.
[16]
Junjie Chen, Xiaoting He, Qingwei Lin, Hongyu Zhang, Dan Hao, Feng Gao, Zhangwei Xu, Yingnong Dang, and Dongmei Zhang. 2019. Continuous incident triage for large-scale online service systems. In 34th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 364–375.
[17]
Junjie Chen, Zhuo Wu, Zan Wang, Hanmo You, Lingming Zhang, and Ming Yan. 2020. Practical accuracy estimation for efficient deep neural network testing. ACM Trans. Softw. Eng. Methodol. 29, 4 (2020), 1–35.
[18]
Zhenpeng Chen, Jie M. Zhang, Federica Sarro, and Mark Harman. 2022. MAAT: A novel ensemble approach to addressing fairness and performance bugs for machine learning software. In 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 1122–1134.
[19]
Holger Cleve and Andreas Zeller. 2005. Locating causes of program failures. In 27th International Conference on Software Engineering. IEEE, 342–351.
[20]
Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge J. Belongie. 2019. Class-balanced loss based on effective number of samples. In IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation/IEEE, 9268–9277.
[21]
Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, and Yike Guo. 2017. TensorLayer: A versatile library for efficient deep learning development. In ACM Multimedia Conference. ACM, 1201–1204.
[22]
Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In NeurIPS Conference. 13042–13054.
[23]
Mengnan Du, Ninghao Liu, Fan Yang, and Xia Hu. 2019. Learning credible deep neural networks with rationale regularization. In IEEE International Conference on Data Mining, Jianyong Wang, Kyuseok Shim, and Xindong Wu (Eds.). IEEE, 150–159.
[24]
Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. 2019. Exploring the landscape of spatial robustness. In International Conference on Machine Learning. PMLR, 1802–1811.
[25]
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Robust physical-world attacks on deep learning visual classification. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 1625–1634.
[26]
Hazem Fahmy, Fabrizio Pastore, Mojtaba Bagherzadeh, and Lionel Briand. 2021. Supporting deep neural network safety analysis and retraining through heatmap-based unsupervised learning. IEEE Trans. Reliab. 70, 4 (2021), 1641–1657.
[27]
Xiang Gao, Ripon K. Saha, Mukul R. Prasad, and Abhik Roychoudhury. 2020. Fuzz testing based data augmentation to improve robustness of deep neural networks. In IEEE/ACM 42nd International Conference on Software Engineering (ICSE ’20). IEEE, 1147–1158.
[28]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. 3rd International Conference on Learning Representations.
[29]
Divya Gopinath, Mengshi Zhang, Kaiyuan Wang, Ismet Burak Kadron, Corina S. Pasareanu, and Sarfraz Khurshid. 2019. Symbolic execution for importance analysis and adversarial generation in neural networks. In 30th IEEE International Symposium on Software Reliability Engineering. IEEE, 313–322.
[30]
Jianping Gou, Baosheng Yu, Stephen J. Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. Int. J. Comput. Vis. 129, 6 (2021), 1789–1819.
[31]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. ACM SIGPLAN Not. 46, 1 (2011), 317–330.
[32]
Sumit Gulwani. 2016. Programming by examples: Applications, algorithms, and ambiguity resolution. Int. J. Curr. Advan. Res. 9706 (2016), 9–14.
[33]
Sumit Gulwani and Prateek Jain. 2017. Programming by examples: PL meets ML. In Asian Symposium on Programming Languages and Systems.
[34]
Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program synthesis. Found. Trends Program. Lang. 4, 1-2 (2017), 1–119.
[35]
Daniel Conrad Halbert. 1984. Programming by Example. Ph. D. Dissertation. University of California, Berkeley.
[36]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 770–778.
[37]
Patrick Henriksen, Francesco Leofante, and Alessio Lomuscio. 2022. Repairing misclassifications in neural networks using limited data. In Symposium on Applied Computing. ACM, 1031–1038.
[38]
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015).
[39]
Max Hort, Jie M. Zhang, Federica Sarro, and Mark Harman. 2021. Fairea: A model behaviour mutation approach to benchmarking bias mitigation methods. In 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 994–1006.
[40]
Hossein Hosseini, Sreeram Kannan, and Radha Poovendran. 2019. Dropping pixels for adversarial robustness. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. Computer Vision Foundation/IEEE, 91–97.
[41]
Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesvári. 2015. Learning with a strong adversary. arXiv preprint arXiv:1511.03034 (2015).
[42]
Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 510–520.
[43]
Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing deep neural networks: Fix patterns and challenges. In IEEE/ACM 42nd International Conference on Software Engineering (ICSE ’20). IEEE, 1135–1146.
[44]
Adam Ivankay, Ivan Girardi, Chiara Marchiori, and Pascal Frossard. 2020. FAR: A general framework for attributional robustness. CoRR abs/2010.07393 (2020).
[45]
Ruyi Ji, Yican Sun, Yingfei Xiong, and Zhenjiang Hu. 2020. Guiding dynamic programing via structural probability for accelerating programming by example. Proc. ACM Program. Lang. 4, OOPSLA (2020), 1–29.
[46]
Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, and Xuansheng Lu. 2020. An empirical study on bugs inside TensorFlow. In International Conference on Database Systems for Advanced Applications. Springer, 604–620.
[47]
Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. 2020. SMART: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. In 58th Annual Meeting of the Association for Computational Linguistics. 2177–2190. DOI:
[48]
Jiajun Jiang, Luyao Ren, Yingfei Xiong, and Lingming Zhang. 2019. Inferring program transformations from singular examples via big code. In 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). 255–266. DOI:
[49]
Jiajun Jiang, Yingfei Xiong, and Xin Xia. 2019. A manual inspection of Defects4J bugs and its implications for automatic program repair. Sci. China Inf. Sci. 62 (Sep.2019), 200102. DOI:
[50]
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In International Symposium on Software Testing and Analysis.
[51]
Wei Jiang, Zhiyuan He, Jinyu Zhan, and Weijia Pan. 2021. Attack-aware detection and defense to resist adversarial examples. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 40, 10 (2021), 2194–2198.
[52]
Kyle D. Julian, Jessica Lopez, Jeffrey S. Brush, Michael P. Owen, and Mykel J. Kochenderfer. 2016. Policy compression for aircraft collision avoidance systems. In IEEE/AIAA 35th Digital Avionics Systems Conference (DASC’16). 1–10.
[53]
Sangwon Jung, Donggyu Lee, Taeeon Park, and Taesup Moon. 2021. Fair feature distillation for visual recognition. In IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation/IEEE, 12115–12124.
[54]
Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Conference on Empirical Methods in Natural Language Processing. ACL, 1700–1709.
[55]
Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, and Sumit Gulwani. 2018. Neural-guided deductive search for real-time program synthesis from examples. In International Conference on Learning Representations (ICLR’18).
[56]
Faisal Kamiran and Toon Calders. 2011. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1 (2011), 1–33.
[57]
Faisal Kamiran, Asim Karim, and Xiangliang Zhang. 2012. Decision theory for discrimination-aware classification. In 12th IEEE International Conference on Data Mining, Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu (Eds.). IEEE Computer Society, 924–929.
[58]
Sungmin Kang, Robert Feldt, and Shin Yoo. 2020. SINVAD: Search-based image space navigation for DNN image classifier test input generation. In 42nd International Conference on Software Engineering, Workshops. ACM, 521–528.
[59]
Kyungyul Kim, Byeongmoon Ji, Doyoung Yoon, and Sangheum Hwang. 2021. Self-knowledge distillation with progressive refinement of targets. In International Conference on Computer Vision. IEEE, 6547–6556.
[60]
Emanuel Kitzelmann. 2009. Inductive programming: A survey of program synthesis techniques. In International Workshop on Approaches and Applications of Inductive Programming. Springer, 50–73.
[61]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical report, University of Toronto, University of Toronto, Toronto, ON.
[62]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In 26th Annual Conference on Neural Information Processing Systems. 1106–1114.
[63]
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In International Conference on Learning Representations.
[64]
Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In 5th International Conference on Learning Representations.
[65]
Seokhyun Lee, Sooyoung Cha, Dain Lee, and Hakjoo Oh. 2020. Effective white-box testing of deep neural networks with adaptive neuron-selection strategy. In 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Sarfraz Khurshid and Corina S. Pasareanu (Eds.). ACM, 165–176.
[66]
Xia Li, Jiajun Jiang, Samuel Benton, Yingfei Xiong, and Lingming Zhang. 2021. A large-scale study on API misuses in the wild. In 14th IEEE Conference on Software Testing, Verification and Validation (ICST’21). 241–252. DOI:
[67]
Xia Li, Wei Li, Yuqun Zhang, and Lingming Zhang. 2019. DeepFL: Integrating multiple fault diagnosis dimensions for deep fault localization. In 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 169–180.
[68]
Yanhui Li, Linghan Meng, Lin Chen, Li Yu, Di Wu, Yuming Zhou, and Baowen Xu. 2022. Training data debugging for the fairness of machine learning software. In 44th IEEE/ACM 44th International Conference on Software Engineering. ACM, 2215–2227.
[69]
Yixing Luo, Xiao-Yi Zhang, Paolo Arcaini, Zhi Jin, Haiyan Zhao, Fuyuki Ishikawa, Rongxin Wu, and Tao Xie. 2021. Targeting requirements violations of autonomous driving systems by dynamic evolutionary search. In 36th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 279–291. DOI:
[70]
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated neural network model debugging via state differential analysis and input selection. In 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 175–186.
[71]
Yixuan Ma, Shuang Liu, Jiajun Jiang, Guanhong Chen, and Keqiu Li. 2021. A comprehensive study on learning-based PE malware family classification methods. In 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1314–1325.
[72]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations. OpenReview.net.
[73]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations. OpenReview.net.
[74]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations.
[75]
Ravi Mangal, Kartik Sarangmath, Aditya V. Nori, and Alessandro Orso. 2020. Probabilistic Lipschitz analysis of neural networks. In 27th International Symposium on Static Analysis(Lecture Notes in Computer Science, Vol. 12389). Springer, 274–309.
[76]
Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. Supervised attentions for neural machine translation. In Conference on Empirical Methods in Natural Language Processing. The Association for Computational Linguistics, 2283–2288.
[77]
Amitabha Mukerjee, Rita Biswas, Kalyanmoy Deb, and Amrit P. Mathur. 2002. Multi–objective evolutionary algorithms for the risk–return trade–off in bank loan management. Int. Trans. Oper. Res. 9, 5 (2002), 583–597.
[78]
Mahdi Nejadgholi and Jinqiu Yang. 2019. A study of oracle approximations in testing deep learning libraries. In 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). IEEE, 785–796.
[79]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
[80]
Osonde A. Osoba and William Welser IV. 2017. An Intelligence in Our Image: The Risks of Bias and Errors in Artificial Intelligence. Rand Corporation.
[81]
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroS&P’16). IEEE, 372–387.
[82]
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy (SP’16). IEEE, 582–597.
[83]
Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy. IEEE Computer Society, 582–597.
[84]
Hung Viet Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, and Nachiappan Nagappan. 2020. Problems and opportunities in training deep learning software systems: An analysis of variance. In 35th IEEE/ACM International Conference on Automated Software Engineering. 771–783.
[85]
Hua Qi, Zhijie Wang, Qing Guo, Jianlang Chen, Felix Juefei-Xu, Lei Ma, and Jianjun Zhao. 2021. ArchRepair: Block-level architecture-oriented repairing for deep neural networks. arXiv preprint arXiv:2111.13330 (2021).
[86]
Sukrut Rao, David Stutz, and Bernt Schiele. 2020. Adversarial training against location-optimized adversarial patches. In European Conference on Computer Vision Workshops(Lecture Notes in Computer Science, Vol. 12539). Springer, 429–448.
[87]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. FitNets: Hints for thin deep nets. In International Conference on Learning Representations.
[88]
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision. 618–626.
[89]
Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, and Tom Goldstein. 2019. Adversarial training for free! In Conference on Advances in Neural Information Processing Systems.
[90]
Qingchao Shen, Junjie Chen, Jie Zhang, Haoyu Wang, Shuang Liu, and Menghan Tian. 2022. Natural test generation for precise testing of question answering software. In 37th IEEE/ACM International Conference on Automated Software Engineering, ACM.
[91]
Hoo-Chang Shin, Le Lu, Lauren Kim, Ari Seff, Jianhua Yao, and Ronald M. Summers. 2016. Interleaved text/image deep mining on a large-scale radiology database for automated image interpretation. J. Mach. Learn. Res. 17, 1 (2016), 3729–3759.
[92]
David Shriver, Sebastian G. Elbaum, and Matthew B. Dwyer. 2021. Reducing DNN properties to enable falsification with adversarial attacks. In 43rd IEEE/ACM International Conference on Software Engineering. IEEE, 275–287.
[93]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations.
[94]
Jeongju Sohn, Sungmin Kang, and Shin Yoo. 2022. Arachne: Search based repair of deep neural networks. ACM Trans. Softw. Eng. Methodol. 32, 4 (2022), 85:1–85:26.
[95]
Matthew Sotoudeh and Aditya V. Thakur. 2021. Provable repair of deep neural networks. In Conference on Programming Design and Implementation. ACM, 588–603.
[96]
Tania Sourdin. 2018. Judge v Robot?: Artificial intelligence and judicial decision-making. Univ. New South Wales Law J. 41, 4 (2018), 1114–1133.
[97]
Bing Sun, Jun Sun, Long H. Pham, and Tie Shi. 2022. Causality-based neural network repair. In 44th IEEE/ACM International Conference on Software Engineering. ACM, 338–349.
[98]
Xiaobing Sun, Tianchi Zhou, Gengjie Li, Jiajun Hu, Hui Yang, and Bin Li. 2017. An empirical study on real bugs for machine learning programs. In 24th Asia-Pacific Software Engineering Conference (APSEC’17). IEEE, 348–357.
[99]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 1–9.
[100]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations.
[101]
Rajkumar Theagarajan, Ming Chen, Bir Bhanu, and Jing Zhang. 2019. ShieldNets: Defending against adversarial attacks using probabilistic adversarial robustness. In IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation/IEEE, 6988–6996.
[102]
Zhao Tian, Junjie Chen, Qihao Zhu, Junjie Yang, and Lingming Zhang. 2022. Learning to construct better mutation faults. In 37th IEEE/ACM International Conference on Automated Software Engineering.
[103]
Shogo Tokui, Susumu Tokumoto, Akihito Yoshii, Fuyuki Ishikawa, Takao Nakagawa, Kazuki Munakata, and Shinji Kikuchi. 2022. NeuRecover: Regression-controlled repair of deep neural networks with training history. In International Conference on Software Analysis, Evolution, and Reengineering. IEEE, 1111–1121.
[104]
Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2018. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations.
[105]
Jasper Ulenaers. 2020. The impact of artificial intelligence on the right to a fair trial: Towards a robot judge? Asian J. Law Econ. 11, 2 (2020).
[106]
Muhammad Usman, Divya Gopinath, Youcheng Sun, Yannic Noller, and Corina S. Pasareanu. 2021. NNrepair: Constraint-based repair of neural network classifiers. In 33rd International Conference on Computer Aided Verification(Lecture Notes in Computer Science, Vol. 12759). Springer, 3–25.
[107]
Jingyi Wang, Jialuo Chen, Youcheng Sun, Xingjun Ma, Dongxia Wang, Jun Sun, and Peng Cheng. 2021. RobOT: Robustness-oriented testing for deep learning systems. In 43rd IEEE/ACM International Conference on Software Engineering. 300–311.
[108]
Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. 2019. On the convergence and robustness of adversarial training. In 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6586–6595.
[109]
Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. 2020. Improving adversarial robustness requires revisiting misclassified examples. In 8th International Conference on Learning Representations. OpenReview.net.
[110]
Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, and Wenbin Zhang. 2021. Prioritizing test inputs for deep neural networks via mutation analysis. In IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 397–409.
[111]
Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In International Conference on Software Engineering. 364–374. DOI:
[112]
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In International Conference on Software Engineering.
[113]
Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. 2018. Evaluating the robustness of neural networks: An extreme value theory approach. The 6th International Conference on Learning Representations, ICLR 2018 (2018).
[114]
Robert F. Woolson. 2007. Wilcoxon signed-rank test. Wiley Encyclopedia of Clinical Trials (2007), 1–3.
[115]
Huanhuan Wu, Zheng Li, Zhanqi Cui, and Jianbin Liu. 2022. GenMuNN: A mutation-based approach to repair deep neural network models. Int. J. Model. Simul. Sci. Comput. 13, 2 (2022), 2341008:1–2341008:17.
[116]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. CoRRabs/1708.07747 (2017).
[117]
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In International Conference on Software Engineering. DOI:
[118]
Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature squeezing: Detecting adversarial examples in deep neural networks. In Network and Distributed System Security Symposium.
[119]
Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised log-based anomaly detection via probabilistic label estimation. In IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 1448–1460.
[120]
Haruki Yokoyama, Satoshi Onoue, and Shinji Kikuchi. 2020. Towards building robust DNN applications: An industrial case study of evolutionary data augmentation. In 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). IEEE, 1184–1188.
[121]
Hanmo You, Zan Wang, Junjie Chen, Shuang Liu, and Shuochuan Li. 2023. Regression fuzzing for deep learning systems. In International Conference on Software Engineering.
[122]
Bing Yu, Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, and Jianjun Zhao. 2021. DeepRepair: Style-guided repairing for deep neural networks in the real-world operational environment. IEEE Trans. Reliab. 71, 4 (2021), 1401–1416.
[123]
Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. 2017. Efficient defenses against adversarial attacks. In 10th ACM Workshop on Artificial Intelligence and Security. 39–49.
[124]
Andreas Zeller. 2002. Isolating cause-effect chains from computer programs. ACM SIGSOFT Softw. Eng. Notes 27, 6 (2002), 1–10.
[125]
Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and isolating failure-inducing input. IEEE Trans. Softw. Eng. 28, 2 (2002), 183–200.
[126]
Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. In AAAI/ACM Conference on AI, Ethics, and Society, Jason Furman, Gary E. Marchant, Huw Price, and Francesca Rossi (Eds.). ACM, 335–340.
[127]
Hao Zhang and W. K. Chan. 2019. Apricot: A weight-adaptation approach to fixing deep learning models. In International Conference on Automated Software Engineering. IEEE, 376–387.
[128]
Jie M. Zhang and Mark Harman. 2021. “Ignorance and Prejudice” in software fairness. In 43rd IEEE/ACM International Conference on Software Engineering. IEEE, 1436–1447.
[129]
Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2022. Machine learning testing: Survey, landscapes and horizons. IEEE Trans. Softw. Eng. 48, 1 (2022), 1–36. DOI:
[130]
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 26, 7 (2017), 3142–3155.
[131]
Mengdi Zhang and Jun Sun. 2022. Adaptive fairness improvement based on causality analysis. In 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 6–17.
[132]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 132–142.
[133]
Peixin Zhang, Jingyi Wang, Jun Sun, Guoliang Dong, Xinyu Wang, Xingen Wang, Jin Song Dong, and Ting Dai. 2020. White-box fairness testing through adversarial sampling. In 42nd International Conference on Software Engineering, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 949–960.
[134]
Qiao Zhang, Zhipeng Cui, Xiaoguang Niu, Shijie Geng, and Yu Qiao. 2017. Image segmentation with pyramid dilated convolution based on ResNet and U-Net. In 24th International Conference on Neural Information Processing(Lecture Notes in Computer Science, Vol. 10635). Springer, 364–372.
[135]
Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, and Mao Yang. 2020. An empirical study on program failures of deep learning jobs. In IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20). IEEE, 1159–1170.
[136]
Tianyi Zhang, Cuiyun Gao, Lei Ma, Michael Lyu, and Miryung Kim. 2019. An empirical study of common challenges in developing deep learning applications. In IEEE 30th International Symposium on Software Reliability Engineering (ISSRE’19). IEEE, 104–115.
[137]
Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 129–140.
[138]
Yingyi Zhang, Zan Wang, Jiajun Jiang, Hanmo You, and Junjie Chen. 2022. Toward improving the robustness of deep learning models via model transformation. 37th IEEE/ACM International Conference on Automated Software Engineering (ASE’22), ACM, Rochester, MI, 104:1–104:13.
[139]
Ziqi Zhang, Yuanchun Li, Yao Guo, Xiangqun Chen, and Yunxin Liu. 2020. Dynamic slicing for deep neural networks. In 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 838–850.
[140]
Zhe Zhao, Guangke Chen, Jingyi Wang, Yiwei Yang, Fu Song, and Jun Sun. 2021. Attack as defense: Characterizing adversarial examples using robustness. 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 42–55.
[141]
Haibin Zheng, Zhiqing Chen, Tianyu Du, Xuhong Zhang, Yao Cheng, Shouling Ji, Jingyi Wang, Yue Yu, and Jinyin Chen. 2022. NeuronFair: Interpretable white-box fairness testing through biased neuron identification. In 44th IEEE/ACM 44th International Conference on Software Engineering. ACM, 1519–1531.
[142]
Ziyuan Zhong, Yuchi Tian, and Baishakhi Ray. 2021. Understanding local robustness of deep neural networks under natural variations. Fundam. Approach. Softw. Eng. 12649 (2021), 313.
[143]
Husheng Zhou, Wei Li, Zelun Kong, Junfeng Guo, Yuqun Zhang, Bei Yu, Lingming Zhang, and Cong Liu. 2020. DeepBillboard: Systematic physical-world testing of autonomous driving systems. In 42nd International Conference on Software Engineering. ACM, 347–358.
[144]
Donglin Zhuang, Xingyao Zhang, Shuaiwen Song, and Sara Hooker. 2022. Randomness in neural network training: Characterizing the impact of tooling. In Conference on Machine Learning and Systems. mlsys.org.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 3
March 2024
943 pages
EISSN:1557-7392
DOI:10.1145/3613618
  • Editor:
  • Mauro Pezzé
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 March 2024
Online AM: 23 October 2023
Accepted: 09 October 2023
Revised: 11 August 2023
Received: 07 January 2023
Published in TOSEM Volume 33, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep neural network
  2. delta debugging
  3. model robustness
  4. model fairness

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 380
    Total Downloads
  • Downloads (Last 12 months)380
  • Downloads (Last 6 weeks)32
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media