Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Patch Correctness Assessment: A Survey

Published: 20 January 2025 Publication History

Abstract

Most automated program repair methods rely on test cases to determine the correctness of the generated patches. However, due to the incompleteness of available test suites, some patches that pass all the test cases may still be incorrect. This issue is known as the patch overfitting problem. Overfitting problem is a longstanding problem in automated program repair. Due to overfitting patches, the patches obtained by automated program repair tools require further validation to determine their correctness. Researchers have proposed many methods to automatically assess the correctness of patches, but no systematic review provides a detailed introduction to this problem, the existing solutions, and the challenges. To address this deficiency, we systematically review the existing approaches to patch correctness assessment. We first offer a few examples of overfitting patches to acquire a more detailed understanding of this problem. We then propose a comprehensive categorization of publicly available techniques and datasets, examine the commonly used evaluation metrics, and perform an in-depth analysis of the effectiveness of the existing models in addressing the challenge of overfitting. Based on our analysis, we provided the difficulties encountered by current methodologies, alongside the possible avenues for future research exploration.

References

[1]
Valgrind. 2016. Retrieved from https://valgrind.org/
[2]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2Vec: Learning distributed representations of code. Proc. ACM Program. Lang. 3, POPL (2019), 40:1–40:29. DOI:
[3]
Gareth Bennett, Tracy Hall, and David Bowes. 2022. Some automatically generated patches are more likely to be correct than others: An analysis of Defects4J patch features. In Proceedings of the 3rd International Workshop on Automated Program Repair, 46–52.
[4]
Marcel Böhme and Abhik Roychoudhury. 2014. CoREBench: Studying complexity of regression errors. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’14). Corina S. Pasareanu and Darko Marinov (Eds.), ACM, 105–115. DOI:
[5]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321–357. DOI:
[6]
Liushan Chen, Yu Pei, and Carlo A. Furia. 2017. Contract-based program repair without the contracts. In Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 637–647. DOI:
[7]
Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2021. SequenceR: Sequence-to-sequence learning for end-to-end program repair. IEEE Trans. Software Eng. 47, 9 (2021), 1943–1959. DOI:
[8]
Zimin Chen and Martin Monperrus. 2019. The remarkable role of similarity in redundancy-based program repair. arxiv:1811.05703. Retrieved from https://arxiv.org/pdf/1811.05703
[9]
Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21 (2020), 1–13.
[10]
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 1 (1960), 37–46.
[11]
Daniela S. Cruzes and Tore Dyba. 2011. Recommended steps for thematic synthesis in software engineering. In Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement. IEEE, 275–284.
[12]
Viktor Csuvik, Dániel Horváth, Ferenc Horváth, and László Vidács. 2020. Utilizing source code embeddings to identify correct patches. In Proceedings of the 2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF), 18–25. DOI:
[13]
Richard A. DeMillo, Richard J. Lipton, and Frederick G. Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer 11, 4 (1978), 34–41.
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT ’19), Vol. 1 (Long and Short Papers). Jill Burstein, Christy Doran, and Thamar Solorio (Eds.), Association for Computational Linguistics, 4171–4186. DOI:
[15]
Yukun Dong, Xiaotong Cheng, Yufei Yang, Lulu Zhang, Shuqi Wang, and Lingjie Kong. 2024. A method to identify overfitting program repair patches based on expression tree. Sci. Comput. Program. 235, 1 (2024), 103105. DOI:
[16]
Yukun Dong, Daolong Tang, Xiaotong Cheng, and Yufei Yang. 2022. Quality evaluation method of automatic software repair using syntax distance metrics. Symmetry 14, 8 (Aug. 2022), 1751. DOI:
[17]
Yukun Dong, Meng Wu, Li Zhang, Wenjing Yin, Mengying Wu, and Haojie Li. 2020. Priority measurement of patches for program repair based on semantic distance. Symmetry 12, 12 (Dec. 2020), 2102. DOI:
[18]
Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. Empirical review of Java program repair tools: A large-scale experiment on 2,141 bugs and 23,551 repair attempts. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/SIGSOFT FSE ’19). Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.), ACM, 302–313. DOI:
[19]
Thomas Durieux and Martin Monperrus. 2016. DynaMoth: Dynamic code synthesis for automatic program repair. In Proceedings of the 11th International Workshop on Automation of Software Test (AST@ICSE ’16). Christof J. Budnik, Gordon Fraser, and Francesca Lonetti (Eds.), ACM, 85–91. DOI:
[20]
Ahmed Elnaggar, Wei Ding, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Silvia Severini, Florian Matthes, and Burkhard Rost. 2021. Codetrans: Towards cracking the language of silicon’s code through self-supervised deeplearning and high performance computing. arXiv:2104.02443. Retrieved from https://arxiv.org/pdf/2104.02443
[21]
Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic test suite generation for object-oriented software. In Proceedings of the SIGSOFT/FSE ’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC ’11: 13th European Software Engineering Conference (ESEC-13). Tibor Gyimóthy and Andreas Zeller (Eds.), ACM, 416–419. DOI:
[22]
Xiang Gao, Sergey Mechtaev, and Abhik Roychoudhury. 2019. Crash-avoiding program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’19). ACM, New York, NY, 8–18. DOI:
[23]
Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2019. Automatic software repair: A survey. IEEE Trans. Software Eng. 45, 1 (2019), 34–67. DOI:
[24]
Ali Ghanbari. 2020. ObjSim: Lightweight automatic patch prioritization via object similarity. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’20). ACM, New York, NY, 541–544. DOI:
[25]
Ali Ghanbari. 2022. Revisiting object similarity-based patch ranking in automated program repair: An extensive study. In Proceedings of the 3rd IEEE/ACM International Workshop on Automated Program Repair (APR@ICSE ’22). IEEE, 16–23. DOI:
[26]
Ali Ghanbari and Andrian Marcus. 2022. Patch correctness assessment in automated program repair based on the impact of patches on production and test code. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’22). Sukyoung Ryu and Yannis Smaragdakis (Eds.), ACM, 654–665. DOI:
[27]
Ali Ghanbari and Andrian Marcus. 2022. Shibboleth: Hybrid patch correctness assessment in automated program repair. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 1–4.
[28]
Ali Ghanbari and Lingming Zhang. 2019. PraPR: Practical program repair via bytecode mutation. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 1118–1121. DOI:
[29]
Patrice Godefroid. 2014. Micro execution. In Proceedings of the 36th International Conference on Software Engineering (ICSE ’14). Pankaj Jalote, Lionel C. Briand, and André van der Hoek (Eds.), ACM, 539–549. DOI:
[30]
Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Trans. Software Eng. 38, 1 (2012), 54–72. DOI:
[31]
Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated program repair. Commun. ACM 62, 12 (2019), 56–65.
[32]
Sylvain Hallé. 2022. Test suite generation for Boolean conditions with equivalence class partitioning. In Proceedings of the 2022 IEEE/ACM 10th International Conference on Formal Methods in Software Engineering (FormaliSE), 23–33. DOI:
[33]
Abram Hindle, Earl T. Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Commun. ACM 59, 5 (2016), 122–131.
[34]
Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. CC2Vec: Distributed representations of code changes. In Proceedings of the 42nd International Conference on Software Engineering (ICSE ’20). Gregg Rothermel and Doo-Hwan Bae (Eds.), ACM, 518–529. DOI:
[35]
Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards practical program repair with on-demand candidate generation. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, 12–23. DOI:
[36]
Xin Huang, He Zhang, Xin Zhou, Muhammad Ali Babar, and Song Yang. 2018. Synthesizing qualitative research in software engineering: A critical review. In Proceedings of the 40th International Conference on Software Engineering, 1207–1218.
[37]
Elkhan Ismayilzada, Md Mazba Ur Rahman, Dongsun Kim, and Jooyong Yi. 2023. Poracle: Testing patches under preservation conditions to combat the overfitting problem of program repair. ACM Trans. Softw. Eng. Methodol. 33, 2 (Dec 2023), Article 44, 39 pages. DOI:
[38]
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’18). Frank Tip and Eric Bodden (Eds.), ACM, 298–309. DOI:
[39]
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’14). Corina S. Pasareanu and Darko Marinov (Eds.), ACM, 437–440. DOI:
[40]
Sungmin Kang and Shin Yoo. 2022. Language models can prioritize patches for practical program patching. In Proceedings of the 3rd IEEE/ACM International Workshop on Automated Program Repair (APR@ICSE ’22). IEEE, 8–15. DOI:
[41]
Rafael-Michael Karampatsis and Charles Sutton. 2020. How often do single-statement bugs occur? The Manysstubs4j dataset. In Proceedings of the 17th International Conference on Mining Software Repositories (MSR ’20). Sunghun Kim, Georgios Gousios, Sarah Nadi, and Joseph Hejderup (Eds.), ACM, 573–577. DOI:
[42]
Maria Kechagia, Sergey Mechtaev, Federica Sarro, and Mark Harman. 2022. Evaluating automatic program repair capabilities to repair API misuses. IEEE Trans. Softw. Eng. 48, 7 (2022), 2658–2679. DOI:
[43]
B. Kitchenham and S. Charters. 2007. Guidelines for performing systematic literature reviews in software engineering. Technical Report EBSE-2007-01. School of Computer Science and Mathematics, Keele University, Keele. Retrieved from https://legacyfileshare.elsevier.com/promis_misc/525444systematicreviewsguide.pdf
[44]
Anil Koyuncu, Kui Liu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. 2020. FixMiner: Mining relevant fix patterns for automated program repair. Empir. Softw. Eng. 25, 3 (2020), 1980–2024. DOI:
[45]
Quoc V. Le and Tomás Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31th International Conference on Machine Learning (ICML ’14). JMLR.org, 1188–1196. Retrieved from http://proceedings.mlr.press/v32/le14.html
[46]
Xuan-Bach Dinh Le, Lingfeng Bao, David Lo, Xin Xia, Shanping Li, and Corina S. Pasareanu. 2019. On reliability of patch correctness assessment. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.), IEEE/ACM, 524–535. DOI:
[47]
Xuan-Bach Dinh Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. 2017. S3: Syntax- and semantic-guided repair synthesis via programming by examples. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’17). Eric Bodden, Wilhelm Schäfer, Arie van Deursen, and Andrea Zisman (Eds.), ACM, 593–604. DOI:
[48]
Xuan-Bach Dinh Le, David Lo, and Claire Le Goues. 2016. History driven program repair. In Proceedings of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER ’16). IEEE Computer Society, 213–224. DOI:
[49]
Thanh Le-Cong, Duc-Minh Luong, Xuan Bach D. Le, David Lo, Nhat-Hoa Tran, Bui Quang-Huy, and Quyet-Thang Huynh. 2023. Invalidator: Automated patch correctness assessment via semantic and syntactic reasoning. IEEE Trans. Softw. Eng. 49, 6 (2023), 3411–3429. DOI:
[50]
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for \(\textdollar\)8 each. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), 3–13. DOI:
[51]
Claire Le Goues, Michael Pradel, Abhik Roychoudhury, and Satish Chandra. 2021. Automatic program repair. IEEE Softw. 38, 4 (2021), 22–27.
[52]
Teven Le Scao, Thomas Wang, Daniel Hesslow, Stas Bekman, M. Saiful Bari, Stella Biderman, Hady Elsahar, Niklas Muennighoff, Jason Phang, Ofir Press, et al. 2022. What language model to train if you have one million GPU hours? In Findings of the Association for Computational Linguistics (EMNLP ’22). Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.), Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 765–782. DOI:
[53]
Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. DLFix: Context-based code transformation learning for automated program repair. In Proceedings of the 42nd International Conference on Software Engineering (ICSE ’20). Gregg Rothermel and Doo-Hwan Bae (Eds.), ACM, 602–614. DOI:
[54]
Jingjing Liang, Ruyi Ji, Jiajun Jiang, Shurui Zhou, Yiling Lou, Yingfei Xiong, and Gang Huang. 2021. Interactive patch filtering as debugging aid. In Proceedings of the 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), 239–250. DOI:
[55]
Bo Lin, Shangwen Wang, Ming Wen, and Xiaoguang Mao. 2022. Context-aware code change embedding for better patch correctness assessment. ACM Trans. Softw. Eng. Methodol. 31, 3, (May 2022), Article 51, 29 pages. DOI:
[56]
Derrick Lin, James Koppel, Angela Chen, and Armando Solar-Lezama. 2017. QuixBugs: A multi-lingual program repair benchmark set based on the quixey challenge. In Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH ’17). Gail C. Murphy (Ed.), ACM, 55–56. DOI:
[57]
Kui Liu, Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, and Yves Le Traon. 2019. You cannot fix what you cannot find! An investigation of fault localization bias in benchmarking automated program repair systems. In Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification (ICST ’19). IEEE, 102–113. DOI:
[58]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. AVATAR: Fixing semantic bugs with fix patterns of static analysis violations. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19). Xinyu Wang, David Lo, and Emad Shihab (Eds.), IEEE, 456–467. DOI:
[59]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: Revisiting template-based automated program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’19). Dongmei Zhang and Anders Møller (Eds.), ACM, 31–42. DOI:
[60]
Kui Liu, Li Li, Anil Koyuncu, Dongsun Kim, Zhe Liu, Jacques Klein, and Tegawendé F. Bissyandé. 2021. A critical review on the evaluation of automated program repair systems. J. Syst. Softw. 171 (2021), 110817.
[61]
Kui Liu, Shangwen Wang, Anil Koyuncu, Kisub Kim, Tegawendé F. Bissyandé, Dongsun Kim, Peng Wu, Jacques Klein, Xiaoguang Mao, and Yves Le Traon. 2020. On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for Java programs. In Proceedings of the 42nd International Conference on Software Engineering (ICSE ’20). Gregg Rothermel and Doo-Hwan Bae (Eds.), ACM, 615–627. DOI:
[62]
Xuliang Liu and Hao Zhong. 2018. Mining stackoverflow for program repair. In Proceedings of the 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), 118–129. DOI:
[63]
Yu Liu, Sergey Mechtaev, Pavle Subotic, and Abhik Roychoudhury. 2023. Program repair guided by datalog-defined static analysis. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’23). Satish Chandra, Kelly Blincoe, and Paolo Tonella (Eds.), ACM, 1216–1228. DOI:
[64]
Fan Long and Martin C. Rinard. 2015. Staged program repair with condition synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’15). Elisabetta Di Nitto, Mark Harman, and Patrick Heymans (Eds.), ACM, 166–178. DOI:
[65]
Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. CoCoNuT: Combining context-aware neural translation models using ensemble for program repair. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’20). Sarfraz Khurshid and Corina S. Pasareanu (Eds.), ACM, 101–114. DOI:
[66]
Fernanda Madeiral, Simon Urli, Marcelo de Almeida Maia, and Martin Monperrus. 2019. BEARS: An extensible Java bug benchmark for automatic program repair studies. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19). Xinyu Wang, David Lo, and Emad Shihab (Eds.), IEEE, 468–478. DOI:
[67]
Matias Martinez, Thomas Durieux, Romain Sommerard, Jifeng Xuan, and Martin Monperrus. 2017. Automatic repair of real bugs in Java: A large-scale experiment on the Defects4j dataset. Empir. Softw. Eng. 22, 4 (2017), 1936–1964. DOI:
[68]
Matias Martinez, Maria Kechagia, Anjana Perera, Justyna Petke, Federica Sarro, and Aldeida Aleti. 2024. Test-based patch clustering for automatically-generated patches assessment. Empir. Softw. Eng. 29, 5 (2024), 116. DOI:
[69]
Matias Martinez and Martin Monperrus. 2016. ASTOR: A program repair library for Java (demo). In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA ’16). Andreas Zeller and Abhik Roychoudhury (Eds.), ACM, 441–444. DOI:
[70]
Matias Martinez and Martin Monperrus. 2017. Open-ended exploration of the program repair search space with mined templates: The next 8935 patches for Defects4j. arXiv:1712.03854. Retrieved from https://arxiv.org/pdf/1712.03854v1
[71]
Matias Martinez and Martin Monperrus. 2018. Ultra-large repair search space with automatically mined templates: The cardumen mode of astor. In Proceedings of the 10th International Symposium on Search-Based Software Engineering (SSBSE ’18). Thelma Elita Colanzi and Phil McMinn (Eds.), Lecture Notes in Computer Science, Vol. 11036, Springer, 65–86. DOI:
[72]
Derrick McKee, Nathan Burow, and Mathias Payer. 2019. Software ethology: An accurate, resilient, and cross-architecture binary analysis framework. arXiv:1906.02928. Retrieved from https://arxiv.org/abs/1906.02928
[73]
Sergey Mechtaev, Manh-Dung Nguyen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2018. Semantic program repair using a reference implementation. In Proceedings of the 40th International Conference on Software Engineering, 129–139.
[74]
Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). Laura K. Dillon, Willem Visser, and Laurie A. Williams (Eds.), ACM, 691–701. DOI:
[75]
Amirfarhad Nilizadeh, Marlon Calvo, Gary T. Leavens, and Xuan-Bach D. Le. 2021. More reliable test suites for dynamic APR by using counterexamples. In Proceedings of the 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), 208–219. DOI:
[76]
Amirfarhad Nilizadeh, Gary T. Leavens, Xuan-Bach D. Le, Corina S. Păsăreanu, and David R. Cok. 2021. Exploring true test overfitting in dynamic automated program repair using formal methods. In Proceedings of the 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 229–240.
[77]
Carlos Pacheco and Michael D. Ernst. 2007. Randoop: Feedback-directed random testing for Java. In Proceedings Companion of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA ’07). Richard P. Gabriel, David F. Bacon, Cristina Videira Lopes, and Guy L. Steele Jr. (Eds.), ACM, 815–816. DOI:
[78]
Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv:2012.08680. Retrieved from https://arxiv.org/abs/2012.08680
[79]
Quang-Ngoc Phung, Misoo Kim, and Eunseok Lee. 2022. Identifying incorrect patches in program repair based on meaning of source code. IEEE Access 10 (2022), 12012–12030. DOI:
[80]
Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE ’14). Pankaj Jalote, Lionel C. Briand, and André van der Hoek (Eds.), ACM, 254–265. DOI:
[81]
Zichao Qi, Fan Long, Sara Achour, and Martin C. Rinard. 2015. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA ’15). Michal Young and Tao Xie (Eds.), ACM, 24–36. DOI:
[82]
Ripon K. Saha, Yingjun Lyu, Wing Lam, Hiroaki Yoshida, and Mukul R. Prasad. 2018. Bugs.jar: A large-scale, diverse dataset of real-world Java bugs. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR ’18). Andy Zaidman, Yasutaka Kamei, and Emily Hill (Eds.), ACM, 10–13. DOI:
[83]
Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. 2017. Elixir: Effective object-oriented program repair. In Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 648–659. DOI:
[84]
Sina Shamshiri, René Just, José Miguel Rojas, Gordon Fraser, Phil McMinn, and Andrea Arcuri. 2015. Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges (T). In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE ’15). Myra B. Cohen, Lars Grunske, and Michael Whalen (Eds.), IEEE Computer Society, 201–211. DOI:
[85]
Ridwan Shariffdeen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2021. Concolic program repair. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’21). ACM, New York, NY, 390–405. DOI:
[86]
Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure worse than the disease? Overfitting in automated program repair. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’15). Elisabetta Di Nitto, Mark Harman, and Patrick Heymans (Eds.), ACM, 532–543. DOI:
[87]
Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou. 2015. Lstm-based deep learning models for non-factoid answer selection. arXiv:1511.04108. Retrieved from https://arxiv.org/pdf/1511.04108
[88]
Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online defect prediction for imbalanced data. In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE ’15). Antonia Bertolino, Gerardo Canfora, and Sebastian G. Elbaum (Eds.), IEEE Computer Society, 99–108. DOI:
[89]
Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, and Abhik Roychoudhury. 2017. Codeflaws: A programming competition benchmark for evaluating automated program repair tools. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard (Eds.), IEEE Computer Society, 180–182. DOI:
[90]
Shin Hwei Tan, Hiroaki Yoshida, Mukul R. Prasad, and Abhik Roychoudhury. 2016. Anti-patterns in search-based program repair. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’16). ACM, New York, NY, 727–738. DOI:
[91]
Xunzhu Tang, Haoye Tian, Zhenghan Chen, Weiguo Pian, Saad Ezzini, Abdoul Kader Kabore, Andrew Habib, Jacques Klein, and Tegawende F. Bissyande. 2024. Learning to represent patches. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (Lisbon, Portugal) (ICSE-Companion ’24). Association for Computing Machinery, New York, NY, 396–397. DOI:
[92]
Haoye Tian, Yinghua Li, Weiguo Pian, Abdoul Kader Kabore, Kui Liu, Andrew Habib, Jacques Klein, and Tegawendé F. Bissyandé. 2022. Predicting patch correctness based on the similarity of failing test cases. ACM Trans. Softw. Eng. Methodol. 31, 4 (2022), 1–30.
[93]
Haoye Tian, Kui Liu, Abdoul Kader Kaboré, Anil Koyuncu, Li Li, Jacques Klein, and Tegawendé F. Bissyandé. 2020. Evaluating representation learning of code changes for predicting patch correctness in program repair. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 981–992.
[94]
Haoye Tian, Kui Liu, Yinghua Li, Abdoul Kader Kaboré, Anil Koyuncu, Andrew Habib, Li Li, Junhao Wen, Jacques Klein, and Tegawendé F. Bissyandé. 2023. The best of both worlds: Combining learned embeddings with engineered features for accurate prediction of correct patches. ACM Trans. Softw. Eng. Methodol. 32, 4, (May 2023), Article 92, 34 pages. DOI:
[95]
Haoye Tian, Xunzhu Tang, Andrew Habib, Shangwen Wang, Kui Liu, Xin Xia, Jacques Klein, and Tegawendé F. Bissyandé. 2022. Is this change the answer to that problem? Correlating descriptions of bug and code changes for evaluating patch correctness. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 1–13.
[96]
Lewis Tunstall, Leandro Von Werra, and Thomas Wolf. 2022. Natural Language Processing with Transformers. O’Reilly Media, Inc.
[97]
Rijnard van Tonder and Claire Le Goues. 2018. Static automated program repair for heap properties. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, 151–162. DOI:
[98]
Shangwen Wang, Ming Wen, Bo Lin, Hongjun Wu, Yihao Qin, Deqing Zou, Xiaoguang Mao, and Hai Jin. 2020. Automated patch correctness assessment: How far are we? In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 968–980.
[99]
Westley Weimer, Zachary P. Fry, and Stephanie Forrest. 2013. Leveraging program equivalence for adaptive program repair: Models and first results. In Proceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE ’13). Ewen Denney, Tevfik Bultan, and Andreas Zeller (Eds.), IEEE, 356–366. DOI:
[100]
Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the 2009 IEEE 31st International Conference on Software Engineering. IEEE, 364–374.
[101]
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.), ACM, 1–11. DOI:
[102]
Martin White, Michele Tufano, Matías Martínez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and transforming program repair ingredients via deep learning code similarities. In Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 479–490. DOI:
[103]
Qiushi Wu and Kangjie Lu. 2021. On the feasibility of stealthily introducing vulnerabilities in open-source software via hypocrite commits. Proc. Oakland (2021). Retrieved from https://linuxreviews.org/images/d/d9/OpenSourceInsecurity.pdf
[104]
Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE ’23). IEEE Press, 1482–1494. DOI:
[105]
Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: Revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’22). ACM, New York, NY, 959–971. DOI:
[106]
Chunqiu Steven Xia and Lingming Zhang. 2024. Automated program repair via conversation: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. In Proceedings of the 33rd ACM SIGSOFT International Symposium on SoftwareTesting and Analysis (Vienna, Austria) (ISSTA 2024). Association for Computing Machinery, New York, NY, 819–831. DOI:
[107]
Qi Xin and Steven P. Reiss. 2017. Identifying test-suite-overfitted patches through test case generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. Tevfik Bultan and Koushik Sen (Eds.), ACM, 226–236. DOI:
[108]
Qi Xin and Steven P. Reiss. 2017. Leveraging syntax-related code for automated program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’17). Grigore Rosu, Massimiliano Di Penta, and Tien N. Nguyen (Eds.), IEEE Computer Society, 660–670. DOI:
[109]
Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.), ACM, 789–799. DOI:
[110]
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 416–426. DOI:
[111]
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard (Eds.), IEEE/ACM, 416–426. DOI:
[112]
Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Clement, Sebastian R. Lamelas Marcote, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2017. Nopol: Automatic repair of conditional statement bugs in Java programs. IEEE Trans. Software Eng. 43, 1 (2017), 34–55. DOI:
[113]
Dapeng Yan, Kui Liu, Yuqing Niu, Li Li, Zhe Liu, Zhiming Liu, Jacques Klein, and Tegawendé F. Bissyandé. 2022. Crex: Predicting patch correctness in automated repair of C programs through transfer learning of execution semantics. Inf. Softw. Technol. 152 (2022), 107043. DOI:
[114]
Bo Yang and Jinqiu Yang. 2020. Exploring the differences between plausible and correct patches at fine-grained level. In Proceedings of the 2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF), 1–8. DOI:
[115]
Jun Yang, Yuehan Wang, Yiling Lou, Ming Wen, and Lingming Zhang. 2023. Attention: Not just another dataset for patch-correctness checking. arXiv:2207.06590. Retrieved from https://arxiv.org/pdf/2207.06590
[116]
Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. 2017. Better test cases for better automated program repair. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’17). Eric Bodden, Wilhelm Schäfer, Arie van Deursen, and Andrea Zisman (Eds.), ACM, 831–841. DOI:
[117]
He Ye, Jian Gu, Matias Martinez, Thomas Durieux, and Martin Monperrus. 2022. Automated classification of overfitting patches with statically extracted code features. IEEE Trans. Softw. Eng. 48, 8 (2022), 2920–2938. DOI:
[118]
He Ye, Matias Martinez, Thomas Durieux, and Martin Monperrus. 2021. A comprehensive study of automatic program repair on the QuixBugs benchmark. J. Syst. Softw. 171 (2021), 110825.
[119]
He Ye, Matias Martinez, and Martin Monperrus. 2021. Automated patch assessment for program repair at scale. Empir. Softw. Eng. 26, 2 (2021), 20. DOI:
[120]
Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. 2017. Test case generation for program repair: A study of feasibility and effectiveness. arXiv:1703.00198. Retrieved from http://arxiv.org/abs/1703.00198
[121]
Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. 2019. Alleviating patch overfitting with automatic test generation: A study of feasibility and effectiveness for the Nopol repair system. Empir. Softw. Eng. 24, 1 (2019), 33–67. DOI:
[122]
Yuan Yuan and Wolfgang Banzhaf. 2020. ARJA: Automated repair of java programs via multi-objective genetic programming. IEEE Trans. Software Eng. 46, 10 (2020), 1040–1067. DOI:
[123]
Yuan Yuan and Wolfgang Banzhaf. 2020. Toward better evolutionary program repair: An integrated approach. ACM Trans. Softw. Eng. Methodol. 29, 1 (2020), 1–53.
[124]
He Zhang, Muhammad Ali Babar, and Paolo Tell. 2011. Identifying relevant studies in software engineering. Inf. Softw. Technol. 53, 6 (2011), 625–637. DOI:
[125]
Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. 2023. A survey of learning-based automated program repair. ACM Trans. Softw. Eng. Methodol. 33, 2, Article 55 (Dec. 2023), 69 pages. DOI:
[126]
Quanjun Zhang, Chunrong Fang, Weisong Sun, Yan Liu, Tieke He, Xiaodong Hao, and Zhenyu Chen. 2023. Boosting automated patch correctness prediction via pre-trained language model. arXiv:2301.12453. DOI:
[127]
Wenkang Zhong, Chuanyi Li, Jidong Ge, and Bin Luo. 2022. Neural program repair: Systems, challenges and solutions. In Proceedings of the 13th Asia-Pacific Symposium on Internetware, 96–106. DOI:
[128]
Xin Zhou, Bowen Xu, Kisub Kim, DongGyun Han, Thanh Le-Cong, Junda He, Bach Le, and David Lo. 2023. PatchZero: Zero-shot automatic patch correctness assessment. arXiv:2303.00202. Retrieved from https://arxiv.org/pdf/2303.00202
[129]
Qihao Zhu, Zeyu Sun, Yuan-an Xiao, Wenjie Zhang, Kang Yuan, Yingfei Xiong, and Lu Zhang. 2021. A syntax-guided edit decoder for neural program repair. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’21). ACM, New York, NY, 341–353. DOI:

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 34, Issue 2
February 2025
904 pages
EISSN:1557-7392
DOI:10.1145/3703017
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2025
Online AM: 08 November 2024
Accepted: 28 September 2024
Revised: 28 September 2024
Received: 14 August 2023
Published in TOSEM Volume 34, Issue 2

Check for updates

Author Tags

  1. Automated Program Repair
  2. Patch Correctness Assessment
  3. Patch Overfitting Problem

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • Natural Science Foundation of Jiangsu Province
  • Cooperation Fund of Huawei-NJU Creative Laboratory for the Next Programming

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 211
    Total Downloads
  • Downloads (Last 12 months)211
  • Downloads (Last 6 weeks)87
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media