Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3533767.3534368acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Open access

Patch correctness assessment in automated program repair based on the impact of patches on production and test code

Published: 18 July 2022 Publication History

Abstract

Test-based generate-and-validate automated program repair (APR) systems often generate many patches that pass the test suite without fixing the bug. The generated patches must be manually inspected by the developers, so previous research proposed various techniques for automatic correctness assessment of APR-generated patches. Among them, dynamic patch correctness assessment techniques rely on the assumption that, when running the originally passing test cases, the correct patches will not alter the program behavior in a significant way, e.g., removing the code implementing correct functionality of the program. In this paper, we propose and evaluate a novel technique, named Shibboleth, for automatic correctness assessment of the patches generated by test-based generate-and-validate APR systems. Unlike existing works, the impact of the patches is captured along three complementary facets, allowing more effective patch correctness assessment. Specifically, we measure the impact of patches on both production code (via syntactic and semantic similarity) and test code (via code coverage of passing tests) to separate the patches that result in similar programs and that do not delete desired program elements. Shibboleth assesses the correctness of patches via both ranking and classification. We evaluated Shibboleth on 1,871 patches, generated by 29 Java-based APR systems for Defects4J programs. The technique outperforms state-of-the-art ranking and classification techniques. Specifically, in our ranking data set, in 43% (66%) of the cases, Shibboleth ranks the correct patch in top-1 (top-2) positions, and in classification mode applied on our classification data set, it achieves an accuracy and F1-score of 0.887 and 0.852, respectively.

References

[1]
James H Andrews, Lionel C Briand, and Yvan Labiche. 2005. Is mutation an appropriate tool for testing experiments? In ICSE. 402–411.
[2]
Andrea Arcuri. 2011. Evolutionary repair of faulty software. ASC, 3494–3514.
[3]
Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra. 2019. Getafix: Learning to fix bugs automatically. OOPSLA, 1–27.
[4]
Earl T. Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, and Federica Sarro. 2014. The plastic surgery hypothesis. In FSE. 306–317.
[5]
Matthew A Carlton and Jay L Devore. 2017. Probability with applications in engineering, science, and technology. Springer.
[6]
Padraic Cashin, Carianne Martinez, Westley Weimer, and Stephanie Forrest. 2019. Understanding automatically-generated patches through symbolic invariant differences. In ASE. 411–414.
[7]
Adriane Chapman, James Cheney, and Simon Miles. 2017. Guest Editorial: The Provenance of Online Data. TIT, 1–3.
[8]
Lingchao Chen, Yicheng Ouyang, and Lingming Zhang. 2021. Fast and Precise On-the-fly Patch Validation for All. In ICSE. 1123–1134.
[9]
Liushan Chen, Yu Pei, and Carlo A. Furia. 2017. Contract-based program repair without the contracts. In ASE. 637–647.
[10]
Zimin Chen, Steve James Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2019. SequenceR: Sequence-to-sequence learning for end-to-end program repair. TSE, 1–1.
[11]
Zimin Chen and Martin Monperrus. 2018. The remarkable role of similarity in redundancy-based program repair. arXiv.
[12]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT press.
[13]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. ML, 20 (1995), 273–297.
[14]
Viktor Csuvik, Dániel Horváth, Ferenc Horváth, and László Vidács. 2020. Utilizing Source Code Embeddings to Identify Correct Patches. In IBF. 18–25.
[15]
Xuan-Bach D. Le, Lingfeng Bao, David Lo, Xin Xia, Shanping Li, and Corina Pasareanu. 2019. On reliability of patch correctness assessment. In ICSE. 524–535.
[16]
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. 2017. S3: syntax-and semantic-guided repair synthesis via programming by examples. In FSE. 593–604.
[17]
Xuan-Bach D. Le, David Lo, and Claire Le Goues. 2016. History driven automated program repair. In SANER. 213–224.
[18]
Vidroha Debroy and W. Eric Wong. 2010. Using mutation to automatically suggest fixes for faulty programs. In ICST. 65–74.
[19]
Defects4J Contributors. 2020. http://bit.ly/2PY3yDa Accessed: 01/22.
[20]
Richard A. DeMillo, Richard J. Lipton, and Frederick G. Sayward. 1978. Hints on test data selection: Help for the practicing programmer. IEEE Computer, 11 (1978), 34–41.
[21]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv.
[22]
Zhen Yu Ding, Yiwei Lyu, Christopher S. Timperley, and Claire Le Goues. 2019. Leveraging Program Invariants to Promote Population Diversity in Search-Based Automatic Program Repair. In GI. 2–9.
[23]
Thomas Durieux and Martin Monperrus. 2016. Dynamoth: dynamic code synthesis for automatic program repair. In WAST. 85–91.
[24]
Loris D’Antoni, Roopsha Samanta, and Rishabh Singh. 2016. Qlose: Program repair with quantitative objectives. In CAV. 383–401.
[25]
Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. SCP, 69 (2007), 35–45.
[26]
Yoav Freund, Robert Schapire, and Naoki Abe. 1999. A short introduction to boosting. JSJA, 14, 771–780 (1999).
[27]
Qing Gao, Yingfei Xiong, Yaqing Mi, Lu Zhang, Weikun Yang, Zhaoping Zhou, Bing Xie, and Hong Mei. 2015. Safe memory-leak fixing for c programs. In ICSE. 459–470.
[28]
Xiang Gao, Sergey Mechtaev, and Abhik Roychoudhury. 2019. Crash-avoiding Program Repair. In ISSTA. 8–18.
[29]
Ali Ghanbari. 2020. ObjSim: Lightweight Automatic Patch Prioritization via Object Similarity. In ISSTA. 541–544.
[30]
Ali Ghanbari. 2020. ObjSim: Lightweight Automatic Patch Prioritization via Object Similarity. http://bit.ly/2I62OIs Accessed: 05/22.
[31]
Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical program repair via bytecode mutation. In ISSTA. 19–30.
[32]
Ali Ghanbari and Andrian Marcus. 2022. Patch Correctness Assessment in Automated Program Repair Based on the Impact of Patches on Production and Test Code. https://github.com/ali-ghanbari/shibboleth Accessed: 05/22.
[33]
James Gosling, Bill Joy, Guy L. Steele, Gilad Bracha, and Alex Buckley. 2014. The Java Language Specification, Java SE 8 Edition. Addison-Wesley Professional.
[34]
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. Deepfix: Fixing common c language errors by deep learning. In AAAI. 1345–1351.
[35]
Mary Jean Harrold, Gregg Rothermel, Rui Wu, and Liu Yi. 1998. An Empirical Investigation of Program Spectra. In PASTE. 83–90.
[36]
Haibo He and Edwardo A Garcia. 2009. Learning from Imbalanced Data. TKDE, 1263–1284.
[37]
Tin Kam Ho. 1995. Random decision forests. In ICDAR. 278–282.
[38]
Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards practical program repair with on-demand candidate generation. In ICSE. 12–23.
[39]
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An introduction to statistical learning. 112, Springer.
[40]
JavaParser Contributors. 2020. JavaParser. http://bit.ly/381qqvu Accessed: 01/22.
[41]
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In ISSTA. 298–309.
[42]
Iman Keivanloo, Chanchal K Roy, and Juergen Rilling. 2014. SeByte: Scalable clone and similarity search for bytecode. SCP, 95 (2014), 426–444.
[43]
Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from human-written patches. In ICSE. 802–811.
[44]
Jindae Kim and Sunghun Kim. 2019. Automatic patch generation with context-based change application. ESE, 24 (2019), 4071–4106.
[45]
Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ Expectations on Automated Fault Localization. In ISSTA. 165–176.
[46]
Anil Koyuncu, Kui Liu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. 2020. Fixminer: Mining relevant fix patterns for automated program repair. ESE, 25 (2020), 1–45.
[47]
Fred Kröger. 1987. Temporal logic of programs. 8, Springer Science & Business Media.
[48]
Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A generic method for automatic software repair. TSE, 38 (2012), 54–72.
[49]
Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. CACM, 62 (2019), 56–65.
[50]
A. Solar Lezama. 2008. Program synthesis by sketching. Ph.D. Dissertation. University of California, Berkeley.
[51]
Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2014. The Java virtual machine specification. Pearson Education.
[52]
Kui Liu, Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, and Yves Le Traon. 2019. You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems. In ICST. 102–113.
[53]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: revisiting template-based automated program repair. In ISSTA. 31–42.
[54]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. AVATAR: Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations. In SANER. 456–467.
[55]
Xuliang Liu and Hao Zhong. 2018. Mining stackoverflow for program repair. In SANER. 118–129.
[56]
Fan Long and Martin Rinard. 2015. Staged program repair with condition synthesis. In FSE. 166–178.
[57]
Fan Long and Martin Rinard. 2016. Automatic patch generation by learning correct code. In POPL. 298–312.
[58]
Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. OOPSLA, 1–28.
[59]
Alexandru Marginean, Johannes Bader, Satish Chandra, Mark Harman, Yue Jia, Ke Mao, Alexander Mols, and Andrew Scott. 2019. Sapfix: Automated end-to-end repair at scale. In ICSE-SEIP. 269–278.
[60]
Matias Martinez and Martin Monperrus. 2016. Astor: A Program Repair Library for Java. In ISSTA. 441–444.
[61]
Matias Martinez and Martin Monperrus. 2018. Ultra-large repair search space with automatically mined templates: The cardumen mode of astor. In SBSE. 65–86.
[62]
Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. Directfix: Looking for simple program repairs. In ICSE. 448–458.
[63]
Ali Mesbah, Andrew Rice, Emily Johnston, Nick Glorioso, and Edward Aftandilian. 2019. DeepDelta: learning to repair compilation errors. In FSE. 925–936.
[64]
Martin Monperrus. 2018. Automatic Software Repair: A Bibliography. CSUR, 51 (2018), 1–17.
[65]
Martin Monperrus. 2018. The Living Review on Automated Program Repair. HAL/archives-ouvertes.fr.
[66]
Martin Monperrus, Simon Urli, Thomas Durieux, Matias Martinez, Benoit Baudry, and Lionel Seinturier. 2019. Repairnator Patches Programs Automatically. Ubiquity, 1–12.
[67]
Oracle Corporation. 2020. Java Agent. https://bit.ly/3czmzFV Accessed: 01/22.
[68]
OW2 Consortium. 2020. ASM Bytecode Manipulation Framework. http://bit.ly/3fsPL2r Accessed: 01/22.
[69]
Carlos Pacheco and Michael D. Ernst. 2007. Randoop: Feedback-directed Random Testing for Java. In OOPSLA. 815–816.
[70]
Benjamin C. Pierce. 2002. Types and programming languages. MIT press.
[71]
Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In ICSE. 254–265.
[72]
Thomas Reps, Thomas Ball, Manuvir Das, and James Larus. 1997. The Use of Program Profiling for Software Maintenance with Applications to the Year 2000 Problem. In FSE. 432–449.
[73]
Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. 2017. Elixir: Effective object-oriented program repair. In ASE. 648–659.
[74]
Seemanta Saha. 2019. Harnessing evolution for multi-hunk program repair. In ICSE. 13–24.
[75]
Eric Schulte, Stephanie Forrest, and Westley Weimer. 2010. Automated program repair through the evolution of assembly code. In ASE. 313–316.
[76]
scikit-learn Contributors. 2020. scikit-learn: Machine Learning in Python. http://bit.ly/3a70cZt Accessed: 01/22.
[77]
Ridwan Shariffdeen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2021. Concolic program repair. In PLDI. 390–405.
[78]
Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure worse than the disease? overfitting in automated program repair. In FSE. 532–543.
[79]
Shin Hwei Tan, Hiroaki Yoshida, Mukul R. Prasad, and Abhik Roychoudhury. 2016. Anti-patterns in search-based program repair. In FSE. 727–738.
[80]
Haoye Tian, Kui Liu, Abdoul Kader Kaboreé, Anil Koyuncu, Li Li, Jacques Klein, and Tegawendé F. Bissyandé. 2020. Evaluating representation learning of code changes for predicting patch correctness in program repair. In ASE. 981–992.
[81]
Ganesha Upadhyaya and Hridesh Rajan. 2018. Collective program analysis. In ICSE. 620–631.
[82]
Rijnard van Tonder and Claire Le Goues. 2018. Static automated program repair for heap properties. In ICSE. 151–162.
[83]
Christian Von Essen and Barbara Jobstmann. 2015. Program repair without regret. FMSD, 47 (2015), 26–50.
[84]
Shangwen Wang, Ming Wen, Bo Lin, Hongjun Wu, Yihao Qin, Deqing Zou, Xiaoguang Mao, and Hai Jin. 2020. Automated Patch Correctness Assessment: How Far are We? In ASE. 968–980.
[85]
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In ICSE. 1–11.
[86]
Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and transforming program repair ingredients via deep learning code similarities. In SANER. 479–490.
[87]
W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. TSE, 42 (2016), 707–740.
[88]
Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. 2007. Top 10 Algorithms in Data Mining. KIS, 14 (2007), 1–37.
[89]
Qi Xin and Steven P. Reiss. 2017. Identifying test-suite-overfitted patches through test case generation. In ISSTA. 226–236.
[90]
Qi Xin and Steven P. Reiss. 2017. Leveraging syntax-related code for automated program repair. In ASE. 660–670.
[91]
Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In ICSE. 789–799.
[92]
Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. A tool for identifying patch correctness in test-based program repair. http://bit.ly/390riQb Accessed: 01/22.
[93]
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In ICSE. 416–426.
[94]
Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Clement, Sebastian R. Lamelas Marcote, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2017. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. TSE, 43 (2017), 34–55.
[95]
Bo Yang and Jinqiu Yang. 2020. Exploring the Differences between Plausible and Correct Patches at Fine-Grained Level. In IBF. 1–8.
[96]
Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. 2017. Better test cases for better automated program repair. In FSE. 831–841.
[97]
H. Ye, J. Gu, M. Martinez, T. Durieux, and M. Monperrus. 2021. Automated Classification of Overfitting Patches with Statically Extracted Code Features. TSE, 1–1.
[98]
He Ye, Matias Martinez, and Martin Monperrus. 2019. Automated Patch Assessment for Program Repair at Scale. arXiv.
[99]
Yuan Yuan and Wolfgang Banzhaf. 2018. ARJA: Automated repair of java programs via multi-objective genetic programming. TSE, 46 (2018).
[100]
Yuan Yuan and Wolfgang Banzhaf. 2020. Toward Better Evolutionary Program Repair: An Integrated Approach. TOSEM, 29 (2020), 1–53.
[101]
Jie Zhang, Lingming Zhang, Mark Harman, Dan Hao, Yue Jia, and Lu Zhang. 2018. Predictive mutation testing. TSE, 45 (2018), 898–918.

Cited By

View all
  • (2024)Enhancing the Efficiency of Automated Program Repair via Greybox AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695602(1719-1731)Online publication date: 27-Oct-2024
  • (2024)The Patch Overfitting Problem in Automated Program Repair: Practical Magnitude and a Baseline for Realistic BenchmarkingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663776(452-456)Online publication date: 10-Jul-2024
  • (2024)FixCheck: A Tool for Improving Patch Correctness AnalysisProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685308(1856-1860)Online publication date: 11-Sep-2024
  • Show More Cited By

Index Terms

  1. Patch correctness assessment in automated program repair based on the impact of patches on production and test code

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISSTA 2022: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis
    July 2022
    808 pages
    ISBN:9781450393799
    DOI:10.1145/3533767
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Automated Program Repair
    2. Branch Coverage
    3. Patch Correctness Assessment
    4. Similarity

    Qualifiers

    • Research-article

    Conference

    ISSTA '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 58 of 213 submissions, 27%

    Upcoming Conference

    ISSTA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)393
    • Downloads (Last 6 weeks)52
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing the Efficiency of Automated Program Repair via Greybox AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695602(1719-1731)Online publication date: 27-Oct-2024
    • (2024)The Patch Overfitting Problem in Automated Program Repair: Practical Magnitude and a Baseline for Realistic BenchmarkingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663776(452-456)Online publication date: 10-Jul-2024
    • (2024)FixCheck: A Tool for Improving Patch Correctness AnalysisProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685308(1856-1860)Online publication date: 11-Sep-2024
    • (2024)JIT-Smart: A Multi-task Learning Framework for Just-in-Time Defect Prediction and LocalizationProceedings of the ACM on Software Engineering10.1145/36437271:FSE(1-23)Online publication date: 12-Jul-2024
    • (2024)Leveraging Large Language Model for Automatic Patch Correctness AssessmentIEEE Transactions on Software Engineering10.1109/TSE.2024.345225250:11(2865-2883)Online publication date: 1-Nov-2024
    • (2024)Accelerating Patch Validation for Program Repair With Interception-Based Execution SchedulingIEEE Transactions on Software Engineering10.1109/TSE.2024.335996950:3(618-635)Online publication date: 1-Mar-2024
    • (2024)Improving Patch Correctness Analysis via Random Testing and Large Language Models2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00036(317-328)Online publication date: 27-May-2024
    • (2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
    • (2023)Automated Program Repair from Fuzzing PerspectiveProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598101(854-866)Online publication date: 12-Jul-2023
    • (2023)Analysis of vulnerability fixing process in the presence of incorrect patchesJournal of Systems and Software10.1016/j.jss.2022.111525195:COnline publication date: 1-Jan-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media