article

Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system

Authors:

Matias Martinez,

Benjamin Danglot,

Thomas Durieux,

Martin MonperrusAuthors Info & Claims

Empirical Software Engineering, Volume 24, Issue 1

Pages 33 - 67

https://doi.org/10.1007/s10664-018-9619-4

Published: 01 February 2019 Publication History

Abstract

Among the many different kinds of program repair techniques, one widely studied family of techniques is called test suite based repair. However, test suites are in essence input-output specifications and are thus typically inadequate for completely specifying the expected behavior of the program under repair. Consequently, the patches generated by test suite based repair techniques can just overfit to the used test suite, and fail to generalize to other tests. We deeply analyze the overfitting problem in program repair and give a classification of this problem. This classification will help the community to better understand and design techniques to defeat the overfitting problem. We further propose and evaluate an approach called UnsatGuided, which aims to alleviate the overfitting problem for synthesis-based repair techniques with automatic test case generation. The approach uses additional automatically generated tests to strengthen the repair constraint used by synthesis-based repair techniques. We analyze the effectiveness of UnsatGuided: 1) analytically with respect to alleviating two different kinds of overfitting issues; 2) empirically based on an experiment over the 224 bugs of the Defects4J repository. The main result is that automatic test generation is effective in alleviating one kind of overfitting, issue---regression introduction, but due to oracle problem, has minimal positive impact on alleviating the other kind of overfitting issue---incomplete fixing.

References

[1]

Almasi MM, Hemmati H, Fraser G, Arcuri A, Benefelds J (2017) An industrial evaluation of unit test generation: Finding real faults in a financial application. In: Proceedings of the 39th International Conference on Software Engineering: Software Engineering in Practice Track, IEEE Press, Piscataway, ICSE-SEIP '17, pp 263-272.

Digital Library

[2]

Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd International Conference on Software Engineering. ACM, New York, ICSE '11, pp 1-10.

Digital Library

[3]

B Le TD, Lo D, Le Goues C, Grunske L (2016) A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, New York, ISSTA 2016, pp 177-188.

Digital Library

[4]

Baresi L, Lanzi PL, Miraz M (2010) Testful: an evolutionary test approach for java. In: 2010 third international conference on Software testing, verification and validation (ICST). IEEE, pp 185- 194.

Digital Library

[5]

Brumley D, cker Chiueh T, Johnson R, Lin H, Song D (2007) Rich: Automatically protecting against integer-based vulnerabilities. In: In Symposium on Network and Distributed Systems Security.

[6]

Cadar C, Dunbar D, Engler DR et al (2008) Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs OSDI, vol 8, pp 209-224.

Digital Library

[7]

Csallner C, Smaragdakis Y (2004) Jcrasher: an automatic robustness tester for java. Softw: Pract Exper 34(11):1025-1050.

Digital Library

[8]

Durieux T, Cornu B, Seinturier L, Monperrus M (2017) Dynamic patch generation for null pointer exceptions using metaprogramming. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 349-358.

[9]

Evans RB, Savoia A (2007) Differential testing: A new approach to change detection. In: The 6th Joint Meeting on European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering: Companion Papers. ACM, New York, ESEC-FSE companion '07, pp 549-552.

Digital Library

[10]

Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. ACM, New York, NY, USA, ESEC/FSE '11, pp 416-419.

Digital Library

[11]

Gao Q, Xiong Y, Mi Y, Zhang L, Yang W, Zhou Z, Xie B, Mei H (2015) Safe memory-leak fixing for c programs. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol 1, pp 459-470.

Digital Library

[12]

Godefroid P, Klarlund N, Sen K (2005) Dart: directed automated random testing. In: ACM Sigplan notices, vol 40. ACM, pp 213-223.

Digital Library

[13]

Goues CL, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54-72.

Digital Library

[14]

Gu Z, Barr ET, Hamilton DJ, Su Z (2010) Has the bug really been fixed? In: Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1. ACM, ICSE '10, pp 55-64.

Digital Library

[15]

Islam M, Csallner C (2010) Dsc+mock: A test case + mock class generator in support of coding against interfaces. In: Proceedings of the Eighth International Workshop on Dynamic Analysis. ACM, New York, WODA '10, pp 26-31.

Digital Library

[16]

Jha S, Gulwani S, Seshia SA, Tiwari A (2010) Oracle-guided component-based program synthesis. In: Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1. ACM, New York, NY, USA, ICSE '10, pp 215-224.

Digital Library

[17]

Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, ASE '05, pp 273-282.

Digital Library

[18]

Just R, Jalali D, Ernst MD (2014a) Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), San Jose, pp 437-440.

Digital Library

[19]

Just R, Jalali D, Inozemtseva L, Ernst MD, Holmes R, Fraser G (2014b) Are mutants a valid substitute for real faults in software testing? In: FSE 2014, Proceedings of the ACM SIGSOFT 22nd Symposium on the Foundations of Software Engineering, Hong Kong, pp 654-665.

Digital Library

[20]

Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, pp 802-811.

Digital Library

[21]

Laghari G, Murgia A, Demeyer S (2016) Fine-tuning spectrum based fault localisation with frequent method item sets. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, ASE 2016, pp 274-285.

Digital Library

[22]

Le XBD, Chu DH, Lo D, Le Goues C, Visser W (2017a) S3: Syntax- and semantic-guided repair synthesis via programming by examples. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, New York, ESEC/FSE 2017, pp 593-604.

Digital Library

[23]

Le XBD, Thung F, Lo d, Le Goues C (2017b) Overfitting in semantics-based automated program repair.

[24]

Liu C, Fei L, Yan X, Han J, Midkiff SP (2006) Statistical debugging: A hypothesis testing-based approach. IEEE Trans Softw Eng 32(10):831-848.

Digital Library

[25]

Liu X, Zeng M, Xiong Y, Zhang L, Huang G (2017) Identifying patch correctness in test-based automatic program repair. arXiv:170609120.

[26]

Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, New York, ESEC/FSE 2015, pp 166-178.

Digital Library

[27]

Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, New York, POPL '16, pp 298-312.

Digital Library

[28]

Long F, Amidon P, Rinard M (2017) Automatic inference of code transforms for patch generation. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, pp 727-739.

Digital Library

[29]

Martinez M, Monperrus M (2016a) Astor: A program repair library for java (demo). In: Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, New York, ISSTA 2016, pp 441-444.

Digital Library

[30]

Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset. Empir Softw Eng 22(4):1936-1964.

Digital Library

[31]

Mechtaev S, Yi J, Roychoudhury A (2015) Directfix: Looking for simple program repairs. In: Proceedings of the 37th International Conference on Software Engineering. Vol 1, IEEE Press, pp 448-458.

Digital Library

[32]

Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th International Conference on Software Engineering. ACM, New York, ICSE '16, pp 691-701.

Digital Library

[33]

Monperrus M (2017) Automatic Software Repair: a Bibliography. ACM Computing Surveys https://hal.archives-ouvertes.fr/hal-01206501/file/survey-automatic-repair.pdf.

Digital Library

[34]

Nguyen HDT, Qi D, Roychoudhury A, Chandra S (2013) Semfix: program repair via semantic analysis. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, Piscataway, ICSE '13, pp 772-781, url http://dl.acm.org/citation.cfm?id=2486788.2486890.

Digital Library

[35]

Pacheco C, Ernst MD (2007) Randoop: feedback-directed random testing for java. In: Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion. ACM, pp 815-816.

Digital Library

[36]

Park S, Hossain B, Hussain I, Csallner C, Grechanik M, Taneja K, Fu C, Xie Q (2012) Carfast: achieving higher statement coverage faster. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, pp 35.

Digital Library

[37]

Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst MD, Pang D, Keller B (2017) Evaluating and improving fault localization. In: Proceedings of the 39th International Conference on Software Engineering. IEEE Press, Piscataway, ICSE '17, pp 609-620.

Digital Library

[38]

Pei Y, Furia CA, Nordio M, Wei Y, Meyer B, Zeller A (2014) Automated fixing of programs with contracts, vol 40.

Digital Library

[39]

Perkins JH, Kim S, Larsen S, Amarasinghe S, Bachrach J, Carbin M, Pacheco C, Sherwood F, Sidiroglou S, Sullivan G, Wong WF, Zibin Y, Ernst MD, Rinard M (2009) Automatically patching errors in deployed software, pp 87-102.

Digital Library

[40]

Prasetya ISWB (2014) T3, a Combinator-Based Random Testing Tool for Java: Benchmarking, Springer International Publishing. Cham, pp 101-110.

[41]

P?s?reanu CS, Rungta N (2010) Symbolic pathfinder: symbolic execution of java bytecode. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, ASE '10, pp 179-180.

Digital Library

[42]

Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th International Conference on Software Engineering. ACM, New York, ICSE 2014, pp 254-265.

Digital Library

[43]

Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of ISSTA. ACM.

Digital Library

[44]

Sen K, Marinov D, Agha G (2005) Cute: a concolic unit testing engine for c. In: ACM SIGSOFT Software engineering notes, vol 30. ACM, pp 263-272.

Digital Library

[45]

Shamshiri S, Just R, Rojas JM, Fraser G, McMinn P, Arcuri A (2015) Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges (t). 2015 30Th IEEE/ACM international conference on automated software engineering (ASE), pp 201-211.

Digital Library

[46]

Shaw A, Doggett D, Hafiz M (2014) Automatically fixing c buffer overflows using program transformations. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp 124-135.

Digital Library

[47]

Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? overfitting in automated program repair. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, pp 532-543.

Digital Library

[48]

Taneja K, Xie T (2008) Diffgen: Automated regression unit-test generation. 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, pp 407-410.

Digital Library

[49]

Tian Y, Ray B (2017) Automatically diagnosing and repairing error handling bugs in c. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, New York, ESEC/FSE 2017, pp 752-762.

Digital Library

[50]

Tillmann N, De Halleux J (2008) Pex: White box test generation for.net. In: Proceedings of the 2Nd International Conference on Tests and Proofs. Springer-Verlag, Berlin, TAP'08, pp 134-153. http://dl.acm.org/citation.cfm?id=1792786.1792798.

Digital Library

[51]

Tonella P (2004) Evolutionary testing of classes. SIGSOFT Softw Eng Notes 29(4):119-128.

Digital Library

[52]

Wei Y, Pei Y, Furia CA, Silva LS, Buchholz S, Meyer B, Zeller A (2010) Automated fixing of programs with contracts. In: Proceedings of the 19th International Symposium on Software Testing and Analysis. ACM, New York, ISSTA '10, pp 61-72.

Digital Library

[53]

Weimer W, Fry ZP, Forrest S (2013) Leveraging program equivalence for adaptive program repair: Models and first results. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 356-366.

Digital Library

[54]

Xie T (2006) Augmenting automatically generated unit-test suites with regression oracle checking. In: Proceedings of the 20th European Conference on Object-Oriented Programming, Springer-Verlag, Berlin, ECOOP'06.

Digital Library

[55]

Xin Q, Reiss SP (2017) Identifying test-suite-overfitted patches through test case generation. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, New York, ISSTA 2017, pp 226-236.

Digital Library

[56]

Xiong Y, Wang J, Yan R, Zhang J, Han S, Huang G, Zhang L (2017) Precise condition synthesis for program repair. In: Proceedings of the 39th International Conference on Software Engineering. IEEE Press, Piscataway, ICSE '17, pp 416-426.

Digital Library

[57]

Xuan J, Martinez M, Demarco F, Clément M, Lamelas S, Durieux T, Le Berre D, Monperrus M (2016) Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Transactions on Software Engineering. https://hal.archives-ouvertes.fr/hal-01285008/document.

Digital Library

[58]

Yang J, Zhikhartsev A, Liu Y, Tan L (2017) Better test cases for better automated program repair. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, pp 831-841.

Digital Library

[59]

Yi J, Tan SH, Mechtaev S, Böhme M, Roychoudhury A (2017) A correlation study between automated program repair and test-suite metrics. Empir Softw Eng. 1-32

Digital Library

[60]

Yu Z, Hu H, Bai C, Cai KY, Wong WE (2011) Gui software fault localization using n-gram analysis. In: 2011 IEEE 13th International Symposium on High-Assurance Systems Engineering, pp 325-332.

Digital Library

[61]

Yu Z, Bai C, Cai KY (2013) Mutation-oriented Test data augmentation for gui software fault localization. Inf Softw Technol 55(12):2076-2098.

Digital Library

[62]

Yu Z, Bai C, Cai KY (2015) Does the failing test execute a single or multiple faults?: An approach to classifying failing tests. In: Proceedings of the 37th International Conference on Software Engineering - Volume 1, IEEE Press, Piscataway, ICSE '15, pp 924-935. http://dl.acm.org/citation.cfm?id=2818754.2818866.

Digital Library

[63]

Yu Z, Martinez M, Danglot B, Durieux T, Monperrus M (2017) Test Case Generation for Program Repair: A Study of Feasibility and Effectiveness. Technical Report 1703.00198v1, ArXiv:1703.00198.

[64]

Zhang X, Gupta N, Gupta R (2006) Locating faults through automated predicate switching. In: Proceedings of the 28th International Conference on Software Engineering. ACM, New York, ICSE '06, pp 272-281.

Digital Library

Cited By

Fei ZGe JLi CWang TLi YZhang HHuang LLuo B(2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3702972
Eladawy HLe Goues CBrun YRoychoudhury APaiva AAbreu RStorey M(2024)Automated Program Repair, What Is It Good For? Not Absolutely Nothing!Proceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639095(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639095
Ismayilzada ERahman MKim DYi J(2023)Poracle: Testing Patches under Preservation Conditions to Combat the Overfitting Problem of Program RepairACM Transactions on Software Engineering and Methodology10.1145/362529333:2(1-39)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1145/3625293
Show More Cited By

Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

Identifying test-suite-overfitted patches through test case generation
ISSTA 2017: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis

A typical automatic program repair technique that uses a test suite as the correct criterion can produce a patched program that is test-suite-overfitted, or overfitting, which passes the test suite but does not actually repair the bug. In this paper, we ...
Supporting Tool for Automatic Specification-Based Test Case Generation
Revised Selected Papers of the Second International Workshop on Structured Object-Oriented Formal Language and Method - Volume 7787

Automatic test case generation is a potentially effective technique for program testing, but it still suffers from the lack of appropriate tool support. Our research presented in this paper mainly focuses on the developing of a tool for automatic test ...
Toward granular search-based automatic unit test case generation
Abstract
Unit testing verifies the presence of faults in individual software components. Previous research has been targeting the automatic generation of unit tests through the adoption of random or search-based algorithms. Despite their effectiveness, ...

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering

Empirical Software Engineering Volume 24, Issue 1

February 2019

535 pages

ISSN:1382-3256

Issue’s Table of Contents

Copyright © Copyright © 2019 Springer Science+Business Media, LLC, part of Springer Nature.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 February 2019

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fei ZGe JLi CWang TLi YZhang HHuang LLuo B(2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3702972
Eladawy HLe Goues CBrun YRoychoudhury APaiva AAbreu RStorey M(2024)Automated Program Repair, What Is It Good For? Not Absolutely Nothing!Proceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639095(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639095
Ismayilzada ERahman MKim DYi J(2023)Poracle: Testing Patches under Preservation Conditions to Combat the Overfitting Problem of Program RepairACM Transactions on Software Engineering and Methodology10.1145/362529333:2(1-39)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1145/3625293
Li LLiang YLiu ZYu ZChandra SBlincoe KTonella P(2023)Understanding Solidity Event Logging Practices in the WildProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616342(300-312)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616342
Du YYu ZChandra SBlincoe KTonella P(2023)Pre-training Code Representation with Semantic Flow Graph for Effective Bug LocalizationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616338(579-591)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616338
First ERabe MRinger TBrun YChandra SBlincoe KTonella P(2023)Baldur: Whole-Proof Generation and Repair with Large Language ModelsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616243(1229-1241)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616243
Tian HLiu KLi YKaboré AKoyuncu AHabib ALi LWen JKlein JBissyandé T(2023)The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct PatchesACM Transactions on Software Engineering and Methodology10.1145/357603932:4(1-34)Online publication date: 27-May-2023
https://dl.acm.org/doi/10.1145/3576039
Geethal CBöhme MPham V(2023)Human-in-the-Loop Automatic Program RepairIEEE Transactions on Software Engineering10.1109/TSE.2023.330505249:10(4526-4549)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3305052
Yu ZMartinez MChen ZBissyandé TMonperrus M(2023)Learning the Relation Between Code Features and Code Transforms With Structured PredictionIEEE Transactions on Software Engineering10.1109/TSE.2023.327538049:7(3872-3900)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3275380
Le-Cong TLuong DLe XLo DTran NQuang-Huy BHuynh Q(2023)Invalidator: Automated Patch Correctness Assessment Via Semantic and Syntactic ReasoningIEEE Transactions on Software Engineering10.1109/TSE.2023.325517749:6(3411-3429)Online publication date: 10-Mar-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3255177
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents