research-article

Better test cases for better automated program repair

Authors:

Alexey Zhikhartsev,

Lin TanAuthors Info & Claims

ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pages 831 - 841

https://doi.org/10.1145/3106237.3106274

Published: 21 August 2017 Publication History

Abstract

Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad).

References

[1]

2016. American Fuzzy Lop. (2016). http://lcamtuf.coredump.cx/afl/. 2016.

[2]

gcov—a Test Coverage Program. (2016).

[3]

https://gcc.gnu.org/onlinedocs/gcc/Gcov.html. 2016. Valgrind. (2016). http://valgrind.org/.

[4]

Paul Ammann and Jeff Offutt. 2008.

[5]

Introduction to Software Testing (1 ed.). Cambridge University Press, New York, NY, USA.

[6]

Earl T. Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, and Federica Sarro. 2014. The Plastic Surgery Hypothesis. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 306–317.

Digital Library

[7]

Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI 2008). USENIX Association, Berkeley, CA, USA, 209–224.

Digital Library

[8]

Christoph Csallner and Yannis Smaragdakis. 2004.

[9]

JCrasher: An automatic robustness tester for Java. Software—Practice & Experience 34, 11 (Sept. 2004), 1025–1050.

Digital Library

[10]

Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (FSE 2011). ACM, 416–419.

Digital Library

[11]

Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic Test Suite Generation for Object-oriented Software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE 2011). ACM, New York, NY, USA, 416–419.

Digital Library

[12]

Juan Pablo Galeotti, Gordon Fraser, and Andrea Arcuri. 2014.

[13]

Extending a Search-Based Test Generator with Adaptive Dynamic Symbolic Execution (Tool paper). In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA 2014). ACM, ACM, New York, NY, USA, 421–424.

Digital Library

[14]

S Hocevar. 2011. zzuf—multi-purpose fuzzer. (2011).

[15]

Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, 435–445.

Digital Library

[16]

René Just, Darioush Jalali, Laura Inozemtseva, Michael D. Ernst, Reid Holmes, and Gordon Fraser. 2014. Are Mutants a Valid Substitute for Real Faults in Software Testing?. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 654–665.

Digital Library

[17]

Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-written Patches. In Proceedings of the 2013 International Conference on Software Engineering (ICSE 2013). IEEE Press, Piscataway, NJ, USA, 802–811.

Digital Library

[18]

X. B. D. Le, D. Lo, and C. L. Goues. 2016.

[19]

History Driven Program Repair. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016), Vol. 1. 213–224.

[20]

Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 2012 International Conference on Software Engineering (ICSE 2012). IEEE Press, Piscataway, NJ, USA, 3–13.

Digital Library

[21]

Xinyuan Liu, Muhan Zeng, Yingfei Xiong, Lu Zhang, and Gang Huang. 2017.

[22]

Identifying Patch Correctness in Test-Based Automatic Program Repair. CoRR abs/1706.09120 (2017).

[23]

Fan Long and Martin Rinard. 2015. Staged program repair with condition synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE (FSE 2015). 166–178.

Digital Library

[24]

Fan Long and Martin Rinard. 2016. An analysis of the search spaces for generate and validate patch generation systems. In Proceedings of the 38th International Conference on Software Engineering (ICSE 2016). ACM, 702–713.

Digital Library

[25]

Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2016). ACM, New York, NY, USA, 298–312.

Digital Library

[26]

Matias Martinez, Westley Weimer, and Martin Monperrus. 2014. Do the Fix Ingredients Already Exist? An Empirical Inquiry into the Redundancy Assumptions of Program Repair Approaches. In Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, USA, 492–495.

Digital Library

[27]

Richard McNally, Ken Yiu, Duncan Grove, and Damien Gerhardy. 2012.

[28]

Fuzzing: the state of the art. Technical Report. DTIC Document.

[29]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. Directfix: Looking for simple program repairs. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE 2015), Vol. 1. IEEE, 448–458.

Digital Library

[30]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th International Conference on Software Engineering. ACM, 691–701.

Digital Library

[31]

Barton P Miller, Louis Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilities. Commun. ACM 33, 12 (1990), 32–44.

Digital Library

[32]

Carlos Pacheco and Michael D Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN conference on Objectoriented programming systems and applications companion. ACM, 815–816.

Digital Library

[33]

Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007.

[34]

Feedback-Directed Random Test Generation. In Proceedings of the 29th International Conference on Software Engineering (ICSE 2007). IEEE Computer Society, Washington, DC, USA, 75–84.

Digital Library

[35]

Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). 254–265.

Digital Library

[36]

Zichao Qi, Fan Long, Sara Anchor, and Martin Rinard. An Analysis of Patch Plausibility and Correctness for Generate-And-Validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). 24–36.

Digital Library

[37]

Koushik Sen, Darko Marinov, and Gul Agha. 2005.

[38]

CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY, USA, 263–272.

Digital Library

[39]

Tan Shin Weiand, H Yoshida, Prasad M, and A. Roychoudhury. Anti-patterns in Search-based Program Repair. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). 727– 738.

Digital Library

[40]

Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015.

[41]

Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 532–543.

Digital Library

[42]

Michael Sutton. 2012. FileFuzz. (2012).

[43]

Shin Hwei Tan and Abhik Roychoudhury. 2015.

[44]

relifix: Automated Repair of Software Regressions. In Proceedings of the 2015 Internaltional Conference on Software Engineering (ICSE 2015). 471–482.

Digital Library

[45]

Westley Weimer, Zachary P Fry, and Stephen Forrest. 2013. Leveraging program equivalence for adaptive program repair: Models and first results. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 356–366.

Digital Library

[46]

Tao Xie. 2006.

[47]

Augmenting Automatically Generated Unit-test Suites with Regression Oracle Checking. In Proceedings of the 20th European Conference on Object-Oriented Programming (ECOOP 2006). Springer-Verlag, Berlin, Heidelberg, 380–403.

Digital Library

[48]

Qi Xin and Steven P. Reiss. 2017.

Digital Library

[49]

Identifying Test-Suite-Overfitted Patches through Test Case Generation. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA 2017).

Digital Library

[50]

J. Xuan, M. Martinez, F. DeMarco, M. Clement, S. Lamelas Marcote, T. Durieux, D. Le Berre, and M. Monperrus. 2016. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Transactions on Software Engineering PP, 99 (2016), 1–1.

Digital Library

[51]

Z. Yu, M. Martinez, B. Danglot, T. Durieux, and M. Monperrus. 2017. Test Case Generation for Program Repair: A Study of Feasibility and Effectiveness. ArXiv e-prints (March 2017). arXiv:cs.SE/1703.00198

[52]

Yucheng Zhang and Ali Mesbah. 2015.

[53]

Assertions Are Strongly Correlated with Test Suite Effectiveness. In Proceedings of the joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2015). ACM, 214–224. Abstract 1 Introduction 2 Background on Automated G&V Program Repair 3 Approach 3.1 Generating New Test Cases Using Fuzz Testing 3.2 Generating Memory-Safety Oracles 3.3 Measuring the Overfitness of a Patch Using an Overfitness Metric (O-measure) 3.4 An Optimized Setting of Opad 4 Evaluation 5 Threats to Validity 6 Related Work 7 Conclusions References

Digital Library

Cited By

Zhao LLi HChen YPan XGuo S(2025)Structuring Semantic‐Aware Relations Between Bugs and Patches for Accurate Patch EvaluationJournal of Software: Evolution and Process10.1002/smr.7000137:2Online publication date: 2-Feb-2025
https://doi.org/10.1002/smr.70001
Fei ZGe JLi CWang TLi YZhang HHuang LLuo B(2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3702972
Huang KXu ZYang SSun HLi XYan ZZhang Y(2024)Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and OpportunitiesACM Computing Surveys10.1145/369645057:2(1-43)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3696450
Show More Cited By

Index Terms

Better test cases for better automated program repair
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Identifying patch correctness in test-based program repair
ICSE '18: Proceedings of the 40th International Conference on Software Engineering

Test-based automatic program repair has attracted a lot of attention in recent years. However, the test suites in practice are often too weak to guarantee correctness and existing approaches often generate a large number of incorrect patches.

To reduce ...
Context-aware patch generation for better automated program repair
ICSE '18: Proceedings of the 40th International Conference on Software Engineering

The effectiveness of search-based automated program repair is limited in the number of correct patches that can be successfully generated. There are two causes of such limitation. First, the search space does not contain the correct patch. Second, the ...
When Automated Program Repair Meets Regression Testing—An Extensive Study on Two Million Patches
In recent years, Automated Program Repair (APR) has been extensively studied in academia and even drawn wide attention from the industry. However, APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

August 2017

1073 pages

ISBN:9781450351058

DOI:10.1145/3106237

General Chairs:
Eric Bodden
Paderborn University, Germany / Fraunhofer IEM, Germany
,
Wilhelm Schäfer
Paderborn University, Germany
,
Program Chairs:
Arie van Deursen
Delft University of Technology, Netherlands
,
Andrea Zisman
Open University, UK

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE'17

Sponsor:

SIGSOFT

ESEC/FSE'17: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering

September 4 - 8, 2017

Paderborn, Germany

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

87
Total Citations
View Citations
816
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)13

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao LLi HChen YPan XGuo S(2025)Structuring Semantic‐Aware Relations Between Bugs and Patches for Accurate Patch EvaluationJournal of Software: Evolution and Process10.1002/smr.7000137:2Online publication date: 2-Feb-2025
https://doi.org/10.1002/smr.70001
Fei ZGe JLi CWang TLi YZhang HHuang LLuo B(2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3702972
Huang KXu ZYang SSun HLi XYan ZZhang Y(2024)Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and OpportunitiesACM Computing Surveys10.1145/369645057:2(1-43)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3696450
Chen MLiu ZTao HHong YLo DXia XSun JFilkov VRay BZhou M(2024)B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible TestsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695536(1693-1705)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695536
Smytzek MEberlein MSerçe BGrunske LZeller Ad'Amorim M(2024)Tests4Py: A Benchmark for System TestingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663798(557-561)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663798
Petke JMartinez MKechagia MAleti ASarro Fd'Amorim M(2024)The Patch Overfitting Problem in Automated Program Repair: Practical Magnitude and a Baseline for Realistic BenchmarkingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663776(452-456)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663776
Eladawy HLe Goues CBrun YRoychoudhury APaiva AAbreu RStorey M(2024)Automated Program Repair, What Is It Good For? Not Absolutely Nothing!Proceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639095(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639095
Zhou XXu BKim KHan DNguyen HLe-Cong THe JLe BLo D(2024)Leveraging Large Language Model for Automatic Patch Correctness AssessmentIEEE Transactions on Software Engineering10.1109/TSE.2024.345225250:11(2865-2883)Online publication date: Nov-2024
https://doi.org/10.1109/TSE.2024.3452252
Zhang QFang CSun WLiu YHe THao XChen Z(2024)APPT: Boosting Automated Patch Correctness Prediction via Fine-Tuning Pre-Trained ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.335496950:3(474-494)Online publication date: Mar-2024
https://doi.org/10.1109/TSE.2024.3354969
Molina FCopia JGorla A(2024)Improving Patch Correctness Analysis via Random Testing and Large Language Models2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00036(317-328)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00036
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten