Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3106237.3106274acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Better test cases for better automated program repair

Published: 21 August 2017 Publication History

Abstract

Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad).

References

[1]
2016. American Fuzzy Lop. (2016). http://lcamtuf.coredump.cx/afl/. 2016.
[2]
gcov—a Test Coverage Program. (2016).
[3]
https://gcc.gnu.org/onlinedocs/gcc/Gcov.html. 2016. Valgrind. (2016). http://valgrind.org/.
[4]
Paul Ammann and Jeff Offutt. 2008.
[5]
Introduction to Software Testing (1 ed.). Cambridge University Press, New York, NY, USA.
[6]
Earl T. Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, and Federica Sarro. 2014. The Plastic Surgery Hypothesis. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 306–317.
[7]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI 2008). USENIX Association, Berkeley, CA, USA, 209–224.
[8]
Christoph Csallner and Yannis Smaragdakis. 2004.
[9]
JCrasher: An automatic robustness tester for Java. Software—Practice & Experience 34, 11 (Sept. 2004), 1025–1050.
[10]
Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (FSE 2011). ACM, 416–419.
[11]
Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic Test Suite Generation for Object-oriented Software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE 2011). ACM, New York, NY, USA, 416–419.
[12]
Juan Pablo Galeotti, Gordon Fraser, and Andrea Arcuri. 2014.
[13]
Extending a Search-Based Test Generator with Adaptive Dynamic Symbolic Execution (Tool paper). In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA 2014). ACM, ACM, New York, NY, USA, 421–424.
[14]
S Hocevar. 2011. zzuf—multi-purpose fuzzer. (2011).
[15]
Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, 435–445.
[16]
René Just, Darioush Jalali, Laura Inozemtseva, Michael D. Ernst, Reid Holmes, and Gordon Fraser. 2014. Are Mutants a Valid Substitute for Real Faults in Software Testing?. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 654–665.
[17]
Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-written Patches. In Proceedings of the 2013 International Conference on Software Engineering (ICSE 2013). IEEE Press, Piscataway, NJ, USA, 802–811.
[18]
X. B. D. Le, D. Lo, and C. L. Goues. 2016.
[19]
History Driven Program Repair. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016), Vol. 1. 213–224.
[20]
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 2012 International Conference on Software Engineering (ICSE 2012). IEEE Press, Piscataway, NJ, USA, 3–13.
[21]
Xinyuan Liu, Muhan Zeng, Yingfei Xiong, Lu Zhang, and Gang Huang. 2017.
[22]
Identifying Patch Correctness in Test-Based Automatic Program Repair. CoRR abs/1706.09120 (2017).
[23]
Fan Long and Martin Rinard. 2015. Staged program repair with condition synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE (FSE 2015). 166–178.
[24]
Fan Long and Martin Rinard. 2016. An analysis of the search spaces for generate and validate patch generation systems. In Proceedings of the 38th International Conference on Software Engineering (ICSE 2016). ACM, 702–713.
[25]
Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2016). ACM, New York, NY, USA, 298–312.
[26]
Matias Martinez, Westley Weimer, and Martin Monperrus. 2014. Do the Fix Ingredients Already Exist? An Empirical Inquiry into the Redundancy Assumptions of Program Repair Approaches. In Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, USA, 492–495.
[27]
Richard McNally, Ken Yiu, Duncan Grove, and Damien Gerhardy. 2012.
[28]
Fuzzing: the state of the art. Technical Report. DTIC Document.
[29]
Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. Directfix: Looking for simple program repairs. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE 2015), Vol. 1. IEEE, 448–458.
[30]
Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th International Conference on Software Engineering. ACM, 691–701.
[31]
Barton P Miller, Louis Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilities. Commun. ACM 33, 12 (1990), 32–44.
[32]
Carlos Pacheco and Michael D Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN conference on Objectoriented programming systems and applications companion. ACM, 815–816.
[33]
Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007.
[34]
Feedback-Directed Random Test Generation. In Proceedings of the 29th International Conference on Software Engineering (ICSE 2007). IEEE Computer Society, Washington, DC, USA, 75–84.
[35]
Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). 254–265.
[36]
Zichao Qi, Fan Long, Sara Anchor, and Martin Rinard. An Analysis of Patch Plausibility and Correctness for Generate-And-Validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). 24–36.
[37]
Koushik Sen, Darko Marinov, and Gul Agha. 2005.
[38]
CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY, USA, 263–272.
[39]
Tan Shin Weiand, H Yoshida, Prasad M, and A. Roychoudhury. Anti-patterns in Search-based Program Repair. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). 727– 738.
[40]
Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015.
[41]
Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 532–543.
[42]
Michael Sutton. 2012. FileFuzz. (2012).
[43]
Shin Hwei Tan and Abhik Roychoudhury. 2015.
[44]
relifix: Automated Repair of Software Regressions. In Proceedings of the 2015 Internaltional Conference on Software Engineering (ICSE 2015). 471–482.
[45]
Westley Weimer, Zachary P Fry, and Stephen Forrest. 2013. Leveraging program equivalence for adaptive program repair: Models and first results. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 356–366.
[46]
Tao Xie. 2006.
[47]
Augmenting Automatically Generated Unit-test Suites with Regression Oracle Checking. In Proceedings of the 20th European Conference on Object-Oriented Programming (ECOOP 2006). Springer-Verlag, Berlin, Heidelberg, 380–403.
[48]
Qi Xin and Steven P. Reiss. 2017.
[49]
Identifying Test-Suite-Overfitted Patches through Test Case Generation. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA 2017).
[50]
J. Xuan, M. Martinez, F. DeMarco, M. Clement, S. Lamelas Marcote, T. Durieux, D. Le Berre, and M. Monperrus. 2016. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Transactions on Software Engineering PP, 99 (2016), 1–1.
[51]
Z. Yu, M. Martinez, B. Danglot, T. Durieux, and M. Monperrus. 2017. Test Case Generation for Program Repair: A Study of Feasibility and Effectiveness. ArXiv e-prints (March 2017). arXiv:cs.SE/1703.00198
[52]
Yucheng Zhang and Ali Mesbah. 2015.
[53]
Assertions Are Strongly Correlated with Test Suite Effectiveness. In Proceedings of the joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2015). ACM, 214–224. Abstract 1 Introduction 2 Background on Automated G&V Program Repair 3 Approach 3.1 Generating New Test Cases Using Fuzz Testing 3.2 Generating Memory-Safety Oracles 3.3 Measuring the Overfitness of a Patch Using an Overfitness Metric (O-measure) 3.4 An Optimized Setting of Opad 4 Evaluation 5 Threats to Validity 6 Related Work 7 Conclusions References

Cited By

View all
  • (2025)Structuring Semantic‐Aware Relations Between Bugs and Patches for Accurate Patch EvaluationJournal of Software: Evolution and Process10.1002/smr.7000137:2Online publication date: 2-Feb-2025
  • (2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
  • (2024)Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and OpportunitiesACM Computing Surveys10.1145/369645057:2(1-43)Online publication date: 10-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
August 2017
1073 pages
ISBN:9781450351058
DOI:10.1145/3106237
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Overfitting in automated program repair
  2. Patch validation
  3. Testing

Qualifiers

  • Research-article

Conference

ESEC/FSE'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)13
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Structuring Semantic‐Aware Relations Between Bugs and Patches for Accurate Patch EvaluationJournal of Software: Evolution and Process10.1002/smr.7000137:2Online publication date: 2-Feb-2025
  • (2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
  • (2024)Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and OpportunitiesACM Computing Surveys10.1145/369645057:2(1-43)Online publication date: 10-Oct-2024
  • (2024)B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible TestsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695536(1693-1705)Online publication date: 27-Oct-2024
  • (2024)Tests4Py: A Benchmark for System TestingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663798(557-561)Online publication date: 10-Jul-2024
  • (2024)The Patch Overfitting Problem in Automated Program Repair: Practical Magnitude and a Baseline for Realistic BenchmarkingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663776(452-456)Online publication date: 10-Jul-2024
  • (2024)Automated Program Repair, What Is It Good For? Not Absolutely Nothing!Proceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639095(1-13)Online publication date: 20-May-2024
  • (2024)Leveraging Large Language Model for Automatic Patch Correctness AssessmentIEEE Transactions on Software Engineering10.1109/TSE.2024.345225250:11(2865-2883)Online publication date: Nov-2024
  • (2024)APPT: Boosting Automated Patch Correctness Prediction via Fine-Tuning Pre-Trained ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.335496950:3(474-494)Online publication date: Mar-2024
  • (2024)Improving Patch Correctness Analysis via Random Testing and Large Language Models2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00036(317-328)Online publication date: 27-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media