research-article

Learning Quick Fixes from Code Repositories

Authors:

Reudismam Sousa,

Gustavo Soares,

Loris D'AntoniAuthors Info & Claims

SBES '21: Proceedings of the XXXV Brazilian Symposium on Software Engineering

Pages 74 - 83

https://doi.org/10.1145/3474624.3474650

Published: 05 October 2021 Publication History

Abstract

Code analyzers such as Error Prone and FindBugs detect code patterns symptomatic of bugs, performance issues, or bad style. These tools express patterns as quick fixes that detect and rewrite unwanted code. However, it is difficult to come up with new quick fixes and decide which ones are useful and frequently appear in real code. We propose to rely on the collective wisdom of programmers and learn quick fixes from revision histories in software repositories. We present Revisar, a tool for discovering common Java edit patterns in code repositories. Given code repositories and their revision histories, Revisar (i) identifies code edits from revisions and (ii) clusters edits into sets that can be described using an edit pattern. The designers of code analyzers can then inspect the patterns and add the corresponding quick fixes to their tools. We ran Revisar on nine popular GitHub projects, and it discovered 89 useful edit patterns that appeared in 3 or more projects. Moreover, 64% of the discovered patterns did not appear in existing tools. We then conducted a survey with 164 programmers from 124 projects and found that programmers significantly preferred eight out of the nine of the discovered patterns. Finally, we submitted 16 pull requests applying our patterns to 9 projects and, at the time of the writing, programmers accepted 7 (63.6%) of them. The results of this work aid toolsmiths in discovering quick fixes and making informed decisions about which quick fixes to prioritize based on patterns programmers actually apply in practice.

References

[1]

J. Andersen and J. L. Lawall. 2008. Generic Patch Inference. In Proceedings of the 23rd International Conference on Automated Software Engineering (L’Aquila, AQ, Italy) (ASE ’08). IEEE Computer Society, Washington, DC, USA, 337–346.

Digital Library

[2]

A. Baumgartner and T. Kutsia. 2017. Unranked second-order anti-unification. Information and Computation 255, 2 (2017), 262 – 286.

Digital Library

[3]

A. Baumgartner, T. Kutsia, J. Levy, and M. Villaret. 2017. Higher-Order Pattern Anti-Unification in Linear Time. Journal of Automated Reasoning 58, 1 (2017), 293–310.

[4]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 1(1995), 289–300.

[5]

David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas Reps. 2017. The Care and Feeding of Wild-caught Mutants. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (Paderborn, Germany) (ESEC/FSE ’17). ACM, New York, NY, USA, 511–522.

Digital Library

[6]

Peter Bulychev and Marius Minea. 2009. An evaluation of duplicate code detection using anti-unification. In Proceedings of the 3rd International Workshop On Software Clones (Kaiserslautern, Germany) (IWSC ’09). Fraunhofer IESE, Kaiserslautern, Germany, 1–6.

[7]

J. Campbell, C. Quincy, J. Osserman, and O. Pedersen. 2013. Coding in-depth semistructured interviews. Sociological Methods & Research 42, 3 (2013), 294–320.

[8]

Leandro Ungari Cayres, Bruno Santos de Lima, and Rogério Eduardo García. 2019. Learning and Suggesting Source Code Changes from Version History: A Systematic Review.arXiv: Software Engineering(2019), 1–15.

[9]

Loris D’Antoni, Rishabh Singh, and Michael Vaughn. 2017. NoFAQ: Synthesizing Command Repairs from Examples. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering (Paderborn, Germany) (ESEC/FSE’17). ACM, New York, NY, USA, 582–592.

Digital Library

[10]

DZone. 2021. FileInputStream / FileOutputStream Considered Harmful. At https://dzone.com/articles/fileinputstream-fileoutputstream-considered-harmful. Accessed in 2021, July 14.

[11]

W. S. Evans, C. W. Fraser, and F. Ma. 2007. Clone Detection via Structural Abstraction. In Proceedings of 14th Working Conference on Reverse Engineering (Vancouver, BC, Canada) (WCRE ’07). IEEE, Piscataway, NJ, USA, 150–159.

[12]

J. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus. 2014. Fine-grained and Accurate Source Code Differencing. In Proceedings of the 29th International Conference on Automated Software Engineering (Vasteras, Sweden) (ASE ’14). ACM, New York, NY, USA, 313–324.

[13]

Mark Gabel and Zhendong Su. 2010. A Study of the Uniqueness of Source Code. In Proceedings of the 18th International Symposium on Foundations of Software Engineering (Santa Fe, USA) (FSE ’10). ACM, New York, NY, USA, 147–156.

Digital Library

[14]

Google. 2012. Guava. At https://docs.oracle.com/javase/9/docs/api/java/lang/Float.html. Accessed in 2021, July 14.

[15]

Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (Zurich, Switzerland) (ICSE ’12). IEEE, Piscataway, NJ, USA, 837–847.

[16]

A. Hora, N. Anquetil, S. Ducasse, and S. Allier. 2012. Domain specific warnings: Are they any better?. In Proceedings of the 28th International Conference on Software Maintenance (Trento, Italy) ((ICSM’ 12). IEEE, Piscataway, USA, 441–450.

[17]

André Hora, Nicolas Anquetil, Anne Etien, Stéphane Ducasse, and Marco Túlio Valente. 2015. Automatic detection of system-specific conventions unknown to developers. Journal of Systems and Software 109, 1 (2015), 192–204.

Digital Library

[18]

M. Janke and P. Mader. 2020. Graph Based Mining of Code Change Patterns from Version Control Commits. Transactions on Software Engineering 1, 1 (2020), 1–16.

[19]

Java JDK. 2021. JDK Bug System. At https://bugs.openjdk.java.net/browse/JDK-8187325. Accessed in 2021, July 14.

[20]

D. Jeffrey, M. Feng, N. Gupta, and R. Gupta. 2009. BugFix: A learning-based tool to assist developers in fixing bugs. In Proceedings of the 17th International Conference on Program Comprehension(ICPC ’09). IEEE, Piscataway, USA, 70–79.

[21]

M. Kessentini, W. Kessentini, H. Sahraoui, M. Boukadoum, and A. Ouni. 2011. Design Defects Detection and Correction by Example. In Proceedings of the 19th International Conference on Program Comprehension(ICPC ’11). IEEE, Piscataway, NJ, USA, 81–90.

[22]

Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-written Patches. In Proceedings of the 35th International Conference on Software Engineering (San Francisco, USA) (ICSE ’13). IEEE, Piscataway, NJ, USA, 802–811.

[23]

A. Koyuncu, K. Liu, T. Bissyandé, D. Kim, J. Klein, M. Monperrus, and Y. Le Traon. 2020. FixMiner: Mining relevant fix patterns for automated program repair. Empirical Software Engineering 25, 3 (2020), 1980–2024.

Digital Library

[24]

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174.

[25]

Daoyuan Li, Li Li, Dongsun Kim, Tegawendé F. Bissyandé, David Lo, and Yves Le Traon. 2019. Watch out for this commit! A study of influential software changes. Journal of Software: Evolution and Process 31, 12 (2019), 1–25.

Digital Library

[26]

C. Liu, J. Yang, L. Tan, and M. Hafiz. 2013. R2Fix: Automatically Generating Bug Fixes from Bug Reports. In Proceedings of the 6th International Conference on Software Testing, Verification and Validation(Neumunster Abbey, Luxembourg, French) (ICST ’13). IEEE, Piscataway, NJ, USA, 282–291.

[27]

K. Liu, D. Kim, T. Bissyande, S. Yoo, and Y. Traon. 2018. Mining Fix Patterns for FindBugs Violations. Transactions on Software Engineering 47, 1 (2018), 165–188.

Digital Library

[28]

K. Liu, A. Koyuncu, D. Kim, and T. Bissyandé. 2019. TBar: Revisiting Template-Based Automated Program Repair. In Proceedings of the 28th International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA’ 19). ACM, New York, NY, USA, 31–42.

[29]

X. Liu and H. Zhong. 2018. Mining stackoverflow for program repair. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering (Campobasso, Italy) (SANER ’18). IEEE, Piscataway, USA, 118–129.

[30]

Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. Proceedings of the 43rd Symposium on Principles of Programming Languages 51, 1 (2016), 298–312.

Digital Library

[31]

Diego Marcilio, Carlo A. Furia, Rodrigo Bonifácio, and Gustavo Pinto. 2020. SpongeBugs: Automatically generating fix suggestions in response to static code analysis warnings. Journal of Systems and Software 168, 1 (2020), 1–20.

[32]

Na Meng, Lisa Hua, Miryung Kim, and Kathryn S. McKinley. 2015. Does Automated Refactoring Obviate Systematic Editing?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (Florence, Italy) (ICSE ’15). IEEE, Piscataway, NJ, USA, 392–402.

[33]

Na Meng, Miryung Kim, and Kathryn S. McKinley. 2013. LASE: Locating and Applying Systematic Edits by Learning from Examples. In Proceedings of the 35th International Conference on Software Engineering (San Francisco, USA) (ICSE ’13). IEEE, Piscataway, NJ, USA, 502–511.

[34]

Stas Negara, Mihai Codoban, Danny Dig, and Ralph E. Johnson. 2014. Mining Fine-grained Code Changes to Detect Unknown Change Patterns. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE’ 14). ACM, New York, NY, USA, 803–813.

Digital Library

[35]

Hoan Anh Nguyen, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, and H. Rajan. 2013. A study of repetitiveness of code changes in software evolution. In Proceedings of the 28th International Conference on Automated Software Engineering (Silicon Valley, USA) (ASE ’13). IEEE, Piscataway, NJ, USA, 180–190.

[36]

Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar Al-Kofahi, and Tien N. Nguyen. 2010. Recurring Bug Fixes in Object-oriented Programs. In Proceedings of the 32nd International Conference on Software Engineering (Cape Town, South Africa) (ICSE ’10). ACM, New York, NY, USA, 315–324.

Digital Library

[37]

Oracle. 2021. ConcurrentSkipListSet. At https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentSkipListSet.html. Accessed in 2021, July 14.

[38]

Oracle. 2021. Raw Types. At https://docs.oracle.com/javase/tutorial/java/generics/rawTypes.html. Accessed in 2021, July 14.

[39]

Oracle. 2021. StringBuilder. At https://docs.oracle.com/javase/8/docs/api/java/lang/StringBuilder.html. Accessed in 2021, July 14.

[40]

Oracle. 2021. Type Inference for Generic Instance Creation. At https://docs.oracle.com/javase/7/docs/technotes/guides/language/type-inference-generic-instance-creation.html. Accessed in 2021, July 14.

[41]

Java Practices. 2021. Use final liberally. At http://www.javapractices.com/topic/TopicAction.do?Id=23. Accessed in 2021, July 14.

[42]

L. Renggli, S. Ducasse, T. Gîrba, and O. Nierstrasz. 2010. Domain-Specific Program Checking. In Proceedings of the 48th International Conference Objects, Models, Components, Patterns (Málaga, Spain) (TOOLS’ 10), J. Vitek (Ed.). Springer, Berlin, Heidelberg, 213–232.

[43]

R. Rolim, G. Soares, L. D’Antoni, O. Polozov, S. Gulwani, R. Gheyi, R. Suzuki, and B. Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (Buenos Aires, Argentina) (ICSE ’17). IEEE, Piscataway, NJ, USA, 404–415.

[44]

G. Sakkas, M. Endres, B. Cosman, W. Weimer, and R. Jhala. 2020. Type Error Feedback via Analytic Program Repair. In Proceedings of the 41st Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020). ACM, New York, NY, USA, 16–30.

[45]

Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente. 2016. Why We Refactor? Confessions of GitHub Contributors. In Proceedings of the 24th International Symposium on Foundations of Software Engineering (Seattle, WA, USA) (FSE 2016). Association for Computing Machinery, New York, NY, USA, 858–870.

Digital Library

[46]

Victor Sobreira, Thomas Durieux, Fernanda Madeiral, Martin Monperrus, and Marcelo A. Maia. 2018. Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering(SANER ’18). IEEE, Piscataway, USA, 130–140.

[47]

Y. Ueda, T. Ishio, A. Ihara, and K. Matsumoto. 2020. DevReplay: Automatic Repair with Editable Fix Pattern. ArXiv: Software Engineering(2020), 1–15.

[48]

Louis Wasserman. 2013. Scalable, Example-based Refactorings with Refaster. In Proceedings of the 6th Workshop on Refactoring Tools (Indianapolis, USA) (WRT ’13). ACM, New York, NY, USA, 25–28.

Digital Library

[49]

Pengcheng Yin, Graham Neubig, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. 2019. Learning to Represent Edits. arXiv: Software Engineering(2019), 1–22.

Cited By

D’Antoni LDing SGoel ARamesh MRungta NSung C(2024)Automatically Reducing Privilege for Access Control PoliciesProceedings of the ACM on Programming Languages10.1145/36897388:OOPSLA2(763-790)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689738
Cerna DBuran M(2024)One or Nothing: Anti-unification over the Simply-Typed Lambda CalculusACM Transactions on Computational Logic10.1145/365479825:3(1-12)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3654798
Bairi RSonwane AKanade AC. VIyer AParthasarathy SRajamani SAshok BShet S(2024)CodePlan: Repository-Level Coding using LLMs and PlanningProceedings of the ACM on Software Engineering10.1145/36437571:FSE(675-698)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643757
Show More Cited By

Recommendations

Memories of bug fixes
SIGSOFT '06/FSE-14: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering

The change history of a software project contains a rich collection of code changes that record previous development experience. Changes that fix bugs are especially interesting, since they record both the old buggy code and the new fixed code. This ...
Debugging optimized code without being misled
Automatic detection of Feature Envy and Data Class code smells using machine learning
Abstract
Code smells in software indicate poor design and implementation choices. Detecting and removing them is critical for sustainable software development. Machine learning (ML) can automate code smell detection. Most ML solutions train models from ...
Highlights
- We detect Feature Envy and Data Class code smells using pre-trained code embeddings.
- We compare handcrafted code metrics with automatically inferred code embeddings.
- We test the performance of smell detectors on the large manually ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SBES '21: Proceedings of the XXXV Brazilian Symposium on Software Engineering

September 2021

473 pages

ISBN:9781450390613

DOI:10.1145/3474624

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SBES '21

SBES '21: Brazilian Symposium on Software Engineering

September 27 - October 1, 2021

Joinville, Brazil

Acceptance Rates

Overall Acceptance Rate 147 of 427 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
153
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)4

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

D’Antoni LDing SGoel ARamesh MRungta NSung C(2024)Automatically Reducing Privilege for Access Control PoliciesProceedings of the ACM on Programming Languages10.1145/36897388:OOPSLA2(763-790)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689738
Cerna DBuran M(2024)One or Nothing: Anti-unification over the Simply-Typed Lambda CalculusACM Transactions on Computational Logic10.1145/365479825:3(1-12)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3654798
Bairi RSonwane AKanade AC. VIyer AParthasarathy SRajamani SAshok BShet S(2024)CodePlan: Repository-Level Coding using LLMs and PlanningProceedings of the ACM on Software Engineering10.1145/36437571:FSE(675-698)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643757
Dilhara MBellur ABryksin TDig D(2024)Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by ExampleProceedings of the ACM on Software Engineering10.1145/36437551:FSE(631-653)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643755
Ayala-Rincón MCerna DBarragán AKutsia T(2024)Equational Anti-unification over Absorption TheoriesAutomated Reasoning10.1007/978-3-031-63501-4_17(317-337)Online publication date: 2-Jul-2024
https://doi.org/10.1007/978-3-031-63501-4_17
Karatzas MDiamantopoulos TSymeonidis A(2024)Extracting Fix Patterns for Static Analysis Violations Based on Collective Developer KnowledgeSoftware: Practice and Experience10.1002/spe.3384Online publication date: 24-Oct-2024
https://doi.org/10.1002/spe.3384
Cerna DKutsia TElkind E(2023)Anti-unification and generalizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/736(6563-6573)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/736
Grazia LBredl PPradel M(2023)DiffSearch: A Scalable and Precise Search Engine for Code ChangesIEEE Transactions on Software Engineering10.1109/TSE.2022.321885949:4(2366-2380)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TSE.2022.3218859
Zhang YBajpai YGupta PKetkar AAllamanis MBarik TGulwani SRadhakrishna ARaza MSoares GTiwari A(2022)Overwatch: learning patterns in code edit sequencesProceedings of the ACM on Programming Languages10.1145/35633026:OOPSLA2(395-423)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563302
Cerqueira LSilva JAlvim ÍMendonça MSantos J(2022)The who, what and how of the current research at the Brazilian Symposium on Software EngineeringProceedings of the XXXVI Brazilian Symposium on Software Engineering10.1145/3555228.3555241(11-20)Online publication date: 5-Oct-2022
https://dl.acm.org/doi/10.1145/3555228.3555241

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten