research-article

Open access

A structural model for contextual code changes

Authors:

Eran YahavAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 4, Issue OOPSLA

Article No.: 215, Pages 1 - 28

https://doi.org/10.1145/3428283

Published: 13 November 2020 Publication History

Abstract

We address the problem of predicting edit completions based on a learned model that was trained on past edits. Given a code snippet that is partially edited, our goal is to predict a completion of the edit for the rest of the snippet. We refer to this task as the EditCompletion task and present a novel approach for tackling it. The main idea is to directly represent structural edits. This allows us to model the likelihood of the edit itself, rather than learning the likelihood of the edited code. We represent an edit operation as a path in the program’s Abstract Syntax Tree (AST), originating from the source of the edit to the target of the edit. Using this representation, we present a powerful and lightweight neural model for the EditCompletion task.

We conduct a thorough evaluation, comparing our approach to a variety of representation and modeling approaches that are driven by multiple strong models such as LSTMs, Transformers, and neural CRFs. Our experiments show that our model achieves a 28% relative gain over state-of-the-art sequential models and 2× higher accuracy than syntactic models that learn to generate the edited code, as opposed to modeling the edits directly.

Our code, dataset, and trained models are publicly available at <a>https://github.com/tech-srl/c3po/</a> .

Supplementary Material

Auxiliary Presentation Video (oopsla20main-p467-p-video.mp4)

A video presentation for the paper "A Structural Model for Contextual Code Changes" by Shaked Brody, Uri Alon, and Eran Yahav.

Download
124.31 MB

References

[1]

Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 132-140. https://doi.org/10.18653/v1/ P17-2021 Miltiadis Allamanis. 2019. The Adverse Efects of Code Duplication in Machine Learning Models of Code. In Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting Accurate Method and Class Names.

[2]

In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (Bergamo, Italy) (ESEC/FSE 2015 ).

[3]

ACM, New York, NY, USA, 38-49. https://doi.org/10.1145/2786805.2786849 Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=BJOFETxRMiltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48 ), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 2091-2100.

Digital Library

[4]

Proc. ACM Program. Lang. 3, POPL, Article 40 ( Jan. 2019 ), 29 pages. https://doi.org/10.1145/3290353 Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 ( 2014 ).

Digital Library

[5]

William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4960-4964.

Digital Library

[6]

arXiv: 2003. 05620 [cs.SE] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997 ), 1735-1780.

[7]

https://doi.org/10.1162/neco. 1997. 9.8.1735 James W. Hunt and M. Douglas McIlroy. 1975. An algorithm for diferential file comparison.

Digital Library

[8]

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 2073-2083. https://doi.org/10.18653/v1/ P16-1195 Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.

[9]

6980 cite arxiv: 1412.6980Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015.

[10]

Cristina V Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. Proceedings of the ACM on Programming Languages 1, OOPSLA ( 2017 ), 1-28.

Digital Library

[11]

Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Efective Approaches to Attention-based Neural Machine Translation. CoRR abs/1508.04025 ( 2015 ). arXiv: 1508.04025 http://arxiv.org/abs/1508.04025 Xuezhe Ma and Eduard Hovy. 2016. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1064-1074.

[12]

Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, and Aliaksei Severyn. 2019. Encode, Tag, Realize: HighPrecision Text Editing. In EMNLP-IJCNLP.

[13]

Ali Mesbah, Andrew Rice, Emily Johnston, Nick Glorioso, and Edward Aftandilian. 2019. DeepDelta: learning to repair compilation errors. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 925-936.

Digital Library

[14]

Reuven Rubinstein. 1999. The cross-entropy method for combinatorial and continuous optimization. Methodology and Computing in Applied Probability 1, 2 ( 1999 ), 127-190.

[15]

Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998-6008.

[16]

http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. arXiv:1506. 03134 [stat.ML] Pengcheng Yin, Graham Neubig, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. 2019. Learning to Represent Edits. In International Conference on Learning Representations. https://openreview.net/forum?id=BJl6AjC5F7

Cited By

Dong LLu QZhu LMenzies TXu BKang HZhang J(2024)A Pilot Study in Surveying Data Challenges of Automatic Software Engineering TasksProceedings of the 4th International Workshop on Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things10.1145/3663530.3665020(6-11)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.1145/3663530.3665020
Bairi RSonwane AKanade AC. VIyer AParthasarathy SRajamani SAshok BShet S(2024)CodePlan: Repository-Level Coding using LLMs and PlanningProceedings of the ACM on Software Engineering10.1145/36437571:FSE(675-698)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643757
Yang BLi HCai D(2024)GPT4D: Automatic Cross-Version Linux Driver Upgrade ToolkitMachine Learning and Intelligent Communication10.1007/978-3-031-71716-1_11(132-141)Online publication date: 20-Sep-2024
https://doi.org/10.1007/978-3-031-71716-1_11
Show More Cited By

Index Terms

A structural model for contextual code changes
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages

Recommendations

Adversarial examples for models of code

Neural models of code have shown impressive results when performing tasks such as predicting method names and identifying certain kinds of bugs. We show that these models are vulnerable to adversarial examples, and introduce a novel approach for ...
Understanding and Detecting Harmful Code
SBES '20: Proceedings of the XXXIV Brazilian Symposium on Software Engineering

Code smells typically indicate poor design implementation and choices that may degrade software quality. Hence, they need to be carefully detected to avoid such poor design. In this context, some studies try to understand the impact of code smells on the ...
Is your code harmful too? Understanding harmful code through transfer learning
SBQS '23: Proceedings of the XXII Brazilian Symposium on Software Quality

Code smells are indicators of poor design implementation and decision-making that can potentially harm the quality of software. Therefore, detecting these smells is crucial to prevent such issues. Some studies aim to comprehend the impact of code smells ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 4, Issue OOPSLA

November 2020

3108 pages

EISSN:2475-1421

DOI:10.1145/3436718

Issue’s Table of Contents

Copyright © 2020 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020

Published in PACMPL Volume 4, Issue OOPSLA

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
1,426
Total Downloads

Downloads (Last 12 months)300
Downloads (Last 6 weeks)37

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dong LLu QZhu LMenzies TXu BKang HZhang J(2024)A Pilot Study in Surveying Data Challenges of Automatic Software Engineering TasksProceedings of the 4th International Workshop on Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things10.1145/3663530.3665020(6-11)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.1145/3663530.3665020
Bairi RSonwane AKanade AC. VIyer AParthasarathy SRajamani SAshok BShet S(2024)CodePlan: Repository-Level Coding using LLMs and PlanningProceedings of the ACM on Software Engineering10.1145/36437571:FSE(675-698)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643757
Yang BLi HCai D(2024)GPT4D: Automatic Cross-Version Linux Driver Upgrade ToolkitMachine Learning and Intelligent Communication10.1007/978-3-031-71716-1_11(132-141)Online publication date: 20-Sep-2024
https://doi.org/10.1007/978-3-031-71716-1_11
Gupta PKhare ABajpai YChakraborty SGulwani SKanade ARadhakrishna ASoares GTiwari AChandra SBlincoe KTonella P(2023)Grace: Language Models Meet Code EditsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616253(1483-1495)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616253
Yu ZMartinez MChen ZBissyandé TMonperrus M(2023)Learning the Relation Between Code Features and Code Transforms With Structured PredictionIEEE Transactions on Software Engineering10.1109/TSE.2023.327538049:7(3872-3900)Online publication date: Jul-2023
https://doi.org/10.1109/TSE.2023.3275380
Zhang FChen BZhao YPeng X(2023)Slice-Based Code Change Representation Learning2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00038(319-330)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00038
Jiang NLutellier TLou YTan LGoldwasser DZhang XGrundy JPollock LPenta M(2023)KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program RepairProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00111(1251-1263)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00111
Bouzenia IPradel M(2023)When to Say What: Learning to Find Condition-Message Inconsistencies2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00081(868-880)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00081
Liu ZTang ZXia XYang X(2023)CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00014(17-29)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00014
Mastropaolo ADi Penta MBavota G(2023)Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00103(585-597)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00103
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents