Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

A structural model for contextual code changes

Published: 13 November 2020 Publication History

Abstract

We address the problem of predicting edit completions based on a learned model that was trained on past edits. Given a code snippet that is partially edited, our goal is to predict a completion of the edit for the rest of the snippet. We refer to this task as the EditCompletion task and present a novel approach for tackling it. The main idea is to directly represent structural edits. This allows us to model the likelihood of the edit itself, rather than learning the likelihood of the edited code. We represent an edit operation as a path in the program’s Abstract Syntax Tree (AST), originating from the source of the edit to the target of the edit. Using this representation, we present a powerful and lightweight neural model for the EditCompletion task.
We conduct a thorough evaluation, comparing our approach to a variety of representation and modeling approaches that are driven by multiple strong models such as LSTMs, Transformers, and neural CRFs. Our experiments show that our model achieves a 28% relative gain over state-of-the-art sequential models and 2× higher accuracy than syntactic models that learn to generate the edited code, as opposed to modeling the edits directly.
Our code, dataset, and trained models are publicly available at <a>https://github.com/tech-srl/c3po/</a> .

Supplementary Material

Auxiliary Presentation Video (oopsla20main-p467-p-video.mp4)
A video presentation for the paper "A Structural Model for Contextual Code Changes" by Shaked Brody, Uri Alon, and Eran Yahav.

References

[1]
Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 132-140. https://doi.org/10.18653/v1/ P17-2021 Miltiadis Allamanis. 2019. The Adverse Efects of Code Duplication in Machine Learning Models of Code. In Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting Accurate Method and Class Names.
[2]
In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (Bergamo, Italy) (ESEC/FSE 2015 ).
[3]
ACM, New York, NY, USA, 38-49. https://doi.org/10.1145/2786805.2786849 Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=BJOFETxRMiltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48 ), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 2091-2100.
[4]
Proc. ACM Program. Lang. 3, POPL, Article 40 ( Jan. 2019 ), 29 pages. https://doi.org/10.1145/3290353 Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 ( 2014 ).
[5]
William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4960-4964.
[6]
arXiv: 2003. 05620 [cs.SE] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997 ), 1735-1780.
[7]
https://doi.org/10.1162/neco. 1997. 9.8.1735 James W. Hunt and M. Douglas McIlroy. 1975. An algorithm for diferential file comparison.
[8]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 2073-2083. https://doi.org/10.18653/v1/ P16-1195 Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.
[9]
6980 cite arxiv: 1412.6980Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015.
[10]
Cristina V Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. Proceedings of the ACM on Programming Languages 1, OOPSLA ( 2017 ), 1-28.
[11]
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Efective Approaches to Attention-based Neural Machine Translation. CoRR abs/1508.04025 ( 2015 ). arXiv: 1508.04025 http://arxiv.org/abs/1508.04025 Xuezhe Ma and Eduard Hovy. 2016. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1064-1074.
[12]
Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, and Aliaksei Severyn. 2019. Encode, Tag, Realize: HighPrecision Text Editing. In EMNLP-IJCNLP.
[13]
Ali Mesbah, Andrew Rice, Emily Johnston, Nick Glorioso, and Edward Aftandilian. 2019. DeepDelta: learning to repair compilation errors. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 925-936.
[14]
Reuven Rubinstein. 1999. The cross-entropy method for combinatorial and continuous optimization. Methodology and Computing in Applied Probability 1, 2 ( 1999 ), 127-190.
[15]
Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998-6008.
[16]
http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. arXiv:1506. 03134 [stat.ML] Pengcheng Yin, Graham Neubig, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. 2019. Learning to Represent Edits. In International Conference on Learning Representations. https://openreview.net/forum?id=BJl6AjC5F7

Cited By

View all
  • (2024)A Pilot Study in Surveying Data Challenges of Automatic Software Engineering TasksProceedings of the 4th International Workshop on Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things10.1145/3663530.3665020(6-11)Online publication date: 15-Jul-2024
  • (2024)CodePlan: Repository-Level Coding using LLMs and PlanningProceedings of the ACM on Software Engineering10.1145/36437571:FSE(675-698)Online publication date: 12-Jul-2024
  • (2024)GPT4D: Automatic Cross-Version Linux Driver Upgrade ToolkitMachine Learning and Intelligent Communication10.1007/978-3-031-71716-1_11(132-141)Online publication date: 20-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
November 2020
3108 pages
EISSN:2475-1421
DOI:10.1145/3436718
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020
Published in PACMPL Volume 4, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Edit Completions
  2. Machine Learning
  3. Neural Models of Code

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)300
  • Downloads (Last 6 weeks)37
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Pilot Study in Surveying Data Challenges of Automatic Software Engineering TasksProceedings of the 4th International Workshop on Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things10.1145/3663530.3665020(6-11)Online publication date: 15-Jul-2024
  • (2024)CodePlan: Repository-Level Coding using LLMs and PlanningProceedings of the ACM on Software Engineering10.1145/36437571:FSE(675-698)Online publication date: 12-Jul-2024
  • (2024)GPT4D: Automatic Cross-Version Linux Driver Upgrade ToolkitMachine Learning and Intelligent Communication10.1007/978-3-031-71716-1_11(132-141)Online publication date: 20-Sep-2024
  • (2023)Grace: Language Models Meet Code EditsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616253(1483-1495)Online publication date: 30-Nov-2023
  • (2023)Learning the Relation Between Code Features and Code Transforms With Structured PredictionIEEE Transactions on Software Engineering10.1109/TSE.2023.327538049:7(3872-3900)Online publication date: Jul-2023
  • (2023)Slice-Based Code Change Representation Learning2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00038(319-330)Online publication date: Mar-2023
  • (2023)KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program RepairProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00111(1251-1263)Online publication date: 14-May-2023
  • (2023)When to Say What: Learning to Find Condition-Message Inconsistencies2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00081(868-880)Online publication date: May-2023
  • (2023)CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00014(17-29)Online publication date: May-2023
  • (2023)Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00103(585-597)Online publication date: 11-Sep-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media