research-article

AdaptivePaste: Intelligent Copy-Paste in IDE

Authors:

Neel Sundaresan,

Miltiadis Allamanis,

Alexey SvyatkovskiyAuthors Info & Claims

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1844 - 1854

https://doi.org/10.1145/3611643.3613895

Published: 30 November 2023 Publication History

Abstract

In software development, it is common for programmers to copy-paste or port code snippets and then adapt them to their use case. This scenario motivates the code adaptation task – a variant of program repair which aims to adapt variable identifiers in a pasted snippet of code to the surrounding, preexisting context. However, no existing approach has been shown to effectively address this task. In this paper, we introduce AdaptivePaste, a learning-based approach to source code adaptation, based on transformers and a dedicated dataflow-aware deobfuscation pre-training task to learn meaningful representations of variable usage patterns. We demonstrate that AdaptivePaste can learn to adapt Python source code snippets with 67.8% exact match accuracy. We study the impact of confidence thresholds on the model predictions, showing the model precision can be further improved to 85.9% with our parallel-decoder transformer model in a selective code adaptation setting. To assess the practical use of AdaptivePaste we perform a user study among Python software developers on real-world copy-paste instances. The results show that AdaptivePaste reduces dwell time to nearly half the time it takes to port code manually, and helps to avoid bugs. In addition, we utilize the participant feedback to identify potential avenues for improvement.

References

[1]

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 281–293. https://doi.org/10.1145/2635868.2635883

Digital Library

[2]

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 38–49. https://doi.org/10.1145/2786805.2786849

Digital Library

[3]

Miltiadis Allamanis and Marc Brockschmidt. 2017. SmartPaste: Learning to adapt source code. arXiv preprint arXiv:1705.07867, https://doi.org/10.48550/arXiv.1705.07867

[4]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1711.00740

[5]

Miltiadis Allamanis, Henry Jackson-Flux, and Marc Brockschmidt. 2021. Self-Supervised Bug Detection and Repair. Advances in Neural Information Processing Systems, 34 (2021), https://doi.org/10.48550/arXiv.2105.12787

[6]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices, 53, 4 (2018), 404–419. https://doi.org/10.48550/arXiv.1803.09544

Digital Library

[7]

Earl T Barr, Mark Harman, Yue Jia, Alexandru Marginean, and Justyna Petke. 2015. Automated software transplantation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. 257–269. https://doi.org/10.1145/2771783.2771796

Digital Library

[8]

Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. arXiv preprint arXiv:1809.05193, https://doi.org/10.48550/arXiv.1809.05193

[9]

Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2009. Relating identifier naming flaws and code quality: An empirical study. In 2009 16th Working Conference on Reverse Engineering. 31–35. https://doi.org/10.1109/WCRE.2009.50

Digital Library

[10]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, https://doi.org/10.48550/arXiv.2107.03374

[11]

Colin Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: multi-mode translation of natural language and Python code with transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9052–9065. https://doi.org/10.48550/arXiv.2010.03150

[12]

Colin Clement, Shuai Lu, Xiaoyu Liu, Michele Tufano, Dawn Drain, Nan Duan, Neel Sundaresan, and Alexey Svyatkovskiy. 2021. Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 4713–4722. https://doi.org/10.48550/arXiv.2109.08780

[13]

Yaniv David, Uri Alon, and Eran Yahav. 2020. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–28. https://doi.org/10.1145/3428293

Digital Library

[14]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, https://doi.org/10.48550/arXiv.1810.04805

[15]

Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. 2020. Hoppity: Learning graph transformations to detect and fix bugs in programs. In International Conference on Learning Representations (ICLR).

[16]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139

[17]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, and Shengyu Fu. 2020. GraphCodeBERT: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366, https://doi.org/10.48550/arXiv.2009.08366

[18]

Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, and Miltiadis Allamanis. 2022. Learning to Complete Code with Sketches. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2106.10158

[19]

Vincent J Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2019. Global relational models of source code. In International conference on learning representations.

[20]

Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample. 2021. DOBF: A Deobfuscation Pre-Training Objective for Programming Languages. Advances in Neural Information Processing Systems, 34 (2021), https://doi.org/10.48550/arXiv.2102.07492

[21]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, https://doi.org/10.48550/arXiv.1910.13461

[22]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 arxiv:1907.11692.

[23]

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). https://doi.org/10.48550/arXiv.2102.04664

[24]

Dan Popper and David Gibson. 2021. How often do people actually copy and paste from Stack Overflow? Now we know. https://stackoverflow.blog/2021/12/30/how-often-do-people-actually-copy-and-paste-from-stack-overflow-now-we-know/

[25]

Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–25. https://doi.org/10.1145/3276517

Digital Library

[26]

Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–16. https://doi.org/10.48550/arXiv.1910.02054

[27]

Baishakhi Ray and Miryung Kim. 2012. A Case Study of Cross-System Porting in Forked Projects. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12). Association for Computing Machinery, New York, NY, USA. Article 53, 11 pages. isbn:9781450316149 https://doi.org/10.1145/2393596.2393659

Digital Library

[28]

Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. 2013. Detecting and characterizing semantic inconsistencies in ported code. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 367–377. https://doi.org/10.1109/ASE.2013.6693095

Digital Library

[29]

Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from" big code". ACM SIGPLAN Notices, 50, 1 (2015), 111–124. https://doi.org/10.1145/2676726.2677009

Digital Library

[30]

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 1157–1168. https://doi.org/10.1145/2884781.2884877

Digital Library

[31]

Armando Solar-Lezama. 2008. Program synthesis by sketching. University of California, Berkeley.

Digital Library

[32]

Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443. https://doi.org/10.1145/3368089.3417058

Digital Library

[33]

Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. Generating accurate assert statements for unit test cases using pretrained transformers. arXiv preprint arXiv:2009.05634, https://doi.org/10.1145/3524481.3527220

Digital Library

[34]

Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh Singh. 2019. Neural program repair by jointly learning to localize and repair. arXiv preprint arXiv:1904.01720, https://doi.org//10.48550/arXiv.1904.01720

[35]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008. https://doi.org/10.48550/arXiv.1706.03762

[36]

Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. https://doi.org/10.48550/arXiv.2109.00859 arxiv:2109.00859.

Cited By

Ragkhitwetsagul CKrinke JChoetkiertikul MSunetnanta TSarro F(2024)Adoption of automated software engineering tools and techniques in ThailandEmpirical Software Engineering10.1007/s10664-024-10472-629:4Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1007/s10664-024-10472-6

Index Terms

AdaptivePaste: Intelligent Copy-Paste in IDE
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Error handling and recovery
  2. Software notations and tools
    1. Formal language definitions
      1. Syntax

Recommendations

Cleaning up copy---paste clones with interactive merging

Copy-paste-modify is a form of software reuse in which developers explicitly duplicate source code. This duplicated source code, amounting to a code clone, is adapted for a new purpose. Copy-paste-modify is popular among software developers, however, ...
Shaping program repair space with existing patches and similar code
ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Automated program repair (APR) has great potential to reduce bug-fixing effort and many approaches have been proposed in recent years. APRs are often treated as a search problem where the search space consists of all the possible patches and the goal is ...
Copy and paste redeemed
ASE '15: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering

Modern software development relies on code reuse, which software engineers typically realise through hand-written abstractions, such as functions, methods, or classes. However, such abstractions can be challenging to develop and maintain. One ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2023

2215 pages

ISBN:9798400703270

DOI:10.1145/3611643

General Chair:
Satish Chandra
Google, USA
,
Program Chairs:
Kelly Blincoe
University of Auckland, New Zealand
,
Paolo Tonella
USI Lugano, Switzerland

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '23

Sponsor:

SIGSOFT

ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

December 3 - 9, 2023

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
187
Total Downloads

Downloads (Last 12 months)186
Downloads (Last 6 weeks)11

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ragkhitwetsagul CKrinke JChoetkiertikul MSunetnanta TSarro F(2024)Adoption of automated software engineering tools and techniques in ThailandEmpirical Software Engineering10.1007/s10664-024-10472-629:4Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1007/s10664-024-10472-6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents