Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3611643.3613895acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

AdaptivePaste: Intelligent Copy-Paste in IDE

Published: 30 November 2023 Publication History

Abstract

In software development, it is common for programmers to copy-paste or port code snippets and then adapt them to their use case. This scenario motivates the code adaptation task – a variant of program repair which aims to adapt variable identifiers in a pasted snippet of code to the surrounding, preexisting context. However, no existing approach has been shown to effectively address this task. In this paper, we introduce AdaptivePaste, a learning-based approach to source code adaptation, based on transformers and a dedicated dataflow-aware deobfuscation pre-training task to learn meaningful representations of variable usage patterns. We demonstrate that AdaptivePaste can learn to adapt Python source code snippets with 67.8% exact match accuracy. We study the impact of confidence thresholds on the model predictions, showing the model precision can be further improved to 85.9% with our parallel-decoder transformer model in a selective code adaptation setting. To assess the practical use of AdaptivePaste we perform a user study among Python software developers on real-world copy-paste instances. The results show that AdaptivePaste reduces dwell time to nearly half the time it takes to port code manually, and helps to avoid bugs. In addition, we utilize the participant feedback to identify potential avenues for improvement.

References

[1]
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 281–293. https://doi.org/10.1145/2635868.2635883
[2]
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 38–49. https://doi.org/10.1145/2786805.2786849
[3]
Miltiadis Allamanis and Marc Brockschmidt. 2017. SmartPaste: Learning to adapt source code. arXiv preprint arXiv:1705.07867, https://doi.org/10.48550/arXiv.1705.07867
[4]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1711.00740
[5]
Miltiadis Allamanis, Henry Jackson-Flux, and Marc Brockschmidt. 2021. Self-Supervised Bug Detection and Repair. Advances in Neural Information Processing Systems, 34 (2021), https://doi.org/10.48550/arXiv.2105.12787
[6]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices, 53, 4 (2018), 404–419. https://doi.org/10.48550/arXiv.1803.09544
[7]
Earl T Barr, Mark Harman, Yue Jia, Alexandru Marginean, and Justyna Petke. 2015. Automated software transplantation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. 257–269. https://doi.org/10.1145/2771783.2771796
[8]
Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. arXiv preprint arXiv:1809.05193, https://doi.org/10.48550/arXiv.1809.05193
[9]
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2009. Relating identifier naming flaws and code quality: An empirical study. In 2009 16th Working Conference on Reverse Engineering. 31–35. https://doi.org/10.1109/WCRE.2009.50
[10]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, https://doi.org/10.48550/arXiv.2107.03374
[11]
Colin Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: multi-mode translation of natural language and Python code with transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9052–9065. https://doi.org/10.48550/arXiv.2010.03150
[12]
Colin Clement, Shuai Lu, Xiaoyu Liu, Michele Tufano, Dawn Drain, Nan Duan, Neel Sundaresan, and Alexey Svyatkovskiy. 2021. Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 4713–4722. https://doi.org/10.48550/arXiv.2109.08780
[13]
Yaniv David, Uri Alon, and Eran Yahav. 2020. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–28. https://doi.org/10.1145/3428293
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, https://doi.org/10.48550/arXiv.1810.04805
[15]
Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. 2020. Hoppity: Learning graph transformations to detect and fix bugs in programs. In International Conference on Learning Representations (ICLR).
[16]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
[17]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, and Shengyu Fu. 2020. GraphCodeBERT: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366, https://doi.org/10.48550/arXiv.2009.08366
[18]
Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, and Miltiadis Allamanis. 2022. Learning to Complete Code with Sketches. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2106.10158
[19]
Vincent J Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2019. Global relational models of source code. In International conference on learning representations.
[20]
Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample. 2021. DOBF: A Deobfuscation Pre-Training Objective for Programming Languages. Advances in Neural Information Processing Systems, 34 (2021), https://doi.org/10.48550/arXiv.2102.07492
[21]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, https://doi.org/10.48550/arXiv.1910.13461
[22]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 arxiv:1907.11692.
[23]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). https://doi.org/10.48550/arXiv.2102.04664
[24]
Dan Popper and David Gibson. 2021. How often do people actually copy and paste from Stack Overflow? Now we know. https://stackoverflow.blog/2021/12/30/how-often-do-people-actually-copy-and-paste-from-stack-overflow-now-we-know/
[25]
Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–25. https://doi.org/10.1145/3276517
[26]
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–16. https://doi.org/10.48550/arXiv.1910.02054
[27]
Baishakhi Ray and Miryung Kim. 2012. A Case Study of Cross-System Porting in Forked Projects. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12). Association for Computing Machinery, New York, NY, USA. Article 53, 11 pages. isbn:9781450316149 https://doi.org/10.1145/2393596.2393659
[28]
Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. 2013. Detecting and characterizing semantic inconsistencies in ported code. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 367–377. https://doi.org/10.1109/ASE.2013.6693095
[29]
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from" big code". ACM SIGPLAN Notices, 50, 1 (2015), 111–124. https://doi.org/10.1145/2676726.2677009
[30]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 1157–1168. https://doi.org/10.1145/2884781.2884877
[31]
Armando Solar-Lezama. 2008. Program synthesis by sketching. University of California, Berkeley.
[32]
Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443. https://doi.org/10.1145/3368089.3417058
[33]
Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. Generating accurate assert statements for unit test cases using pretrained transformers. arXiv preprint arXiv:2009.05634, https://doi.org/10.1145/3524481.3527220
[34]
Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh Singh. 2019. Neural program repair by jointly learning to localize and repair. arXiv preprint arXiv:1904.01720, https://doi.org//10.48550/arXiv.1904.01720
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008. https://doi.org/10.48550/arXiv.1706.03762
[36]
Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. https://doi.org/10.48550/arXiv.2109.00859 arxiv:2109.00859.

Cited By

View all
  • (2024)Adoption of automated software engineering tools and techniques in ThailandEmpirical Software Engineering10.1007/s10664-024-10472-629:4Online publication date: 10-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2023
2215 pages
ISBN:9798400703270
DOI:10.1145/3611643
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code adaptation
  2. Machine learning

Qualifiers

  • Research-article

Conference

ESEC/FSE '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)186
  • Downloads (Last 6 weeks)11
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Adoption of automated software engineering tools and techniques in ThailandEmpirical Software Engineering10.1007/s10664-024-10472-629:4Online publication date: 10-Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media