Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICSE.2019.00021acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

On learning meaningful code changes via neural machine translation

Published: 25 May 2019 Publication History

Abstract

Recent years have seen the rise of Deep Learning (DL) techniques applied to source code. Researchers have exploited DL to automate several development and maintenance tasks, such as writing commit messages, generating comments and detecting vulnerabilities among others. One of the long lasting dreams of applying DL to source code is the possibility to automate non-trivial coding activities. While some steps in this direction have been taken (e.g., learning how to fix bugs), there is still a glaring lack of empirical evidence on the types of code changes that can be learned and automatically applied by DL.
Our goal is to make this first important step by quantitatively and qualitatively investigating the ability of a Neural Machine Translation (NMT) model to learn how to automatically apply code changes implemented by developers during pull requests. We train and experiment with the NMT model on a set of 236k pairs of code components before and after the implementation of the changes provided in the pull requests. We show that, when applied in a narrow enough context (i.e., small/medium-sized pairs of methods before/after the pull request changes), NMT can automatically replicate the changes implemented by developers during pull requests in up to 36% of the cases. Moreover, our qualitative analysis shows that the model is capable of learning and replicating a wide variety of meaningful code changes, especially refactorings and bug-fixing activities. Our results pave the way for novel research in the area of DL on code, such as the automatic learning and applications of refactoring.

References

[1]
"Android: Abstract Method. https://android-review.googlesource.com/c/platform/libcore/+/675863."
[2]
"Android: Add Catch Block. https://android-review.googlesource.com/c/platform/libcore/+/283122."
[3]
"Android: Add Final. https://android-review.googlesource.com/c/platform/libcore/+/321410/1/."
[4]
"Android: Added Null Check. https://android-review.googlesource.com/c/platform/frameworks/base/+/382232."
[5]
"Android: Broadening Visibility. https://android-review.googlesource.com/c/platform/tools/base/+/110627/6/."
[6]
"Android: Change Operand. https://android-review.googlesource.com/c/platform/frameworks/base/+/98463/2/."
[7]
"Android: Merging Catch Blocks. https://android-review.googlesource.com/c/platform/libcore/+/244295/4/."
[8]
"Android: Move Synchronization. https://android-review.googlesource.com/c/platform/libcore/+/40261/2/."
[9]
"Android: Narrow Catch Block. https://android-review.googlesource.com/c/platform/libcore/+/148551."
[10]
"Android: Remove Synchronized From Signature. https://android-review.googlesource.com/c/platform/frameworks/base/+/114871/2/."
[11]
"Android: Remove Synchronized. https://android-review.googlesource.com/c/platform/frameworks/base/+/143346."
[12]
"Android: Rename Parameter. https://android-review.googlesource.com/c/toolchain/jack/+/264513/2/."
[13]
"Android: Return Value. https://android-review.googlesource.com/c/platform/tools/base/+/155460/6/."
[14]
"Gerrit - Android. https://android-review.googlesource.com/ (last access: 18/08/2018)."
[15]
"Gerrit - Google Source. https://gerrit-review.googlesource.com/ (last access: 18/08/2018)."
[16]
"Gerrit - Ovirt. https://gerrit.ovirt.org/ (last access: 18/08/2018)."
[17]
"Gerrit. https://www.gerritcodereview.com (last access: 11/08/2018)."
[18]
"Google: Broader Generic Type. https://gerrit-review.googlesource.com/c/gerrit/+/127039."
[19]
"Google: Narrowing Visibility. https://gerrit-review.googlesource.com/c/gerrit/+/99660/4/."
[20]
"Google: Return Value. https://gerrit-review.googlesource.com/c/gerrit/+/139770."
[21]
"On learning meaningful code changes via neural machine translation Replication Package https://sites.google.com/view/learning-codechanges."
[22]
"Ovirt: Anonymous Class To Lambda. https://gerrit.ovirt.org/#/c/50859/."
[23]
"Ovirt: Flipped Parameters. https://gerrit.ovirt.org/#/c/63570/."
[24]
"Ovirt: Redundant Super. https://gerrit.ovirt.org/#/c/45678/."
[25]
"Ovirt: Rename Method. https://gerrit.ovirt.org/#/c/14147/."
[26]
C. V. Alexandra, "Guided code synthesis using deep neural networks," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. New York, NY, USA: ACM, 2016, pp. 1068--1070. {Online}. Available
[27]
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton, "Suggesting accurate method and class names," in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. New York, NY, USA: ACM, 2015, pp. 38--49. {Online}. Available
[28]
M. Allamanis, "Suggesting accurate method and class names," in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. New York, NY, USA: ACM, 2015, pp. 38--49. {Online}. Available
[29]
M. Allamanis, H. Peng, and C. A. Sutton, "A convolutional attention network for extreme summarization of source code," CoRR, vol. abs/1602.03001, 2016. {Online}. Available: http://arxiv.org/abs/1602.03001
[30]
D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," CoRR, vol. abs/1409.0473, 2014. {Online}. Available: http://arxiv.org/abs/1409.0473
[31]
J. Berkman, "Machine learning vs. deep learning," August 2018, {Online; posted 22-August-2017}. {Online}. Available: https://www.datascience.com/blog/machine-learning-and-deep-learning-what-is-the-difference
[32]
D. Britz, A. Goldie, M. Luong, and Q. V. Le, "Massive exploration of neural machine translation architectures," CoRR, vol. abs/1703.03906, 2017. {Online}. Available: http://arxiv.org/abs/1703.03906
[33]
K. Cho, B. van Merrienboer, Ç. Gülçehre, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," CoRR, vol. abs/1406.1078, 2014. {Online}. Available: http://arxiv.org/abs/1406.1078
[34]
M. Choetkiertikul, H. K. Dam, T. Tran, T. T. M. Pham, A. Ghose, and T. Menzies, "A deep learning model for estimating story points," IEEE Transactions on Software Engineering, pp. 1--1, 2018.
[35]
C. S. Corley, K. Damevski, and N. A. Kraft, "Exploring the use of deep learning for feature location," in 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), Sept 2015, pp. 556--560.
[36]
H. K. Dam, T. Tran, J. Grundy, and A. Ghose, "Deepsoft: A vision for a deep model of software," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. New York, NY, USA: ACM, 2016, pp. 944--947. {Online}. Available
[37]
J. Deshmukh, A. K. M, S. Podder, S. Sengupta, and N. Dubash, "Towards accurate duplicate bug retrieval using deep learning techniques," in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Sept 2017, pp. 115--124.
[38]
J. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus, "Fine-grained and accurate source code differencing," in ACM/IEEE International Conference on Automated Software Engineering, ASE '14, 2014, pp. 313--324.
[39]
P. Godefroid, H. Peleg, and R. Singh, "Learn&fuzz: Machine learning for input fuzzing," CoRR, vol. abs/1701.07232, 2017. {Online}. Available: http://arxiv.org/abs/1701.07232
[40]
X. Gu, H. Zhang, and S. Kim, "Deep code search," in Proceedings of the 40th International Conference on Software Engineering, ser. ICSE '18. New York, NY, USA: ACM, 2018, pp. 933--944. {Online}. Available
[41]
X. Gu, H. Zhang, D. Zhang, and S. Kim, "Deep api learning," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. New York, NY, USA: ACM, 2016, pp. 631--642. {Online}. Available
[42]
J. Guo, J. Cheng, and J. Cleland-Huang, "Semantically enhanced software traceability using deep learning techniques," in Proceedings of the 39th International Conference on Software Engineering, ser. ICSE '17. Piscataway, NJ, USA: IEEE Press, 2017, pp. 3--14. {Online}. Available
[43]
R. Gupta, S. Pal, A. Kanade, and S. K. Shevade, "Deepfix: Fixing common c language errors by deep learning," in AAAI, 2017.
[44]
Z. Han, X. Li, Z. Xing, H. Liu, and Z. Feng, "Learning to predict severity of software vulnerability using only vulnerability description," in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Sept 2017, pp. 125--136.
[45]
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735--1780, Nov. 1997. {Online}. Available
[46]
E. W. Høst and B. M. Østvold, "Debugging method names," in ECOOP 2009 - Object-Oriented Programming, S. Drossopoulou, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 294--317.
[47]
S. Jiang, A. Armaly, and C. McMillan, "Automatically generating commit messages from diffs using neural machine translation," in 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Oct 2017, pp. 135--146.
[48]
N. Kalchbrenner and P. Blunsom, "Recurrent continuous translation models," in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA: Association for Computational Linguistics, October 2013, pp. 1700--1709. {Online}. Available: http://www.aclweb.org/anthology/D13-1176
[49]
A. N. Lam, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen, "Combining deep learning with information retrieval to localize buggy files for bug reports (n)," in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov 2015, pp. 476--481.
[50]
S.-R. Lee, M.-J. Heo, C.-G. Lee, M. Kim, and G. Jeong, "Applying deep learning based automatic bug triager to industrial projects," in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2017. New York, NY, USA: ACM, 2017, pp. 926--931. {Online}. Available
[51]
L. Li, H. Feng, W. Zhuang, N. Meng, and B. Ryder, "Cclearner: A deep learning-based clone detection approach," in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Sept 2017, pp. 249--260.
[52]
M. Luong, H. Pham, and C. D. Manning, "Effective approaches to attention-based neural machine translation," CoRR, vol. abs/1508.04025, 2015. {Online}. Available: http://arxiv.org/abs/1508.04025
[53]
H. Mi, Z. Wang, and A. Ittycheriah, "Vocabulary manipulation for neural machine translation," CoRR, vol. abs/1605.03209, 2016. {Online}. Available: http://arxiv.org/abs/1605.03209
[54]
K. Moran, C. Bernal-Cárdenas, M. Curcio, R. Bonett, and D. Poshyvanyk, "Machine learning-based prototyping of graphical user interfaces for mobile apps," CoRR, vol. abs/1802.02312, 2018. {Online}. Available: http://arxiv.org/abs/1802.02312
[55]
Oxford, "How many words are there in the english language?" August 2018. {Online}. Available: https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-the-english-language/
[56]
T. Parr, The Definitive ANTLR 4 Reference, 2nd ed. Pragmatic Bookshelf, 2013.
[57]
V. Raychev, M. Vechev, and E. Yahav, "Code completion with statistical language models," in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '14. New York, NY, USA: ACM, 2014, pp. 419--428. {Online}. Available
[58]
S. Romansky, N. C. Borle, S. Chowdhury, A. Hindle, and R. Greiner, "Deep green: Modelling time-series of software energy consumption," in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Sept 2017, pp. 273--283.
[59]
S. Scalabrino, M. Linares-Vásquez, D. Poshyvanyk, and R. Oliveto, "Improving code readability models with textual features," in 2016 IEEE 24th International Conference on Program Comprehension (ICPC), May 2016.
[60]
I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," CoRR, vol. abs/1409.3215, 2014. {Online}. Available: http://arxiv.org/abs/1409.3215
[61]
M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, and D. Poshyvanyk, "Deep learning similarities from different representations of source code," in Proceedings of the 15th International Conference on Mining Software Repositories, ser. MSR '18. New York, NY, USA: ACM, 2018, pp. 542--553. {Online}. Available
[62]
M. Tufano, "An empirical investigation into learning bug-fixing patches in the wild via neural machine translation," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ser. ASE 2018. New York, NY, USA: ACM, 2018, pp. 832--837. {Online}. Available
[63]
S. Wang, T. Liu, and L. Tan, "Automatically learning semantic features for defect prediction," in Proceedings of the 38th International Conference on Software Engineering, ser. ICSE '16. New York, NY, USA: ACM, 2016, pp. 297--308. {Online}. Available
[64]
M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, "Deep learning code fragments for code clone detection," in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ser. ASE 2016. New York, NY, USA: ACM, 2016, pp. 87--98. {Online}. Available
[65]
M. White, C. Vendome, M. Linares-Vásquez, and D. Poshyvanyk, "Toward deep learning software repositories," in Proceedings of the 12th Working Conference on Mining Software Repositories, ser. MSR '15. Piscataway, NJ, USA: IEEE Press, 2015, pp. 334--345. {Online}. Available: http://dl.acm.org/citation.cfm?id=2820518.2820559
[66]
E. Wong, J. Yang, and L. Tan, "Autocomment: Mining question and answer sites for automatic comment generation," in 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov 2013, pp. 562--567.
[67]
Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean, "Google's neural machine translation system: Bridging the gap between human and machine translation," CoRR, vol. abs/1609.08144, 2016. {Online}. Available: http://arxiv.org/abs/1609.08144

Cited By

View all
  • (2024)Understanding Code Changes Practically with Small-Scale Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694999(216-228)Online publication date: 27-Oct-2024
  • (2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
  • (2024)Mining Fix Patterns for System Interaction BugsProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671398(367-376)Online publication date: 24-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '19: Proceedings of the 41st International Conference on Software Engineering
May 2019
1318 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 25 May 2019

Check for updates

Author Tags

  1. empirical study
  2. neural-machine translation

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)4
Reflects downloads up to 24 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Understanding Code Changes Practically with Small-Scale Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694999(216-228)Online publication date: 27-Oct-2024
  • (2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
  • (2024)Mining Fix Patterns for System Interaction BugsProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671398(367-376)Online publication date: 24-Jul-2024
  • (2024)MineCPP: Mining Bug Fix Pairs and Their StructuresCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663797(552-556)Online publication date: 10-Jul-2024
  • (2024)On the Use of ChatGPT for Code Review: Do Developers Like Reviews By ChatGPT?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661183(375-380)Online publication date: 18-Jun-2024
  • (2024)Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning AttacksProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644416(280-292)Online publication date: 15-Apr-2024
  • (2024)Automated Program Repair with the GPT Family, including GPT-2, GPT-3 and CodeXProceedings of the 5th ACM/IEEE International Workshop on Automated Program Repair10.1145/3643788.3648021(34-41)Online publication date: 20-Apr-2024
  • (2024)On the Reliability and Explainability of Language Models for Program GenerationACM Transactions on Software Engineering and Methodology10.1145/364154033:5(1-26)Online publication date: 3-Jun-2024
  • (2024)Beyond Accuracy and Robustness Metrics for Large Language Models for CodeProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639792(159-161)Online publication date: 14-Apr-2024
  • (2024)Smart Contract Code Repair Recommendation based on Reinforcement Learning and Multi-metric OptimizationACM Transactions on Software Engineering and Methodology10.1145/363722933:4(1-31)Online publication date: 18-Apr-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media