code-generation-using-machine-learning-a-systematic-review-1ic7hqvz_Extracted
code-generation-using-machine-learning-a-systematic-review-1ic7hqvz_Extracted
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3196347
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Recently, machine learning (ML) methods have been used to create powerful language
models for a broad range of natural language processing tasks. An important subset of this field is that
of generating code of programming languages for automatic software development. This review provides a
broad and detailed overview of studies for code generation using ML. We selected 37 publications indexed
in arXiv and IEEE Xplore databases that train ML models on programming language data to generate code.
The three paradigms of code generation we identified in these studies are description-to-code, code-to-
description, and code-to-code. The most popular applications that work in these paradigms were found to
be code generation from natural language descriptions, documentation generation, and automatic program
repair, respectively. The most frequently used ML models in these studies include recurrent neural networks,
transformers, and convolutional neural networks. Other neural network architectures, as well as non-neural
techniques, were also observed. In this review, we have summarized the applications, models, datasets,
results, limitations, and future work of 37 publications. Additionally, we include discussions on topics
general to the literature reviewed. This includes comparing different model types, comparing tokenizers,
the volume and quality of data used, and methods for evaluating synthesized code. Furthermore, we provide
three suggestions for future work for code generation using ML.
INDEX TERMS Automatic programming, Computer languages, Data collection, Machine Learning,
Natural language processing, Neural networks, Recurrent neural networks, Software debugging, Software
maintenance, Text mining.
I. INTRODUCTION
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/