Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

code-generation-using-machine-learning-a-systematic-review-1ic7hqvz_Extracted

Uploaded by

Jamoni Jamo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

code-generation-using-machine-learning-a-systematic-review-1ic7hqvz_Extracted

Uploaded by

Jamoni Jamo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3196347

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Code Generation Using Machine


Learning: A Systematic Review
ENRIQUE DEHAERNE1,2 (Graduate Student Member, IEEE), BAPPADITYA DEY2
(Graduate Student Member, IEEE), SANDIP HALDER2 , STEFAN DE GENDT1,3 (Senior
Member, IEEE), and WANNES MEERT1 (Member, IEEE).
1
Dept. of Computer Science, KU Leuven, 3001 Leuven, Belgium (e-mail: Enrique.Dehaerne@student.kuleuven.be, Wannes.Meert@kuleuven.be)
2
Interuniversity Microelectronics Centre (imec), Kapeldreef 75, 3001 Leuven, Belgium (e-mails: {Bappaditya.Dey, Sandip.Halder, Stefan.DeGendt}@imec.be)
3
Dept. of Chemistry, KU Leuven, 3001 Leuven, Belgium (e-mail: stefan.degendt@kuleuven.be)
Corresponding author: Enrique Dehaerne (e-mail: enrique.dehaerne@student.kuleuven.be).

ABSTRACT Recently, machine learning (ML) methods have been used to create powerful language
models for a broad range of natural language processing tasks. An important subset of this field is that
of generating code of programming languages for automatic software development. This review provides a
broad and detailed overview of studies for code generation using ML. We selected 37 publications indexed
in arXiv and IEEE Xplore databases that train ML models on programming language data to generate code.
The three paradigms of code generation we identified in these studies are description-to-code, code-to-
description, and code-to-code. The most popular applications that work in these paradigms were found to
be code generation from natural language descriptions, documentation generation, and automatic program
repair, respectively. The most frequently used ML models in these studies include recurrent neural networks,
transformers, and convolutional neural networks. Other neural network architectures, as well as non-neural
techniques, were also observed. In this review, we have summarized the applications, models, datasets,
results, limitations, and future work of 37 publications. Additionally, we include discussions on topics
general to the literature reviewed. This includes comparing different model types, comparing tokenizers,
the volume and quality of data used, and methods for evaluating synthesized code. Furthermore, we provide
three suggestions for future work for code generation using ML.

INDEX TERMS Automatic programming, Computer languages, Data collection, Machine Learning,
Natural language processing, Neural networks, Recurrent neural networks, Software debugging, Software
maintenance, Text mining.

I. INTRODUCTION

S OFTWARE development is a complex and time-


consuming process. It consists of two main phases: anal-
ysis and coding [1]. In the analysis phase, the requirements
and architecture of the software system are formalized. In the
coding phase, source code is written and tested to meet the
requirements set in the first phase. Usually, maintenance of
the system is included as an additional phase in the software
development cycle where previous steps can be adapted to FIGURE 1. An example model for software development with three phases,
reflect changes in the needs of the system user. Figure 1 each consisting of multiple steps. Software development models used in
practice vary in the number of steps and their ordering compared to the model
shows a flowchart for a simple software development model. depicted in this flowchart.
In this review, we focus on the coding phase which works
directly with source code.
Modern society relies on complex software applications. many programming languages (PLs) by many teams of devel-
These applications can consist of millions of lines written in opers. Even small software projects will often leverage large

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like