Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3609437.3609465acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

Seq2Seq or Seq2Tree: Generating Code Using Both Paradigms via Mutual Learning

Published: 05 October 2023 Publication History

Abstract

Code generation aims to automatically generate the source code based on given natural language (NL) descriptions, which is of great significance for automated software development. Some code generation models follow a language model-based paradigm (LMBP) to generate source code tokens sequentially. Some others focus on deriving the grammatical structure by generating the program’s abstract syntax tree (AST), i.e., using the grammatical structure-based paradigm (GSBP). Existing studies are trying to generate code through one of the above two models. However, human developers often consider both paradigms: building the grammatical structure of the code and writing source code sentences according to the language model. Therefore, we argue that code generation should consider both GSBP and LMBP. In this paper, we use mutual learning to combine two classes of models to make the two different paradigms train together. To implement the mutual learning framework, we design alignment methods between code and AST. Under this framework, models can be enhanced through shared encoders and knowledge interaction in aligned training steps. We experiment on three Python-based code generation datasets. Experimental results and ablation analysis confirm the effectiveness of our approach. Our results demonstrate that considering both GSBP and LMBP is helpful in improving the performance of code generation.

References

[1]
Rajas Agashe, Srinivasan Iyer, and Luke Zettlemoyer. 2019. JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 5435–5445.
[2]
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In NAACL-HLT. Association for Computational Linguistics, 2655–2668.
[3]
Ruisheng Cao, Su Zhu, Chen Liu, Jieyu Li, and Kai Yu. 2019. Semantic Parsing with Dual Learning. In ACL (1). Association for Computational Linguistics, 51–64.
[4]
Huimin Chen, Yankai Lin, Fanchao Qi, Jinyi Hu, Peng Li, Jie Zhou, and Maosong Sun. 2021. Aspect-Level Sentiment-Controllable Review Generation with Mutual Learning Framework. In AAAI. AAAI Press, 12639–12647.
[5]
Li Dong and Mirella Lapata. 2016. Language to Logical Form with Neural Attention. In ACL (1). The Association for Computer Linguistics.
[6]
Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. 2023. Self-collaboration Code Generation via ChatGPT. CoRR abs/2304.07590 (2023).
[7]
Yihong Dong, Ge Li, and Zhi Jin. 2022. CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation. CoRR abs/2211.00818 (2022).
[8]
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In ACL (1). Association for Computational Linguistics, 7212–7225.
[9]
Jessica B. Hamrick. 2016. Creating and Grading IPython/Jupyter Notebook Assignments with NbGrader. In SIGCSE. ACM, 242.
[10]
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. 2012. On the naturalness of software. In ICSE. IEEE Computer Society, 837–847.
[11]
Peixian Hong, Tao Wu, Ancong Wu, Xintong Han, and Wei-Shi Zheng. 2021. Fine-Grained Shape-Appearance Mutual Learning for Cloth-Changing Person Re-Identification. In CVPR. Computer Vision Foundation / IEEE, 10513–10522.
[12]
Robin Jia and Percy Liang. 2016. Data Recombination for Neural Semantic Parsing. In ACL (1). The Association for Computer Linguistics.
[13]
Hui Jiang, Chulun Zhou, Fandong Meng, Biao Zhang, Jie Zhou, Degen Huang, Qingqiang Wu, and Jinsong Su. 2021. Exploring Dynamic Selection of Branch Expansion Orders for Code Generation. In ACL/IJCNLP (1). Association for Computational Linguistics, 5076–5085.
[14]
Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2021. Code Prediction by Feeding Trees to Transformers. In ICSE. IEEE, 150–162.
[15]
Tomasz Korbak, Hady Elsahar, Marc Dymetman, and Germán Kruszewski. 2021. Energy-Based Models for Code Generation under Compilability Constraints. CoRR abs/2106.04985 (2021).
[16]
Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tomás Kociský, Fumin Wang, and Andrew W. Senior. 2016. Latent Predictor Networks for Code Generation. In ACL (1). The Association for Computer Linguistics.
[17]
Fang Liu, Jia Li, and Li Zhang. 2023. Syntax and Domain Aware Model for Unsupervised Program Translation. In ICSE. IEEE, 755–767.
[18]
Sajad Norouzi, Keyi Tang, and Yanshuai Cao. 2021. Code Generation from Natural Language with Less Prior Knowledge and More Monolingual Data. In ACL/IJCNLP (2). Association for Computational Linguistics, 776–785.
[19]
Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract Syntax Networks for Code Generation and Semantic Parsing. In ACL (1). Association for Computational Linguistics, 1139–1149.
[20]
Veselin Raychev, Martin T. Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In PLDI. ACM, 419–428.
[21]
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. CoRR abs/2009.10297 (2020).
[22]
Zeyu Sun, Qihao Zhu, Yingfei Xiong, Yican Sun, Lili Mou, and Lu Zhang. 2020. TreeGen: A Tree-Based Transformer Architecture for Code Generation. In AAAI. AAAI Press, 8984–8991.
[23]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS. 3104–3112.
[24]
Sindhu Tipirneni, Ming Zhu, and Chandan K. Reddy. 2022. StructCoder: Structure-Aware Transformer for Code Generation. CoRR abs/2206.05239 (2022).
[25]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998–6008.
[26]
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. In NIPS. 2692–2700.
[27]
Xin Wang, Yasheng Wang, Yao Wan, Fei Mi, Yitong Li, Pingyi Zhou, Jin Liu, Hao Wu, Xin Jiang, and Qun Liu. 2022. Compilable Neural Code Generation with Compiler Feedback. In ACL (Findings). Association for Computational Linguistics, 9–19.
[28]
Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In EMNLP (1). Association for Computational Linguistics, 8696–8708.
[29]
Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, and Zhi Jin. 2019. Code Generation as a Dual Task of Code Summarization. In NeurIPS. 6559–6569.
[30]
Binbin Xie, Jinsong Su, Yubin Ge, Xiang Li, Jianwei Cui, Junfeng Yao, and Bin Wang. 2021. Improving Tree-Structured Decoder Training for Code Generation via Mutual Learning. In AAAI. AAAI Press, 14121–14128.
[31]
Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, and Graham Neubig. 2020. Incorporating External Knowledge through Pre-training for Natural Language to Code Generation. In ACL. Association for Computational Linguistics, 6045–6052.
[32]
Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. In ACL (1). Association for Computational Linguistics, 440–450.
[33]
Pengcheng Yin and Graham Neubig. 2018. TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation. In EMNLP (Demonstration). Association for Computational Linguistics, 7–12.
[34]
Pengcheng Yin and Graham Neubig. 2019. Reranking for Neural Semantic Parsing. In ACL (1). Association for Computational Linguistics, 4553–4559.
[35]
Ying Zhang, Tao Xiang, Timothy M. Hospedales, and Huchuan Lu. 2018. Deep Mutual Learning. In CVPR. Computer Vision Foundation / IEEE Computer Society, 4320–4328.
[36]
Jiawei Zhao, Wei Luo, Boxing Chen, and Andrew Gilman. 2021. Mutual-Learning Improves End-to-End Speech Translation. In EMNLP (1). Association for Computational Linguistics, 3989–3994.

Cited By

View all

Index Terms

  1. Seq2Seq or Seq2Tree: Generating Code Using Both Paradigms via Mutual Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware
    August 2023
    332 pages
    ISBN:9798400708947
    DOI:10.1145/3609437
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. abstract syntax tree
    2. code generation
    3. mutual learning
    4. neural networks

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Internetware 2023

    Acceptance Rates

    Overall Acceptance Rate 55 of 111 submissions, 50%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 96
      Total Downloads
    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 22 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media