research-article

Measuring Efficient Code Generation with GEC

Authors:

Chen LyuAuthors Info & Claims

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

Pages 249 - 258

https://doi.org/10.1145/3609437.3609459

Published: 05 October 2023 Publication History

Abstract

Although efficiency is one of the core metrics in programming, recent large-scale language models often face the issue of “inefficient code” generation, which struggles to meet the real-time requirements of algorithms. However, there is relatively little research on evaluating the selection of efficient algorithms, and it is not easy to rigorously assess a model’s ability to correctly choose efficient algorithm solutions. Furthermore, the selection of efficient algorithm solutions often relies on the appropriate application of problem-solving skills, necessitating more in-depth research on algorithm reasoning. To address this challenge, we introduce the Generation of Efficient Code (GEC) benchmark, which aims to evaluate the ability to select efficient algorithm solutions. Unlike code generation, our benchmark focuses on a model’s ability to generate satisfactory efficient code when given a natural language description and inefficient code. We propose two novel metrics to examine the efficiency of the generated code and assess the model’s ability to generate efficient code. Our benchmark includes 3,712 problems, 31,577 combinations of efficient and inefficient code pairs, and 13,092 alternative efficient codes. We evaluate the performance of mainstream code generation models on the GEC benchmark. As the societal importance of code efficiency increases in the coming years, our benchmark will provide an essential measurement standard for tracking research progress. Our dataset and models are open-source and can be accessed at https://github.com/CodeGeneration2/Efficient-Code-Generation-with-GEC.

References

[1]

[1] Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, and Jian-Guang Lou. When neural model meets nl2code: A survey. arXiv preprint arXiv:2212.09420, 2022.

[2]

[2] Binghong Chen, Daniel Tarlow, Kevin Swersky, Martin Maas, Pablo Heiber, Ashish Naik, Milad Hashemi, and Parthasarathy Ranganathan. Learning to improve code efficiency. arXiv preprint arXiv:2208.05297, 2022.

[3]

[3] David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350, 2021.

[4]

[4] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.

[5]

[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.

[6]

[6] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, J Kaplan, H Edwards, Y Burda, N Joseph, G Brockman, et al. Evaluating large language models trained on code.(2021). arXiv preprint arXiv:2107.03374, 2021.

[7]

[7] Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022.

[8]

[8] Sila Lertbanjongngam, Bodin Chinthanet, Takashi Ishio, Raula Gaikovina Kula, Pattara Leelaprute, Bundit Manaskasemsak, Arnon Rungsawang, and Kenichi Matsumoto. An empirical evaluation of competitive programming ai: A case study of alphacode. In 2022 IEEE 16th International Workshop on Software Clones (IWSC), pages 10–15. IEEE, 2022.

[9]

[9] Mike Mirzayanov. Codeforces.

[10]

[10] Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, et al. Program synthesis. Foundations and Trends® in Programming Languages, 4(1-2):1–119, 2017.

[11]

[11] Rajeev Alur, Dana Fisman, Rishabh Singh, and Armando Solar-Lezama. Sygus-comp 2017: Results and analysis. In 6th Workshop on Synthesis, SYNT 2017, pages 97–115. Open Publishing Association, 2017.

[12]

[12] Jonathon Cai, Richard Shin, and Dawn Song. Making neural programming architectures generalize via recursion. In International Conference on Learning Representations, 2016.

[13]

[13] Pengcheng Yin and Graham Neubig. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 440–450, 2017.

[14]

[14] Christopher W Fraser and David R Hanson. A retargetable C compiler: design and implementation. Addison-Wesley Longman Publishing Co., Inc., 1995.

Digital Library

[15]

[15] Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, and Miltiadis Allamanis. Learning to complete code with sketches. In International Conference on Learning Representations, 2021.

[16]

[16] Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. Robustfill: Neural program learning under noisy i/o. In International conference on machine learning, pages 990–998. PMLR, 2017.

[17]

[17] Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, et al. Measuring coding challenge competence with apps. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.

[18]

[18] Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.

[19]

[19] Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample. Dobf: A deobfuscation pre-training objective for programming languages. Advances in Neural Information Processing Systems, 34:14967–14979, 2021.

[20]

[20] Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297, 2020.

[21]

[21] Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, 2021.

[22]

[22] Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, and Steven Chu Hong Hoi. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328, 2022.

[23]

[23] Loshchilov Ilya, Hutter Frank, et al. Decoupled weight decay regularization. Proceedings of ICLR, 7, 2019.

Cited By

Pan YShao XLyu C(2025)Measuring code efficiency optimization capabilities with ACEOBJournal of Systems and Software10.1016/j.jss.2024.112250219(112250)Online publication date: Jan-2025
https://doi.org/10.1016/j.jss.2024.112250
Sengul CNeykova RDestefanis G(2024)Software engineering education in the era of conversational AI: current trends and future directionsFrontiers in Artificial Intelligence10.3389/frai.2024.14363507Online publication date: 29-Aug-2024
https://doi.org/10.3389/frai.2024.1436350

Index Terms

Measuring Efficient Code Generation with GEC
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

Measuring code efficiency optimization capabilities with ACEOB
Abstract
As Moore’s Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously ...
Highlights
- Benchmark dataset for measuring the efficiency of model optimization code.
- Analysis of reasons for the inefficiency of code generated by current models.
- Development of two metrics for assessing code efficiency.
- The NPI filter ...
CYCLE: Learning to Self-Refine the Code Generation

Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs,...
Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network

Automatically transforming developers' natural language descriptions into source code has been a longstanding goal in software engineering research. Two types of approaches have been proposed in the literature to achieve this: code generation, which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

August 2023

332 pages

ISBN:9798400708947

DOI:10.1145/3609437

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

Internetware 2023

Internetware 2023: 14th Asia-Pacific Symposium on Internetware

August 4 - 6, 2023

Hangzhou, China

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
85
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)1

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pan YShao XLyu C(2025)Measuring code efficiency optimization capabilities with ACEOBJournal of Systems and Software10.1016/j.jss.2024.112250219(112250)Online publication date: Jan-2025
https://doi.org/10.1016/j.jss.2024.112250
Sengul CNeykova RDestefanis G(2024)Software engineering education in the era of conversational AI: current trends and future directionsFrontiers in Artificial Intelligence10.3389/frai.2024.14363507Online publication date: 29-Aug-2024
https://doi.org/10.3389/frai.2024.1436350

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents