Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3524842.3528470acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

An empirical evaluation of GitHub copilot's code suggestions

Published: 17 October 2022 Publication History

Abstract

GitHub and OpenAI recently launched Copilot, an "AI pair programmer" that utilizes the power of Natural Language Processing, Static Analysis, Code Synthesis, and Artificial Intelligence. Given a natural language description of the target functionality, Copilot can generate corresponding code in several programming languages. In this paper, we perform an empirical study to evaluate the correctness and understandability of Copilot's suggested code. We use 33 LeetCode questions to create queries for Copilot in four different programming languages. We evaluate the correctness of the corresponding 132 Copilot solutions by running LeetCode's provided tests, and evaluate understandability using SonarQube's cyclomatic complexity and cognitive complexity metrics. We find that Copilot's Java suggestions have the highest correctness score (57%) while JavaScript is the lowest (27%). Overall, Copilot's suggestions have low complexity with no notable differences between the programming languages. We also find some potential Copilot shortcomings, such as generating code that can be further simplified and code that relies on undefined helper methods.

References

[1]
Romaana Aamir. 2021. GitHub copilot-bright future or an impending doom. https://code.likeagirl.io/github-copilot-bright-future-or-an-impending-doom-df0f1674a50c
[2]
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1--37.
[3]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1--29.
[4]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. arXiv:2108.07732 [cs.PL]
[5]
Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 964--974.
[6]
G. Ann Campbell. 2018. Cognitive Complexity: An Overview and Evaluation. In Proceedings of the 2018 International Conference on Technical Debt (Gothenburg, Sweden) (TechDebt '18). Association for Computing Machinery, New York, NY, USA, 57--58.
[7]
G. Ann Campbell. 2021. Cognitive complexity - A new way of measuring understandability. https://www.sonarsource.com/docs/CognitiveComplexity.pdf
[8]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]
[9]
Carlos Eduardo de Carvalho Dantas and Marcelo de Almeida Maia. 2021. Readability and Understandability Scores for Snippet Assessment: an Exploratory Study. CoRR abs/2108.09181 (2021). arXiv:2108.09181 https://arxiv.org/abs/2108.09181
[10]
fabasoad and Sachin131. 2016. Is there public API endpoints available for leet-code? https://leetcode.com/discuss/general-discussion/1297705/is-there-public-api-endpoints-available-for-leetcode
[11]
GitHub. 2021. GitHub Copilot · Your AI pair programmer. https://copilot.github.com/
[12]
Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep Code Search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 933--944.
[13]
HackerRank. [n.d.]. HackerRank for Work API. https://www.hackerrank.com/work/apidocs#
[14]
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). 837--847.
[15]
LeetCode. [n.d.]. Integer Break. https://leetcode.com/problems/integer-break/
[16]
LeetCode. [n.d.]. Longest Increasing Path in a Matrix. https://leetcode.com/problems/longest-increasing-path-in-a-matrix
[17]
LeetCode. 2019. Start your coding practice -. https://support.leetcode.com/hc/enus/articles/360012016874-Start-your-Coding-Practice
[18]
LeetCode. 2021. The world's leading online programming learning platform. https://leetcode.com/
[19]
Jian Li, Yue Wang, Michael R. Lyu, and Irwin King. 2018. Code Completion with Neural Attention and Pointer Networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, 4159--4165.
[20]
Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1--28.
[21]
Matthew MacDonald. 2021. GitHub copilot: Fatally flawed or the future of software development? https://medium.com/young-coder/github-copilot-fatally-flawed-or-the-future-of-software-development-390c30afbc97
[22]
Gerald Mücke and G Ann Campbell. 2021. How to use cognitive complexity? https://community.sonarsource.com/t/how-to-use-cognitive-complexity/1894/7
[23]
Nhan Nguyen and Sarah Nadi. 2022. Online artifact for MSR 2022 Submission "An Empirical Evaluation of GitHub Copilot's Code Suggestions".
[24]
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2021. Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. arXiv:2108.09293 [cs.CR]
[25]
Martin Robillard, Robert Walker, and Thomas Zimmermann. 2009. Recommendation systems for software engineering. IEEE software 27, 4 (2009), 80--86.
[26]
Swapnil Rustagi and Jagga Jasoos. 2019. Access to CodeChef API. https://discuss.codechef.com/t/access-to-codechef-api/27308
[27]
Dominik Sobania, Martin Briesch, and Franz Rothlauf. 2021. Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming. arXiv:2111.07875 [cs.SE]
[28]
SonarQube. 2021. Code quality and code security. https://www.sonarqube.org/
[29]
SonarQube. 2021. Metric definitions. https://docs.sonarqube.org/latest/user-guide/metric-definitions/
[30]
Meng Xia, Mingfei Sun, Huan Wei, Qing Chen, Yong Wang, Lei Shi, Huamin Qu, and Xiaojuan Ma. 2019. PeerLens: Peer-Inspired Interactive Learning Path Planning in Online Question Pool. Association for Computing Machinery, New York, NY, USA, 1--12.
[31]
Ziyu Yao, Jayavardhan Reddy Peddamail, and Huan Sun. 2019. CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 2203--2214.
[32]
Qihao Zhu and Wenjie Zhang. 2021. Code Generation Based on Deep Learning: a Brief Review. arXiv:2106.08253 [cs.SE]

Cited By

View all
  • (2025)CIPAC: A framework of automated software construction based on collective intelligenceJournal of Systems and Software10.1016/j.jss.2025.112335(112335)Online publication date: Jan-2025
  • (2025)A fine-grained taxonomy of code review feedback in TypeScript projectsEmpirical Software Engineering10.1007/s10664-024-10604-y30:2Online publication date: 14-Jan-2025
  • (2024)Uma análise do uso de ferramentas de geração de código por alunos de computaçãoAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237427(63-71)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
May 2022
815 pages
ISBN:9781450393034
DOI:10.1145/3524842
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GitHub copilot
  2. codex
  3. empirical evaluation
  4. program synthesis

Qualifiers

  • Short-paper

Funding Sources

  • Canada Research Chairs Program

Conference

MSR '22
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,060
  • Downloads (Last 6 weeks)74
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)CIPAC: A framework of automated software construction based on collective intelligenceJournal of Systems and Software10.1016/j.jss.2025.112335(112335)Online publication date: Jan-2025
  • (2025)A fine-grained taxonomy of code review feedback in TypeScript projectsEmpirical Software Engineering10.1007/s10664-024-10604-y30:2Online publication date: 14-Jan-2025
  • (2024)Uma análise do uso de ferramentas de geração de código por alunos de computaçãoAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237427(63-71)Online publication date: 22-Apr-2024
  • (2024)ArtWhispererProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694099(49627-49654)Online publication date: 21-Jul-2024
  • (2024)The Current State of Generative Artificial Intelligence Tools for Accessibility in Product DevelopmentNafath10.54455/MCN26059:26Online publication date: 30-Jul-2024
  • (2024)Changes in homework submission patterns with the advent of AI tools: a high school perspectiveSTUDIES IN EDUCATION SCIENCES10.54019/sesv5n4-0015:4(e10249)Online publication date: 6-Nov-2024
  • (2024)Framework for evaluating code generation ability of large language modelsETRI Journal10.4218/etrij.2023-035746:1(106-117)Online publication date: 14-Feb-2024
  • (2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
  • (2024)Program Code Generation with Generative AIsAlgorithms10.3390/a1702006217:2(62)Online publication date: 31-Jan-2024
  • (2024)Software engineering education in the era of conversational AI: current trends and future directionsFrontiers in Artificial Intelligence10.3389/frai.2024.14363507Online publication date: 29-Aug-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media