research-article

Assessing the Readability of ChatGPT Code Snippet Recommendations: A Comparative Study

Authors:

Marcelo MaiaAuthors Info & Claims

SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering

Pages 283 - 292

https://doi.org/10.1145/3613372.3613413

Published: 25 September 2023 Publication History

Abstract

Developers often rely on code search engines to find high-quality and reusable code snippets online, such as those available on Stack Overflow. Recently, ChatGPT, a language model trained for dialog tasks, has been gaining attention as a promising approach for code snippet generation. However, there is still a need for in-depth analysis of the quality of its recommendations. In this work, we propose the evaluation of the readability of code snippets generated by ChatGPT, comparing them with those recommended by CROKAGE, a state-of-the-art code search engine for Stack Overflow. We compare the recommended snippets of both approaches using readability issues raised by the automated static analysis tool (ASAT) SonarQube. Our results show that ChatGPT can generate cleaner code snippets and more consistent naming and code conventions than those written by humans and recommended by CROKAGE. However, in some cases, ChatGPT generates code that lacks recent features from Java API such as try-with-resources, lambdas, and others. Overall, our findings suggest that ChatGPT can provide valuable assistance to developers searching for didactic and high-quality code snippets online. However, it is still important for developers to review the generated code, either manually or assisted by an ASAT, to prevent potential readability issues, as the correctness of the generated code snippets.

References

[1]

Naser Al Madi. 2023. How Readable is Model-Generated Code? Examining Readability and Visual Inspection of GitHub Copilot. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 205, 5 pages. https://doi.org/10.1145/3551349.3560438

Digital Library

[2]

Sushil Bajracharya, Trung Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Lopes. 2006. Sourcerer: A Search Engine for Open Source Code Supporting Structure-Based Search. In Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications (Portland, Oregon, USA) (OOPSLA ’06). Association for Computing Machinery, New York, NY, USA, 681–682.

[3]

Marvin Muñoz Barón, Marvin Wyrich, and Stefan Wagner. 2020. An Empirical Validation of Cognitive Complexity as a Measure of Source Code Understandability. CoRR abs/2007.12520 (2020). arxiv:2007.12520

[4]

B. W. Boehm, J. R. Brown, and M. Lipow. 1976. Quantitative Evaluation of Software Quality. In Proceedings of the 2nd International Conference on Software Engineering (San Francisco, California, USA) (ICSE ’76). IEEE Computer Society Press, Washington, DC, USA, 592–605.

[5]

Raymond P.L. Buse and Westley R. Weimer. 2010. Learning a Metric for Code Readability. IEEE Transactions on Software Engineering 36, 4 (2010), 546–558.

Digital Library

[6]

Jürgen Börstler, Michael Caspersen, and Marie Nordström. 2015. Beauty and the Beast: on the readability of object-oriented example programs. Software Quality Journal 24 (02 2015). https://doi.org/10.1007/s11219-015-9267-5

Digital Library

[7]

G. Ann Campbell. 2018. Cognitive Complexity — An Overview and Evaluation. In 2018 IEEE/ACM International Conference on Technical Debt (TechDebt). 57–58.

Digital Library

[8]

Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, and Xiang Chen. 2021. Automated Query Reformulation for Efficient Search Based on Query Logs From Stack Overflow. 1273–1285. https://doi.org/10.1109/ICSE43902.2021.00116

Digital Library

[9]

Rodrigo Fernandes Gomes da Silva, Chanchal K. Roy, Mohammad Masudur Rahman, Kevin A. Schneider, Klérisson V. R. Paixão, Carlos Eduardo de Carvalho Dantas, and Marcelo de Almeida Maia. 2020. CROKAGE: effective solution recommendation for programming tasks by leveraging crowd knowledge. Empir. Softw. Eng. 25, 6 (2020), 4707–4758.

Digital Library

[10]

Carlos Eduardo C. Dantas, Adriano M. Rocha, and Marcelo A. Maia. 2023. Assessing the Readability of ChatGPT Code Snippet Recommendations: A Comparative Study. https://doi.org/10.5281/zenodo.7945900

[11]

Neil A. Ernst and Gabriele Bavota. 2022. AI-Driven Development Is Here: Should You Worry?IEEE Software 39, 2 (mar 2022), 106–110. https://doi.org/10.1109/ms.2021.3133805

[12]

Andre Hora. 2021. Characterizing top ranked code examples in Google. Journal of Systems and Software 178 (2021), 110971.

[13]

Andre Hora. 2021. Googling for Software Development: What Developers Search For and What They Find.

[14]

André C. Hora. 2021. APISonar: Mining API usage examples. Software: Practice and Experience 51 (2021), 319 – 352.

[15]

David Hovemeyer and William Pugh. 2004. Finding Bugs is Easy. SIGPLAN Not. 39, 12 (dec 2004), 92–106. https://doi.org/10.1145/1052883.1052895

Digital Library

[16]

Qiao Huang, Xin Xia, Zhenchang Xing, David Lo, and Xinyu Wang. 2018. API method recommendation without worrying about the task-API knowledge gap. In Proc. ASE. 293–304.

Digital Library

[17]

Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why Don’t Software Developers Use Static Analysis Tools to Find Bugs?. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE ’13). IEEE Press, 672–681.

[18]

John Johnson, Sergio Lubo, Nishitha Yedla, Jairo Aponte, and Bonita Sharif. 2019. An Empirical Study Assessing Source Code Readability in Comprehension. 513–523. https://doi.org/10.1109/ICSME.2019.00085

[19]

Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting Working Code Examples. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 664–675.

Digital Library

[20]

John C. Knight and E. Ann Myers. 1991. Phased Inspections and Their Implementation. SIGSOFT Softw. Eng. Notes 16, 3 (jul 1991), 29–35. https://doi.org/10.1145/127099.127101

Digital Library

[21]

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174.

[22]

Diego Marcilio, Rodrigo Bonifácio, Eduardo Monteiro, Edna Canedo, Welder Luz, and Gustavo Pinto. 2019. Are Static Analysis Violations Really Fixed? A Closer Look at Realistic Usage of SonarQube. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). 209–219. https://doi.org/10.1109/ICPC.2019.00040

Digital Library

[23]

Roberto Minelli, Andrea Mocci and, and Michele Lanza. 2015. I Know What You Did Last Summer: An Investigation of How Developers Spend Their Time(ICPC ’15). IEEE Press, 25–35.

[24]

Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrian Marcus. 2015. How Can I Use This Method?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (Florence, Italy) (ICSE ’15). IEEE Press, 880–890.

Digital Library

[25]

Nhan Nguyen and Sarah Nadi. 2022. An Empirical Evaluation of GitHub Copilot’s Code Suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 1–5. https://doi.org/10.1145/3524842.3528470

Digital Library

[26]

Flemming Nielson, Hanne R. Nielson, and Chris Hankin. 2010. Principles of Program Analysis. Springer Publishing Company, Incorporated.

[27]

Haoran Niu, Iman Keivanloo, and Ying Zou. 2017. Learning to Rank Code Examples for Code Search Engines. Empirical Softw. Engg. 22, 1 (Feb. 2017), 259–291.

Digital Library

[28]

Delano Oliveira, Reydne Bruno, Fernanda Madeiral, and Fernando Castor. 2020. Evaluating Code Readability and Legibility: An Examination of Human-centric Studies. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). 348–359.

[29]

Jevgenija Pantiuchina, Fiorella Zampetti, Simone Scalabrino, Valentina Piantadosi, Rocco Oliveto, Gabriele Bavota, and Massimiliano Di Penta. 2020. Why Developers Refactor Source Code: A Mining-Based Study. 29, 4, Article 29 (2020), 30 pages.

[30]

Valentina Piantadosi, Fabiana Fierro, Simone Scalabrino, Alexander Serebrenik, and Rocco Oliveto. 2020. How does code readability change during software evolution?Empirical Software Engineering 25 (11 2020), 1–39.

[31]

Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2016. Prompter: Turning the IDE into a self-confident programming assistant. Empirical Software Engineering 21 (2016), 2190–2231.

Digital Library

[32]

Daryl Posnett, Abram Hindle, and Premkumar Devanbu. 2011. A simpler model of software readability. Proceedings - International Conference on Software Engineering, 73–82.

Digital Library

[33]

Flesch R.1948. A new readability yardstick. Journal of Applied Psychology (1948), 221–233.

[34]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language Models are Unsupervised Multitask Learners. (2018). https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf

[35]

Gerard K. Rambally. 1986. The influence of color on program readability and comprehensibility. In SIGCSE ’86.

[36]

Riccardo Rubei, Claudio Di Sipio, Phuong T. Nguyen, Juri Di Rocco, and Davide Di Ruscio. 2020. PostFinder: Mining Stack Overflow posts to support software developers. Information and Software Technology 127 (2020), 106367. https://doi.org/10.1016/j.infsof.2020.106367

[37]

Spencer Rugaber. 2000. The use of domain knowledge in program understanding. Annals of Software Engineering 9 (2000), 143–192.

Digital Library

[38]

Simone Scalabrino, Gabriele Bavota, Christopher Vendome, Mario Linares-Vásquez, Denys Poshyvanyk, and Rocco Oliveto. 2017. Automatically assessing code understandability: How far are we?. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 417–427.

[39]

Simone Scalabrino, Mario Linares‐Vásquez, Rocco Oliveto, and Denys Poshyvanyk. 2018. A comprehensive model for code readability. Journal of Software: Evolution and Process 30 (06 2018).

[40]

Rodrigo F. Silva, Mohammad Masudur Rahman, Carlos Eduardo Dantas, Chanchal Roy, Foutse Khomh, and Marcelo A. Maia. 2021. Improved retrieval of programming solutions with code examples using a multi-featured score. Journal of Systems and Software 181 (2021), 111063. https://doi.org/10.1016/j.jss.2021.111063

Digital Library

[41]

Medha Umarji, Susan Elliott Sim, and Crista Lopes. 2008. Archetypal Internet-Scale Source Code Searching. In Open Source Development, Communities and Quality, Barbara Russo, Ernesto Damiani, Scott Hissam, Björn Lundell, and Giancarlo Succi (Eds.). Springer US, Boston, MA, 257–263.

[42]

Carmine Vassallo, Sebastiano Panichella, Fabio Palomba, Sebastian Proksch, Andy Zaidman, and Harald C. Gall. 2018. Context is king: The developer perspective on the usage of static analysis tools. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 38–49. https://doi.org/10.1109/SANER.2018.8330195

[43]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arxiv:1706.03762 [cs.CL]

[44]

Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E. Hassan, and Zhenchang Xing. 2017. What Do Developers Search for on the Web?22, 6 (Dec. 2017), 3149–3185.

Cited By

Index Terms

Assessing the Readability of ChatGPT Code Snippet Recommendations: A Comparative Study
1. Human-centered computing
  1. Visualization
2. Software and its engineering
  1. Software creation and management
  2. Software notations and tools

Index terms have been assigned to the content through auto-classification.

Recommendations

How Readable is Model-generated Code? Examining Readability and Visual Inspection of GitHub Copilot
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Background: Recent advancements in large language models have motivated the practical use of such models in code generation and program synthesis. However, little is known about the effects of such tools on code readability and visual attention in ...
Demystifying code snippets in code reviews: a study of the OpenStack and Qt communities and a practitioner survey
Abstract
Code review is widely known as one of the best practices for software quality assurance in software development. In a typical code review process, reviewers check the code committed by developers to ensure the quality of the code, during which ...
Usage and attribution of Stack Overflow code snippets in GitHub projects

Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of copyable code snippets. Using those snippets raises maintenance and legal issues. SO's license (CC BY-SA 3.0) requires attribution, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering

September 2023

570 pages

ISBN:9798400707872

DOI:10.1145/3613372

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SBES 2023

SBES 2023: XXXVII Brazilian Symposium on Software Engineering

September 25 - 29, 2023

Campo Grande, Brazil

Acceptance Rates

Overall Acceptance Rate 147 of 427 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
243
Total Downloads

Downloads (Last 12 months)243
Downloads (Last 6 weeks)15

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents