Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3613372.3613413acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

Assessing the Readability of ChatGPT Code Snippet Recommendations: A Comparative Study

Published: 25 September 2023 Publication History

Abstract

Developers often rely on code search engines to find high-quality and reusable code snippets online, such as those available on Stack Overflow. Recently, ChatGPT, a language model trained for dialog tasks, has been gaining attention as a promising approach for code snippet generation. However, there is still a need for in-depth analysis of the quality of its recommendations. In this work, we propose the evaluation of the readability of code snippets generated by ChatGPT, comparing them with those recommended by CROKAGE, a state-of-the-art code search engine for Stack Overflow. We compare the recommended snippets of both approaches using readability issues raised by the automated static analysis tool (ASAT) SonarQube. Our results show that ChatGPT can generate cleaner code snippets and more consistent naming and code conventions than those written by humans and recommended by CROKAGE. However, in some cases, ChatGPT generates code that lacks recent features from Java API such as try-with-resources, lambdas, and others. Overall, our findings suggest that ChatGPT can provide valuable assistance to developers searching for didactic and high-quality code snippets online. However, it is still important for developers to review the generated code, either manually or assisted by an ASAT, to prevent potential readability issues, as the correctness of the generated code snippets.

References

[1]
Naser Al Madi. 2023. How Readable is Model-Generated Code? Examining Readability and Visual Inspection of GitHub Copilot. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 205, 5 pages. https://doi.org/10.1145/3551349.3560438
[2]
Sushil Bajracharya, Trung Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Lopes. 2006. Sourcerer: A Search Engine for Open Source Code Supporting Structure-Based Search. In Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications (Portland, Oregon, USA) (OOPSLA ’06). Association for Computing Machinery, New York, NY, USA, 681–682.
[3]
Marvin Muñoz Barón, Marvin Wyrich, and Stefan Wagner. 2020. An Empirical Validation of Cognitive Complexity as a Measure of Source Code Understandability. CoRR abs/2007.12520 (2020). arxiv:2007.12520
[4]
B. W. Boehm, J. R. Brown, and M. Lipow. 1976. Quantitative Evaluation of Software Quality. In Proceedings of the 2nd International Conference on Software Engineering (San Francisco, California, USA) (ICSE ’76). IEEE Computer Society Press, Washington, DC, USA, 592–605.
[5]
Raymond P.L. Buse and Westley R. Weimer. 2010. Learning a Metric for Code Readability. IEEE Transactions on Software Engineering 36, 4 (2010), 546–558.
[6]
Jürgen Börstler, Michael Caspersen, and Marie Nordström. 2015. Beauty and the Beast: on the readability of object-oriented example programs. Software Quality Journal 24 (02 2015). https://doi.org/10.1007/s11219-015-9267-5
[7]
G. Ann Campbell. 2018. Cognitive Complexity — An Overview and Evaluation. In 2018 IEEE/ACM International Conference on Technical Debt (TechDebt). 57–58.
[8]
Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, and Xiang Chen. 2021. Automated Query Reformulation for Efficient Search Based on Query Logs From Stack Overflow. 1273–1285. https://doi.org/10.1109/ICSE43902.2021.00116
[9]
Rodrigo Fernandes Gomes da Silva, Chanchal K. Roy, Mohammad Masudur Rahman, Kevin A. Schneider, Klérisson V. R. Paixão, Carlos Eduardo de Carvalho Dantas, and Marcelo de Almeida Maia. 2020. CROKAGE: effective solution recommendation for programming tasks by leveraging crowd knowledge. Empir. Softw. Eng. 25, 6 (2020), 4707–4758.
[10]
Carlos Eduardo C. Dantas, Adriano M. Rocha, and Marcelo A. Maia. 2023. Assessing the Readability of ChatGPT Code Snippet Recommendations: A Comparative Study. https://doi.org/10.5281/zenodo.7945900
[11]
Neil A. Ernst and Gabriele Bavota. 2022. AI-Driven Development Is Here: Should You Worry?IEEE Software 39, 2 (mar 2022), 106–110. https://doi.org/10.1109/ms.2021.3133805
[12]
Andre Hora. 2021. Characterizing top ranked code examples in Google. Journal of Systems and Software 178 (2021), 110971.
[13]
Andre Hora. 2021. Googling for Software Development: What Developers Search For and What They Find.
[14]
André C. Hora. 2021. APISonar: Mining API usage examples. Software: Practice and Experience 51 (2021), 319 – 352.
[15]
David Hovemeyer and William Pugh. 2004. Finding Bugs is Easy. SIGPLAN Not. 39, 12 (dec 2004), 92–106. https://doi.org/10.1145/1052883.1052895
[16]
Qiao Huang, Xin Xia, Zhenchang Xing, David Lo, and Xinyu Wang. 2018. API method recommendation without worrying about the task-API knowledge gap. In Proc. ASE. 293–304.
[17]
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why Don’t Software Developers Use Static Analysis Tools to Find Bugs?. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE ’13). IEEE Press, 672–681.
[18]
John Johnson, Sergio Lubo, Nishitha Yedla, Jairo Aponte, and Bonita Sharif. 2019. An Empirical Study Assessing Source Code Readability in Comprehension. 513–523. https://doi.org/10.1109/ICSME.2019.00085
[19]
Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting Working Code Examples. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 664–675.
[20]
John C. Knight and E. Ann Myers. 1991. Phased Inspections and Their Implementation. SIGSOFT Softw. Eng. Notes 16, 3 (jul 1991), 29–35. https://doi.org/10.1145/127099.127101
[21]
J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174.
[22]
Diego Marcilio, Rodrigo Bonifácio, Eduardo Monteiro, Edna Canedo, Welder Luz, and Gustavo Pinto. 2019. Are Static Analysis Violations Really Fixed? A Closer Look at Realistic Usage of SonarQube. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). 209–219. https://doi.org/10.1109/ICPC.2019.00040
[23]
Roberto Minelli, Andrea Mocci and, and Michele Lanza. 2015. I Know What You Did Last Summer: An Investigation of How Developers Spend Their Time(ICPC ’15). IEEE Press, 25–35.
[24]
Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrian Marcus. 2015. How Can I Use This Method?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (Florence, Italy) (ICSE ’15). IEEE Press, 880–890.
[25]
Nhan Nguyen and Sarah Nadi. 2022. An Empirical Evaluation of GitHub Copilot’s Code Suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 1–5. https://doi.org/10.1145/3524842.3528470
[26]
Flemming Nielson, Hanne R. Nielson, and Chris Hankin. 2010. Principles of Program Analysis. Springer Publishing Company, Incorporated.
[27]
Haoran Niu, Iman Keivanloo, and Ying Zou. 2017. Learning to Rank Code Examples for Code Search Engines. Empirical Softw. Engg. 22, 1 (Feb. 2017), 259–291.
[28]
Delano Oliveira, Reydne Bruno, Fernanda Madeiral, and Fernando Castor. 2020. Evaluating Code Readability and Legibility: An Examination of Human-centric Studies. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). 348–359.
[29]
Jevgenija Pantiuchina, Fiorella Zampetti, Simone Scalabrino, Valentina Piantadosi, Rocco Oliveto, Gabriele Bavota, and Massimiliano Di Penta. 2020. Why Developers Refactor Source Code: A Mining-Based Study. 29, 4, Article 29 (2020), 30 pages.
[30]
Valentina Piantadosi, Fabiana Fierro, Simone Scalabrino, Alexander Serebrenik, and Rocco Oliveto. 2020. How does code readability change during software evolution?Empirical Software Engineering 25 (11 2020), 1–39.
[31]
Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2016. Prompter: Turning the IDE into a self-confident programming assistant. Empirical Software Engineering 21 (2016), 2190–2231.
[32]
Daryl Posnett, Abram Hindle, and Premkumar Devanbu. 2011. A simpler model of software readability. Proceedings - International Conference on Software Engineering, 73–82.
[33]
Flesch R.1948. A new readability yardstick. Journal of Applied Psychology (1948), 221–233.
[34]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language Models are Unsupervised Multitask Learners. (2018). https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
[35]
Gerard K. Rambally. 1986. The influence of color on program readability and comprehensibility. In SIGCSE ’86.
[36]
Riccardo Rubei, Claudio Di Sipio, Phuong T. Nguyen, Juri Di Rocco, and Davide Di Ruscio. 2020. PostFinder: Mining Stack Overflow posts to support software developers. Information and Software Technology 127 (2020), 106367. https://doi.org/10.1016/j.infsof.2020.106367
[37]
Spencer Rugaber. 2000. The use of domain knowledge in program understanding. Annals of Software Engineering 9 (2000), 143–192.
[38]
Simone Scalabrino, Gabriele Bavota, Christopher Vendome, Mario Linares-Vásquez, Denys Poshyvanyk, and Rocco Oliveto. 2017. Automatically assessing code understandability: How far are we?. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 417–427.
[39]
Simone Scalabrino, Mario Linares‐Vásquez, Rocco Oliveto, and Denys Poshyvanyk. 2018. A comprehensive model for code readability. Journal of Software: Evolution and Process 30 (06 2018).
[40]
Rodrigo F. Silva, Mohammad Masudur Rahman, Carlos Eduardo Dantas, Chanchal Roy, Foutse Khomh, and Marcelo A. Maia. 2021. Improved retrieval of programming solutions with code examples using a multi-featured score. Journal of Systems and Software 181 (2021), 111063. https://doi.org/10.1016/j.jss.2021.111063
[41]
Medha Umarji, Susan Elliott Sim, and Crista Lopes. 2008. Archetypal Internet-Scale Source Code Searching. In Open Source Development, Communities and Quality, Barbara Russo, Ernesto Damiani, Scott Hissam, Björn Lundell, and Giancarlo Succi (Eds.). Springer US, Boston, MA, 257–263.
[42]
Carmine Vassallo, Sebastiano Panichella, Fabio Palomba, Sebastian Proksch, Andy Zaidman, and Harald C. Gall. 2018. Context is king: The developer perspective on the usage of static analysis tools. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 38–49. https://doi.org/10.1109/SANER.2018.8330195
[43]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arxiv:1706.03762 [cs.CL]
[44]
Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E. Hassan, and Zhenchang Xing. 2017. What Do Developers Search for on the Web?22, 6 (Dec. 2017), 3149–3185.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering
September 2023
570 pages
ISBN:9798400707872
DOI:10.1145/3613372
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ChatGPT
  2. SonarQube
  3. Stack Overflow
  4. code snippets
  5. readability

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SBES 2023
SBES 2023: XXXVII Brazilian Symposium on Software Engineering
September 25 - 29, 2023
Campo Grande, Brazil

Acceptance Rates

Overall Acceptance Rate 147 of 427 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 243
    Total Downloads
  • Downloads (Last 12 months)243
  • Downloads (Last 6 weeks)15
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media