research-article

Free access

Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problems

Authors:

Oseremen Joy Idialu,

Noble Saji Mathews,

Rungroj Maipradit,

Joanne M. Atlee, and

Mei NagappanAuthors Info & Claims

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

April 2024

Pages 394 - 406

https://doi.org/10.1145/3643991.3644926

Published: 02 July 2024 Publication History

Abstract

Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether code is authored by generative AI models. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit AI-generated code as their work. Our research explores the viability of using code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code. Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4. Our classifier outperforms baselines, with an F1-score and AUC-ROC score of 0.91. A variant of our classifier that excludes gameable features (e.g., empty lines, whitespace) still performs well with an F1-score and AUC-ROC score of 0.89. We also evaluated our classifier on the difficulty of the programming problem and found that there was almost no difference between easier and intermediate problems, and the classifier performed only slightly worse on harder problems. Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.

References

[1]

[n. d.]. https://codequiry.com/

[2]

[n. d.]. https://copyleaks.com/

[3]

Simran Aggarwal. 2020. Software code analysis using ensemble learning techniques. In Proceedings of the 1st International Conference on Advanced Information Science and System (AISS '19). Association for Computing Machinery, New York, NY, USA, 1--7.

Digital Library

[4]

Alex Aiken. [n. d.]. MOSS. https://theory.stanford.edu/~aiken/moss/

[5]

Ibrahim Albluwi. [n. d.]. Plagiarism in Programming Assessments: A Systematic Review. ACM Transactions on Computing Education 20, 1 ([n. d.]).

[6]

Bander Alsulami, Edwin Dauber, Richard Harang, Spiros Mancoridis, and Rachel Greenstadt. 2017. Source Code Authorship Attribution Using Long Short-Term Memory Based Networks. In Computer Security - ESORICS 2017, Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Vol. 10492. Springer International Publishing, Cham, 65--82. Series Title: Lecture Notes in Computer Science.

[7]

Brett A. Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard - Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 500--506.

Digital Library

[8]

H. L. Berghel and D. L. Sallach. 1984. Measurements of Program Similarity in Identical Task Environments. SIGPLAN Not. 19, 8 (aug 1984), 65--76.

Digital Library

[9]

Jason R Briggs. 2012. Python for kids: A playful introduction to programming. no starch press.

[10]

Sufiyan Bukhari, Benjamin Tan, and Lorenzo De Carli. 2023. Distinguishing AI-and Human-Generated Code: a Case Study. (2023).

[11]

Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-anonymizing Programmers via Code Stylometry. (2015), 255--270.

[12]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). Association for Computing Machinery, New York, NY, USA, 785--794.

Digital Library

[13]

Robert Clarke and Thomas Lancaster. 2013. Commercial Aspects of Contract Cheating. In Proceedings of the ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE'13). 219--224.

Digital Library

[14]

Coderbyte. 2021. Detect candidates that cheat with AI / ChatGPT. https://help.coderbyte.com/knowledge/detect-candidates-that-cheat-with-ai-/-chatgpt

[15]

D. Coleman, D. Ash, B. Lowther, and P. Oman. 1994. Using metrics to evaluate software system maintainability. Computer 27, 8 (Aug 1994), 44--49.

Digital Library

[16]

Edwin Dauber, Aylin Caliskan, Richard Harang, and Rachel Greenstadt. 2018. Git Blame Who? Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (Gothenburg, Sweden) (ICSE '18). Association for Computing Machinery, New York, NY, USA, 356--357.

Digital Library

[17]

Thomas G Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation 10, 7 (1998), 1895--1923.

[18]

John L. Donaldson, Ann-Marie Lancaster, and Paula H. Sposato. 1981. A plagiarism detection system. In Proceedings of the twelfth SIGCSE technical symposium on Computer science education - SIGCSE '81. ACM Press, St. Louis, Missouri, United States, 21--25.

Digital Library

[19]

Wenyuan Dong, Zhiyong Feng, Hua Wei, and Hong Luo. 2020. A Novel Code Stylometry-based Code Clone Detection Strategy. In 2020 International Wireless Communications and Mobile Computing (IWCMC). 1516--1521. ISSN: 2376-6506.

[20]

Mojtaba Eshghie, Cyrille Artho, and Dilian Gurov. 2021. Dynamic Vulnerability Detection on Smart Contracts Using Machine Learning. In Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering (EASE '21). Association for Computing Machinery, New York, NY, USA, 305--312.

Digital Library

[21]

J.A.W. Faidhi and S.K. Robinson. 1987. An empirical approach for detecting program similarity and plagiarism within a university programming environment. Computers Education 11, 1 (1987), 11--19.

Digital Library

[22]

James Finnie-Ansley, Paul Denny, Brett Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. 10--19.

Digital Library

[23]

Sophia F. Frankel and Krishnendu Ghosh. 2021. Machine Learning Approaches for Authorship Attribution using Source Code Stylometry. In 2021 IEEE International Conference on Big Data (Big Data). 3298--3304.

[24]

Github. 2021. Copilot: Your AI Pair Programmer. Retrieved 9-October-2023 from https://github.com/features/copilot

[25]

Sam Grier. 1981. A tool that detects plagiarism in Pascal programs. ACM SIGCSE Bulletin 13, 1 (Feb. 1981), 15--20. Number: 1.

Digital Library

[26]

HackerRank. 2021. HackerRank Launches AI-Powered Plagiarism Detection. https://www.hackerrank.com/blog/hackerrank-launches-ai-powered-plagiarism-detection/

[27]

M. H. Halstead. 1972. Natural laws controlling algorithm structure? ACM SIGPLAN Notices 7, 2 (Feb. 1972), 19--26. Number: 2.

Digital Library

[28]

Pengnan Hao, Zhen Li, Cui Liu, Yu Wen, and Fanming Liu. 2022. Towards Improving Multiple Authorship Attribution of Source Code. In 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS). 516--526. ISSN: 2693-9177.

[29]

Cheng Jiao, Neel R Edupuganti, Parth A Patel, Tommy Bui, Veeral Sheth, and Neel Edupuganti. 2023. Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge. Cureus 15, 9 (2023).

[30]

Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, and Alina Matyukhina. 2019. Code Authorship Attribution: Methods and Challenges. Comput. Surveys 52, 1 (Feb. 2019), 3:1--3:36.

Digital Library

[31]

Gurpreet Kaur, Yasir Malik, Hamman Samuel, and Fehmi Jaafar. 2018. Detecting Blind Cross-Site Scripting Attacks Using Machine Learning. In Proceedings of the 2018 International Conference on Signal Processing and Machine Learning. ACM, Shanghai China, 22--25.

Digital Library

[32]

Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'23). Article 455, 23 pages.

Digital Library

[33]

Jorrit Kronjee, Arjen Hommersom, and Harald Vranken. 2018. Discovering software vulnerabilities using data-flow analysis and machine learning. In Proceedings of the 13th International Conference on Availability, Reliability and Security (ARES '18). Association for Computing Machinery, New York, NY, USA, 1--10.

Digital Library

[34]

Lov Kumar, Shashank Mouli Satapathy, and Lalita Bhanu Murthy. 2019. Method Level Refactoring Prediction on Five Open Source Java Projects using Machine Learning Techniques. In Proceedings of the 12th Innovations on Software Engineering Conference (formerly known as India Software Engineering Conference) (ISEC'19). Association for Computing Machinery, New York, NY, USA, 1--10.

Digital Library

[35]

Wanda M. Kunkle and Robert B. Allen. 2016. The Impact of Different Teaching Approaches and Languages on Student Learning of Introductory Programming Concepts. ACM Trans. Comput. Educ. 16, 1, Article 3 (jan 2016), 26 pages.

Digital Library

[36]

Sam Lau and Philip Guo. 2023. From "Ban It Till We Understand It" to "Resistance is Futile": How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools Such as ChatGPT and GitHub Copilot. In Proceedings of the ACM Conference on International Computing Education Research (ICER'23) - Volume 1. 106--121.

[37]

Ronald J. Leach. 1995. Using Metrics to Evaluate Student Programs. SIGCSE Bull. 27, 2 (jun 1995), 41--43.

Digital Library

[38]

Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. CoRR abs/1705.07874 (2017). arXiv:1705.07874 http://arxiv.org/abs/1705.07874

Digital Library

[39]

T.J. McCabe. 1976. A Complexity Measure. IEEE Transactions on Software Engineering SE-2, 4 (Dec. 1976), 308--320. Number: 4 Conference Name: IEEE Transactions on Software Engineering.

Digital Library

[40]

Aravind Nair, Karl Meinke, and Sigrid Eldh. 2019. Leveraging mutants for automatic prediction of metamorphic relations using machine learning. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE 2019). Association for Computing Machinery, New York, NY, USA, 1--6.

Digital Library

[41]

P. W. Oman and C. R. Cook. 1989. Programming style authorship analysis. In Proceedings of the seventeenth annual ACM conference on Computer science : Computing trends in the 1990's Computing trends in the 1990's - CSC '89. ACM Press, Louisville, Kentucky, United States, 320--326.

Digital Library

[42]

OpenAI. 2022. Introducing ChatGPT. Retrieved 9-October-2023 from https://openai.com/blog/chatgpt

[43]

Julia Opgen-Rhein, Bastian Küppers, and Ulrik Schroeder. 2019. Requirements for Author Verification in Electronic Computer Science Exams:. In Proceedings of the 11th International Conference on Computer Supported Education. SCITEPRESS - Science and Technology Publications, Heraklion, Crete, Greece, 432--439.

[44]

Manjula Peiris and James H. Hill. 2014. Towards detecting software performance anti-patterns using classification techniques. ACM SIGSOFT Software Engineering Notes 39, 1 (Feb. 2014), 1--4.

Digital Library

[45]

Ben Puryear and Gina Sprint. 2022. Github copilot in the classroom: learning to code with AI assistance. Journal of Computing Sciences in Colleges 38, 1 (2022), 37--47.

Digital Library

[46]

Amazon Web Services. 2023. What is CodeWhisperer? Retrieved 9-October-2023 from https://docs.aws.amazon.com/codewhisperer/latest/userguide/what-is-cwspr.html

[47]

Zhiyu Sun, Fang Peng, Junrui Guan, and Yanchun Sun. 2019. An Approach to Helping Developers Learn Open Source Projects Based on Machine Learning. In Proceedings of the 11th Asia-Pacific Symposium on Internetware (Fukuoka, Japan) (Internetware '19). Association for Computing Machinery, New York, NY, USA, Article 13, 10 pages.

Digital Library

[48]

Irene Tollin, Francesca Arcelli Fontana, Marco Zanoni, and Riccardo Roveda. 2017. Change Prediction through Coding Rules Violations. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering (EASE '17). Association for Computing Machinery, New York, NY, USA, 61--64.

Digital Library

[49]

Farhan Ullah, Sohail Jabbar, and Fadi Al-Turjman. 2020. Programmers' de-anonymization using a hybrid approach of abstract syntax tree and deep learning. Technological Forecasting and Social Change 159 (Oct. 2020), 120186.

[50]

Nickolay Viuginov, Petr Grachev, and Andrey Filchenkov. 2021. A Machine Learning Based Plagiarism Detection in Source Code. In Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence (ACAI '20). Association for Computing Machinery, New York, NY, USA, 1--6.

Digital Library

Recommendations

Distinguishing AI- and Human-Generated Code: A Case Study
SCORED '23: Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses

While the use of AI assistants for code generation has the potential to revolutionize the way software is produced, assistants may generate insecure code, either by accident or as a result of poisoning attacks. They may also inadvertently violate ...
Read More
Assessing the Readability of ChatGPT Code Snippet Recommendations: A Comparative Study
SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering

Developers often rely on code search engines to find high-quality and reusable code snippets online, such as those available on Stack Overflow. Recently, ChatGPT, a language model trained for dialog tasks, has been gaining attention as a promising ...
Read More
Debugging optimized code without being misled
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

April 2024

788 pages

ISBN:9798400705878

DOI:10.1145/3643991

Chair:
Diomidis Spinellis,
Program Chair:
Alberto Bacchelli,
Program Co-chair:
Eleni Constantinou

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MSR '24

Sponsor:

SIGSOFT

MSR '24: 21st International Conference on Mining Software Repositories

April 15 - 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
20
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)20

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents