Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3643991.3644926acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Free access

Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problems

Published: 02 July 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether code is authored by generative AI models. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit AI-generated code as their work. Our research explores the viability of using code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code. Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4. Our classifier outperforms baselines, with an F1-score and AUC-ROC score of 0.91. A variant of our classifier that excludes gameable features (e.g., empty lines, whitespace) still performs well with an F1-score and AUC-ROC score of 0.89. We also evaluated our classifier on the difficulty of the programming problem and found that there was almost no difference between easier and intermediate problems, and the classifier performed only slightly worse on harder problems. Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.

    References

    [1]
    [n. d.]. https://codequiry.com/
    [2]
    [n. d.]. https://copyleaks.com/
    [3]
    Simran Aggarwal. 2020. Software code analysis using ensemble learning techniques. In Proceedings of the 1st International Conference on Advanced Information Science and System (AISS '19). Association for Computing Machinery, New York, NY, USA, 1--7.
    [4]
    Alex Aiken. [n. d.]. MOSS. https://theory.stanford.edu/~aiken/moss/
    [5]
    Ibrahim Albluwi. [n. d.]. Plagiarism in Programming Assessments: A Systematic Review. ACM Transactions on Computing Education 20, 1 ([n. d.]).
    [6]
    Bander Alsulami, Edwin Dauber, Richard Harang, Spiros Mancoridis, and Rachel Greenstadt. 2017. Source Code Authorship Attribution Using Long Short-Term Memory Based Networks. In Computer Security - ESORICS 2017, Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Vol. 10492. Springer International Publishing, Cham, 65--82. Series Title: Lecture Notes in Computer Science.
    [7]
    Brett A. Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard - Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 500--506.
    [8]
    H. L. Berghel and D. L. Sallach. 1984. Measurements of Program Similarity in Identical Task Environments. SIGPLAN Not. 19, 8 (aug 1984), 65--76.
    [9]
    Jason R Briggs. 2012. Python for kids: A playful introduction to programming. no starch press.
    [10]
    Sufiyan Bukhari, Benjamin Tan, and Lorenzo De Carli. 2023. Distinguishing AI-and Human-Generated Code: a Case Study. (2023).
    [11]
    Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-anonymizing Programmers via Code Stylometry. (2015), 255--270.
    [12]
    Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). Association for Computing Machinery, New York, NY, USA, 785--794.
    [13]
    Robert Clarke and Thomas Lancaster. 2013. Commercial Aspects of Contract Cheating. In Proceedings of the ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE'13). 219--224.
    [14]
    Coderbyte. 2021. Detect candidates that cheat with AI / ChatGPT. https://help.coderbyte.com/knowledge/detect-candidates-that-cheat-with-ai-/-chatgpt
    [15]
    D. Coleman, D. Ash, B. Lowther, and P. Oman. 1994. Using metrics to evaluate software system maintainability. Computer 27, 8 (Aug 1994), 44--49.
    [16]
    Edwin Dauber, Aylin Caliskan, Richard Harang, and Rachel Greenstadt. 2018. Git Blame Who? Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (Gothenburg, Sweden) (ICSE '18). Association for Computing Machinery, New York, NY, USA, 356--357.
    [17]
    Thomas G Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation 10, 7 (1998), 1895--1923.
    [18]
    John L. Donaldson, Ann-Marie Lancaster, and Paula H. Sposato. 1981. A plagiarism detection system. In Proceedings of the twelfth SIGCSE technical symposium on Computer science education - SIGCSE '81. ACM Press, St. Louis, Missouri, United States, 21--25.
    [19]
    Wenyuan Dong, Zhiyong Feng, Hua Wei, and Hong Luo. 2020. A Novel Code Stylometry-based Code Clone Detection Strategy. In 2020 International Wireless Communications and Mobile Computing (IWCMC). 1516--1521. ISSN: 2376-6506.
    [20]
    Mojtaba Eshghie, Cyrille Artho, and Dilian Gurov. 2021. Dynamic Vulnerability Detection on Smart Contracts Using Machine Learning. In Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering (EASE '21). Association for Computing Machinery, New York, NY, USA, 305--312.
    [21]
    J.A.W. Faidhi and S.K. Robinson. 1987. An empirical approach for detecting program similarity and plagiarism within a university programming environment. Computers Education 11, 1 (1987), 11--19.
    [22]
    James Finnie-Ansley, Paul Denny, Brett Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. 10--19.
    [23]
    Sophia F. Frankel and Krishnendu Ghosh. 2021. Machine Learning Approaches for Authorship Attribution using Source Code Stylometry. In 2021 IEEE International Conference on Big Data (Big Data). 3298--3304.
    [24]
    Github. 2021. Copilot: Your AI Pair Programmer. Retrieved 9-October-2023 from https://github.com/features/copilot
    [25]
    Sam Grier. 1981. A tool that detects plagiarism in Pascal programs. ACM SIGCSE Bulletin 13, 1 (Feb. 1981), 15--20. Number: 1.
    [26]
    HackerRank. 2021. HackerRank Launches AI-Powered Plagiarism Detection. https://www.hackerrank.com/blog/hackerrank-launches-ai-powered-plagiarism-detection/
    [27]
    M. H. Halstead. 1972. Natural laws controlling algorithm structure? ACM SIGPLAN Notices 7, 2 (Feb. 1972), 19--26. Number: 2.
    [28]
    Pengnan Hao, Zhen Li, Cui Liu, Yu Wen, and Fanming Liu. 2022. Towards Improving Multiple Authorship Attribution of Source Code. In 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS). 516--526. ISSN: 2693-9177.
    [29]
    Cheng Jiao, Neel R Edupuganti, Parth A Patel, Tommy Bui, Veeral Sheth, and Neel Edupuganti. 2023. Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge. Cureus 15, 9 (2023).
    [30]
    Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, and Alina Matyukhina. 2019. Code Authorship Attribution: Methods and Challenges. Comput. Surveys 52, 1 (Feb. 2019), 3:1--3:36.
    [31]
    Gurpreet Kaur, Yasir Malik, Hamman Samuel, and Fehmi Jaafar. 2018. Detecting Blind Cross-Site Scripting Attacks Using Machine Learning. In Proceedings of the 2018 International Conference on Signal Processing and Machine Learning. ACM, Shanghai China, 22--25.
    [32]
    Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'23). Article 455, 23 pages.
    [33]
    Jorrit Kronjee, Arjen Hommersom, and Harald Vranken. 2018. Discovering software vulnerabilities using data-flow analysis and machine learning. In Proceedings of the 13th International Conference on Availability, Reliability and Security (ARES '18). Association for Computing Machinery, New York, NY, USA, 1--10.
    [34]
    Lov Kumar, Shashank Mouli Satapathy, and Lalita Bhanu Murthy. 2019. Method Level Refactoring Prediction on Five Open Source Java Projects using Machine Learning Techniques. In Proceedings of the 12th Innovations on Software Engineering Conference (formerly known as India Software Engineering Conference) (ISEC'19). Association for Computing Machinery, New York, NY, USA, 1--10.
    [35]
    Wanda M. Kunkle and Robert B. Allen. 2016. The Impact of Different Teaching Approaches and Languages on Student Learning of Introductory Programming Concepts. ACM Trans. Comput. Educ. 16, 1, Article 3 (jan 2016), 26 pages.
    [36]
    Sam Lau and Philip Guo. 2023. From "Ban It Till We Understand It" to "Resistance is Futile": How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools Such as ChatGPT and GitHub Copilot. In Proceedings of the ACM Conference on International Computing Education Research (ICER'23) - Volume 1. 106--121.
    [37]
    Ronald J. Leach. 1995. Using Metrics to Evaluate Student Programs. SIGCSE Bull. 27, 2 (jun 1995), 41--43.
    [38]
    Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. CoRR abs/1705.07874 (2017). arXiv:1705.07874 http://arxiv.org/abs/1705.07874
    [39]
    T.J. McCabe. 1976. A Complexity Measure. IEEE Transactions on Software Engineering SE-2, 4 (Dec. 1976), 308--320. Number: 4 Conference Name: IEEE Transactions on Software Engineering.
    [40]
    Aravind Nair, Karl Meinke, and Sigrid Eldh. 2019. Leveraging mutants for automatic prediction of metamorphic relations using machine learning. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE 2019). Association for Computing Machinery, New York, NY, USA, 1--6.
    [41]
    P. W. Oman and C. R. Cook. 1989. Programming style authorship analysis. In Proceedings of the seventeenth annual ACM conference on Computer science : Computing trends in the 1990's Computing trends in the 1990's - CSC '89. ACM Press, Louisville, Kentucky, United States, 320--326.
    [42]
    OpenAI. 2022. Introducing ChatGPT. Retrieved 9-October-2023 from https://openai.com/blog/chatgpt
    [43]
    Julia Opgen-Rhein, Bastian Küppers, and Ulrik Schroeder. 2019. Requirements for Author Verification in Electronic Computer Science Exams:. In Proceedings of the 11th International Conference on Computer Supported Education. SCITEPRESS - Science and Technology Publications, Heraklion, Crete, Greece, 432--439.
    [44]
    Manjula Peiris and James H. Hill. 2014. Towards detecting software performance anti-patterns using classification techniques. ACM SIGSOFT Software Engineering Notes 39, 1 (Feb. 2014), 1--4.
    [45]
    Ben Puryear and Gina Sprint. 2022. Github copilot in the classroom: learning to code with AI assistance. Journal of Computing Sciences in Colleges 38, 1 (2022), 37--47.
    [46]
    Amazon Web Services. 2023. What is CodeWhisperer? Retrieved 9-October-2023 from https://docs.aws.amazon.com/codewhisperer/latest/userguide/what-is-cwspr.html
    [47]
    Zhiyu Sun, Fang Peng, Junrui Guan, and Yanchun Sun. 2019. An Approach to Helping Developers Learn Open Source Projects Based on Machine Learning. In Proceedings of the 11th Asia-Pacific Symposium on Internetware (Fukuoka, Japan) (Internetware '19). Association for Computing Machinery, New York, NY, USA, Article 13, 10 pages.
    [48]
    Irene Tollin, Francesca Arcelli Fontana, Marco Zanoni, and Riccardo Roveda. 2017. Change Prediction through Coding Rules Violations. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering (EASE '17). Association for Computing Machinery, New York, NY, USA, 61--64.
    [49]
    Farhan Ullah, Sohail Jabbar, and Fadi Al-Turjman. 2020. Programmers' de-anonymization using a hybrid approach of abstract syntax tree and deep learning. Technological Forecasting and Social Change 159 (Oct. 2020), 120186.
    [50]
    Nickolay Viuginov, Petr Grachev, and Andrey Filchenkov. 2021. A Machine Learning Based Plagiarism Detection in Source Code. In Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence (ACAI '20). Association for Computing Machinery, New York, NY, USA, 1--6.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories
    April 2024
    788 pages
    ISBN:9798400705878
    DOI:10.1145/3643991
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 July 2024

    Check for updates

    Author Tags

    1. code stylometry
    2. ChatGPT
    3. AI code
    4. GPT-4 generated code
    5. authorship profiling
    6. software engineering

    Qualifiers

    • Research-article

    Conference

    MSR '24
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 20
      Total Downloads
    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)20

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media