Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3636243.3636259acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesaus-ceConference Proceedingsconference-collections
research-article

Next-Step Hint Generation for Introductory Programming Using Large Language Models

Published: 29 January 2024 Publication History

Abstract

Large Language Models possess skills such as answering questions, writing essays or solving programming exercises. Since these models are easily accessible, researchers have investigated their capabilities and risks for programming education. This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints. We investigate prompt practices that lead to effective next-step hints and use these insights to build our StAP-tutor. We evaluate this tutor by conducting an experiment with students, and performing expert assessments. Our findings show that most LLM-generated feedback messages describe one specific next step and are personalised to the student’s code and approach. However, the hints may contain misleading information and lack sufficient detail when students approach the end of the assignment. This work demonstrates the potential for LLM-generated feedback, but further research is required to explore its practical implementation.

References

[1]
John R Anderson and Brian J Reiser. 1985. The LISP tutor. Byte 10, 4 (1985).
[2]
Brett A Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proc. of SIGCSE. 500–506.
[3]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
[4]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
[5]
Phillip Dawson, Michael Henderson, Paige Mahoney, Michael Phillips, Tracii Ryan, David Boud, and Elizabeth Molloy. 2019. What makes for effective feedback: Staff and student perspectives. Assess. & Eval. in Higher Ed. 44, 1 (2019), 25–36.
[6]
Galina Deeva, Daria Bogdanova, Estefanía Serral, Monique Snoeck, and Jochen De Weerdt. 2021. A review of automated feedback systems for learners: Classification framework, challenges and opportunities. Comp. & Ed. (2021).
[7]
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with copilot: Exploring prompt engineering for solving cs1 problems using natural language. In Proc. of SIGCSE. 1136–1142.
[8]
Paul Denny, James Prather, Brett A Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N Reeves, Eddie Antonio Santos, and Sami Sarsa. 2023. Computing Education in the Era of Generative AI. arXiv preprint arXiv:2306.02608 (2023).
[9]
Jean-Baptiste Döderlein, Mathieu Acher, Djamel Eddine Khelladi, and Benoit Combemale. 2022. Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic?arXiv e-prints (2022).
[10]
James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proc. of ACE. 10–19.
[11]
James Finnie-Ansley, Paul Denny, Andrew Luxton-Reilly, Eddie Antonio Santos, James Prather, and Brett A Becker. 2023. My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. In Proc. of ACE. 97–104.
[12]
Qiang Hao, David H Smith IV, Lu Ding, Amy Ko, Camille Ottaway, Jack Wilson, Kai H Arakawa, Alistair Turcan, Timothy Poehlman, and Tyler Greer. 2022. Towards understanding the effective design of automated formative feedback for programming assignments. Computer Science Education 32, 1 (2022), 105–127.
[13]
John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77, 1 (2007), 81–112.
[14]
Arto Hellas, Juho Leinonen, Sami Sarsa, Charles Koutcheme, Lilja Kujanpää, and Juha Sorva. 2023. Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests. In Proc. of ICER. 93–105.
[15]
Alastair Irons and Sam Elkington. 2021. Enhancing learning through formative assessment and feedback. Routledge.
[16]
Johan Jeuring, Hieke Keuning, Samiha Marwan, Dennis Bouvier, Cruz Izu, Natalie Kiesler, Teemu Lehtinen, Dominic Lohr, Andrew Peterson, and Sami Sarsa. 2022. Towards Giving Timely Formative Feedback and Hints to Novice Programmers. In ITiCSE Working Group Reports. 95–115.
[17]
Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proc. of CHI.
[18]
Hieke Keuning, Bastiaan Heeren, and Johan Jeuring. 2014. Strategy-based feedback in a programming tutor. In Proc. of CSERC. 43–54.
[19]
Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2018. A systematic literature review of automated feedback generation for programming exercises. ACM TOCE 19, 1 (2018), 1–43.
[20]
Natalie Kiesler, Dominic Lohr, and Hieke Keuning. 2023. Exploring the Potential of Large Language Models to Generate Formative Programming Feedback. arXiv preprint arXiv:2309.00029 (2023).
[21]
J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. Biometrics (1977), 159–174.
[22]
Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, and Brett A Becker. 2023. Using large language models to enhance programming error messages. In Proc. of SIGCSE. 563–569.
[23]
Elena Lyulina, Anastasiia Birillo, Vladimir Kovalenko, and Timofey Bryksin. 2021. TaskTracker-Tool: A Toolkit for Tracking of Code Snapshots and Activity Data During Solution of Programming Tasks. In Proc. of SIGCSE. 495–501.
[24]
Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from using code explanations generated by large language models in a web software development e-book. In Proc. of SIGCSE. 931–937.
[25]
Stephen MacNeil, Andrew Tran, Dan Mogil, Seth Bernstein, Erin Ross, and Ziheng Huang. 2022. Generating diverse code explanations using the gpt-3 large language model. In Proc. of ICER. 37–39.
[26]
Yana Malysheva and Caitlin Kelleher. 2022. An Algorithm for Generating Explainable Corrections to Student Code. In Proc. of Koli Calling. 1–11.
[27]
Samiha Marwan, Nicholas Lytle, Joseph Jay Williams, and Thomas Price. 2019. The impact of adding textual explanations to next-step hints in a novice programming environment. In Proc. of ITiCSE. 520–526.
[28]
Elham Mousavinasab, Nahid Zarifsanaiey, Sharareh R. Niakan Kalhori, Mahnaz Rakhshan, Leila Keikha, and Marjan Ghazi Saeedi. 2021. Intelligent tutoring systems: a systematic review of characteristics, applications, and evaluation methods. Interactive Learning Environments 29, 1 (2021), 142–163.
[29]
Susanne Narciss. 2008. Feedback strategies for interactive learning tasks. In Handbook of research on educ. communications and technology. 125–143.
[30]
OpenAI. 2023. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
[31]
Benjamin Paassen, Barbara Hammer, Thomas W Price, Tiffany Barnes, Sebastian Gross, Niels Pinkwart, 2018. The Continuous Hint Factory-Providing Hints in Vast and Sparsely Populated Edit Distance Spaces. JEDM 10, 1 (2018).
[32]
Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. arXiv preprint arXiv:2302.04662 (2023).
[33]
James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Peterson, Raymond Pettit, Brent N. Reeves, and Jaromir Savelka. 2023. The Robots are Here: Navigating the Generative AI Revolution in Computing Education. arXiv preprint arXiv:2310.00658 (2023).
[34]
James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. (2023).
[35]
Thomas W Price, Yihuan Dong, and Tiffany Barnes. 2016. Generating data-driven hints for open-ended programming.Int. Educ. Data Mining Society (2016).
[36]
Thomas W Price, Yihuan Dong, Rui Zhi, Benjamin Paaßen, Nicholas Lytle, Veronica Cateté, and Tiffany Barnes. 2019. A comparison of the quality of data-driven programming hint generation algorithms. International Journal of Artificial Intelligence in Education 29 (2019), 368–395.
[37]
Thomas W Price, Zhongxiu Liu, Veronica Cateté, and Tiffany Barnes. 2017. Factors influencing students’ help-seeking behavior while programming with human and computer tutors. In Proc. of ICER. 127–135.
[38]
Kelly Rivers and Kenneth Koedinger. 2017. Data-driven hint generation in vast solution spaces. Int. J. of Artificial Intelligence in Education 27 (2017), 37–64.
[39]
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proc. of ICER. 27–43.
[40]
Valerie J Shute. 2008. Focus on formative feedback. Rev. of ed. res. 78, 1 (2008).
[41]
Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI conference extended abstracts. 1–7.
[42]
Jialu Zhang, José Cambronero, Sumit Gulwani, Vu Le, Ruzica Piskac, Gustavo Soares, and Gust Verbruggen. 2022. Repairing Bugs in Python Assignments Using Large Language Models. arXiv preprint arXiv:2209.14876 (2022).

Cited By

View all
  • (2024)Strategy and Tactics for Introducing Generative Artificial Intelligence into the Instrumental Distance Learning System DL.GSU.BYDigital Transformation10.35596/1729-7648-2024-30-4-42-4930:4(42-49)Online publication date: 5-Dec-2024
  • (2024)Risk management strategy for generative AI in computing education: how to handle the strengths, weaknesses, opportunities, and threats?International Journal of Educational Technology in Higher Education10.1186/s41239-024-00494-x21:1Online publication date: 11-Dec-2024
  • (2024)Bringing Industry-Grade Code Quality and Practices into Software Engineering Education (Doctoral Consortium)Proceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699571(1-2)Online publication date: 12-Nov-2024
  • Show More Cited By

Index Terms

  1. Next-Step Hint Generation for Introductory Programming Using Large Language Models
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACE '24: Proceedings of the 26th Australasian Computing Education Conference
    January 2024
    208 pages
    ISBN:9798400716195
    DOI:10.1145/3636243
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 January 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Generative AI
    2. Large Language Models
    3. Next-step hints
    4. automated feedback
    5. learning programming

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACE 2024
    ACE 2024: Australian Computing Education Conference
    January 29 - February 2, 2024
    NSW, Sydney, Australia

    Acceptance Rates

    Overall Acceptance Rate 161 of 359 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)365
    • Downloads (Last 6 weeks)41
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Strategy and Tactics for Introducing Generative Artificial Intelligence into the Instrumental Distance Learning System DL.GSU.BYDigital Transformation10.35596/1729-7648-2024-30-4-42-4930:4(42-49)Online publication date: 5-Dec-2024
    • (2024)Risk management strategy for generative AI in computing education: how to handle the strengths, weaknesses, opportunities, and threats?International Journal of Educational Technology in Higher Education10.1186/s41239-024-00494-x21:1Online publication date: 11-Dec-2024
    • (2024)Bringing Industry-Grade Code Quality and Practices into Software Engineering Education (Doctoral Consortium)Proceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699571(1-2)Online publication date: 12-Nov-2024
    • (2024)One Step at a Time: Combining LLMs and Static Analysis to Generate Next-Step Hints for Programming TasksProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699556(1-12)Online publication date: 12-Nov-2024
    • (2024)Exploring Human-Centered Approaches in Generative AI and Introductory Programming Research: A Scoping ReviewProceedings of the 2024 Conference on United Kingdom & Ireland Computing Education Research10.1145/3689535.3689553(1-7)Online publication date: 5-Sep-2024
    • (2024)Propagating Large Language Models Programming FeedbackProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664665(366-370)Online publication date: 9-Jul-2024
    • (2024)Student Perspectives on Using a Large Language Model (LLM) for an Assignment on Professional EthicsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653624(478-484)Online publication date: 3-Jul-2024
    • (2024)Feedback-Generation for Programming Exercises With GPT-4Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653594(31-37)Online publication date: 3-Jul-2024
    • (2024)"Let Them Try to Figure It Out First" - Reasons Why Experts (Do Not) Provide Feedback to Novice ProgrammersProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653530(38-44)Online publication date: 3-Jul-2024
    • (2024)Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint NovicesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650937(1-10)Online publication date: 11-May-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media