research-article

Next-Step Hint Generation for Introductory Programming Using Large Language Models

Authors:

Johan JeuringAuthors Info & Claims

ACE '24: Proceedings of the 26th Australasian Computing Education Conference

Pages 144 - 153

https://doi.org/10.1145/3636243.3636259

Published: 29 January 2024 Publication History

Abstract

Large Language Models possess skills such as answering questions, writing essays or solving programming exercises. Since these models are easily accessible, researchers have investigated their capabilities and risks for programming education. This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints. We investigate prompt practices that lead to effective next-step hints and use these insights to build our StAP-tutor. We evaluate this tutor by conducting an experiment with students, and performing expert assessments. Our findings show that most LLM-generated feedback messages describe one specific next step and are personalised to the student’s code and approach. However, the hints may contain misleading information and lack sufficient detail when students approach the end of the assignment. This work demonstrates the potential for LLM-generated feedback, but further research is required to explore its practical implementation.

References

[1]

John R Anderson and Brian J Reiser. 1985. The LISP tutor. Byte 10, 4 (1985).

[2]

Brett A Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proc. of SIGCSE. 500–506.

Digital Library

[3]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[4]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).

[5]

Phillip Dawson, Michael Henderson, Paige Mahoney, Michael Phillips, Tracii Ryan, David Boud, and Elizabeth Molloy. 2019. What makes for effective feedback: Staff and student perspectives. Assess. & Eval. in Higher Ed. 44, 1 (2019), 25–36.

[6]

Galina Deeva, Daria Bogdanova, Estefanía Serral, Monique Snoeck, and Jochen De Weerdt. 2021. A review of automated feedback systems for learners: Classification framework, challenges and opportunities. Comp. & Ed. (2021).

[7]

Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with copilot: Exploring prompt engineering for solving cs1 problems using natural language. In Proc. of SIGCSE. 1136–1142.

Digital Library

[8]

Paul Denny, James Prather, Brett A Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N Reeves, Eddie Antonio Santos, and Sami Sarsa. 2023. Computing Education in the Era of Generative AI. arXiv preprint arXiv:2306.02608 (2023).

[9]

Jean-Baptiste Döderlein, Mathieu Acher, Djamel Eddine Khelladi, and Benoit Combemale. 2022. Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic?arXiv e-prints (2022).

[10]

James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proc. of ACE. 10–19.

Digital Library

[11]

James Finnie-Ansley, Paul Denny, Andrew Luxton-Reilly, Eddie Antonio Santos, James Prather, and Brett A Becker. 2023. My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. In Proc. of ACE. 97–104.

[12]

Qiang Hao, David H Smith IV, Lu Ding, Amy Ko, Camille Ottaway, Jack Wilson, Kai H Arakawa, Alistair Turcan, Timothy Poehlman, and Tyler Greer. 2022. Towards understanding the effective design of automated formative feedback for programming assignments. Computer Science Education 32, 1 (2022), 105–127.

[13]

John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77, 1 (2007), 81–112.

[14]

Arto Hellas, Juho Leinonen, Sami Sarsa, Charles Koutcheme, Lilja Kujanpää, and Juha Sorva. 2023. Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests. In Proc. of ICER. 93–105.

[15]

Alastair Irons and Sam Elkington. 2021. Enhancing learning through formative assessment and feedback. Routledge.

[16]

Johan Jeuring, Hieke Keuning, Samiha Marwan, Dennis Bouvier, Cruz Izu, Natalie Kiesler, Teemu Lehtinen, Dominic Lohr, Andrew Peterson, and Sami Sarsa. 2022. Towards Giving Timely Formative Feedback and Hints to Novice Programmers. In ITiCSE Working Group Reports. 95–115.

[17]

Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proc. of CHI.

Digital Library

[18]

Hieke Keuning, Bastiaan Heeren, and Johan Jeuring. 2014. Strategy-based feedback in a programming tutor. In Proc. of CSERC. 43–54.

Digital Library

[19]

Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2018. A systematic literature review of automated feedback generation for programming exercises. ACM TOCE 19, 1 (2018), 1–43.

[20]

Natalie Kiesler, Dominic Lohr, and Hieke Keuning. 2023. Exploring the Potential of Large Language Models to Generate Formative Programming Feedback. arXiv preprint arXiv:2309.00029 (2023).

[21]

J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. Biometrics (1977), 159–174.

[22]

Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, and Brett A Becker. 2023. Using large language models to enhance programming error messages. In Proc. of SIGCSE. 563–569.

Digital Library

[23]

Elena Lyulina, Anastasiia Birillo, Vladimir Kovalenko, and Timofey Bryksin. 2021. TaskTracker-Tool: A Toolkit for Tracking of Code Snapshots and Activity Data During Solution of Programming Tasks. In Proc. of SIGCSE. 495–501.

Digital Library

[24]

Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from using code explanations generated by large language models in a web software development e-book. In Proc. of SIGCSE. 931–937.

Digital Library

[25]

Stephen MacNeil, Andrew Tran, Dan Mogil, Seth Bernstein, Erin Ross, and Ziheng Huang. 2022. Generating diverse code explanations using the gpt-3 large language model. In Proc. of ICER. 37–39.

Digital Library

[26]

Yana Malysheva and Caitlin Kelleher. 2022. An Algorithm for Generating Explainable Corrections to Student Code. In Proc. of Koli Calling. 1–11.

Digital Library

[27]

Samiha Marwan, Nicholas Lytle, Joseph Jay Williams, and Thomas Price. 2019. The impact of adding textual explanations to next-step hints in a novice programming environment. In Proc. of ITiCSE. 520–526.

Digital Library

[28]

Elham Mousavinasab, Nahid Zarifsanaiey, Sharareh R. Niakan Kalhori, Mahnaz Rakhshan, Leila Keikha, and Marjan Ghazi Saeedi. 2021. Intelligent tutoring systems: a systematic review of characteristics, applications, and evaluation methods. Interactive Learning Environments 29, 1 (2021), 142–163.

[29]

Susanne Narciss. 2008. Feedback strategies for interactive learning tasks. In Handbook of research on educ. communications and technology. 125–143.

[30]

OpenAI. 2023. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).

[31]

Benjamin Paassen, Barbara Hammer, Thomas W Price, Tiffany Barnes, Sebastian Gross, Niels Pinkwart, 2018. The Continuous Hint Factory-Providing Hints in Vast and Sparsely Populated Edit Distance Spaces. JEDM 10, 1 (2018).

[32]

Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. arXiv preprint arXiv:2302.04662 (2023).

[33]

James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Peterson, Raymond Pettit, Brent N. Reeves, and Jaromir Savelka. 2023. The Robots are Here: Navigating the Generative AI Revolution in Computing Education. arXiv preprint arXiv:2310.00658 (2023).

[34]

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. (2023).

[35]

Thomas W Price, Yihuan Dong, and Tiffany Barnes. 2016. Generating data-driven hints for open-ended programming.Int. Educ. Data Mining Society (2016).

[36]

Thomas W Price, Yihuan Dong, Rui Zhi, Benjamin Paaßen, Nicholas Lytle, Veronica Cateté, and Tiffany Barnes. 2019. A comparison of the quality of data-driven programming hint generation algorithms. International Journal of Artificial Intelligence in Education 29 (2019), 368–395.

[37]

Thomas W Price, Zhongxiu Liu, Veronica Cateté, and Tiffany Barnes. 2017. Factors influencing students’ help-seeking behavior while programming with human and computer tutors. In Proc. of ICER. 127–135.

[38]

Kelly Rivers and Kenneth Koedinger. 2017. Data-driven hint generation in vast solution spaces. Int. J. of Artificial Intelligence in Education 27 (2017), 37–64.

[39]

Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proc. of ICER. 27–43.

Digital Library

[40]

Valerie J Shute. 2008. Focus on formative feedback. Rev. of ed. res. 78, 1 (2008).

[41]

Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI conference extended abstracts. 1–7.

Digital Library

[42]

Jialu Zhang, José Cambronero, Sumit Gulwani, Vu Le, Ruzica Piskac, Gustavo Soares, and Gust Verbruggen. 2022. Repairing Bugs in Python Assignments Using Large Language Models. arXiv preprint arXiv:2209.14876 (2022).

Cited By

Dolinsky M(2024)Strategy and Tactics for Introducing Generative Artificial Intelligence into the Instrumental Distance Learning System DL.GSU.BYDigital Transformation10.35596/1729-7648-2024-30-4-42-4930:4(42-49)Online publication date: 5-Dec-2024
https://doi.org/10.35596/1729-7648-2024-30-4-42-49
Humble N(2024)Risk management strategy for generative AI in computing education: how to handle the strengths, weaknesses, opportunities, and threats?International Journal of Educational Technology in Higher Education10.1186/s41239-024-00494-x21:1Online publication date: 11-Dec-2024
https://doi.org/10.1186/s41239-024-00494-x
Birillo A(2024)Bringing Industry-Grade Code Quality and Practices into Software Engineering Education (Doctoral Consortium)Proceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699571(1-2)Online publication date: 12-Nov-2024
https://dl.acm.org/doi/10.1145/3699538.3699571
Show More Cited By

Index Terms

Next-Step Hint Generation for Introductory Programming Using Large Language Models
1. Applied computing
  1. Education
    1. Interactive learning environments
2. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs
        Computer science education
        CS1

Index terms have been assigned to the content through auto-classification.

Recommendations

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Timely, personalized feedback is essential for students learning programming. LLM-powered tools like ChatGPT offer instant support, but reveal direct answers with code, which may hinder deep conceptual engagement. We developed CodeAid, an LLM-powered ...
Effects of Automated Feedback in Scratch Programming Tutorials
ITiCSE 2023: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

Block-based programming languages like Scratch are commonly used to introduce young learners to programming. While coding, learners may encounter problems, which may require teachers to intervene. However, teachers may be overwhelmed with help requests ...
Guiding Next-Step Hint Generation Using Automated Tests
ITiCSE '21: Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1

Learning basic programming with Scratch can be hard for novices and tutors alike: Students may not know how to advance when solving a task, teachers may face classrooms with many raised hands at a time, and the problem is exacerbated when novices are on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACE '24: Proceedings of the 26th Australasian Computing Education Conference

January 2024

208 pages

ISBN:9798400716195

DOI:10.1145/3636243

Editors:
Nicole Herbert
University of Tasmania
,
Carolyn Seton
Southern Cross University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ACE 2024

ACE 2024: Australian Computing Education Conference

January 29 - February 2, 2024

NSW, Sydney, Australia

Acceptance Rates

Overall Acceptance Rate 161 of 359 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
365
Total Downloads

Downloads (Last 12 months)365
Downloads (Last 6 weeks)41

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dolinsky M(2024)Strategy and Tactics for Introducing Generative Artificial Intelligence into the Instrumental Distance Learning System DL.GSU.BYDigital Transformation10.35596/1729-7648-2024-30-4-42-4930:4(42-49)Online publication date: 5-Dec-2024
https://doi.org/10.35596/1729-7648-2024-30-4-42-49
Humble N(2024)Risk management strategy for generative AI in computing education: how to handle the strengths, weaknesses, opportunities, and threats?International Journal of Educational Technology in Higher Education10.1186/s41239-024-00494-x21:1Online publication date: 11-Dec-2024
https://doi.org/10.1186/s41239-024-00494-x
Birillo A(2024)Bringing Industry-Grade Code Quality and Practices into Software Engineering Education (Doctoral Consortium)Proceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699571(1-2)Online publication date: 12-Nov-2024
https://dl.acm.org/doi/10.1145/3699538.3699571
Birillo AArtser EPotriasaeva AVlasov IDzialets KGolubev YGerasimov IKeuning HBryksin T(2024)One Step at a Time: Combining LLMs and Static Analysis to Generate Next-Step Hints for Programming TasksProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699556(1-12)Online publication date: 12-Nov-2024
https://dl.acm.org/doi/10.1145/3699538.3699556
Stone I(2024)Exploring Human-Centered Approaches in Generative AI and Introductory Programming Research: A Scoping ReviewProceedings of the 2024 Conference on United Kingdom & Ireland Computing Education Research10.1145/3689535.3689553(1-7)Online publication date: 5-Sep-2024
https://dl.acm.org/doi/10.1145/3689535.3689553
Koutcheme CHellas AJoyner DKim MWang XXia M(2024)Propagating Large Language Models Programming FeedbackProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664665(366-370)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664665
Grande VKiesler NFrancisco R. MMonga MLonati VBarendsen ESheard JPaterson J(2024)Student Perspectives on Using a Large Language Model (LLM) for an Assignment on Professional EthicsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653624(478-484)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653624
Azaiz IKiesler NStrickroth SMonga MLonati VBarendsen ESheard JPaterson J(2024)Feedback-Generation for Programming Exercises With GPT-4Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653594(31-37)Online publication date: 3-Jul-2024
https://doi.org/10.1145/3649217.3653594
Lohr DKiesler NKeuning HJeuring JMonga MLonati VBarendsen ESheard JPaterson J(2024)"Let Them Try to Figure It Out First" - Reasons Why Experts (Do Not) Provide Feedback to Novice ProgrammersProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653530(38-44)Online publication date: 3-Jul-2024
https://doi.org/10.1145/3649217.3653530
Xiao RHou XStamper J(2024)Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint NovicesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650937(1-10)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650937
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents