research-article

Open access

CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes

Authors:

Jaromir Savelka,

Paul DennyAuthors Info & Claims

Koli Calling '23: Proceedings of the 23rd Koli Calling International Conference on Computing Education Research

Article No.: 8, Pages 1 - 11

https://doi.org/10.1145/3631802.3631830

Published: 06 February 2024 Publication History

All formats PDF

Abstract

Computing educators face significant challenges in providing timely support to students, especially in large class settings. Large language models (LLMs) have emerged recently and show great promise for providing on-demand help at a large scale, but there are concerns that students may over-rely on the outputs produced by these models. In this paper, we introduce CodeHelp, a novel LLM-powered tool designed with guardrails to provide on-demand assistance to programming students without directly revealing solutions. We detail the design of the tool, which incorporates a number of useful features for instructors, and elaborate on the pipeline of prompting strategies we use to ensure generated outputs are suitable for students. To evaluate CodeHelp, we deployed it in a first-year computer and data science course with 52 students and collected student interactions over a 12-week period. We examine students’ usage patterns and perceptions of the tool, and we report reflections from the course instructor and a series of recommendations for classroom use. Our findings suggest that CodeHelp is well-received by students who especially value its availability and help with resolving errors, and that for instructors it is easy to deploy and complements, rather than replaces, the support that they provide to students.

References

[1]

Brett A Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 500–506.

Digital Library

[2]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (2006), 77–101. https://doi.org/10.1191/1478088706qp063oa

[3]

Peter Brusilovsky, Barbara J Ericson, Cay S Horstmann, and Christian Servin. 2023. The Future of Computing Education Materials. (2023).

[4]

Gustavo Carreira, Leonardo Silva, Antonio Jose Mendes, and Hugo Goncalo Oliveira. 2022. Pyo, a Chatbot Assistant for Introductory Programming Students. In 2022 International Symposium on Computers in Education (SIIE). IEEE, Coimbra, Portugal, 1–6. https://doi.org/10.1109/SIIE56031.2022.9982349

[5]

Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, and Weizhu Chen. 2022. CodeT: Code Generation with Generated Tests. arxiv:2207.10397 [cs.CL]

[6]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arxiv:2107.03374 [cs.LG]

[7]

Jonathan E Collins. 2023. Policy Solutions: Policy questions for ChatGPT and artificial intelligence. Phi Delta Kappan 104, 7 (2023), 60–61.

[8]

Tyne Crow, Andrew Luxton-Reilly, and Burkhard Wuensche. 2018. Intelligent tutoring systems for programming education: a systematic review. In Proceedings of the 20th Australasian Computing Education Conference. ACM, Brisbane Queensland Australia, 53–62. https://doi.org/10.1145/3160489.3160492

Digital Library

[9]

Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. ACM, Toronto ON Canada, 1136–1142. https://doi.org/10.1145/3545945.3569823

Digital Library

[10]

Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2023. Promptly: Using Prompt Problems to Teach Learners How to Effectively Utilize AI Code Generators. arxiv:2307.16364 [cs.HC]

[11]

Paul Denny, James Prather, Brett A. Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N. Reeves, Eddie Antonio Santos, and Sami Sarsa. 2023. Computing Education in the Era of Generative AI. arxiv:2306.02608 [cs.CY]

[12]

James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The robots are coming: Exploring the implications of openai codex on introductory programming. In Proceedings of the 24th Australasian Computing Education Conference. 10–19. https://doi.org/10.1145/3511861.3511863

Digital Library

[13]

Zhikai Gao, Sarah Heckman, and Collin Lynch. 2022. Who Uses Office Hours? A Comparison of In-Person and Virtual Office Hours Utilization. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1 (Providence, RI, USA) (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 300–306. https://doi.org/10.1145/3478431.3499334

Digital Library

[14]

Arto Hellas, Juho Leinonen, Sami Sarsa, Charles Koutcheme, Lilja Kujanpää, and Juha Sorva. 2023. Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests. arxiv:2306.05715 [cs.CY]

[15]

Sajed Jalil, Suzzana Rafi, Thomas D. LaToza, Kevin Moran, and Wing Lam. 2023. ChatGPT and Software Testing Education: Promises & Perils. In 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE. https://doi.org/10.1109/icstw58534.2023.00078 arXiv:arXiv:2302.03287

[16]

Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stepha Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103 (2023), 102274. https://doi.org/10.1016/j.lindif.2023.102274

[17]

Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 455, 23 pages. https://doi.org/10.1145/3544548.3580919

Digital Library

[18]

Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2019. A Systematic Literature Review of Automated Feedback Generation for Programming Exercises. ACM Transactions on Computing Education 19, 1 (March 2019), 1–43. https://doi.org/10.1145/3231711

Digital Library

[19]

Mario Konecki, Nikola Kadoic, and Rok Piltaver. 2015. Intelligent assistant for helping students to learn programming. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, Opatija, Croatia, 924–928. https://doi.org/10.1109/MIPRO.2015.7160406

[20]

Charles Koutcheme, Sami Sarsa, Juho Leinonen, Arto Hellas, and Paul Denny. 2023. Automated Program Repair Using Generative Models for Code Infilling. In Artificial Intelligence in Education, Ning Wang, Genaro Rebolledo-Mendez, Noboru Matsuda, Olga C. Santos, and Vania Dimitrova (Eds.). Springer Nature Switzerland, Cham, 798–803.

[21]

Juho Leinonen, Paul Denny, Stephen MacNeil, Sami Sarsa, Seth Bernstein, Joanne Kim, Andrew Tran, and Arto Hellas. 2023. Comparing Code Explanations Created by Students and Large Language Models. arxiv:2304.03938 [cs.CY]

[22]

Mariam Mahdaoui, Said Nouh, My Seddiq ELKASMI Alaoui, and Mounir Sadiq. 2022. Comparative study between automatic hint generation approaches in Intelligent Programming Tutors. Procedia Computer Science 198 (2022), 391–396. https://doi.org/10.1016/j.procs.2021.12.259

Digital Library

[23]

Jessica McBroom, Irena Koprinska, and Kalina Yacef. 2022. A Survey of Automated Programming Hint Generation: The HINTS Framework. Comput. Surveys 54, 8 (Nov. 2022), 1–27. https://doi.org/10.1145/3469885

Digital Library

[24]

Nhan Nguyen and Sarah Nadi. 2022. An empirical evaluation of GitHub copilot’s code suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories. ACM, Pittsburgh Pennsylvania, 1–5. https://doi.org/10.1145/3524842.3528470

Digital Library

[25]

Chinedu Wilfred Okonkwo and Abejide Ade-Ibijola. 2021. Python-Bot: A Chatbot for Teaching Python Programming. Engineering Letters 29 (02 2021), 25–34.

[26]

Chinedu Wilfred Okonkwo and Abejide Ade-Ibijola. 2022. Revision-Bot: A Chatbot for Studying Past Questions in Introductory Programming. IAENG International Journal of Computer Science 49, 3 (2022).

[27]

Zachary A. Pardos and Shreya Bhandari. 2023. Learning gain differences between ChatGPT and human tutor generated algebra hints. arxiv:2302.06871 [cs.CY]

[28]

Stephen R. Piccolo, Paul Denny, Andrew Luxton-Reilly, Samuel H. Payne, and Perry G. Ridge. 2023. Evaluating a large language model’s ability to solve programming exercises from an introductory bioinformatics course. PLOS Computational Biology 19, 9 (09 2023), 1–16. https://doi.org/10.1371/journal.pcbi.1011511

[29]

James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Peterson, Raymond Pettit, Brent N. Reeves, and Jaromir Savelka. 2023. The Robots are Here: Navigating the Generative AI Revolution in Computing Education. arxiv:2310.00658 [cs.CY]

[30]

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. "It’s Weird That it Knows What I Want": Usability and Interactions with Copilot for Novice Programmers. arxiv:2304.02491 [cs.HC]

[31]

Margot Rutgers. 2021. Duckbot: A chatbot to assist students in programming tutorials. Master’s thesis. University of Twente.

[32]

Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the 2022 ACM Conference on International Computing Education Research V.1. ACM, Lugano and Virtual Event Switzerland, 27–43. https://doi.org/10.1145/3501385.3543957

Digital Library

[33]

Jaromir Savelka, Arav Agarwal, Marshall An, Chris Bogart, and Majd Sakr. 2023. Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Course. In Proceedings of the 2023 ACM Conference on International Computing Education Research V.1. ACM.

Digital Library

[34]

Jaromir Savelka, Arav Agarwal, Christopher Bogart, and Majd Sakr. 2023. Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code. arxiv:2303.08033 [cs.CL]

[35]

Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, and Tegawendé F. Bissyandé. 2023. Is ChatGPT the Ultimate Programming Assistant – How far is it?arxiv:2304.11938 [cs.SE]

[36]

James Walden, Nicholas Caporusso, and Ludiana Atnafu. 2022. A Chatbot for Teaching Secure Programming. In Proceedings of the EDSIG Conference ISSN, Vol. 2473. 4901.

[37]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv:2201.11903 [cs.CL]

[38]

Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, and Iason Gabriel. 2022. Taxonomy of Risks Posed by Language Models. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 214–229. https://doi.org/10.1145/3531146.3533088

Digital Library

[39]

Terry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. 2023. Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity. arxiv:2301.12867 [cs.CL]

Cited By

Lyu WWang YChung TSun YZhang YJoyner DKim MWang XXia M(2024)Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field StudyProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662036(63-74)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662036
Cipriano BMonga MLonati VBarendsen ESheard JPaterson J(2024)Towards the Integration of Large Language Models in an Object-Oriented Programming CourseProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659473(832-833)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3649405.3659473
Margulieux LPrather JReeves BBecker BCetin Uzun GLoksa DLeinonen JDenny PMonga MLonati VBarendsen ESheard JPaterson J(2024)Self-Regulation, Self-Efficacy, and Fear of Failure Interactions with How Novices Use LLMs to Solve Programming ProblemsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653621(276-282)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653621
Show More Cited By

Index Terms

CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools
2. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs
        Computer science education
        Software engineering education

Recommendations

Patterns of Student Help-Seeking When Using a Large Language Model-Powered Programming Assistant
ACE '24: Proceedings of the 26th Australasian Computing Education Conference

Providing personalized assistance at scale is a long-standing challenge for computing educators, but a new generation of tools powered by large language models (LLMs) offers immense promise. Such tools can, in theory, provide on-demand help in large ...
Using genetic programming for the induction of novice procedural programming solution algorithms
SAC '02: Proceedings of the 2002 ACM symposium on Applied computing

This paper describes a genetic programming system for the induction of solutions to novice procedural programming problems. This genetic programming system will form part of a generic architecture for the development of intelligent programming tutors ...
Is Pair Programming More Effective than Solo Programming for Secondary Education Novice Programmers?: A Case Study

The teaching and learning of programming are often considered a difficult topic for both teachers and students, due to its complexity and abstract nature. The traditional teaching approaches are unable to contribute substantially to the development of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

Koli Calling '23: Proceedings of the 23rd Koli Calling International Conference on Computing Education Research

November 2023

361 pages

ISBN:9798400716539

DOI:10.1145/3631802

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 February 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

Koli Calling '23

Koli Calling '23: 23rd Koli Calling International Conference on Computing Education Research

November 13 - 18, 2023

Koli, Finland

Acceptance Rates

Overall Acceptance Rate 80 of 182 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
712
Total Downloads

Downloads (Last 12 months)712
Downloads (Last 6 weeks)157

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lyu WWang YChung TSun YZhang YJoyner DKim MWang XXia M(2024)Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field StudyProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662036(63-74)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662036
Cipriano BMonga MLonati VBarendsen ESheard JPaterson J(2024)Towards the Integration of Large Language Models in an Object-Oriented Programming CourseProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659473(832-833)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3649405.3659473
Margulieux LPrather JReeves BBecker BCetin Uzun GLoksa DLeinonen JDenny PMonga MLonati VBarendsen ESheard JPaterson J(2024)Self-Regulation, Self-Efficacy, and Fear of Failure Interactions with How Novices Use LLMs to Solve Programming ProblemsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653621(276-282)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653621
Denny PMacNeil SSavelka JPorter LLuxton-Reilly AMonga MLonati VBarendsen ESheard JPaterson J(2024)Desirable Characteristics for AI Teaching Assistants in Programming EducationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653574(408-414)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653574
del Carpio Gutierrez ADenny PLuxton-Reilly AMonga MLonati VBarendsen ESheard JPaterson J(2024)Automating Personalized Parsons Problems with Customized Contexts and ConceptsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653568(688-694)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653568
Bassner PFrankford EKrusche SMonga MLonati VBarendsen ESheard JPaterson J(2024)Iris: An AI-Driven Virtual Tutor for Computer Science EducationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653543(394-400)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653543
Prather JReeves BLeinonen JMacNeil SRandrianasolo ABecker BKimmel BWright JBriggs B(2024)The Widening Gap: The Benefits and Harms of Generative AI for Novice ProgrammersProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671116(469-486)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3632620.3671116
Tran KBacher JShi YSkripchuk JPrice T(2024)Overcoming Barriers in Scaling Computing Education Research Programming Tools: A Developer's PerspectiveProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671113(312-325)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3632620.3671113
Logacheva EHellas APrather JSarsa SLeinonen J(2024)Evaluating Contextually Personalized Programming Exercises Created with Generative AIProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671103(95-113)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3632620.3671103
Padiyath AHou XPang AViramontes Vargas DGu XNelson-Fromm TWu ZGuzdial MEricson B(2024)Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming CourseProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671098(114-130)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3632620.3671098
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents