Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3631802.3631830acmotherconferencesArticle/Chapter ViewAbstractPublication Pageskoli-callingConference Proceedingsconference-collections
research-article
Open access

CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes

Published: 06 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Computing educators face significant challenges in providing timely support to students, especially in large class settings. Large language models (LLMs) have emerged recently and show great promise for providing on-demand help at a large scale, but there are concerns that students may over-rely on the outputs produced by these models. In this paper, we introduce CodeHelp, a novel LLM-powered tool designed with guardrails to provide on-demand assistance to programming students without directly revealing solutions. We detail the design of the tool, which incorporates a number of useful features for instructors, and elaborate on the pipeline of prompting strategies we use to ensure generated outputs are suitable for students. To evaluate CodeHelp, we deployed it in a first-year computer and data science course with 52 students and collected student interactions over a 12-week period. We examine students’ usage patterns and perceptions of the tool, and we report reflections from the course instructor and a series of recommendations for classroom use. Our findings suggest that CodeHelp is well-received by students who especially value its availability and help with resolving errors, and that for instructors it is easy to deploy and complements, rather than replaces, the support that they provide to students.

    References

    [1]
    Brett A Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 500–506.
    [2]
    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (2006), 77–101. https://doi.org/10.1191/1478088706qp063oa
    [3]
    Peter Brusilovsky, Barbara J Ericson, Cay S Horstmann, and Christian Servin. 2023. The Future of Computing Education Materials. (2023).
    [4]
    Gustavo Carreira, Leonardo Silva, Antonio Jose Mendes, and Hugo Goncalo Oliveira. 2022. Pyo, a Chatbot Assistant for Introductory Programming Students. In 2022 International Symposium on Computers in Education (SIIE). IEEE, Coimbra, Portugal, 1–6. https://doi.org/10.1109/SIIE56031.2022.9982349
    [5]
    Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, and Weizhu Chen. 2022. CodeT: Code Generation with Generated Tests. arxiv:2207.10397 [cs.CL]
    [6]
    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arxiv:2107.03374 [cs.LG]
    [7]
    Jonathan E Collins. 2023. Policy Solutions: Policy questions for ChatGPT and artificial intelligence. Phi Delta Kappan 104, 7 (2023), 60–61.
    [8]
    Tyne Crow, Andrew Luxton-Reilly, and Burkhard Wuensche. 2018. Intelligent tutoring systems for programming education: a systematic review. In Proceedings of the 20th Australasian Computing Education Conference. ACM, Brisbane Queensland Australia, 53–62. https://doi.org/10.1145/3160489.3160492
    [9]
    Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. ACM, Toronto ON Canada, 1136–1142. https://doi.org/10.1145/3545945.3569823
    [10]
    Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2023. Promptly: Using Prompt Problems to Teach Learners How to Effectively Utilize AI Code Generators. arxiv:2307.16364 [cs.HC]
    [11]
    Paul Denny, James Prather, Brett A. Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N. Reeves, Eddie Antonio Santos, and Sami Sarsa. 2023. Computing Education in the Era of Generative AI. arxiv:2306.02608 [cs.CY]
    [12]
    James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The robots are coming: Exploring the implications of openai codex on introductory programming. In Proceedings of the 24th Australasian Computing Education Conference. 10–19. https://doi.org/10.1145/3511861.3511863
    [13]
    Zhikai Gao, Sarah Heckman, and Collin Lynch. 2022. Who Uses Office Hours? A Comparison of In-Person and Virtual Office Hours Utilization. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1 (Providence, RI, USA) (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 300–306. https://doi.org/10.1145/3478431.3499334
    [14]
    Arto Hellas, Juho Leinonen, Sami Sarsa, Charles Koutcheme, Lilja Kujanpää, and Juha Sorva. 2023. Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests. arxiv:2306.05715 [cs.CY]
    [15]
    Sajed Jalil, Suzzana Rafi, Thomas D. LaToza, Kevin Moran, and Wing Lam. 2023. ChatGPT and Software Testing Education: Promises & Perils. In 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE. https://doi.org/10.1109/icstw58534.2023.00078 arXiv:arXiv:2302.03287
    [16]
    Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stepha Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103 (2023), 102274. https://doi.org/10.1016/j.lindif.2023.102274
    [17]
    Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 455, 23 pages. https://doi.org/10.1145/3544548.3580919
    [18]
    Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2019. A Systematic Literature Review of Automated Feedback Generation for Programming Exercises. ACM Transactions on Computing Education 19, 1 (March 2019), 1–43. https://doi.org/10.1145/3231711
    [19]
    Mario Konecki, Nikola Kadoic, and Rok Piltaver. 2015. Intelligent assistant for helping students to learn programming. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, Opatija, Croatia, 924–928. https://doi.org/10.1109/MIPRO.2015.7160406
    [20]
    Charles Koutcheme, Sami Sarsa, Juho Leinonen, Arto Hellas, and Paul Denny. 2023. Automated Program Repair Using Generative Models for Code Infilling. In Artificial Intelligence in Education, Ning Wang, Genaro Rebolledo-Mendez, Noboru Matsuda, Olga C. Santos, and Vania Dimitrova (Eds.). Springer Nature Switzerland, Cham, 798–803.
    [21]
    Juho Leinonen, Paul Denny, Stephen MacNeil, Sami Sarsa, Seth Bernstein, Joanne Kim, Andrew Tran, and Arto Hellas. 2023. Comparing Code Explanations Created by Students and Large Language Models. arxiv:2304.03938 [cs.CY]
    [22]
    Mariam Mahdaoui, Said Nouh, My Seddiq ELKASMI Alaoui, and Mounir Sadiq. 2022. Comparative study between automatic hint generation approaches in Intelligent Programming Tutors. Procedia Computer Science 198 (2022), 391–396. https://doi.org/10.1016/j.procs.2021.12.259
    [23]
    Jessica McBroom, Irena Koprinska, and Kalina Yacef. 2022. A Survey of Automated Programming Hint Generation: The HINTS Framework. Comput. Surveys 54, 8 (Nov. 2022), 1–27. https://doi.org/10.1145/3469885
    [24]
    Nhan Nguyen and Sarah Nadi. 2022. An empirical evaluation of GitHub copilot’s code suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories. ACM, Pittsburgh Pennsylvania, 1–5. https://doi.org/10.1145/3524842.3528470
    [25]
    Chinedu Wilfred Okonkwo and Abejide Ade-Ibijola. 2021. Python-Bot: A Chatbot for Teaching Python Programming. Engineering Letters 29 (02 2021), 25–34.
    [26]
    Chinedu Wilfred Okonkwo and Abejide Ade-Ibijola. 2022. Revision-Bot: A Chatbot for Studying Past Questions in Introductory Programming. IAENG International Journal of Computer Science 49, 3 (2022).
    [27]
    Zachary A. Pardos and Shreya Bhandari. 2023. Learning gain differences between ChatGPT and human tutor generated algebra hints. arxiv:2302.06871 [cs.CY]
    [28]
    Stephen R. Piccolo, Paul Denny, Andrew Luxton-Reilly, Samuel H. Payne, and Perry G. Ridge. 2023. Evaluating a large language model’s ability to solve programming exercises from an introductory bioinformatics course. PLOS Computational Biology 19, 9 (09 2023), 1–16. https://doi.org/10.1371/journal.pcbi.1011511
    [29]
    James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Peterson, Raymond Pettit, Brent N. Reeves, and Jaromir Savelka. 2023. The Robots are Here: Navigating the Generative AI Revolution in Computing Education. arxiv:2310.00658 [cs.CY]
    [30]
    James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. "It’s Weird That it Knows What I Want": Usability and Interactions with Copilot for Novice Programmers. arxiv:2304.02491 [cs.HC]
    [31]
    Margot Rutgers. 2021. Duckbot: A chatbot to assist students in programming tutorials. Master’s thesis. University of Twente.
    [32]
    Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the 2022 ACM Conference on International Computing Education Research V.1. ACM, Lugano and Virtual Event Switzerland, 27–43. https://doi.org/10.1145/3501385.3543957
    [33]
    Jaromir Savelka, Arav Agarwal, Marshall An, Chris Bogart, and Majd Sakr. 2023. Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Course. In Proceedings of the 2023 ACM Conference on International Computing Education Research V.1. ACM.
    [34]
    Jaromir Savelka, Arav Agarwal, Christopher Bogart, and Majd Sakr. 2023. Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code. arxiv:2303.08033 [cs.CL]
    [35]
    Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, and Tegawendé F. Bissyandé. 2023. Is ChatGPT the Ultimate Programming Assistant – How far is it?arxiv:2304.11938 [cs.SE]
    [36]
    James Walden, Nicholas Caporusso, and Ludiana Atnafu. 2022. A Chatbot for Teaching Secure Programming. In Proceedings of the EDSIG Conference ISSN, Vol. 2473. 4901.
    [37]
    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv:2201.11903 [cs.CL]
    [38]
    Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, and Iason Gabriel. 2022. Taxonomy of Risks Posed by Language Models. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 214–229. https://doi.org/10.1145/3531146.3533088
    [39]
    Terry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. 2023. Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity. arxiv:2301.12867 [cs.CL]

    Cited By

    View all
    • (2024)Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field StudyProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662036(63-74)Online publication date: 9-Jul-2024
    • (2024)Towards the Integration of Large Language Models in an Object-Oriented Programming CourseProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659473(832-833)Online publication date: 8-Jul-2024
    • (2024)Self-Regulation, Self-Efficacy, and Fear of Failure Interactions with How Novices Use LLMs to Solve Programming ProblemsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653621(276-282)Online publication date: 3-Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    Koli Calling '23: Proceedings of the 23rd Koli Calling International Conference on Computing Education Research
    November 2023
    361 pages
    ISBN:9798400716539
    DOI:10.1145/3631802
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 February 2024

    Check for updates

    Author Tags

    1. Guardrails
    2. Intelligent programming tutors
    3. Intelligent tutoring systems
    4. Large language models
    5. Natural language interfaces
    6. Novice programmers
    7. Programming assistance

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Koli Calling '23

    Acceptance Rates

    Overall Acceptance Rate 80 of 182 submissions, 44%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)712
    • Downloads (Last 6 weeks)157
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field StudyProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662036(63-74)Online publication date: 9-Jul-2024
    • (2024)Towards the Integration of Large Language Models in an Object-Oriented Programming CourseProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659473(832-833)Online publication date: 8-Jul-2024
    • (2024)Self-Regulation, Self-Efficacy, and Fear of Failure Interactions with How Novices Use LLMs to Solve Programming ProblemsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653621(276-282)Online publication date: 3-Jul-2024
    • (2024)Desirable Characteristics for AI Teaching Assistants in Programming EducationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653574(408-414)Online publication date: 3-Jul-2024
    • (2024)Automating Personalized Parsons Problems with Customized Contexts and ConceptsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653568(688-694)Online publication date: 3-Jul-2024
    • (2024)Iris: An AI-Driven Virtual Tutor for Computer Science EducationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653543(394-400)Online publication date: 3-Jul-2024
    • (2024)The Widening Gap: The Benefits and Harms of Generative AI for Novice ProgrammersProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671116(469-486)Online publication date: 12-Aug-2024
    • (2024)Overcoming Barriers in Scaling Computing Education Research Programming Tools: A Developer's PerspectiveProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671113(312-325)Online publication date: 12-Aug-2024
    • (2024)Evaluating Contextually Personalized Programming Exercises Created with Generative AIProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671103(95-113)Online publication date: 12-Aug-2024
    • (2024)Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming CourseProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671098(114-130)Online publication date: 12-Aug-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media