Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3613372.3614197acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

Large Language Models for Education: Grading Open-Ended Questions Using ChatGPT

Published: 25 September 2023 Publication History
  • Get Citation Alerts
  • Abstract

    As a way of addressing increasingly sophisticated problems, software professionals face the constant challenge of seeking improvement. However, for these individuals to enhance their skills, their process of studying and training must involve feedback that is both immediate and accurate. In the context of software companies, where the scale of professionals undergoing training is large, but the number of qualified professionals available for providing corrections is small, delivering effective feedback becomes even more challenging. To circumvent this challenge, this work presents an exploration of using Large Language Models (LLMs) to support the correction process of open-ended questions in technical training.
    In this study, we utilized ChatGPT to correct open-ended questions answered by 42 industry professionals on two topics. Evaluating the corrections and feedback provided by ChatGPT, we observed that it is capable of identifying semantic details in responses that other metrics cannot observe. Furthermore, we noticed that, in general, subject matter experts tended to agree with the corrections and feedback given by ChatGPT.

    References

    [1]
    Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.
    [2]
    Andrew Begel and Beth Simon. 2008. Novice Software Developers, All over Again. In Proceedings of the Fourth International Workshop on Computing Education Research (Sydney, Australia) (ICER ’08). Association for Computing Machinery, New York, NY, USA, 3–14. https://doi.org/10.1145/1404520.1404522
    [3]
    Jan Philip Bernius, Stephan Krusche, and Bernd Bruegge. 2022. Machine learning based feedback on textual student answers in large courses. Computers and Education: Artificial Intelligence 3 (2022), 100081.
    [4]
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
    [5]
    Sarah Gielen, Elien Peeters, Filip Dochy, Patrick Onghena, and Katrien Struyven. 2010. Improving the effectiveness of peer feedback for learning. Learning and instruction 20, 4 (2010), 304–315.
    [6]
    Antonio Hernández-Blanco, Boris Herrera-Flores, David Tomás, and Borja Navarro-Colorado. 2019. A systematic review of deep learning approaches to educational data mining. Complexity 2019 (2019).
    [7]
    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.
    [8]
    Enkelejda Kasneci, Kathrin Seßler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103 (2023), 102274.
    [9]
    Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916 (2022).
    [10]
    Alfirna Rizqi Lahitani, Adhistya Erna Permanasari, and Noor Akhmad Setiawan. 2016. Cosine similarity to determine similarity measure: Study case in online essay assessment. In 2016 4th International Conference on Cyber and IT Service Management. IEEE, 1–6.
    [11]
    Baoli Li and Liping Han. 2013. Distance weighted cosine similarity measure for text classification. In Intelligent Data Engineering and Automated Learning–IDEAL 2013: 14th International Conference, IDEAL 2013, Hefei, China, October 20-23, 2013. Proceedings 14. Springer, 611–618.
    [12]
    Qian Li, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S Yu, and Lifang He. 2020. A survey on text classification: From shallow to deep learning. arXiv preprint arXiv:2008.00364 (2020).
    [13]
    Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
    [14]
    Vivian Liu and Lydia B Chilton. 2022. Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–23.
    [15]
    Xiaofei Ma, Zhiguo Wang, Patrick Ng, Ramesh Nallapati, and Bing Xiang. 2019. Universal text representation from bert: An empirical study. arXiv preprint arXiv:1910.07973 (2019).
    [16]
    Guido Makransky, Malene Warming Thisgaard, and Helen Gadegaard. 2016. Virtual simulations as preparation for lab exercises: Assessing learning of key laboratory skills in microbiology and improvement of essential non-cognitive skills. PloS one 11, 6 (2016), e0155895.
    [17]
    Steven Moore, Huy A Nguyen, Norman Bier, Tanvi Domadia, and John Stamper. 2022. Assessing the Quality of Student-Generated Short Answer Questions Using GPT-3. In Educating for a New Future: Making Sense of Technology-Enhanced Learning Adoption: 17th European Conference on Technology Enhanced Learning, EC-TEL 2022, Toulouse, France, September 12–16, 2022, Proceedings. Springer, 243–257.
    [18]
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL ’02). Association for Computational Linguistics, USA, 311–318. https://doi.org/10.3115/1073083.1073135
    [19]
    Partha Pratim Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems (2023).
    [20]
    Thomas Scialom and Jacopo Staiano. 2019. Ask to learn: A study on curiosity-driven question generation. arXiv preprint arXiv:1911.03350 (2019).
    [21]
    Ritika Singh and Satwinder Singh. 2021. Text similarity measures in news articles by vector space model using NLP. Journal of The Institution of Engineers (India): Series B 102 (2021), 329–338.
    [22]
    Robert McCaughan Smith. 1982. Learning how to learn: Applied theory for adults. Open University Press Great Britain.
    [23]
    Marieke Thurlings, Marjan Vermeulen, Theo Bastiaens, and Sjef Stijnen. 2013. Understanding feedback: A learning theory perspective. Educational Research Review 9 (2013), 1–15.
    [24]
    Phillip D Tomporowski, Bryan McCullick, Daniel M Pendleton, and Caterina Pesce. 2015. Exercise and children’s cognition: The role of exercise characteristics and a place for metacognition. Journal of Sport and Health Science 4, 1 (2015), 47–55.
    [25]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
    [26]
    Regina Vollmeyer and Falko Rheinberg. 2005. A surprising effect of feedback on learning. Learning and instruction 15, 6 (2005), 589–602.
    [27]
    Sam Witteveen and Martin Andrews. 2019. Paraphrasing with large language models. arXiv preprint arXiv:1911.09661 (2019).
    [28]
    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
    [29]
    Mengxiao Zhu, Ou Lydia Liu, and Hee-Sun Lee. 2020. The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Computers & Education 143 (2020), 103668.

    Cited By

    View all
    • (2024)Generative AI in Education: Technical Foundations, Applications, and ChallengesArtificial Intelligence for Quality Education [Working Title]10.5772/intechopen.1005402Online publication date: 20-May-2024
    • (2024)Aplicação do POGIL no ensino de ComputaçãoAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237541(224-233)Online publication date: 22-Apr-2024
    • (2024)Generative AI for Customizable Learning ExperiencesSustainability10.3390/su1607303416:7(3034)Online publication date: 5-Apr-2024
    • Show More Cited By

    Index Terms

    1. Large Language Models for Education: Grading Open-Ended Questions Using ChatGPT

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering
      September 2023
      570 pages
      ISBN:9798400707872
      DOI:10.1145/3613372
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 September 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Automated grading
      2. ChatGPT
      3. Open-ended Questions

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • FacePE
      • CNPQ
      • Capes
      • FAPESPA

      Conference

      SBES 2023
      SBES 2023: XXXVII Brazilian Symposium on Software Engineering
      September 25 - 29, 2023
      Campo Grande, Brazil

      Acceptance Rates

      Overall Acceptance Rate 147 of 427 submissions, 34%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)546
      • Downloads (Last 6 weeks)59
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Generative AI in Education: Technical Foundations, Applications, and ChallengesArtificial Intelligence for Quality Education [Working Title]10.5772/intechopen.1005402Online publication date: 20-May-2024
      • (2024)Aplicação do POGIL no ensino de ComputaçãoAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237541(224-233)Online publication date: 22-Apr-2024
      • (2024)Generative AI for Customizable Learning ExperiencesSustainability10.3390/su1607303416:7(3034)Online publication date: 5-Apr-2024
      • (2024)Making ChatGPT Work For MeSSRN Electronic Journal10.2139/ssrn.4700354Online publication date: 2024
      • (2024)Unveiling the Potential of a Conversational Agent in Developer Support: Insights from Mozilla’s PDF.js ProjectProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664758(10-18)Online publication date: 10-Jul-2024
      • (2024)Chatbot Development Using LangChain: A Case Study to Foster Critical Thinking and CreativityProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653557(401-407)Online publication date: 3-Jul-2024
      • (2024)Using Benchmarking Infrastructure to Evaluate LLM Performance on CS Concept Inventories: Challenges, Opportunities, and CritiquesProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671097(452-468)Online publication date: 12-Aug-2024
      • (2024)BC4LLM: A perspective of trusted artificial intelligence when blockchain meets large language modelsNeurocomputing10.1016/j.neucom.2024.128089599(128089)Online publication date: Sep-2024
      • (2024)Competency and Skill-Based Educational Recommendation SystemInternational Journal of Artificial Intelligence in Education10.1007/s40593-024-00423-zOnline publication date: 30-Jul-2024
      • (2024)Is it all hype? ChatGPT’s performance and disruptive potential in the accounting and auditing industriesReview of Accounting Studies10.1007/s11142-024-09833-9Online publication date: 27-Jun-2024

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media