Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Chiang, Cheng-Han; Chen, Wei-Chih; Kuan, Chun-Yi; Yang, Chienchou; Lee, Hung-yi

Computer Science > Computation and Language

arXiv:2407.05216 (cs)

[Submitted on 7 Jul 2024 (v1), last revised 21 Sep 2024 (this version, v2)]

Title:Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Authors:Cheng-Han Chiang, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, Hung-yi Lee

View PDF HTML (experimental)

Abstract:Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research. However, it is unclear whether these LLM-based evaluators can be applied in real-world classrooms to assess student assignments. This empirical report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students. Based on student responses, we find that LLM-based assignment evaluators are generally acceptable to students when students have free access to these LLM-based evaluators. However, students also noted that the LLM sometimes fails to adhere to the evaluation instructions. Additionally, we observe that students can easily manipulate the LLM-based evaluator to output specific strings, allowing them to achieve high scores without meeting the assignment rubric. Based on student feedback and our experience, we provide several recommendations for integrating LLM-based evaluators into future classrooms. Our observation also highlights potential directions for improving LLM-based evaluators, including their instruction-following ability and vulnerability to prompt hacking.

Comments:	EMNLP 2024 main conference paper. An empirical report of our course: Introduction to Generative AI 2024 Spring (this https URL)
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.05216 [cs.CL]
	(or arXiv:2407.05216v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.05216

Submission history

From: Cheng-Han Chiang [view email]
[v1] Sun, 7 Jul 2024 00:17:24 UTC (832 KB)
[v2] Sat, 21 Sep 2024 07:37:32 UTC (835 KB)

Computer Science > Computation and Language

Title:Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators