Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Leveraging Large Language Models for Automated Chinese Essay Scoring

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2024)

Abstract

Automated Essay Scoring (AES) plays a crucial role in offering immediate feedback, reducing the workload of educators in grading essays, and improving students’ learning experiences. With strong generalization capabilities, large language models (LLMs) offer a new perspective in AES. While previous research has primarily focused on employing deep learning architectures and models like BERT for feature extraction and scoring, the potential of LLMs in Chinese AES remains largely unexplored. In this paper, we explored the capabilities of LLMs in the realm of Chinese AES. We investigated the effectiveness of the application of well-established LLMs in Chinese AES, e.g., the GPT-series by OpenAI and Qwen-1.8B by Alibaba Cloud. We constructed a Chinese essay dataset with carefully developed rubrics, based on which we acquired grades from human raters. Then we fed in prompts to LLMs, specifically GPT-4, fine-tuned GPT-3.5 and Qwen to get grades, where different strategies were adopted for prompt generations and model fine-tuning. The comparisons between the grades provided by LLMs and human raters suggest that the strategies to generate prompts have a remarkable impact on the grade agreement between LLMs and human raters. When model fine-tuning was adopted, the consistency between LLMs’ scores and human scores was further improved. Comparative experimental results demonstrate that fine-tuned GPT-3.5 and Qwen outperform BERT in QWK score. These results highlight the substantial potential of LLMs in Chinese AES and pave the way for further research in the integration of LLMs within Chinese AES, employing varied strategies for prompt generation and model fine-tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://aic-fe.bnu.edu.cn/cgzs/kfsj/xxszskszw/index.html.

  2. 2.

    https://huggingface.co/Qwen/Qwen-1_8B-Chat.

  3. 3.

    https://huggingface.co/google-bert/bert-base-chinese.

References

  1. Abraham, B., Nair, M.S.: Automated grading of prostate cancer using convolutional neural network and ordinal class classifier. Inform. Med. Unlocked 17, 100256 (2019)

    Article  Google Scholar 

  2. Bai, J., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)

  3. Bai, J.Y.H., et al.: Automated essay scoring (AES) systems: opportunities and challenges for open and distance education. In: Tenth Pan-Commonwealth Forum on Open Learning. Commonwealth of Learning (2022). https://doi.org/10.56059/pcf10.8339

  4. Chen, B., Zhang, Z., Langrené, N., Zhu, S.: Unleashing the potential of prompt engineering in large language models: a comprehensive review (2023). http://arxiv.org/abs/2310.14735. Accessed 26 Mar 2024

  5. Chen, H., He, B., Luo, T., Li, B.: A ranked-based learning approach to automated essay scoring. In: 2012 Second International Conference on Cloud and Green Computing, Xiangtan, Hunan, China, pp. 448–455. IEEE (2012). https://doi.org/10.1109/CGC.2012.41

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. Gong, J., et al.: Iflyea: a Chinese essay assessment system with automated rating, review generation, and recommendation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pp. 240–248 (2021)

    Google Scholar 

  8. Guan, Y., Xie, Y., Liu, X., Sun, Y., Gong, B.: Understanding lexical features for Chinese essay grading. In: Sun, Y., Lu, T., Yu, Z., Fan, H., Gao, L. (eds.) ChineseCSCW 2019. CCIS, vol. 1042, pp. 645–657. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-1377-0_50

    Chapter  Google Scholar 

  9. He, Y., Jiang, F., Chu, X., Li, P.: Automated Chinese essay scoring from multiple traits. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 3007–3016 (2022)

    Google Scholar 

  10. Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  11. Hussein, M., Hassan, H., Nassef, M.: Automated language essay scoring systems: a literature review. PeerJ Comput. Sci. 5, e208 (2019)

    Article  Google Scholar 

  12. Li, L., Zhang, H., Li, C., You, H., Cui, W.: Evaluation on ChatGPT for Chinese language understanding. Data Intell. 5(4), 885–903 (2023)

    Article  Google Scholar 

  13. McNamara, D.S., Crossley, S.A., Roscoe, R.D., Allen, L.K., Dai, J.: A hierarchical classification approach to automated essay scoring. Assess. Writ. 23, 35–59 (2015)

    Article  Google Scholar 

  14. Mizumoto, A., Eguchi, M.: Exploring the potential of using an AI language model for automated essay scoring. Res. Methods Appl. Linguist. 2(2), 100050 (2023)

    Article  Google Scholar 

  15. Page, E.B.: Project essay grade: PEG (2003)

    Google Scholar 

  16. Peng, X., Ke, D., Chen, Z., Xu, B.: Automated Chinese essay scoring using vector space models. In: 2010 4th International Universal Communication Symposium, pp. 149–153. IEEE (2010)

    Google Scholar 

  17. Phandi, P., Chai, K.M.A., Ng, H.T.: Flexible domain adaptation for automated essay scoring using correlated linear regression. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 431–439 (2015)

    Google Scholar 

  18. Ramesh, D., Sanampudi, S.K.: An automated essay scoring systems: a systematic literature review. Artif. Intell. Rev. 55(3), 2495–2527 (2022)

    Article  Google Scholar 

  19. Taghipour, K., Ng, H.T.: A neural approach to automated essay scoring. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1882–1891 (2016)

    Google Scholar 

  20. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2023). http://arxiv.org/abs/1706.03762

  21. Wang, Y., Wang, C., Li, R., Lin, H.: On the use of BERT for automated essay scoring: joint learning of multi-scale essay representation. arXiv preprint arXiv:2205.03835 (2022)

  22. Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural. Inf. Process. Syst. 35, 24824–24837 (2022)

    Google Scholar 

  23. Xiao, C., Ma, W., Xu, S.X., Zhang, K., Wang, Y., Fu, Q.: From automation to augmentation: Large language models elevating essay scoring landscape. arXiv preprint arXiv:2401.06431 (2024)

  24. Yancey, K.P., Laflair, G., Verardi, A., Burstein, J.: Rating short L2 essays on the CEFR scale with GPT-4. In: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pp. 576–584 (2023)

    Google Scholar 

  25. Yang, H., He, Y., Bu, X., Xu, H., Guo, W.: Automatic essay evaluation technologies in Chinese writing-a systematic literature review. Appl. Sci. 13(19), 10737 (2023)

    Article  Google Scholar 

  26. Yang, R., Cao, J., Wen, Z., Wu, Y., He, X.: Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1560–1569 (2020)

    Google Scholar 

  27. Zheng, C., Guo, S., Xia, W., Mao, S.: Elion: an intelligent Chinese composition tutoring system based on large language models. Chinese/English J. Educ. Measur. Eval. 4(3), 3 (2023)

    Google Scholar 

Download references

Acknowledgements

This research is supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 1 (A Trustworthy Feedback Agent for Secondary School Chinese Language Learning), the Shenzhen Science and Technology Foundation (General Pro gram, JCYJ20210324093212034), and 2022 Guangdong Province Undergraduate University Quality Engineering Project (Shenzhen University Academic Affairs [2022] No. 7).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuhong Feng .

Editor information

Editors and Affiliations

Appendix

Appendix

The detailed scoring rubrics and the essay example from K3 students can be accessed at https://github.com/seamoon224/AIED-2024.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Feng, H. et al. (2024). Leveraging Large Language Models for Automated Chinese Essay Scoring. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. AIED 2024. Lecture Notes in Computer Science(), vol 14829. Springer, Cham. https://doi.org/10.1007/978-3-031-64302-6_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-64302-6_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-64301-9

  • Online ISBN: 978-3-031-64302-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics