Article

Leaf: Multiple-Choice Question Generation

Authors:

Kristiyan Vachev,

Momchil Hardalov,

Georgi Karadzhov,

Georgi Georgiev,

Preslav NakovAuthors Info & Claims

Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II

Pages 321 - 328

https://doi.org/10.1007/978-3-030-99739-7_41

Published: 10 April 2022 Publication History

Abstract

Testing with quiz questions has proven to be an effective way to assess and improve the educational process. However, manually creating quizzes is tedious and time-consuming. To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for the classroom, Leaf could also be used in an industrial setting, e.g., to facilitate onboarding and knowledge sharing, or as a component of chatbots, question answering systems, or Massive Open Online Courses (MOOCs). The code and the demo are available on GitHub (https://github.com/KristiyanVachev/Leaf-Question-Generation).

References

[1]

Jacopo Amidei, Paul Piwek, and Alistair Willis. Evaluation methodologies in automatic question generation 2013–2018. In Proceedings of the 11th International Conference on Natural Language Generation, INLG ’20, pages 307–317, Tilburg University, The Netherlands, 2018. Association for Computational Linguistics

[2]

Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, and Hsiao-Wuen Hon. UniLMv2: Pseudo-masked language models for unified language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of ICML ’20, pages 642–652. PMLR, 2020

[3]

Ho-Lam Chung, Ying-Hong Chan, and Yao-Chung Fan. A BERT-based distractor generation scheme with multi-tasking and negative answer training strategies. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4390–4400. Association for Computational Linguistics, 2020

[4]

Clark JH, Choi E, Collins M, Garrette D, Kwiatkowski T, Nikolaev V, and Palomaki J TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages Transactions of the Association for Computational Linguistics 2020 8 454-470

[5]

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? Try ARC, the AI2 Reasoning Challenge. arXiv:1803.05457, 2018

[6]

Peter Clark, Oren Etzioni, Tushar Khot, Daniel Khashabi, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, and Michael Schmitz. From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An overview of the Aristo project. AI Mag., 41(4):39–53, 2020

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT ’19, pages 4171–4186, Minneapolis, Minnesota, USA, 2019. Association for Computational Linguistics

[8]

Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS ’19, pages 13042–13054, Vancouver, British Columbia, Canada, 2019

[9]

Xinya Du, Junru Shao, and Claire Cardie. Learning to ask: Neural question generation for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL ’17, pages 1342–1352, Vancouver, Canada, 2017. Association for Computational Linguistics

[10]

Nan Duan, Duyu Tang, Peng Chen, and Ming Zhou. Question generation for question answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP ’17, pages 866–874, Copenhagen, Denmark, 2017. Association for Computational Linguistics

[11]

Yifan Gao, Lidong Bing, Piji Li, Irwin King, and Michael R. Lyu. Generating distractors for reading comprehension questions from real examinations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33 of AAAI ’19, pages 6423–6430, 2019

[12]

Momchil Hardalov, Ivan Koychev, and Preslav Nakov. Beyond English-only reading comprehension: Experiments in zero-shot multilingual transfer for Bulgarian. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 19, pages 447–459, Varna, Bulgaria, 2019. INCOMA Ltd

[13]

Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, and Preslav Nakov. EXAMS: A multi-subject high school examinations dataset for cross-lingual and multilingual question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP ’20, pages 5427–5444. Association for Computational Linguistics, 2020

[14]

Ayako Hoshino and Hiroshi Nakagawa. WebExperimenter for multiple-choice question generation. In Proceedings of HLT/EMNLP 2005 Interactive Demonstrations, HLT/EMNLP ’05, pages 18–19, Vancouver, British Columbia, Canada, 2005. Association for Computational Linguistics

[15]

Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of ICML ’20, pages 4411–4421. PMLR, 2020

[16]

Yimin Jing, Deyi Xiong, and Zhen Yan. BiPaR: A bilingual parallel dataset for multilingual and cross-lingual reading comprehension on novels. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP ’19, pages 2452–2462, Hong Kong, China, 2019. Association for Computational Linguistics

[17]

Kalpesh Krishna and Mohit Iyyer. Generating question-answer hierarchies. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL ’19, pages 2321–2334, Florence, Italy, 2019. Association for Computational Linguistics

[18]

Ghader Kurdi, Jared Leo, Bijan Parsia, Uli Sattler, and Salam Al-Emari. A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education, 30, 2019

[19]

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP ’17, pages 785–794, Copenhagen, Denmark, 2017. Association for Computational Linguistics

[20]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedigs of the 8th International Conference on Learning Representations, ICLR ’20, Addis Ababa, Ethiopia, 2020. OpenReview.net

[21]

Alon Lavie and Abhaya Agarwal. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, WMT ’07, pages 228–231, Prague, Czech Republic, 2007. Association for Computational Linguistics

[22]

John Lee, Baikun Liang, and Haley Fong. Restatement and question generation for counsellor chatbot. In Proceedings of the 1st Workshop on NLP for Positive Impact, pages 1–7. Association for Computational Linguistics, 2021

[23]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL ’20, pages 7871–7880. Association for Computational Linguistics, 2020

[24]

Patrick Lewis, Barlas Oguz, Ruty Rinott, Sebastian Riedel, and Holger Schwenk. MLQA: Evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL ’20, pages 7315–7330. Association for Computational Linguistics, 2020

[25]

Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Proceedigs of the Workshop on Text Summarization Branches Out, pages 74–81, Barcelona, Spain, 2004. Association for Computational Linguistics

[26]

Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, et al. Few-shot learning with multilingual language models. arXiv:2112.10668, 2021

[27]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019

[28]

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? A new dataset for open book question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP ’18, pages 2381–2391, Brussels, Belgium, 2018. Association for Computational Linguistics

[29]

Ruslan Mitkov and Le An Ha. Computer-aided generation of multiple-choice tests. In Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing, BEA ’03, pages 17–22, Edmonton, Alberta, Canada, 2003

[30]

Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan Nguyen, and Ngan Luu-Thuy Nguyen. Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension. IEEE Access, 8:201404–201417, 2020

[31]

Jeroen Offerijns, Suzan Verberne, and Tessa Verhoef. Better distractions: Transformer-based distractor generation and multiple choice question filtering. arXiv:2010.09598, 2020

[32]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL ’02, pages 311–318, Philadelphia, Pennsylvania, USA, 2002. Association for Computational Linguistics

[33]

Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, and Ming Zhou. ProphetNet: Predicting future n-gram for sequence-to-sequence pre-training. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2401–2410. Association for Computational Linguistics, 2020

[34]

Questgen. Questgen: AI powered question generator. http://questgen.ai/. Accessed: 2022–01-05

[35]

Quillionz. Quillionz - world’s first AI-powered question generator. https://www.quillionz.com/. Accessed: 2022–01-05

[36]

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, and Liu PJ Exploring the limits of transfer learning with a unified text-to-text transformer Journal of Machine Learning Research 2020 21 140 1-67

[37]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP ’16, pages 2383–2392, Austin, Texas, USA, 2016. Association for Computational Linguistics

[38]

Henry L. Roediger III, Adam L. Putnam, and Megan A. Smith. Chapter one - ten benefits of testing and their applications to educational practice. In Psychology of Learning and Motivation, volume 55, pages 1–36. Academic Press, 2011

[39]

Melissa Roemmele, Deep Sidhpura, Steve DeNeefe, and Ling Tsou. AnswerQuest: A system for generating question-answer items from multi-paragraph documents. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, EACL ’21, pages 40–52, Online, 2021. Association for Computational Linguistics

[40]

Linfeng Song, Zhiguo Wang, Wael Hamza, Yue Zhang, and Daniel Gildea. Leveraging context information for natural question generation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT ’18, pages 569–574, New Orleans, Louisiana, USA, 2018. Association for Computational Linguistics

[41]

Susanti Y, Tokunaga T, Nishikawa H, and Obari H Evaluation of automatically generated english vocabulary questions Research and practice in technology enhanced learning 2017 12 1 1-21

[42]

Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, and Ashish Sabharwal. Quarel: A dataset and models for answering questions about qualitative relationships. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33 of AAAI ’19, pages 7063–7071, 2019

[43]

Andrew Trask, Phil Michalak, and John Liu. sense2vec - a fast and accurate method for word sense disambiguation in neural word embeddings. arXiv:1511.06388, 2015

[44]

Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, RepL4NLP ’17, pages 191–200, Vancouver, Canada, 2017. Association for Computational Linguistics

[45]

Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, and Preslav Nakov. Generating answer candidates for quizzes and answer-aware question generators. In Proceedings of the Student Research Workshop Associated with RANLP 2021, RANLP ’21, pages 203–209. INCOMA Ltd., 2021

[46]

Dongling Xiao, Han Zhang, Yu-Kun Li, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. ERNIE-GEN: an enhanced multi-flow pre-training and fine-tuning framework for natural language generation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI ’20, pages 3997–4003. ijcai.org, 2020

[47]

Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, and Ming Zhou. Neural question generation from text: A preliminary study. In Natural Language Processing and Chinese Computing, pages 662–671, Cham, 2018. Springer International Publishing

Cited By

Doughty JWan ZBompelli AQayum JWang TZhang JZheng YDoyle ASridhar PAgarwal ABogart CKeylor EKultur CSavelka JSakr M(2024)A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming EducationProceedings of the 26th Australasian Computing Education Conference10.1145/3636243.3636256(114-123)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1145/3636243.3636256
Grévisse CPavlou MSchneider J(2024)Docimological Quality Analysis of LLM-Generated Multiple Choice Questions in Computer Science and MedicineSN Computer Science10.1007/s42979-024-02963-65:5Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1007/s42979-024-02963-6
Maity SDeroy ASarkar S(2024)A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation Using GPTAdvances in Information Retrieval10.1007/978-3-031-56063-7_18(268-277)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56063-7_18

Recommendations

Investigating Learners’ Views of Assessment Types in Massive Open Online Courses (MOOCs)
Design for Teaching and Learning in a Networked World
Abstract
Massive Open Online Courses (MOOCs) are changing the contours of the teaching and learning landscape. Assessment covers an important part of this landscape and may be a key driver for learning. This paper presents preliminary results of a ...
e-Assessment in Mathematics Courses with Multiple-choice Questions Tests
CSEDU 2015: Proceedings of the 7th International Conference on Computer Supported Education - Volume 2

With the implementation of the Bologna Process several challenges have been posed to higher education

institution, particularly in Portugal. One of the main implications is related to the change of the paradigm of

a teacher centered education, to a ...
TreeQuestion: Assessing Conceptual Learning Outcomes with LLM-Generated Multiple-Choice Questions
CSCW

The advances of generative AI have posed a challenge for using open-ended questions to assess conceptual learning outcomes, as it is increasingly common for students to use tools like ChatGPT to generate long textual answers. However, teachers still have ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II

Apr 2022

629 pages

ISBN:978-3-030-99738-0

DOI:10.1007/978-3-030-99739-7

Editors:
Matthias Hagen
Martin Luther University Halle-Wittenberg, Halle, Germany
,
Suzan Verberne
Leiden University, Leiden, The Netherlands
,
Craig Macdonald
University of Glasgow, Glasgow, UK
,
Christin Seifert
University of Duisburg-Essen, Essen, Germany
,
Krisztian Balog
University of Stavanger, Stavanger, Norway
,
Kjetil Nørvåg
Norwegian University of Science and Technology, Trondheim, Norway
,
Vinay Setty
University of Stavanger, Stavanger, Norway

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 10 April 2022

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Doughty JWan ZBompelli AQayum JWang TZhang JZheng YDoyle ASridhar PAgarwal ABogart CKeylor EKultur CSavelka JSakr M(2024)A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming EducationProceedings of the 26th Australasian Computing Education Conference10.1145/3636243.3636256(114-123)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1145/3636243.3636256
Grévisse CPavlou MSchneider J(2024)Docimological Quality Analysis of LLM-Generated Multiple Choice Questions in Computer Science and MedicineSN Computer Science10.1007/s42979-024-02963-65:5Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1007/s42979-024-02963-6
Maity SDeroy ASarkar S(2024)A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation Using GPTAdvances in Information Retrieval10.1007/978-3-031-56063-7_18(268-277)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56063-7_18

View Options

View options

Figures

Tables

Media

View Table of Conten