Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A comparative study of methods for a priori prediction of MCQ difficulty

Published: 01 January 2021 Publication History

Abstract

Successful exams require a balance of easy, medium, and difficult questions. Question difficulty is generally either estimated by an expert or determined after an exam is taken. The latter provides no utility for the generation of new questions and the former is expensive both in terms of time and cost. Additionally, it is not known whether expert prediction is indeed a good proxy for estimating question difficulty.
In this paper, we analyse and compare two ontology-based measures for difficulty prediction of multiple choice questions, as well as comparing each measure with expert prediction (by 15 experts) against the exam performance of 12 residents over a corpus of 231 medical case-based questions that are in multiple choice format. We find one ontology-based measure (relation strength indicativeness) to be of comparable performance (accuracy = 47%) to expert prediction (average accuracy = 49%).

References

[1]
M.E. Abdalla, A.M. Gaffar and R.A. Suliman, Constructing A-type multiple choice questions (MCQs): Step by step manual, 2011.
[2]
N. Afzal, Automatic generation of multiple choice questions using surface-based semantic relations, International Journal of Computational Linguistics (IJCL) 6(3) (2015), 26–44.
[3]
M. Al-Yahya, Ontology-based multiple choice question generation, The Scientific World Journal 2014 (2014).
[4]
T. Alsubait, Ontology-based question generation, PhD thesis, University of Manchester, 2015.
[5]
T. Alsubait, B. Parsia and U. Sattler, Generating multiple choice questions from ontologies: Lessons learnt, in: OWLED, 2014, pp. 73–84.
[6]
S. Banerjee, N.J. Rao and C. Ramanathan, Rubrics for assessment item difficulty in engineering courses, in: 2015 IEEE Frontiers in Education Conference (FIE), 2015, pp. 1–8.
[7]
B.S. Bloom, M.D. Engelhart, E.J. Furst, W.H. Hill and D.R. Krathwohl, Taxonomy of Educational Objectives, Handbook I: The Cognitive Domain, Vol. 19, David McKay Co Inc, New York, 1956.
[8]
R.F. Boldt, GRE analytical reasoning item statistics prediction study, Technical Report, Educational testing services.
[9]
J. Collins, Writing multiple-choice questions for continuing medical education activities and self-assessment modules, RadioGraphics 26(2) (2006), 543–551.
[10]
J. Considine, M. Botti and S. Thomas, Design, format, validity and reliability of multiple choice questions for use in nursing research and education, Collegian 12(1) (2005), 19–24.
[11]
V. Crisp and R. Grayson, Modelling question difficulty in an a level physics examination, Research Papers in Education 28(3) (2013), 346–372.
[12]
L. Crocker and J. Algina, Introduction to Classical and Modern Test Theory, ERIC, 1986.
[13]
M. Cubric and M. Tosic, Towards automatic generation of e-assessment using semantic web technologies, International Journal of e-Assessment 1(1) (2011).
[14]
J.P.W. Cunnington, G.R. Norman, J.M. Blake, W.D. Dauphinee and D.E. Blackmore, Applying learning taxonomies to test items: Is a fact an artifact? in: Advances in Medical Education, A.J.J.A. Scherpbier, C.P.M. van der Vleuten, J.J. Rethans and A.F.W. van der Steeg, eds, Springer, Netherlands, Dordrecht, 1997, pp. 139–142. ISBN 978-94-011-4886-3.
[15]
S.M. Downing, Threats to the validity of locally developed multiple-choice tests in medical education: Construct-irrelevant variance and construct underrepresentation, Advances in Health Sciences Education 7(3) (2002), 235–241.
[16]
M.K. Enright and K. Sheehan, Modeling the difficulty of quantitative reasoning items: Implications for item generation, in: Item Generation for Test Development, S.H. Irvine and P.C. Kyllonen, eds, Routledge, 2002, pp. 129–157.
[17]
T. Freiwald, M. Salimi, E. Khaljani and S. Harendza, Pattern recognition as a concept for multiple-choice questions in a national licensing exam, BMC Medical Education 14(1) (2014), 232.
[18]
K. Gautam, I. Gupta and K. Chandramouli, Conceptual extraction of questions from Wikipedia, in: ET LACNEM, 2013.
[19]
M. Heilman, Automatic factual question generation from text, PhD thesis, Carnegie Mellon University, 2011.
[20]
B.M. Hijji, Flaws of multiple choice questions in teacher-constructed nursing examinations: A pilot descriptive study, Journal of Nursing Education 56(8) (2017), 490–496.
[21]
M.R. Hingorjo and F. Jaleel, Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency, JPMA-Journal of the Pakistan Medical Association 62(2) (2012), 142–147.
[22]
M. Horridge and S. Bechhofer, The OWL API: A Java API for OWL ontologies, Semantic Web 2(1) (2011), 11–21.
[23]
D. Hutzler, E. David, M. Avigal and R. Azoulay, Learning methods for rating the difficulty of reading comprehension questions, in: 2014 IEEE International Conference on Software Science, Technology and Engineering, 2014, pp. 54–62.
[24]
P. Jaccard, Étude comparative de la distribution florale dans une portion des Alpes et des Jura, Bull Soc Vaudoise Sci Nat 37 (1901), 547–579.
[25]
J.D. Kibble and T. Johnson, Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations?, Advances in Physiology Education 35(4) (2011), 396–401.
[26]
L. Kovacs and G. Szeman, Complexity-based generation of multi-choice tests in AQG systems, in: The IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom), 2013, pp. 399–402.
[27]
G. Kurdi, Generation and mining of medical, case-based multiple choice questions, PhD thesis, University of Manchester, 2020.
[28]
G. Kurdi, J. Leo, B. Parsia, U. Sattler and S. Al-Emari, A systematic review of automatic question generation for educational purposes, International Journal of Artificial Intelligence in Education (2019), in press.
[29]
F.-L. Lee and R. Heyworth, Problem complexity: A measure of problem difficulty in algebra by using computer, Education Journal 28(1) (2000), 85–108.
[30]
J. Leo, G. Kurdi, N. Matentzoglu, B. Parsia, S. Forege, G. Donato and W. Dowling, Ontology-based generation of medical, multi-term MCQs, International Journal of Artificial Intelligence in Education (2019).
[31]
M. Liu, R.A. Calvo, A. Aditomo and L.A. Pizzato, Using Wikipedia and conceptual graph structures to generate questions for academic writing support, IEEE Transactions on Learning Technologies 5(3) (2012), 251–263.
[32]
B.S. Malau-Aduli and C. Zimitat, Peer review improves the quality of MCQ examinations, Assessment & Evaluation in Higher Education 37(8) (2012), 919–931.
[33]
J.C. Masters, B.S. Hulsmeyer, M.E. Pike, K. Leichty, M.T. Miller and A.L. Verst, Assessment of multiple-choice questions in selected test banks accompanying text books used in nursing education, Journal of Nursing Education 40(1) (2001), 25–32.
[34]
V. Mesic and H. Muratovic, Identifying predictors of physics item difficulty: A linear regression approach, Physical Review Special Topics – Physics Education Research 7 (2011), 010110, https://link.aps.org/doi/10.1103/PhysRevSTPER.7.010110.
[35]
M. Mukhopadhyay, K. Bhowmick, S. Chakraborty, D. Roy, P.K. Sen and I. Chakraborty, Evaluation of MCQs for judgement of higher levels of cognitive learning, Gomal Journal of Medical Sciences 8(2) (2010).
[36]
S.K. Namdeo and S.D. Rout, Assessment of functional and nonfunctional distracter in an item analysis, International Journal of Contemporary Medical Research 3(7) (2016), 1891–1893.
[37]
R. Nedeau-Cayo, D. Laughlin, L. Rus and J. Hall, Assessment of item-writing flaws in multiple-choice questions, Journal for Nurses in Professional Development 29(2) (2013), 52–57.
[38]
J. Pais, A. Silva, B. Guimarães, A. Povo, E. Coelho, F. Silva-Pereira, I. Lourinho, M.A. Ferreira and M. Severo, Do item-writing flaws reduce examinations psychometric quality?, BMC Research Notes 9(1) (2016), 399.
[39]
A. Papasalouros, K. Kanaris and K. Kotis, Automatic generation of multiple choice questions from domain ontologies, in: IADIS International Conference E-Learning, 2008, pp. 427–434.
[40]
B. Parsia, T. Alsubait, J. Leo, V. Malaisé, S. Forge, M. Gregory and A. Allen, Lifting EMMeT to OWL getting the most from SKOS, in: Ontology Engineering: 12th International Experiences and Directions Workshop on OWL, OWLED 2015, Co-Located with ISWC 2015, Bethlehem, PA, USA, October, October 9-10, 2015, V. Tamma, M. Dragoni, R. Gonçalves and A. Ławrynowicz, eds, Revised Selected Papers Springer International Publishing, Cham, 2016, pp. 69–80. ISBN 978-3-319-33245-1.
[41]
M.C. Rodríguez-Díez, M. Alegre, N. Díez, L. Arbea and M. Ferrer, Technical flaws in multiple-choice questions in the access exam to medical specialties (“examen MIR”), in: Spain (2009–2013), BMC Medical Education, Vol. 16, 2016, p. 47.
[42]
B.R. Rush, D.C. Rankin and B.J. White, The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value, BMC Medical Education 16(1) (2016), 250.
[43]
Y. Sarin, M. Khurana, M. Natu, A.G. Thomas and T. Singh, Item analysis of published MCQs, Indian pediatrics 35(11) (1998), 1103–1105.
[44]
J.D. Scheuneman, Y.V. Fan and S.G. Clyman, An investigation of the difficulty of computer-based case simulations, Medical Education 32(2) (1998), 150–158.
[45]
L.W.T. Schuwirth, M.M. Verheggen, C.P.M. Van Der Vleuten, H.P.A. Boshuizen and G.J. Dinant, Do short cases elicit different thinking processes than factual knowledge questions do?, Medical Education 35(4) (2001), 348–356.
[46]
A. Singh Bhatia, M. Kirti and S.K. Saha, Automatic generation of multiple choice questions using Wikipedia, in: Pattern Recognition and Machine Intelligence, P. Maji, A. Ghosh, M.N. Murty, K. Ghosh and S.K. Pal, eds, Springer, Berlin Heidelberg, 2013, pp. 733–738. ISBN 978-3-642-45062-4.
[47]
J. Stiller, S. Hartmann, S. Mathesius, P. Straube, R. Tiemann, V. Nordmeier, D. Krüger and A.U. zu Belzen, Assessing scientific reasoning: A comprehensive evaluation of item features that affect item difficulty, Assessment & Evaluation in Higher Education 41(5) (2016), 721–732.
[48]
M. Tarrant, J. Ware and A.M. Mohammed, An assessment of functioning and non-functioning distractors in multiple-choice questions: A descriptive analysis, BMC Medical Education 9(1) (2009), 40.
[49]
G. van de Watering and J. van der Rijt, Teachers’ and students’ perceptions of assessments: A review and a study into the ability and accuracy of estimating the difficulty levels of assessment items, Educational Research Review 1(2) (2006), 133–147, http://www.sciencedirect.com/science/article/pii/S1747938X06000236.
[50]
E.V. Vinu and P.S. Kumar, A novel approach to generate MCQs from domain ontology: Considering DL semantics and open-world assumption, Web Semantics: Science, Services and Agents on the World Wide Web 34 (2015), 40–54, http://www.sciencedirect.com/science/article/pii/S1570826815000475.
[51]
E.V. Vinu and P.S. Kumar, Difficulty-level modeling of ontology-based factual questions, Semantic Web Journal (2018), in press.
[52]
J. Ware and T. Vik, Quality assurance of item writing: During the introduction of multiple choice questions in medicine for high stakes examinations, Medical Teacher 31(3) (2009), 238–243.
[53]
J.D. Wasserman and B.A. Bracken, Psychometric characteristics of assessment procedures, in: Handbook of Psychology, John Wiley and Sons, Inc., 2003. ISBN 9780471264385.
[54]
S. Williams, Generating mathematical word problems, in: The Association for the Advancement of Artificial Intelligence AAAI Fall Symposium: Question Generation, 2011, pp. 61–64.

Index Terms

  1. A comparative study of methods for a priori prediction of MCQ difficulty
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Semantic Web
            Semantic Web  Volume 12, Issue 3
            Open Science Data and the Semantic Web Journal
            2021
            136 pages
            ISSN:1570-0844
            EISSN:2210-4968
            Issue’s Table of Contents

            Publisher

            IOS Press

            Netherlands

            Publication History

            Published: 01 January 2021

            Author Tags

            1. Ontologies
            2. semantic web
            3. automatic question generation
            4. difficulty modelling
            5. difficulty prediction
            6. multiple choice questions
            7. student assessment

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 27 Jan 2025

            Other Metrics

            Citations

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media