research-article

A bounded ability estimation for computerized adaptive testing

AUTHORs:

Zachary A. Pardos,

Xin LiAuthors Info & Claims

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Article No.: 111, Pages 2381 - 2402

Published: 30 May 2024 Publication History

Abstract

Computerized adaptive testing (CAT), as a tool that can efficiently measure student's ability, has been widely used in various standardized tests (e.g., GMAT and GRE). The adaptivity of CAT refers to the selection of the most informative questions for each student, reducing test length. Existing CAT methods do not explicitly target ability estimation accuracy since there is no student's true ability as ground truth; therefore, these methods cannot be guaranteed to make the estimate converge to the true with such limited responses. In this paper, we analyze the statistical properties of estimation and find a theoretical approximation of the true ability: the ability estimated by full responses to question bank. Based on this, a Bounded Ability Estimation framework for CAT (BECAT) is proposed in a data-summary manner, which selects a question subset that closely matches the gradient of the full responses. Thus, we develop an expected gradient difference approximation to design a simple greedy selection algorithm, and show the rigorous theoretical and error upper-bound guarantees of its ability estimate. Experiments on both real-world and synthetic datasets, show that it can reach the same estimation accuracy using 15% less questions on average, significantly reducing test length.

References

[1]

Zachary A Pardos. Big data in education and the models that love them. Current opinion in behavioral sciences, 18:107-113, 2017.

[2]

Jill-Jênn Vie, Fabrice Popineau, Éric Bruillard, and Yolaine Bourda. A review of recent advances in adaptive assessment. Learning analytics: fundaments, applications, and trends, pages 113-142, 2017.

[3]

Wim J Van der Linden and Peter J Pashley. Item selection and ability estimation in adaptive testing. In Computerized adaptive testing: Theory and practice, pages 1-25. Springer, 2000.

[4]

Andrew S Lan, Andrew E Waters, Christoph Studer, and Richard G Baraniuk. Sparse factor analysis for learning and content analytics. Journal of Machine Learning Research (JMLR), 2014.

[5]

Wynne Harlen. The Assessment of Scientific Literacy in the OECD/PISA Project, pages 49-60. Springer Netherlands, Dordrecht, 2001.

[6]

Zheng Zhang, Qi Liu, Hao Jiang, Fei Wang, Yan Zhuang, Le Wu, Weibo Gao, and Enhong Chen. Fairlisa: Fair user modeling with limited sensitive attributes information. Advances in Neural Information Processing Systems, 2023.

[7]

Frederic M Lord. Applications of item response theory to practical testing problems. Routledge, 2012.

[8]

Hua-Hua Chang and Zhiliang Ying. A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3):213-229, 1996.

[9]

Haoyang Bi, Haiping Ma, Zhenya Huang, Yu Yin, Qi Liu, Enhong Chen, Yu Su, and Shijin Wang. Quality meets diversity: A model-agnostic framework for computerized adaptive testing. In 2020 IEEE International Conference on Data Mining (ICDM), pages 42-51. IEEE, 2020.

[10]

Darkhan Nurakhmetov. Reinforcement learning applied to adaptive classification testing. In Theoretical and Practical Advances in Computer-based Educational Measurement, pages 325-336. Springer, Cham, 2019.

[11]

Xiao Li, Hanchen Xu, Jinming Zhang, and Hua-hua Chang. Deep reinforcement learning for adaptive learning systems. arXiv preprint arXiv:2004.08410, 2020.

[12]

Aritra Ghosh and Andrew Lan. Bobcat: Bilevel optimization-based computerized adaptive testing. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 2410-2417. International Joint Conferences on Artificial Intelligence Organization, 8 2021.

[13]

Yan Zhuang, Qi Liu, Zhenya Huang, Zhi Li, Shuanghong Shen, and Haiping Ma. Fully adaptive framework: Neural computerized adaptive testing for online education. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4):4734-4742, Jun. 2022.

[14]

Dan Feldman. Introduction to core-sets: an updated survey. CoRR, abs/2011.09384, 2020.

[15]

Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for data-efficient training of machine learning models. In Proceedings of the 37th International Conference on Machine Learning, ICML'20. JMLR.org, 2020.

Digital Library

[16]

Zixiu Wang, Yiwen Guo, and Hu Ding. Robust and fully-dynamic coreset for continuous-and-bounded learning (with outliers) problems. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 14319-14331. Curran Associates, Inc., 2021.

[17]

Hua-Hua Chang. Psychometrics behind computerized adaptive testing. Psychometrika, 80(1):1-20, 2015.

[18]

Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157-166, 1994.

Digital Library

[19]

Rainer Gemulla, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, page 69-77, New York, NY, USA, 2011. Association for Computing Machinery.

Digital Library

[20]

Omead Pooladzandi, David Davini, and Baharan Mirzasoleiman. Adaptive second order coresets for data-efficient machine learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 17848-17869. PMLR, 17-23 Jul 2022.

[21]

Krishnateja Killamsetty, Durga S, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Gradmatch: Gradient matching based data subset selection for efficient deep model training. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5464-5474. PMLR, 18-24 Jul 2021.

[22]

H. Lin, J. Bilmes, and S. Xie. Graph-based submodular selection for extractive summarization. In IEEE Automatic Speech Recognition and Understanding Workshop, 2009.

[23]

Ehsan Kazemi, Morteza Zadimoghaddam, and Amin Karbasi. Scalable deletion-robust submodular maximization: Data summarization with privacy and fairness constraints. In International conference on machine learning, pages 2544-2553. PMLR, 2018.

[24]

Sheng-jun Huang, Rong Jin, and Zhi-Hua Zhou. Active learning by querying informative and representative examples. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010.

[25]

Zheng Wang and Jieping Ye. Querying discriminative and representative samples for batch mode active learning. ACM Trans. Knowl. Discov. Data, 9(3), feb 2015.

[26]

Christos Boutsidis, Petros Drineas, and Michael W Mahoney. Unsupervised feature selection for the k-means clustering problem. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009.

[27]

Qi Liu, Biao Xiang, Enhong Chen, Hui Xiong, Fangshuang Tang, and Jeffrey Xu Yu. Influence maximization over large-scale social networks: A bounded linear approach. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pages 171-180, 2014.

Digital Library

[28]

M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximations for maximizing submodular set functions. CORE Discussion Papers RP, 1978.

[29]

Susan E Embretson and Steven P Reise. Item response theory. Psychology Press, 2013.

[30]

Michel Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In J. Stoer, editor, Optimization Techniques, pages 234-243, Berlin, Heidelberg, 1978. Springer Berlin Heidelberg.

[31]

Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '07, page 420-429, New York, NY, USA, 2007. Association for Computing Machinery.

Digital Library

[32]

Yan Zhuang, Qi Liu, Zhenya Huang, Zhi Li, Binbin Jin, Haoyang Bi, Enhong Chen, and Shijin Wang. A robust computerized adaptive testing approach in educational question retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '22, page 416-426, New York, NY, USA, 2022. Association for Computing Machinery.

Digital Library

[33]

Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Hui Xiong, Yu Su, and Guoping Hu. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering, 33(1):100-115, 2019.

Digital Library

[34]

Zachary A Pardos, Ryan SJD Baker, Maria OCZ San Pedro, Sujith M Gowda, and Supreeth M Gowda. Affective states and state tests: Investigating how affect throughout the school year predicts end of year learning outcomes. In Proceedings of the third international conference on learning analytics and knowledge, pages 117-124, 2013.

Digital Library

[35]

Zichao Wang, Angus Lamb, Evgeny Saveliev, Pashmina Cameron, Yordan Zaykov, José Miguel Hernández-Lobato, Richard E Turner, Richard G Baraniuk, Craig Barton, Simon Peyton Jones, Simon Woodhead, and Cheng Zhang. Diagnostic questions: The neurips 2020 education challenge. arXiv preprint arXiv:2007.12061, 2020.

[36]

Fei Wang, Qi Liu, Enhong Chen, Zhenya Huang, Yuying Chen, Yu Yin, Zai Huang, and Shijin Wang. Neural cognitive diagnosis for intelligent education systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 6153-6161, 2020.

[37]

Mark D. Reckase. Multidimensional Item Response Theory Models, pages 79-112. Springer New York, New York, NY, 2009.

[38]

Andreas Tscher. Collaborative filtering applied to educational data mining. Journal of Machine Learning Research, 2010.

[39]

Michel C. Desmarais. Mapping question items to skills with non-negative matrix factorization. SIGKDD Explor. Newsl., 13(2):30-36, may 2012.

Digital Library

[40]

Anita Krishnakumar. Active learning literature survey. 07 2007.

[41]

Maksims Volkovs, Guangwei Yu, and Tomi Poutanen. Dropoutnet: Addressing cold start in recommender systems. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.

[42]

Ucik Fitri Handayani, Cholis Sa'dijah, Sisworo Sisworo, Mukhtamilatus Sa'diyah, and Lathiful Anwar. Mathematical creative thinking skill of middle-ability students in solving contextual problems. volume 2215, page 060007, 04 2020.

[43]

Zhemin Zhu, David Arthur, and Hua-Hua Chang. A new person-fit method based on machine learning in cdm in education. British Journal of Mathematical and Statistical Psychology, 75(3):616-637, 2022.

[44]

Duc Nguyen and Anderson Ye Zhang. A spectral approach to item response theory. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 38818-38830. Curran Associates, Inc., 2022.

[45]

Xinping Wang, Caidie Huang, Jinfang Cai, and Liangyu Chen. Using knowledge concept aggregation towards accurate cognitive diagnosis. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 2010-2019, 2021.

Digital Library

[46]

Weibo Gao, Qi Liu, Zhenya Huang, Yu Yin, Haoyang Bi, Mu-Chun Wang, Jianhui Ma, Shijin Wang, and Yu Su. Rcd: Relation map driven cognitive diagnosis for intelligent education systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 501-510, 2021.

Digital Library

[47]

Lawrence M Rudner. An examination of decision-theory adaptive testing procedures. In annual meeting of the American Educational Research Association, 2002.

[48]

Wim J van der Linden. Bayesian item selection criteria for adaptive testing. Psychometrika, 63(2):201-216, 1998.

[49]

Wim JJ Veerkamp and Martijn PF Berger. Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22(2):203-226, 1997.

[50]

Ronald L Flaugher. The many definitions of test bias. American Psychologist, 33(7):671, 1978.

[51]

Amina Adadi. A survey on data-efficient algorithms in big data era. Journal of Big Data, 8(1):24, Jan 2021.

[52]

Baharan Mirzasoleiman, Kaidi Cao, and Jure Leskovec. Coresets for robust training of deep neural networks against noisy labels. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 11465-11477. Curran Associates, Inc., 2020.

[53]

Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, STOC '04, page 291-300, New York, NY, USA, 2004. Association for Computing Machinery.

Digital Library

[54]

Hongbin Pei, Bo Yang, Jiming Liu, and Kevin Chen-Chuan Chang. Active surveillance via group sparse bayesian learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1133-1148, 2022.

[55]

Yi-Fan Yan, Sheng-Jun Huang, Shaoyi Chen, Meng Liao, and Jin Xu. Active learning with query generation for cost-effective text classification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):6583-6590, Apr. 2020.

[56]

Wenbin Cai, Yexun Zhang, Ya Zhang, Siyuan Zhou, Wenquan Wang, Zhuoxiang Chen, and Chris Ding. Active learning for classification with maximum model change. ACM Trans. Inf. Syst., 36(2), aug 2017.

[57]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. Dataset distillation, 2018.

[58]

Angelia Nedić and Dimitri Bertsekas. Convergence rate of incremental subgradient algorithms. In Stochastic optimization: algorithms and applications, pages 223-264. Springer, 2001.

[59]

George B Arfken and Hans J Weber. Mathematical methods for physicists, 1999.

[60]

Satoru Fujishige. Submodular systems and related topics. In Mathematical Programming at Oberwolfach II, pages 113-131. Springer, 1984.

[61]

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR (Poster), 2015.

[62]

Andrew P Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7):1145-1159, 1997.

Digital Library

[63]

Ying Cheng. When cognitive diagnosis meets computerized adaptive testing: Cd-cat. Psychometrika, 74(4):619-632, 2009.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

December 2023

80772 pages

Copyright © 2023 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 30 May 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents