Abstract
Learning performance data, such as correct or incorrect responses to questions in Intelligent Tutoring Systems (ITSs) is crucial for tracking and assessing the learners’ progress and mastery of knowledge. However, the issue of data sparsity, characterized by unexplored questions and missing attempts, hampers accurate assessment and the provision of tailored, personalized instruction within ITSs. This paper proposes using the Generative Adversarial Imputation Networks (GAIN) framework to impute sparse learning performance data, reconstructed into a three-dimensional (3D) tensor representation across the dimensions of learners, questions and attempts. Our customized GAIN-based method computational process imputes sparse data in a 3D tensor space, significantly enhanced by convolutional neural networks for its input and output layers. This adaptation also includes the use of a least squares loss function for optimization and aligns the shapes of the input and output with the dimensions of the questions-attempts matrices along the learners’ dimension. Through extensive experiments on six datasets from various ITSs, including AutoTutor, ASSISTments and MATHia, we demonstrate that the GAIN approach generally outperforms existing methods such as tensor factorization and other generative adversarial network (GAN) based approaches in terms of imputation accuracy. This finding enhances comprehensive learning data modeling and analytics in AI-based education.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
AutoTutor Moodel Website: https://sites.autotutor.org/; Adult Literacy and Adult Education Website: https://adulted.autotutor.org/.
- 2.
ASSISTments Website: https://new.assistments.org/.
- 3.
MATHia Website: https://www.carnegielearning.com/solutions/math/mathia/.
- 4.
Assistments 2008–2009: https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=388.
- 5.
- 6.
MATHia 2019–2020: https://pslcdatashop.web.cmu.edu/Project?id=720.
References
Psathas, G., Chatzidaki, T.K., Demetriadis, S.N.: Predictive modeling of student dropout in MOOCs and self-regulated learning. Computers 12(10), 194 (2023)
Baker, R.S.: Modeling and understanding students’ off-task behavior in intelligent tutoring systems. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1059–1068 (2007)
Saarela, M.: Automatic knowledge discovery from sparse and largescale educational data: case Finland, PhD thesis. University of Jyväskylä (2017)
Greer, J., Mark, M.: Evaluation methods for intelligent tutoring systems revisited. Int. J. Artif. Intell. Edu. 26(1), 387–392 (2016)
Batista, G.E., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17(5-6), 519–533 (2003)
Donders, A.R., et al.: A gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)
Zhang, Z.: Missing data imputation: focusing on single imputation. Ann. Trans. Med. 4(1) (2016)
Rubin, D.B.: Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. In: Proceedings of the survey research methods section Of the American Statistical Association. Vol. 1, pp. 20–34 American Statistical Association Alexandria, VA, USA (1978)
Rubin, D.B.: Assignment to treatment group on the basis of a covariate. In: J. Edu. Stat. 2(1), 1–26 (1977)
Seaman, S.R., Bartlett, J.W., White, I.R.: Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. In: BMC Medical Research Methodology 12, pp. 1–13 (2012)
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Proce. Syste. 27 (2014)
Yoon, J., Jordon, J., Schaar, M.: Gain: missing data imputation using generative adversarial nets. In: International Conference on Machine Learning. PMLR, pp. 5689–5698 (2018)
Dong, W., et al.: Generative adversarial networks for imputing missing data for big data clinical research. BMC Med. Res. Methodol. 21, 1–10 (2021)
Zhang, Y., Zhang, R., Zhao, B.: A systematic review of generative adversarial imputation network in missing data imputation. Neural Comput. Appl. 35(27), 19685–19705 (2023)
Wenyang, H., Wang, T., Chu, F.: Fault feature recovery with Wasserstein generative adversarial imputation network with gradient penalty for rotating machine health monitoring under signal loss condition. IEEE Trans. Instrum. Meas. 71, 1–12 (2022)
Chen, P., Lu, Y., Zheng, V.W., Pian, Y.: Prerequisite-driven deep knowledge tracing. In: 2018 IEEE international conference on data mining (ICDM), pp. 39–48. IEEE (2018)
Pandey, S., Karypis, G.: A self-attentive model for knowledge tracing. In: arXiv preprint arXiv:1907.06837 (2019)
Wang, T., Ma, F., Gao, J.: Deep hierarchical knowledge tracing. In: Proceedings of the 12th International Conference on Educational Data Mining (2019)
Novak, J.D., Cañas, A.J.: The theory underlying concept maps and how to construct them. Florida Inst. Human Mach. Cogn. 1(1), 1–31 (2006)
Thai-Nghe, N., et al.: “Factorization techniques for predicting student performance”. In: Educational Recommender Systems and Technologies: Practices and Challenges. IGI Global, pp. 129–153 (2012)
Conway, C.M., Christiansen, M.H.: Sequential learning in non-human primates. In: Trends Cognitive Sci. 5(12), 539–546 (2001)
Conway, C.M.: Sequential Learning. In: Encyclopedia of the Sciences of Learning. Ed. by Norbert M. Seel. https://doi.org/10.1007/978-1-4419-1428-6_72. Boston, MA: Springer US, pp. 3047–3050. isbn: 978-1-4419-1428-6 (2012). https://doi.org/10.1007/978-1-4419-1428-6_72
Thai-Nghe, N., et al.: “Matrix and tensor factorization for predicting student performance”. In: International Conference on Computer Supported Education. Vol. 2. SciTePress, pp. 69–78 (2011)
Sahebi, S., Lin, Y.R., Brusilovsky, P.: Tensor factorization for student modeling and performance prediction in unstructured domain.” In: International Educational Data Mining Society (2016)
Morales-Alvarez, P., et al.: Simultaneous missing value imputation and structure learning with groups. Adv. Neural. Inf. Process. Syst. 35, 20011–20024 (2022)
Boyle, A., et al.: EEDI evaluation report (2021)
Ma, C., Zhang, C.: Identifiable generative models for missing not at random data imputation. Adv. Neural. Inf. Process. Syst. 34, 27645–27658 (2021)
Zhang, L., et al.: “3DG: a framework for using generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring Systems”. In: arXiv preprint arXiv:2402.01746 (2024)
Graesser, A.C., et al.: “Reading comprehension lessons in AutoTutor for the center for the study of adult literacy”. In: Adaptive educational technologies for literacy instruction. Routledge, pp. 288–293 (2016)
Heffernan, N.T., Heffernan, C.L.: The ASSISTments ecosystem: building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching”. Int. J. Artif. Intell. Edu. 24, 470–497 (2014)
Ritter, S., et al.: Cognitive Tutor: applied research in mathematics education. Psychon. Bull. Rev. 14, 249–255 (2007)
Pathak, D., et al.: “Context encoders: Feature learning by inpainting”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 2016
LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Proce. Syst. 2 (1989)
Mao, X., et al.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Yoon, S., Sull, S.: GAMIN: generative adversarial multiple imputation network for highly missing data”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8456–8464 (2020)
Yudelson, M.V., Koedinger, K.R., Gordon, G.J.: Individualized Bayesian knowledge tracing models. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS (LNAI), vol. 7926, pp. 171–180. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39112-5_18
Gervet, T., et al.: “When is deep learning the best approach to knowledge tracing?” J. Edu. Data Mining 12(3), 31–54 (2020)
Pavlik, P.I., Eglington, L.G., Harrell-Williams, L.M.: “Logistic knowledge tracing: a constrained framework for learner modeling”. IEEE Trans. Learn. Technol. 14(5), 624–639 (2021)
Chen, X., et al.: “Infogan: Interpretable representation learning by information maximizing generative adversarial nets”. Adv. Neural Inf. Proce. Syst. 29 (2016)
Wang, Y., et al.: PC-GAIN: pseudo-label conditional generative adversarial imputation networks for incomplete data. Neural Netw. 141, 395–403 (2021)
Rubin, D.B.: “Inference and missing data”. Biometrika 63(3), pp. 581–592 (1976)
Baker, R.S.J., Corbett, A.T., Aleven, V.: More accurate student modeling through contextual estimation of slip and guess probabilities in Bayesian knowledge tracing. In: Woolf, B.P., Aïmeur, E., Nkambou, R., Lajoie, S. (eds.) ITS 2008. LNCS, vol. 5091, pp. 406–415. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69132-7_44
Corbett, A.T., Anderson, J.R.: “Knowledge tracing: Modeling the acquisition of procedural knowledge”. In: User Modeling and User-adapted Interaction vol. 4, pp. 253–278 (1994)
Essa, A.: A possible future for next generation adaptive learning systems. Smart Learn. Environ. 3(1), 1–24 (2016). https://doi.org/10.1186/s40561-016-0038-y
Thai-Nghe, N.,et al.: “Factorization Models for Forecasting Student Performance.” In: EDM. Eindhoven, pp. 11–20 (2011)
Ramscar, M.: Learning and the replicability of priming effects. Curr. Opin. Psychol. 12, 80–84 (2016)
Zhang, L., et al.: “Exploring the individual differences in multidimensional evolution of knowledge states of learners”. In: International Conference on Human-Computer Interaction. Springer, pp. 265–284 (2023)
Doan, T.N., Sahebi, S.: “Rank-based tensor factorization for student performance prediction”. In: 12th International Conference on Educational Data Mining (EDM) (2019)
Carroll, J.D., Chang, J.J.: “Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition”. In: Psychometrika 35(3), pp. 283–319 (1970)
Harshman, R.A., et al.: “Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis”. In: UCLA working papers in phonetics 16(1), pp. 84 (1970)
Xiong, L., et al.: “Temporal collaborative filtering with bayesian probabilistic tensor factorization”. In: Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM, pp. 211–222 (2010)
Morise, H., Oyama, S., Kurihara, M.: Bayesian probabilistic tensor factorization for recommendation and rating aggregation with multicriteria evaluation data. Expert Syst. Appl. 131, 1–8 (2019)
Bora, A., Price, E., Dimakis, A.G.: “AmbientGAN: Generative models from lossy measurements”. In: International conference on learning representations (2018)
Spearman, C.: “The proof and measurement of association between two things” (1961)
Vincent, P., et al. “Extracting and composing robust features with denoising autoencoders”. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)
Zhang, L., et al.: “Predicting learning performance with large language models: a study in adult literacy”. In: International Conference on Human-Computer Interaction. Springer, pp. 333–353 (2024)
Zhang, L., et al.: “SPL: A Socratic Playground for Learning Powered by Large Language Mode”. In: arXiv preprint arXiv:2406.13919 (2024)
Acknowledgements
This paper is grateful to Prof. Philip I. Pavlik Jr. from the University of Memphis and Prof. Shaghayegh Sahebi from the University at Albany - SUNY for their invaluable assistance with tensor factorization in the early stages of this research. Additionally, we extend our thanks to Prof. Arthur C. Graesser, also from the University of Memphis, for his insightful communications that significantly enriched our understanding and inspired deeper analytical thinking.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, L., Yeasin, M., Lin, J., Havugimana, F., Hu, X. (2025). Generative Adversarial Networks for Imputing Sparse Learning Performance. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15306. Springer, Cham. https://doi.org/10.1007/978-3-031-78172-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-78172-8_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78171-1
Online ISBN: 978-3-031-78172-8
eBook Packages: Computer ScienceComputer Science (R0)