Abstract
School dropout is a significant issue in distance learning, and early detection is crucial for addressing the problem. Our study aims to create a binary classification model that anticipates students’ activity levels based on their current achievements and engagement on a Canadian Distance learning Platform. Predicting student dropout, a common classification problem in educational data analysis, is addressed by utilizing a comprehensive dataset that includes 49 features ranging from socio-demographic to behavioral data. This dataset provides a unique opportunity to analyze student interactions and success factors in a distance learning environment. We have developed a student profiling system and implemented a predictive approach using XGBoost, selecting the most important features for the prediction process. In this work, our methodology was developed in Python, using the widely used sci-kit-learn package. Alongside XGBoost, logistic regression was also employed as part of our combination of strategies to enhance the models predictive capabilities. Our work can accurately predict student dropout, achieving an accuracy rate of approximately 82% on unseen data from the next academic year.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability Statement
The datasets generated and/or analyzed during this study are not publicly available. This is due to a confidentiality agreement and the fact that the data is hosted exclusively on ChallengeU’s server. The data is proprietary to ChallengeU, thus it is not accessible to anyone outside of the company. This study was conducted within the company’s servers using this non-public data.
References
Alam, R., Ahmad, N., Shahab, S., & Anjum, M. (2023) Prediction of dropout students in massive open online courses using ensemble learning: A pilot study in postcovid academic session. In: Mobile computing and sustainable informatics (pp. 549–565)
Alario-Hoyos, C., Estévez-Ayres, I., Pérez-Sanagustín, M., Kloos, C. D., & Fernández-Panadero, C. (2017). Understanding learners’ motivation and learning strategies in moocs. The International Review of Research in Open and Distributed Learning, 18, 119–137.
Alhramelah, A., & Alshahrani, H. A. (2020). Saudi graduate student acceptance of blended learning courses based upon the unified theory of acceptance and use of technology. Australian Educational Computing, 35, 1–22.
Bonifro, F. D., Gabbrielli, M., Lisanti, G., & Zingaro, S. P. (2020). Student dropout prediction. Artificial Intelligence in Education, 12163, 129–140.
Chen, J., Feng, J., Sun, X., Wu, N., Yang, Z., & Chen, S.-S. (2019). Mooc dropout prediction using a hybrid algorithm based on decision tree and extreme learning machine. Journal Hindawi Mathematical Problems in Engineering
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Issah, I., Appiah, O., Appiahene, P., & Inusah, F. (2023). A systematic review of the literature on machine learning application of determining the attributes influencing academic performance. Decision Analytics Journal, 7, 100204.
Kemper, L., Vorhoff, G., & Wigger, B. U. (2020). Predicting student dropout: A machine learning approach. European Journal of Higher Education, 10, 28–47.
King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.
Krüger, J. G. C., Souza Britto, A., & Barddal, J. P. (2023). An explainable machine learning approach for student dropout prediction. Expert Systems with Applications, 233, 120933.
Oz, H. C., Güven, Ç., & Nápoles, G. (2022). School dropout prediction and feature importance exploration in malawi using household panel data: machine learning approach. Journal of Computational Social Science, 6, 245–287.
Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26, 217–222.
Pardos, Z.A., Baker, R., Pedro, M. O. S., Gowda, S. M., & Gowda, S. M. (2013). Affective states and state tests: investigating how affect throughout the school year predicts end of year learning outcomes. In: International Conference on Learning Analytics and Knowledge
Pereira, F. D., Oliveira, E. H. T., Cristea, A. I., Fernandes, D., Silva, L., Aguiar, G., Alamri, A., & Alshehri, M. (2019). Early dropout prediction for programming courses supported by online judges. In: International Conference on Artificial Intelligence in Education
Prenkaj, B., Velardi, P., Stilo, G., Distante, D., & Faralli, S. (2020). A survey of machine learning approaches for student dropout prediction in online courses. ACM Computing Surveys (CSUR), 53, 1–34.
scolaire. (2023). https://www.ledevoir.com/opinion/idees/753858/milieux-defavorisesplus-de-10-000-decrocheurs-scolaires-au-quebec
Shiao, Y. -T., Chen, C. -H., Wu, K. -F., Chen, B. -L., Chou, Y. -H., & Wu, T. -N. (2023). Reducing dropout rate through a deep learning model for sustainable education: long-term tracking of learning outcomes of an undergraduate cohort from 2018 to 2021. Smart Learning Environments, 10
Solís, M., Moreira, T. M. B., Gonzalez, R., Fernandez, T., & Hernandez, M. (2018). Perspectives to predict dropout in university students with machine learning. IEEE International Work Conference on Bioinspired Intelligence (IWOBI), 2018, 1–6.
Wang, L., & Wang, H. (2019). Learning behavior analysis and dropout rate prediction based on moocs data. 2019 10th International Conference on Information Technology in Medicine and Education (ITME), 419–423
Funding
This work was supported by Ministry of the Economy in Canada.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zerkouk, M., Mihoubi, M., Chikhaoui, B. et al. A machine learning based model for student’s dropout prediction in online training. Educ Inf Technol 29, 15793–15812 (2024). https://doi.org/10.1007/s10639-024-12500-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-024-12500-w