Predicting Student Performance Using Data Mining and Learning Analytics Techniques: A Systematic Literature Review
Abstract
:Featured Application
Abstract
1. Introduction
- Deeply understand the intelligent approaches and techniques developed to forecast student learning outcomes, which represent the student academic performance.
- Compare the performance of existing models and techniques on different aspects, including their accuracy, strengths, and weaknesses.
- Specify the dominant predictors (e.g., factors and features) of student learning outcomes based on evidence from the synthesis.
- Identify the research challenges and limitations facing the current intelligent techniques for predicting academic performance using learning outcomes.
- Highlight future research areas to ameliorate the prediction of student performance using learning outcomes.
2. Background and Related Works
2.1. Student Outcomes
2.2. Student Performance
2.3. Existing Student Performance Reviews and Literature Gaps
3. Survey Methodology
- RQ1-Learning Outcomes Prediction. How is student academic performance measured using learning outcomes?
- RQ2-Academic Performance Prediction Approaches. What intelligent models and techniques are devised to forecast student academic performance using learning outcomes?
- RQ3-Academic Performance Predictors. What dominant predictors of student performance using learning outcomes are reported?
3.1. Inclusion Criteria
3.2. Data Extraction
- General information about the publication, for instance, publication year, venue type, country of publication, and number of authors;
- Educational dataset and context of prediction (e.g., students, courses, school, university, … etc.);
- Input variables used for student outcomes prediction and the form in which they were predicted;
- Intelligent models and approaches used for the prediction of academic performance;
- Significant predictors of learning outcomes.
4. Survey Results
4.1. Publication Venues and Years
4.2. Experimental Datasets and the Context of Performance Prediction
4.3. Learning Outcomes as Indicators of Student Performance
4.4. Predictive Models of Learning Outcomes
4.5. Dominant Factors Predicting Student Learning Outcomes
4.6. Quality Assessment of Reviewed Models
5. Discussion
5.1. Key Findings
- RQ1-Learning Outcomes Prediction. How is student academic performance measured using learning outcomes?
- RQ2-Academic Performance Prediction Approaches. What intelligent approaches and techniques are devised to forecast student academic performance using learning outcomes?
- RQ3-Academic Performance Predictors. What dominant predictors of student performance using learning outcomes are reported?
5.2. Challenges and Weaknesses of Existing Predictive Models
- Research challenge one: The prediction of academic performance of student cohorts to assist in the automation of course and program-level outcomes assessment.
- Research challenge two: The use and availability of multiple datasets from various disciplines to strengthen the validity of the predictive model. The datasets should comprise a large sample size of students to draw any meaningful conclusions.
- Research challenge three: The inspection of the effects of different features on the attainment of student outcomes to contribute to academic corrective interventions in higher education, i.e., the shift from predictive analytics to explanatory analytics.
- Research challenge four: The use of multiple performance evaluation metrics to assess the quality of the learning outcomes predictions.
- Research challenge five: The lack of unsupervised learning techniques devised to forecast student attainment of the learning outcomes.
- Research challenge six: The application of automated machine learning (i.e., AutoML) to the problem of student outcomes prediction was rarely conducted, except in [84]. Addressing this challenge would enable the development of ML models that automate the machine learning pipeline tasks, making the tasks of featurization, classification, and forecasting efficient and accessible to the non-technical audience (e.g., education leaders and course instructors) in different disciplines.
5.3. Threats to Validity
- Defined the methodology, including search key terms and phrases, publication venues … etc., to enable the replicability of the survey.
- Used the manual search to incorporate any missing articles in the synthesis.
- Applied the appropriate inclusion and exclusion criteria to focus on student performance modeling using learning outcomes. These constituted the selection criteria of the survey.
- Selected all studies that meet the inclusion criteria irrespective of the researchers’ background or nationality to eliminate any culture bias.
- Ensured that the primary studies are not repeated in the synthesis by removing the duplicates.
- Defined the quality assessment criteria based on previous surveys and recommendations [3].
5.4. Survey Limitations
6. Practical Implications and Recommendations
- Recommendation one: Formalize a clear definition of the variable ‘learning outcomes’ before embarking on the development of predictive models that measure the attainment of learning outcomes.
- Recommendation two: Build predictive models for non-technical majors, e.g., humanities, and for supporting teaching and learning in developing countries. These educational settings and contexts have different characteristics and features; therefore, specialized analytics models ought to be developed to work correctly in these settings.
- Recommendation three: Produce and share educational datasets for other researchers to explore and use after anonymizing any sensitive student data.
- Recommendation four: Build intelligent models that predict program-level outcomes as well as cohort academic performance. This would assist educational leaders in undertaking the activities of assessment and improve the quality of their programs.
- Recommendation five: Devise machine learning models that endeavor to explain and justify the attainment levels of student outcomes and explore the effectiveness of hybrid models in improving the accuracy of student outcomes predictions.
7. Future Directions
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Daniel, B. Big data and analytics in higher education: Opportunities and challenges. Br. J. Educ. Technol. 2015, 46, 904–920. [Google Scholar] [CrossRef]
- Zohair, L.M.A. Prediction of student’s performance by modelling small dataset size. Int. J. Educ. Technol. High. Educ. 2019, 16, 27. [Google Scholar] [CrossRef]
- Hellas, A.; Ihantola, P.; Petersen, A.; Ajanovski, V.V.; Gutica, M.; Hynninen, T.; Knutas, A.; Leinonen, J.; Messom, C.; Liao, S.N. Predicting academic performance: A systematic literature review. In Proceedings of the Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education, Larnaca, Cyprus, 2–4 July 2018; pp. 175–199. [Google Scholar]
- Baradwaj, B.K.; Pal, S. Mining educational data to analyze students’ performance. Int. J. Adv. Comput. Sci. Appl. 2012, 2, 63–69. [Google Scholar] [CrossRef]
- Zhang, L.; Li, K.F. Education analytics: Challenges and approaches. In Proceedings of the 2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA), Krakow, Poland, 16–18 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 193–198. [Google Scholar] [CrossRef]
- Daud, A.; Aljohani, N.R.; Abbasi, R.A.; Lytras, M.D.; Abbas, F.; Alowibdi, J.S. Predicting student performance using advanced learning analytics. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017; pp. 415–421. [Google Scholar]
- Macayan, J.V. Implementing outcome-based education (OBE) framework: Implications for assessment of students’ performance. Educ. Meas. Eval. Rev. 2017, 8, 1–10. [Google Scholar]
- Yassine, S.; Kadry, S.; Sicilia, M.A. A framework for learning analytics in moodle for assessing course outcomes. In Proceedings of the 2016 IEEE Global Engineering Education Conference (EDUCON), Abu Dhabi, UAE, 10–13 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 261–266. [Google Scholar]
- Rajak, A.; Shrivastava, A.K.; Shrivastava, D.P. Automating outcome based education for the attainment of course and program outcomes. In Proceedings of the 2018 Fifth HCT Information Technology Trends (ITT), Dubai, UAE, 28–29 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 373–376. [Google Scholar]
- Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; EBSE: Keele, UK, 2007; pp. 1–65. [Google Scholar]
- Okoli, C.; Schabram, K. A guide to conducting a systematic literature review of information systems research. Ssrn Eletronic J. 2010, 10. [Google Scholar] [CrossRef] [Green Version]
- Kaliannan, M.; Chandran, S.D. Empowering students through outcome-based education (OBE). Res. Educ. 2012, 87, 50–63. [Google Scholar] [CrossRef] [Green Version]
- Premalatha, K. Course and program outcomes assessment methods in outcome-based education: A review. J. Educ. 2019, 199, 111–127. [Google Scholar] [CrossRef]
- Kanmani, B.; Babu, K.M. Leveraging technology in outcome-based education. In Proceedings of the International Conference on Transformations in Engineering Education, New Delhi, India, 5–8 January 2015; Natarajan, R., Ed.; Springer: New Delhi, India, 2015; pp. 415–421. [Google Scholar]
- Namoun, A.; Taleb, A.; Benaida, M. An expert comparison of accreditation support tools for the undergraduate computing programs. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2018, 9, 371–384. [Google Scholar] [CrossRef]
- Mahajan, M.; Singh, M.K.S. Importance and benefits of learning outcomes. IOSR J. Humanit. Soc. Sci. 2017, 22, 65–67. [Google Scholar] [CrossRef]
- Namoun, A.; Taleb, A.; Al-Shargabi, M.; Benaida, M. A learning outcome inspired survey instrument for assessing the quality of continuous improvement cycle. Int. J. Inf. Commun. Technol. Educ. (IJICTE) 2019, 15, 108–129. [Google Scholar] [CrossRef]
- Taleb, A.; Namoun, A.; Benaida, M. A holistic quality assurance framework to acquire national and international. J. Eng. Appl. Sci. 2019, 14, 6685–6698. [Google Scholar] [CrossRef] [Green Version]
- Singh, R.; Sarkar, S. Teaching Quality Counts: How Student Outcomes Relate to Quality of Teaching in Private and Public Schools in India; Young Lives: Oxford, UK, 2012; pp. 1–48. [Google Scholar]
- Philip, K.; Lee, A. Online public health education for low and middle-income countries: Factors influencing successful student outcomes. Int. J. Emerg. Technol. Learn. (IJET) 2011, 6, 65–69. [Google Scholar] [CrossRef] [Green Version]
- Garbacz, S.A.; Herman, K.C.; Thompson, A.M.; Reinke, W.M. Family engagement in education and intervention: Implementation and evaluation to maximize family, school, and student outcomes. J. Sch. Psychol. 2017, 62, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Nonis, S.A.; Fenner, G.H. An exploratory study of student motivations for taking online courses and learning outcomes. J. Instr. Pedagog. 2012, 7, 2–13. [Google Scholar]
- Polyzou, A.; Karypis, G. Feature extraction for next-term prediction of poor student performance. IEEE Trans. Learn. Technol. 2019, 12, 237–248. [Google Scholar] [CrossRef]
- Shahiri, A.M.; Husain, W.; Abdul Rashid, N. A review on predicting student’s performance using data mining techniques. Procedia Comput. Sci. 2015, 72, 414–422. [Google Scholar] [CrossRef] [Green Version]
- Tatar, A.E.; Düştegör, D. Prediction of academic performance at undergraduate graduation: Course grades or grade point average? Appl. Sci. 2020, 10, 4967. [Google Scholar] [CrossRef]
- Elbadrawy, A.; Polyzou, A.; Ren, Z.; Sweeney, M.; Karypis, G.; Rangwala, H. Predicting student performance using personalized analytics. Computer 2016, 49, 61–69. [Google Scholar] [CrossRef]
- Cui, Y.; Chen, F.; Shiri, A.; Fan, Y. Predictive analytic models of student success in higher education: A review of methodology. Inf. Learn. Sci. 2019, 120, 208–227. [Google Scholar] [CrossRef]
- Rastrollo-Guerrero, J.L.; Gómez-Pulido, J.A.; Durán-Domínguez, A. Analyzing and predicting students’ performance by means of machine learning: A review. Appl. Sci. 2020, 10, 1042. [Google Scholar] [CrossRef] [Green Version]
- Alshanqiti, A.; Namoun, A. Predicting student performance and its influential factors using hybrid regression and multi-label classification. IEEE Access 2020, 8, 203827–203844. [Google Scholar] [CrossRef]
- Mthimunye, K.; Daniels, F.M. Predictors of academic performance, success and retention amongst undergraduate nursing students: A systematic review. S. Afr. J. High. Educ. 2019, 33, 200–220. [Google Scholar] [CrossRef] [Green Version]
- Dixson, D.D.; Worrell, F.C.; Olszewski-Kubilius, P.; Subotnik, R.F. Beyond perceived ability: The contribution of psychosocial factors to academic performance. Ann. N. Y. Acad. Sci. 2016, 1377, 67–77. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Felix, I.; Ambrósio, A.P.; Lima, P.D.S.; Brancher, J.D. Data mining for student outcome prediction on moodle: A systematic mapping. In Proceedings of the Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), Fortaleza, Brazil, 29 October–1 November 2018; Volume 29, p. 1393. [Google Scholar] [CrossRef]
- Peña-Ayala, A. Educational data mining: A survey and a data mining-based analysis of recent works. Expert Syst. Appl. 2014, 41, 1432–1462. [Google Scholar] [CrossRef]
- Kumar, M.; Singh, A.J.; Handa, D. Literature survey on student’s performance prediction in education using data mining techniques. Int. J. Educ. Manag. Eng. 2017, 7, 40–49. [Google Scholar] [CrossRef]
- Ofori, F.; Maina, E.; Gitonga, R. Using machine learning algorithms to predict students’ performance and improve learning outcome: A literature based review. J. Inf. Technol. 2020, 4, 33–55. [Google Scholar]
- Hu, X.; Cheong, C.W.; Ding, W.; Woo, M. A systematic review of studies on predicting student learning outcomes using learning analytics. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference, Vancouver, BC, Canada, 13–17 March 2017; pp. 528–529. [Google Scholar] [CrossRef]
- Magalhães, P.; Ferreira, D.; Cunha, J.; Rosário, P. Online vs traditional homework: A systematic review on the benefits to students’ performance. Comput. Educ. 2020, 152, 103869. [Google Scholar] [CrossRef]
- Digregorio, P.; Sobel-Lojeski, K. The effects of interactive whiteboards (IWBs) on student performance and learning: A literature review. J. Educ. Technol. Syst. 2010, 38, 255–312. [Google Scholar] [CrossRef]
- van der Zanden, P.J.; Denessen, E.; Cillessen, A.H.; Meijer, P.C. Domains and predictors of first-year student success: A systematic review. Educ. Res. Rev. 2018, 23, 57–77. [Google Scholar] [CrossRef]
- Bain, S.; Fedynich, L.; Knight, M. The successful graduate student: A review of the factors for success. J. Acad. Bus. Ethics 2011, 3, 1. [Google Scholar]
- Petersen, K.; Vakkalanka, S.; Kuzniarz, L. Guidelines for conducting systematic mapping studies in software engineering: An update. Inf. Softw. Technol. 2015, 64, 1–18. [Google Scholar] [CrossRef]
- Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; Prisma Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. BMJ 2009, 6, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Ming, N.C.; Ming, V.L. Predicting student outcomes from unstructured data. In UMAP Workshops; CEUR Workshop Proceedings: Aachen, Germany, 2012. [Google Scholar]
- Heise, N.; Meyer, C.A.; Garbe, B.A.; Hall, H.A.; Clapp, T.R. Table quizzes as an assessment tool in the gross anatomy laboratory. J. Med. Educ. Curric. Dev. 2020, 7. [Google Scholar] [CrossRef]
- Shulruf, B.; Bagg, W.; Begun, M.; Hay, M.; Lichtwark, I.; Turnock, A.; Warnecke, E.; Wilkinson, T.J.; Poole, P.J. The efficacy of medical student selection tools in Australia and New Zealand. Med. J. Aust. 2018, 208, 214–218. [Google Scholar] [CrossRef] [PubMed]
- Moreno-Marcos, P.M.; Pong, T.C.; Muñoz-Merino, P.J.; Kloos, C.D. Analysis of the factors influencing learners’ performance prediction with learning analytics. IEEE Access 2020, 8, 5264–5282. [Google Scholar] [CrossRef]
- Martin, A.J.; Nejad, H.G.; Colmar, S.; Liem, G.A.D. Adaptability: How students’ responses to uncertainty and novelty predict their academic and non-academic outcomes. J. Educ. Psychol. 2013, 105, 728. [Google Scholar] [CrossRef]
- Bowers, A.J.; Zhou, X. Receiver operating characteristic (ROC) area under the curve (AUC): A diagnostic measure for evaluating the accuracy of predictors of education outcomes. J. Educ. Stud. Placed Risk (JESPAR) 2019, 24, 20–46. [Google Scholar] [CrossRef]
- Palmer, L.E.; Erford, B.T. Predicting student outcome measures using the ASCA national model program audit. Prof. Couns. 2012, 2, 152–159. [Google Scholar] [CrossRef] [Green Version]
- Fauth, B.; Decristan, J.; Rieser, S.; Klieme, E.; Büttner, G. Student ratings of teaching quality in primary school: Dimensions and prediction of student outcomes. Learn. Instr. 2014, 29, 1–9. [Google Scholar] [CrossRef]
- Harred, R.; Cody, C.; Maniktala, M.; Shabrina, P.; Barnes, T.; Lynch, C. How Long is Enough? Predicting Student Outcomes with Same-Day Gameplay Data in an Educational Math Game. In Proceedings of the Educational Data Mining (Workshops), Montréal, QC, Canada, 2–5 July 2019; pp. 60–68. [Google Scholar]
- Aldrup, K.; Klusmann, U.; Lüdtke, O.; Göllner, R.; Trautwein, U. Social support and classroom management are related to secondary students’ general school adjustment: A multilevel structural equation model using student and teacher ratings. J. Educ. Psychol. 2018, 110, 1066. [Google Scholar] [CrossRef]
- Van Ryzin, M. Secondary school advisors as mentors and secondary attachment figures. J. Community Psychol. 2010, 38, 131–154. [Google Scholar] [CrossRef]
- Porayska-Pomsta, K.; Mavrikis, M.; Cukurova, M.; Margeti, M.; Samani, T. Leveraging non-cognitive student self-reports to predict learning outcomes. In Proceedings of the International Conference on Artificial Intelligence in Education, London, UK, 27–30 June 2018; Springer: Cham, Switzerland, 2018; pp. 458–462. [Google Scholar]
- Kórösi, G.; Esztelecki, P.; Farkas, R.; Tóth, K. Clickstream-based outcome prediction in short video MOOCs. In Proceedings of the 2018 International Conference on Computer, Information and Telecommunication Systems (CITS), Colmar, France, 11–13 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–5. [Google Scholar]
- Brinkworth, M.E.; McIntyre, J.; Juraschek, A.D.; Gehlbach, H. Teacher-student relationships: The positives and negatives of assessing both perspectives. J. Appl. Dev. Psychol. 2018, 55, 24–38. [Google Scholar] [CrossRef]
- Mantzicopoulos, P.; Patrick, H.; Strati, A.; Watson, J.S. Predicting kindergarteners’ achievement and motivation from observational measures of teaching effectiveness. J. Exp. Educ. 2018, 86, 214–232. [Google Scholar] [CrossRef]
- Aelterman, N.; Vansteenkiste, M.; Haerens, L. Correlates of students’ internalization and defiance of classroom rules: A self-determination theory perspective. Br. J. Educ. Psychol. 2019, 89, 22–40. [Google Scholar] [CrossRef] [Green Version]
- Simjanoska, M.; Gusev, M.; Ristov, S.; Bogdanova, A.M. Intelligent student profiling for predicting e-assessment outcomes. In Proceedings of the 2014 IEEE Global Engineering Education Conference (EDUCON), Istanbul, Turkey, 3–5 April 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 616–622. [Google Scholar]
- Pang, Y.; Judd, N.; O’Brien, J.; Ben-Avie, M. Predicting students’ graduation outcomes through support vector machines. In Proceedings of the 2017 IEEE Frontiers in Education Conference (FIE), Indianapolis, IN, USA, 18–21 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
- Liu, K.F.R.; Chen, J.S. Prediction and assessment of student learning outcomes in calculus a decision support of integrating data mining and Bayesian belief networks. In Proceedings of the 2011 3rd International Conference on Computer Research and Development, Shanghai, China, 11–13 March 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 299–303. [Google Scholar]
- Smith, V.C.; Lange, A.; Huston, D.R. Predictive modeling to forecast student outcomes and drive effective interventions in online community college courses. J. Asynchronous Learn. Netw. 2012, 16, 51–61. [Google Scholar] [CrossRef]
- Pavani, M.; Teja, A.R.; Neelima, A.; Bhavishya, G.; Sukrutha, D.S. Prediction of student outcome in educational sector by using decision tree. Int. J. Technol. Res. Eng. 2017, 4, 2347–4718. [Google Scholar]
- Zacharis, N.Z. A multivariate approach to predicting student outcomes in web-enabled blended learning courses. Internet High. Educ. 2015, 27, 44–53. [Google Scholar] [CrossRef]
- Gray, C.C.; Perkins, D. Utilizing early engagement and machine learning to predict student outcomes. Comput. Educ. 2019, 131, 22–32. [Google Scholar] [CrossRef]
- Iatrellis, O.; Savvas, I.Κ.; Fitsilis, P.; Gerogiannis, V.C. A two-phase machine learning approach for predicting student outcomes. Educ. Inf. Technol. 2020, 1–20. [Google Scholar] [CrossRef]
- Kuzilek, J.; Vaclavek, J.; Zdrahal, Z.; Fuglik, V. Analysing Student VLE Behaviour Intensity and Performance. In Proceedings of the European Conference on Technology Enhanced Learning, Delft, The Netherlands, 16–19 September 2019; Springer: Cham, Switzerland, 2019; pp. 587–590. [Google Scholar]
- Raga, R.; Raga, J. Early Prediction of Student Performance in Blended Learning Courses Using Deep Neural Networks. In Proceedings of the 2019 International Symposium on Educational Technology (ISET), Hradec Kralove, Czech Republic, 2–4 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 39–43. [Google Scholar]
- Walsh, K.R.; Mahesh, S. Exploratory study using machine learning to make early predictions of student outcomes. In Proceedings of the Twenty-third Americas Conference on Information Systems, Data Science and Analytics for Decision Support (SIGDSA), Boston, MA, USA, 10–12 August 2017; AIS: Atlanta, GA, USA, 2017; pp. 1–5. [Google Scholar]
- Olama, M.M.; Thakur, G.; McNair, A.W.; Sukumar, S.R. Predicting student success using analytics in course learning management systems. In Next-Generation Analyst II; International Society for Optics and Photonics: Washington, DC, USA, 2014; p. 91220M. [Google Scholar]
- Wilson, J.H.; Ryan, R.G. Professor–student rapport scale: Six items predict student outcomes. Teach. Psychol. 2013, 40, 130–133. [Google Scholar] [CrossRef]
- Wilson, J.H.; Ryan, R.G.; Pugh, J.L. Professor–student rapport scale predicts student outcomes. Teach. Psychol. 2010, 37, 246–251. [Google Scholar] [CrossRef]
- Kuzilek, J.; Vaclavek, J.; Fuglik, V.; Zdrahal, Z. Student Drop-out Modelling Using Virtual Learning Environment Behaviour Data. In Proceedings of the European Conference on Technology Enhanced Learning, Leeds, UK, 3–5 September 2018; Springer: Cham, Switzerland, 2018; pp. 166–171. [Google Scholar]
- Zaporozhko, V.V.; Parfenov, D.I.; Shardakov, V.M. Development Approach of Formation of Individual Educational Trajectories Based on Neural Network Prediction of Student Learning Outcomes. In Proceedings of the International Conference of Artificial Intelligence, Medical Engineering, Education, Moscow, Russia, 3–4 October 2019; Springer: Cham, Switzerland, 2019; pp. 305–314. [Google Scholar]
- Ruiz, S.; Urretavizcaya, M.; Rodríguez, C.; Fernández-Castro, I. Predicting students’ outcomes from emotional response in the classroom and attendance. Interact. Learn. Environ. 2020, 28, 107–129. [Google Scholar] [CrossRef]
- Eagle, M.; Carmichael, T.; Stokes, J.; Blink, M.J.; Stamper, J.C.; Levin, J. Predictive Student Modeling for Interventions in Online Classes. In Proceedings of the 11th International Conference on Educational Data Mining EDM, Buffalo, NY, USA, 15–18 July 2018; pp. 619–624. [Google Scholar]
- Alonso, J.M.; Casalino, G. Explainable Artificial Intelligence for Human-Centric Data Analysis in Virtual Learning Environments. In Proceedings of the International Workshop on Higher Education Learning Methodologies and Technologies Online, Novedrate, Italy, 6–7 June 2019; Springer: Cham, Switzerland, 2019; pp. 125–138. [Google Scholar]
- Kőrösi, G.; Farkas, R. MOOC Performance Prediction by Deep Learning from Raw Clickstream Data. In Proceedings of the International Conference on Advances in Computing and Data Sciences, Maharashtra, India, 23–24 April 2020; Springer: Singapore, 2020; pp. 474–485. [Google Scholar]
- Culligan, N.; Quille, K.; Bergin, S. Veap: A visualization engine and analyzer for press#. In Proceedings of the 16th Koli Calling International Conference on Computing Education Research, Koli, Finland, 24–27 November 2016; pp. 130–134. [Google Scholar]
- Umer, R.; Mathrani, A.; Susnjak, T.; Lim, S. Mining Activity Log Data to Predict Student’s Outcome in a Course. In Proceedings of the 2019 International Conference on Big Data and Education, London, UK, 27–29 March 2019; pp. 52–58. [Google Scholar]
- Yadav, A.; Alexander, V.; Mehta, S. Case-based Instruction in Undergraduate Engineering: Does Student Confidence Predict Learning. Int. J. Eng. Educ. 2019, 35, 25–34. [Google Scholar]
- Strang, K.D. Beyond engagement analytics: Which online mixed-data factors predict student learning outcomes? Educ. Inf. Technol. 2017, 22, 917–937. [Google Scholar] [CrossRef]
- Ketonen, E.; Lonka, K. Do situational academic emotions predict academic outcomes in a lecture course? Procedia Soc. Behav. Sci. 2012, 69, 1901–1910. [Google Scholar] [CrossRef] [Green Version]
- Tsiakmaki, M.; Kostopoulos, G.; Kotsiantis, S.; Ragos, O. Implementing AutoML in educational data mining for prediction tasks. Appl. Sci. 2020, 10, 90. [Google Scholar] [CrossRef] [Green Version]
- Al-Shabandar, R.; Hussain, A.; Laws, A.; Keight, R.; Lunn, J.; Radi, N. Machine learning approaches to predict learning outcomes in Massive open online courses. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 713–720. [Google Scholar]
- Yu, C.H.; Wu, J.; Liu, A.C. Predicting learning outcomes with MOOC clickstreams. Educ. Sci. 2019, 9, 104. [Google Scholar] [CrossRef] [Green Version]
- Zabriskie, C.; Yang, J.; DeVore, S.; Stewart, J. Using machine learning to predict physics course outcomes. Phys. Rev. Phys. Educ. Res. 2019, 15, 020120. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, V.A.; Nguyen, Q.B.; Nguyen, V.T. A model to forecast learning outcomes for students in blended learning courses based on learning analytics. In Proceedings of the 2nd International Conference on E-Society, E-Education and E-Technology, Taipei, Taiwan, 13–15 August 2018; pp. 35–41. [Google Scholar]
- Guo, S.; Wu, W. Modeling student learning outcomes in MOOCs. In Proceedings of the 4th International Conference on Teaching, Assessment, and Learning for Engineering, Zhuhai, China, 10–12 December 2015; pp. 1305–1313. [Google Scholar]
- Foung, D.; Chen, J. A learning analytics approach to the evaluation of an online learning package in a Hong Kong University. Electron. J. E Learn. 2019, 17, 11–24. [Google Scholar]
- Akhtar, S.; Warburton, S.; Xu, W. The use of an online learning and teaching system for monitoring computer aided design student participation and predicting student success. Int. J. Technol. Des. Educ. 2017, 27, 251–270. [Google Scholar] [CrossRef]
- Gratiano, S.M.; Palm, W.J. Can a five minute, three question survey foretell first-year engineering student performance and retention? In Proceedings of the 123rd ASEE Annual Conference & Exposition, New Orleans, LA, USA, 26–29 June 2016. [Google Scholar]
- Vasić, D.; Kundid, M.; Pinjuh, A.; Šerić, L. Predicting student’s learning outcome from Learning Management system logs. In Proceedings of the 2015 23rd International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Bol (Island of Brac), Croatia, 16–18 September 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 210–214. [Google Scholar]
- Felix, I.; Ambrosio, A.; Duilio, J.; Simões, E. Predicting student outcome in moodle. In Proceedings of the Conference: Academic Success in Higher Education, Porto, Portugal, 14–15 February 2019; pp. 1–2. [Google Scholar]
- Alkoot, F.M. Using classifiers to predict student outcome at HITN-PAAET. In Proceedings of the 18th International Conference on Machine Learning and Data Analysis, Tokyo, Japan, 22–24 May 2016. [Google Scholar]
- Wang, X.; Mei, X.; Huang, Q.; Han, Z.; Huang, C. Fine-grained learning performance prediction via adaptive sparse self-attention networks. Inf. Sci. 2020, 545, 223–240. [Google Scholar] [CrossRef]
- Pianta, R.C.; Ansari, A. Does attendance in private schools predict student outcomes at age 15? Evidence from a longitudinal study. Educ. Res. 2018, 47, 419–434. [Google Scholar] [CrossRef]
- Hill, H.C.; Charalambous, C.Y.; Chin, M.J. Teacher characteristics and student learning in mathematics: A comprehensive assessment. Educ. Policy 2019, 33, 1103–1134. [Google Scholar] [CrossRef]
- Anderson, K.A. A national study of the differential impact of novice teacher certification on teacher traits and race-based mathematics achievement. J. Teach. Educ. 2020, 71, 247–260. [Google Scholar] [CrossRef]
- Lima, P.D.S.N.; Ambrósio, A.P.L.; Félix, I.M.; Brancher, J.D.; Ferreira, D.J. Content Analysis of Student Assessment Exams. In Proceedings of the 2018 IEEE Frontiers in Education Conference (FIE), San Jose, CA, USA, 3–6 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–9. [Google Scholar]
- Sokkhey, P.; Okazaki, T. Developing web-based support systems for predicting poor-performing students using educational data mining techniques. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 23–32. [Google Scholar] [CrossRef]
- Sales, A.; Botelho, A.F.; Patikorn, T.; Heffernan, N.T. Using big data to sharpen design-based inference in A/B tests. In Proceedings of the Eleventh International Conference on Educational Data Mining, Buffalo, NY, USA, 15–18 January 2018. [Google Scholar]
- Bhatia, J.; Girdhar, A.; Singh, I. An Automated Survey Designing Tool for Indirect Assessment in Outcome Based Education Using Data Mining. In Proceedings of the 2017 5th IEEE International Conference on MOOCs, Innovation and Technology in Education (MITE), Bangalore, India, 27–28 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 95–100. [Google Scholar]
- Bindra, S.K.; Girdhar, A.; Bamrah, I.S. Outcome based predictive analysis of automatic question paper using data mining. In Proceedings of the 2017 2nd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 19–20 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 629–634. [Google Scholar]
- Joksimović, S.; Kovanović, V.; Dawson, S. The journey of learning analytics. Herdsa Rev. High. Educ. 2019, 6, 27–63. [Google Scholar]
- Kumari, P.; Jain, P.K.; Pamula, R. An efficient use of ensemble methods to predict students academic performance. In Proceedings of the 2018 4th International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, India, 15–17 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Arroway, P.; Morgan, G.; O’Keefe, M.; Yanosky, R. Learning Analytics in Higher Education; Research report; ECAR: Louisville, CO, USA, 2016; p. 17. [Google Scholar]
- Viberg, O.; Hatakka, M.; Bälter, O.; Mavroudi, A. The current landscape of learning analytics in higher education. Comput. Hum. Behav. 2018, 89, 98–110. [Google Scholar] [CrossRef]
- Manjarres, A.V.; Sandoval, L.G.M.; Suárez, M.S. Data mining techniques applied in educational environments: Literature review. Digit. Educ. Rev. 2018, 33, 235–266. [Google Scholar]
- Romero, C.; Ventura, S. Educational data mining and learning analytics: An updated survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1355. [Google Scholar] [CrossRef]
- Shmueli, G. To explain or to predict? Stat. Sci. 2010, 25, 289–310. [Google Scholar] [CrossRef]
- Ranjeeth, S.; Latchoumi, T.P.; Paul, P.V. A survey on predictive models of learning analytics. Procedia Comput. Sci. 2020, 167, 37–46. [Google Scholar] [CrossRef]
- Zhou, X.; Jin, Y.; Zhang, H.; Li, S.; Huang, X. A map of threats to validity of systematic literature reviews in software engineering. In Proceedings of the 2016 23rd Asia-Pacific Software Engineering Conference (APSEC) IEEE, Hamilton, New Zealand, 6–9 December 2016; pp. 153–160. [Google Scholar]
Focus of Survey and Publication Venue | Type of Survey | Number of Bibliographic Databases Explored; Papers Reviewed | Metric of Student Performance Comparison | Models and Approaches Reviewed | Years Covered | Weakness | Strength |
---|---|---|---|---|---|---|---|
Prediction of student performance using data mining [24]; Indexed Conference | Systematic review | Four databases; 30 papers | Prediction accuracy (%) | Data mining techniques | (2002–Jan 2015) |
|
|
Prediction of student outcome using data mining [32]; Symposium | Systematic review | 10 databases; 42 papers | Prediction accuracy (%) | Data mining techniques | Not indicated |
|
|
Techniques and algorithms used for student performance prediction [5]; Indexed Conference | Traditional literature review | One database; 88 papers/projects/reports | Not reported | Education analytics | (2013–2017) |
|
|
Data mining techniques to discover knowledge in education [33]; Indexed Journal | Traditional review | Databases not indicated; 240 papers | Not reported | Educational data mining | (2010–first quarter 2013) |
|
|
Performance prediction using data mining techniques [34]; Unindexed Journal | Systematic review | Six databases | Prediction accuracy (%) | Data mining techniques | (2007—July 2016) |
|
|
Performance prediction using machine learning [35]; Unindexed Journal | Literature survey | Not indicated | Prediction accuracy (%) | Machine learning models | Not indicated |
|
|
Features predicting student performance [3]; Indexed Journal | Systematic review | Three databases; 357 papers | Different measures of performance were considered. | Statistical approaches, data mining techniques, machine learning models | (2010–2018) |
|
|
Preliminary results of predictive learning analytics [36]; Indexed Conference | Systematic review | Databases not indicated; 39 papers | Prediction accuracy (%) | Machine learning models | (2002–2016) |
|
|
Population/Problem | Intervention | Comparison | Outcome |
---|---|---|---|
Studies predicting student performance using the learning outcomes | List of intelligent models and techniques | Comparison across the identified models and techniques | Quality and accuracy of the approaches Set of performance predictors of learning outcomes |
Inclusion Criteria | Description of Criteria |
---|---|
I1. Focus of study | Studies that explicitly predict student performance with a direct reference to the learning outcomes |
I2. Empirical evidence of prediction | Studies that contain empirical evidence of the performance prediction |
I3. Language of publication | Only articles written in English are considered |
I4. Year of publication | Studies published between 2010 and 2020 (both years inclusive) |
I5. Publication venue | Studies published in peer-reviewed scientific venues (e.g., conference or journal) |
I6. Availability of text | Full text is accessible for analysis |
ACM | IEEE Xplore | Google Scholar | Science Direct | Scopus | Springer | Web of Science | Round Total | |
---|---|---|---|---|---|---|---|---|
Round 1 (Initial Results) | 64 | 152 | 65 | 91 | 63 | 115 | 36 | 586 |
Round 2 (Removing Duplicates) | 64 | 148 | 65 | 91 | 28 | 114 | 33 | 543 |
Round 3 (Scanning the Title and Abstract) | 12 | 63 | 10 | 13 | 53 | 9 | 27 | 187 |
Round 4 (Reading Full Text) | 3 | 7 | 5 | 5 | 12 | 2 | 17 | 51 |
Round 5 (Manual Searches) | A further 11 articles were added through manual searches | 62 |
Source | Number | Number of Studies (Percentage of Occurrence) | Studies |
---|---|---|---|
School | One | 1 (1.61%) | [50] |
Multiple | 11 (17.74%) | [47,48,49,51,52,53,54,55,56,57,58] | |
University | One | 36 (58.06%) | [42,44,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92] |
Multiple | 2 (3.22%) | [45,93] | |
Not Specified | 12 (19.35%) | [46,94,95,96,97,98,99,100,101,102,103,104] |
Learning Outcome Type | Number of Occurrences | Studies |
---|---|---|
Performance classes (Categorical; Binary, nominal and ordinal) | 34 | [45,46,55,59,60,61,62,63,64,65,66,67,68,69,70,73,74,75,76,77,79,80,83,84,85,86,87,88,89,91,93,94,100,101,103] |
Achievement/grade scores (Continuous; Interval) | 20 | [43,44,50,51,52,53,54,55,56,57,78,81,82,84,90,92,96,97,98,99,102] |
Perceived competence and achievements (Continuous; Interval scale) | 5 | [47,57,71,72,97] |
Self-reports about educational aspects (Continuous; Interval scale) | 3 | [56,81,97] |
Failure/dropout/graduation rates (Continuous; Ratio) | 3 | [48,49,95] |
Other (e.g., college enrollment, careers, time to graduate, attendance, … etc.) | 6 | [45,48,49,56,58,66] |
NS | 1 | [104] |
Learning Type | Number of Studies (Percentage of Occurrence (%)) | Studies |
---|---|---|
Statistical analysis | 28 (45.16%) | [43,44,45,47,48,49,50,52,53,54,56,57,58,64,71,72,73,75,76,81,82,83,90,91,92,97,98,99] |
Supervised machine learning | 25 (40.32%) | [46,51,55,59,60,61,62,65,68,69,70,74,78,80,84,85,86,87,88,89,95,96,101,102,103] |
Data mining | 5 (8.06%) | [63,77,94,100,104] |
Supervised and unsupervised learning | 3 (4.83%) | [66,79,93] |
Unsupervised machine learning | 1 (1.61%) | [67] |
Learning Model | Number of Studies (Percentage of Occurrence (%)) | Studies |
---|---|---|
Statistical models (Correlation and Regression) | 32 (51.61%) | [43,44,45,46,47,49,50,52,53,54,56,57,58,64,67,69,71,72,73,76,81,82,83,88,89,90,91,92,97,98,99] |
Neural networks | 9 (14.51%) | [51,68,70,74,78,86,95,96,102] |
Tree-based models (Decision trees) | 9 (14.51%) | [55,63,65,66,75,77,85,101,104] |
Bayesian-based models | 5 (8.06%) | [61,62,79,93,94] |
Support Vector Machines | 2 (3.22%) | [59,60] |
Instance-based models | 1 (1.62%) | [103] |
Other | 4 (6.45%) | [48,80,84,100] |
Top 5 Performing Prediction Models (Accuracy %) | Worst 5 Performing Prediction Models (Accuracy %) |
---|---|
Hybrid Random Forest [101]: 99.25–99.98% | Linear Regression [88]: 50% |
Feedforward 3-L Neural Networks [74]: 98.81% | Bagging [78]: 48–55% |
Random Forest [85]: 98% | Mixed-effects Logistic Regression [76]: 69% |
Naive Bayes [93]: 96.87% | Discriminant Function Analysis [45]: 64–73% |
Artificial Neural Network [86]: 95.16–97.30% | Logistic Regression [89]: 76.2% |
Assessment Criterion | Yes (%) | No (%) |
---|---|---|
1. Verification of predictive model with a second dataset | 8.06% | 91.94% |
2. Threats to validity reported | 12.90% | 87.10% |
3. Research implications and recommendations | 20.96% | 79.04% |
4. Well-defined research questions | 33.87% | 66.13% |
5. Use of separate training and testing datasets | 35.48% | 64.52% |
6. Research limitations and challenges | 37.09% | 62.91% |
7. Results detailed sufficiently | 56.45% | 43.56% |
8. Predictor variables clearly described | 77.42% | 22.58% |
9. Predictions being made are clear | 82.25% | 17.75% |
10. Data collection instruments stated | 82.25% | 17.75% |
11. Sound research methodology | 83.87% | 16.13% |
12. Clear research contributions | 90.32% | 9.68% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Namoun, A.; Alshanqiti, A. Predicting Student Performance Using Data Mining and Learning Analytics Techniques: A Systematic Literature Review. Appl. Sci. 2021, 11, 237. https://doi.org/10.3390/app11010237
Namoun A, Alshanqiti A. Predicting Student Performance Using Data Mining and Learning Analytics Techniques: A Systematic Literature Review. Applied Sciences. 2021; 11(1):237. https://doi.org/10.3390/app11010237
Chicago/Turabian StyleNamoun, Abdallah, and Abdullah Alshanqiti. 2021. "Predicting Student Performance Using Data Mining and Learning Analytics Techniques: A Systematic Literature Review" Applied Sciences 11, no. 1: 237. https://doi.org/10.3390/app11010237