Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Multi-split optimized bagging ensemble model selection for multi-class educational data mining

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Predicting students’ academic performance has been a research area of interest in recent years, with many institutions focusing on improving the students’ performance and the education quality. The analysis and prediction of students’ performance can be achieved using various data mining techniques. Moreover, such techniques allow instructors to determine possible factors that may affect the students’ final marks. To that end, this work analyzes two different undergraduate datasets at two different universities. Furthermore, this work aims to predict the students’ performance at two stages of course delivery (20% and 50% respectively). This analysis allows for properly choosing the appropriate machine learning algorithms to use as well as optimize the algorithms’ parameters. Furthermore, this work adopts a systematic multi-split approach based on Gini index and p-value. This is done by optimizing a suitable bagging ensemble learner that is built from any combination of six potential base machine learning algorithms. It is shown through experimental results that the posited bagging ensemble models achieve high accuracy for the target group for both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Abdul Aziz A, Ismail NH, Ahmad F (2013) Mining students’ academic performance. Journal of Theoretical and Applied Information Technology 53(3):485–485

    Google Scholar 

  2. Ahmed ABED, Elaraby IS (2014) Data mining: a prediction for student’s performance using classification method. World Journal of Computer Application and Technology 2(2):43–47

    Google Scholar 

  3. Aly M (2005) Survey on multiclass classification methods. Neural Network 19:1–9

    Google Scholar 

  4. Asogbon MG, Samuel OW, Omisore MO, Ojokoh BA (2016) A multi-class support vector machine approach for students academic performance prediction. Int J Multidisciplinary and Current Research 4

  5. Athani SS, Kodli SA, Banavasi MN, Hiremath PS (2017) Student performance predictor using multiclass support vector classification algorithm. In: 2017 international conference on signal processing and communication (ICSPC). IEEE, pp 341–346

  6. Baradwaj BK, Pal S (2012) Mining educational data to analyze students’ performance. arXiv:12013417

  7. Bhardwaj BK, Pal S (2012) Data mining: a prediction for performance improvement using classification. arXiv:12013418

  8. Buffardi K, Edwards SH (2014) Introducing codeworkout: an adaptive and social learning environment. In: Proceedings of the 45th ACM technical symposium on computer science education, ACM, SIGCSE ’14. https://doi.org/10.1145/2538862.2544317, pp 724–724

  9. Bühlmann P (2012) Bagging, boosting and ensemble methods. In: Handbook of computational statistics. Springer, Berlin, pp 985–1022

  10. Bühlmann P, Yu B, et al. (2002) Analyzing bagging. The Annals of Statistics 30(4):927–961

    MathSciNet  MATH  Google Scholar 

  11. Chang YC, Kao WY, Chu CP, Chiu CH (2009) A learning style classification mechanism for e-learning. Computers & Education 53(2):273–285

    Google Scholar 

  12. Chen X, Vorvoreanu M, Madhavan K (2014) Mining social media data for understanding students’ learning experiences. IEEE Transactions on Learning Technologies 7(3):246–259. https://doi.org/10.1109/TLT.2013.2296520

    Article  Google Scholar 

  13. Daniel J, Vázquez Cano E, Gisbert Cervera M (2015) The future of moocs: adaptive learning or business model? International Journal of Educational Technology in Higher Education 12(1):64–73. https://doi.org/10.7238/rusc.v12i1.2475

    Article  Google Scholar 

  14. Daradoumis T, Bassi R, Xhafa F, Caballe S (2013) A review on massive e-learning (mooc) design, delivery and assessment. In: 2013 eighth international conference on p2p, parallel, grid, cloud and internet computing, pp 208–213

  15. Dhar V, Tickoo A, Koul R, Dubey B (2010) Comparative performance of some popular artificial neural network algorithms on benchmark and function approximation problems. Pramana 74(2):307–324

    Google Scholar 

  16. Essalmi F, Ayed LJB, Jemni M, Graf S, Kinshuk (2015) Generalized metrics for the analysis of e-learning personalization strategies. Computers in Human Behavior 48:310–322. https://doi.org/10.1016/j.chb.2014.12.050

    Article  Google Scholar 

  17. Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI magazine 17(3):37–37

    Google Scholar 

  18. Feldman L (2006) Designing homework assignments: from theory to design. Age 4:1

    Google Scholar 

  19. Fiszelew A, Britos P, Ochoa A, Merlino H, Fernández E, García-Marínez R (2007) Finding optimal neural network architecture using genetic algorithms. Advances in Computer Science and Engineering Research in Computing Science 27:15–24

    Google Scholar 

  20. Fluss R, Faraggi D, Reiser B (2005) Estimation of the youden index and its associated cutoff point. Biometrical Journal: Journal of Mathematical Methods in Biosciences 47(4):458–472

    MathSciNet  MATH  Google Scholar 

  21. Fok WW, He Y, Yeung HA, Law K, Cheung K, Ai Y, Ho P (2018) Prediction model for students’ future development by deep learning and tensorflow artificial intelligence engine. In: 2018 4th international conference on information management (ICIM). IEEE, pp 103–106

  22. Fujita H, et al. (2019) Neural-fuzzy with representative sets for prediction of student performance. Appl Intell 49(1):172–187

    Google Scholar 

  23. Gevrey M, Dimopoulos I, Lek S (2003) Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological Modelling 160(3):249–264

    Google Scholar 

  24. Guyon I, Lemaire V, Boullé M, Dror G, Vogel D (2010) Design and analysis of the kdd cup 2009: fast scoring on a large orange customer database. ACM SIGKDD Explorations Newsletter 11(2):68–76

    Google Scholar 

  25. Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning 45(2):171–186

    MATH  Google Scholar 

  26. Hijazi ST, Naqvi S (2006) Factors affecting students’performance. Bangladesh E-Journal of Sociology 3(1)

  27. Hosseinzadeh A, Izadi M, Verma A, Precup D, Buckeridge D (2013) Assessing the predictability of hospital readmission using machine learning. In: Twenty-fifth IAAI conference

  28. Injadat M, Salo F, Nassif AB (2016) Data mining techniques in social media: a survey. Neurocomputing 214:654–670

    Google Scholar 

  29. Injadat M, Salo F, Nassif AB, Essex A, Shami A (2018) Bayesian optimization with machine learning algorithms towards anomaly detection. In: 2018 IEEE global communications conference (GLOBECOM). https://doi.org/10.1109/GLOCOM.2018.8647714, pp 1–6

  30. Injadat M, Moubayed A, Nassif AB, Shami A (2020) Systematic ensemble model selection approach for educational data mining. Knowledge-Based Systems 200:105992. https://doi.org/10.1016/j.knosys.2020.105992. http://www.sciencedirect.com/science/article/pii/S0950705120302999

    Google Scholar 

  31. Jain A, Solanki S (2019) An efficient approach for multiclass student performance prediction based upon machine learning. In: 2019 International conference on communication and electronics systems (ICCES). IEEE, pp 1457–1462

  32. Kaggle Inc (2019) Kaggle. https://www.kaggle.com/

  33. Karaci A (2019) Intelligent tutoring system model based on fuzzy logic and constraint-based student model. Neural Computing and Applications 31(8):3619–3628. https://doi.org/10.1007/s00521-017-3311-2

    Article  Google Scholar 

  34. Kaur G, Singh W (2016) Prediction of student performance using weka tool. An International Journal of Engineering Sciences 17:8–16

    Google Scholar 

  35. Kehrwald B (2008) Understanding social presence in text-based online learning environments. Distance Education 29(1):89–106. https://doi.org/10.1080/01587910802004860

    Article  Google Scholar 

  36. Khan B, Khiyal MSH, Khattak MD (2015) Final grade prediction of secondary school student using decision tree. Int J Comput Appli 115(21)

  37. Khribim MK, Jemni M, Nasraoui O (2008) Automatic recommendations for e-learning personalization based on web usage mining techniques and information retrieval. In: 2008 eighth IEEE international conference on advanced learning technologies. https://doi.org/10.1109/ICALT.2008.198, pp 241–245

  38. Klamma R, Chatti MA, Duval E, Hummel H, Hvannberg ET, Kravcik M, Law E, Naeve A, Scott P (2007) Social software for life-long learning. Journal of Educational Technology & Society 10 (3):72–83

    Google Scholar 

  39. Koch P, Wujek B, Golovidov O, Gardner S (2017) Automated hyperparameter tuning for effective machine learning. In: Proceedings of the SAS global forum 2017 conference, pp 1–23

  40. Kotsiantis S, Patriarcheas K, Xenos M (2010) A combinational incremental ensemble of classifiers as a technique for predicting students’ performance in distance education. Knowl-Based Syst 23(6):529–535

    Google Scholar 

  41. Kuhn M, et al. (2008) Building predictive models in r using the caret package. Journal of statistical software 28(5):1–26

    Google Scholar 

  42. Lerman RI, Yitzhaki S (1984) A note on the calculation and interpretation of the gini index. Economics Letters 15(3-4):363–368

    Google Scholar 

  43. Lorenz MO (1905) Methods of measuring the concentration of wealth. Publications of the American statistical association 9(70):209–219

    Google Scholar 

  44. Luan J (2002) Data mining and its applications in higher education. New Directions for Institutional Research 2002(113):17–36. https://doi.org/10.1002/ir.35

    Article  Google Scholar 

  45. Lv C, Xing Y, Zhang J, Na X, Li Y, Liu T, Cao D, Wang FY (2017) Levenberg–marquardt backpropagation training of multilayer neural networks for state estimation of a safety-critical cyber-physical system. IEEE Transactions on Industrial Informatics 14(8):3436–3446

    Google Scholar 

  46. Ma Y, Liu B, Wong CK, Yu PS, Lee SM (2000) Targeting the right students using data mining. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 457–464

  47. Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics 11(2):431–441

    MathSciNet  MATH  Google Scholar 

  48. Márquez-Vera C, Cano A, Romero C, Ventura S (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied Intelligence 38(3):315–330

    Google Scholar 

  49. Moubayed A, Injadat M, Nassif AB, Lutfiyya H, Shami A (2018) E-learning: challenges and research opportunities using machine learning data analytics. IEEE Access 6:39117–39138. https://doi.org/10.1109/ACCESS.2018.2851790

    Article  Google Scholar 

  50. Moubayed A, Injadat M, Shami A, Lutfiyya H (2018) DNS typo-squatting domain detection: a data analytics & machine learning based approach. In: 2018 IEEE global communications conference (GLOBECOM). IEEE, pp 1–7

  51. Moubayed A, Injadat M, Shami A, Lutfiyya H (2018) Relationship between student engagement and performance in e-learning environment using association rules. In: 2018 IEEE world engineering education conference (EDUNINE). https://doi.org/10.1109/EDUNINE.2018.8451005, pp 1–6

  52. Moubayed A, Aqeeli E, Shami A (2020) Ensemble-based feature selection and classification model for DNS typo-squatting detection. In: 33rd Canadian conference on electrical and computer engineering (CCECE’20). IEEE, pp 1–6

  53. Moubayed A, Injadat M, Shami A, Lutfiyya H (2020) Student engagement level in e-learning environment. Clustering using k-means. American Journal of Distance Education. https://doi.org/10.1080/08923647.2020.1696140

  54. Netflix Inc (2009) Netflix competition. https://www.netflixprize.com/

  55. Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In: 1990 IJCNN international joint conference on neural networks. IEEE, pp 21–26

  56. Pal S (2012) Mining educational data to reduce dropout rates of engineering students. Int J Inform Eng Electron Business 4(2):1

    Google Scholar 

  57. Prasad GNR, Babu AV (2013) Mining previous marks data to predict students performance in their final year examinations. Int J Eng Res Technol 2(2):1–4

    Google Scholar 

  58. Ramaswami M (2014) Validating predictive performance of classifier models for multiclass problem in educational data mining. International Journal of Computer Science Issues (IJCSI) 11(5):86

    Google Scholar 

  59. Rana S, Garg R (2016) Evaluation of students’ performance of an institute using clustering algorithms. Int J Appl Eng Res 11(5):3605–3609

    Google Scholar 

  60. Romero C, Ventura S (2007) Educational data mining: a survey from 1995 to 2005. Expert systems with applications 33(1):135–146

    Google Scholar 

  61. Rosenberg MJ, Foshay R (2002) E-learning: strategies for delivering knowledge in the digital age. Performance Improvement 41(5):50–51. https://doi.org/10.1002/pfi.4140410512. https://onlinelibrary.wiley.com/doi/abs/10.1002/pfi.4140410512, https://onlinelibrary.wiley.com/doi/pdf/10.1002/pfi.4140410512

    Google Scholar 

  62. Saxena R (2015) Educational data mining: performance evaluation of decision tree and clustering techniques using weka platform. Int J Comput Sci Business Inform 15(2):26–37

    Google Scholar 

  63. Vahdat M, Oneto L, Anguita D, Funk M, Rauterberg M (2015) A learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. In: Design for teaching and learning in a networked world. Springer International Publishing, Cham, pp 352–366

  64. Vujicic T, Matijevic T, Ljucovic J, Balota A, Sevarac Z (2016) Comparative analysis of methods for determining number of hidden neurons in artificial neural network. In: Central European conference on information and intelligent systems, faculty of organization and informatics Varazdin, p 219

  65. Wang X, Zhang Y, Yu S, Liu X, Yuan Y, Wang F (2017) E-learning recommendation framework based on deep learning. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC). https://doi.org/10.1109/SMC.2017.8122647, pp 455–460

  66. Yang L, Moubayed A, Hamieh I, Shami A (2019) Tree-based intelligent intrusion detection system in internet of vehicles. In: 2019 IEEE global communications conference (GLOBECOM)

Download references

Acknowledgments

This study was funded by Ontario Graduate Scholarship (OGS) Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to MohammadNoor Injadat.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Informed Consent

This study does not involve any experiments on animals.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Injadat, M., Moubayed, A., Nassif, A.B. et al. Multi-split optimized bagging ensemble model selection for multi-class educational data mining. Appl Intell 50, 4506–4528 (2020). https://doi.org/10.1007/s10489-020-01776-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01776-3

Keywords